Operating System - HP-UX
1834379 Members
1960 Online
110066 Solutions
New Discussion

MC/SG nihilistic package control scripting

 
Ralph Grothe
Honored Contributor

MC/SG nihilistic package control scripting

Hello,

I presume the odd subjet line lured you into this thread?

But the way our customer expects us to deploy their MC/SG cluster I can only describe that way.
I just received their requirements, and I must conclude that their wishes deny all HA concepts, or at least are not MC/SG-clusterable (ooh, is there such noun as "clusterability" in English?).

Without boring you with the gory details just let me give you an example.

When a package is stopped on one cluster node it should look up the node of another, obviously closely related package, find out a certain process of this related package in the other host's process table, revoke execute bits from certain executables in a local (not cluster shared) filesystem on that node, then send the looked up PID a SIGHUP (still all on the other node), kill various other processes on the other node, then wait for *15 MINUTES!!!* (ha, ha), look up shared memory segments on other node, and remove those which belong to a certain uid, and which have zero processes attached to these segments anymore, and may finally gracefully halt its own stuff (i.e. the package to be halted).

Now if you read all this I guess you by now totally agree that this cannot be described other than "nihilistic package control"?

Ok, I don't care anymore about the odd wishes of our dear customers, and I will script this, no matter how decent the outcome would be.

So my question here goes, how can I signal the package control halt script when an abnormal condition initiated the execution of this script, i.e. when the node TOCs (to at least bybass that stupid 15 minute wait then)?

Your comments are most welcome!

Ralph
Madness, thy name is system administration
5 REPLIES 5
Chris Wilshaw
Honored Contributor

Re: MC/SG nihilistic package control scripting

You could try to set up the script so that if a "normal" halt is actioned, a trigger file is written to the other host, which then causes the other processes to sleep for the 15 minutes before continuing.

If the node just crashes, this file wouldn't be created, allowing things to run immediately.
John Palmer
Honored Contributor

Re: MC/SG nihilistic package control scripting

Hi Ralph,

If the node crashes the package halt script just wont get run.

Two possibilities spring to mind (I've called your package the 'slave' and the other 'closely related' one the 'master':-

1. If the slave restarts elsewhere on another node then you could duplicate your shutdown code in the slave startup script. You'll need to differentiate between a 'normal' package start and a 'crashed' restart somehow, perhaps by checking your execute bits or whether a process is running on the master.

2. If the slave doesn't restart then you'll have to have some sort of monitoring in place within the master such that if the slave dies, the master self destructs.

On further thought, as the two are so closely related, perhaps each should be monitoring the other. If either fails then action has to be taken. You'll have to differentiate between normal close and crash though. Presumably you wouldn't close (or start) one without the other?

Another thought, how many servers are in your cluster? If only two, then perhaps you should only have one package rather than two. The master would be responsible for starting itself and the slave (and monitoring both).

It's not very HA 'friendly' though is it?

Regards,
John
Elmar P. Kolkman
Honored Contributor

Re: MC/SG nihilistic package control scripting

Let's assume the package that initiates it is package pkg1, and that package pkg2 is run on the other node and killed by package pkg1.

Why not just run a cmhaltpkg pkg2 from the stop-commands in pkg1? Then you can do all the stuff to stop your processes in the halt-commandlist of pkg2, where it belongs...
Every problem has at least one solution. Only some solutions are harder to find.
Ralph Grothe
Honored Contributor

Re: MC/SG nihilistic package control scripting

Chris,

yes I could drop some sort of semaphore file on the other node.

John,

of course, it's obvious that if a node crashes that there won't be any time to execute cleanup processes in an ordered manner.
So as you correctly wrote I have to distinguish between a "clean" start, and an "unclean" start, viz. one after the package had crashed (or rather its hosting node).
Maybe I could do it in a similar way like filesystems are marked unclean after mounting, and only toggled to clean when there was a successful unmount.
I think this would be easier than scanning the package's control logs on each cluster node only to find out where it ran last time, and if it was stopped cleanly.
Yes, since the packages (there are 4 of them, running on 3 nodes) are so closely related maybe it would be best to set up service monitors for them.

Of the 3 nodes one is some sort of "leader", viz. he who carries the database server.
So the other 3 packages only act as application servers which whose life on their own is pretty redundant.

Elmar,

also an interesting hint, to run cmhalpkg for another package from within another package's halt script.

This all is getting quite bizar...
Madness, thy name is system administration
Ralph Grothe
Honored Contributor

Re: MC/SG nihilistic package control scripting

Ouch, "BIZARRE"!
they should include a spell checker for us non-English speakers.
Madness, thy name is system administration