Operating System - HP-UX
1827677 Members
4234 Online
109967 Solutions
New Discussion

Re: Service Guard 11.12 cluster question

 
Melody Pulling
Occasional Contributor

Service Guard 11.12 cluster question

In a 3 node cluster with one server used as a hot stand-by for fail-over and 2 production nodes ... we want to be sure that either production node will fail over to the stand by if they crash (or pkgs are brought down) singly. However, should both production nodes fail at the same time we want to give only one server the ability to fail to the standby node (or give it priority) and have the other production just fail... down. Does anyone know how we can accomplish this with as little pain as possible?

This is apparently unheard of by HP, per support call. There is nothing in the manuals nor any of the documentation we have reviewed.

Thank you in advance for any assistance you can lend.

Regards,
Melody
5 REPLIES 5
Luc Bussieres_1
Trusted Contributor

Re: Service Guard 11.12 cluster question

I think the best solution to will be to have the package, that should be running on the failover node, create a file when starting (for example /tmp/package1_running) and removing it when stopping and have the monitor script of the second package look for that file and if that file is existing then shutdown the application and then kill itself so the package will completly stop.

Regards
Luc
John Poff
Honored Contributor

Re: Service Guard 11.12 cluster question

Hello,

We have run a similar configuration here before, except we configured our failover node to run both production packages at the same time from the other nodes.

Here is one way to try it:

If your nodes are node1, node2, and node3, and pkg1 runs on node1, pkg3 runs on node3; let's assume that if node1 and node3 both fail that you just want pkg1 to run on node2. I think you can do it if you put a command in the customer_defined_run_commands part of the control script for pkg1 that says:

cmhaltpkg -n node2 pkg3

which should just halt pkg3 if it is running on node2.

Then, you could put some code in the control script for pkg3 to check to see if 1)pkg3 is trying to start on node2 (hostname=node2) and 2) pkg1 is already running on node2. If both are true, pkg3 can't start.


Don't worry about HP not having heard of it before; I've done all kinds of wild and crazy stuff that they've never seen before. That's the fun part of this job!

I'm just curious, but why can't you run both packages on the failover node at the same time?

JP
A. Clay Stephenson
Acclaimed Contributor

Re: Service Guard 11.12 cluster question

This would be my approach (and I'm thinking out loud):

In the package startup script (that you want to fail if both servers are down):

1) Determine which node I am running on.
2) If I'm on the dedicated failover node then
enter this special block of code otherwise
start the package.
3) Ping the other two servers with a timeout.
(I already have a Perl script) and return a status).
3) If I get a bad exit status from both other hosts then exit the package startup otherwise start the package.

If you want the Perl ping script let me know.

-------------------------------------

I must add that this is really out there in that in a well-designed cluster, the simultaneous failure of two nodes is extremely unlikely. If you have a scenario that could result in the failure of two nodes, you should probably rethink your cluster.


If it ain't broke, I can fix that.
Sanjay_6
Honored Contributor

Re: Service Guard 11.12 cluster question

Hi Melody,

I think luc suggestion is what you should look at. say you have three node node1,node2 and node3 and you have two packages pkg1 and pkg2. By default pkg1 runs on node1 and pkg2 runs on node2. Let the package starup file create a file in some directory, say /tmp and the same package starup file looks for this file before it starts on the node. So if pkg1 fails to node3, it looks for the file over there and starts itself since the file is absent. It then creates this file on that node in /tmp. Now say pkg2 fails and it tries to start on node3, it looks for the file and tries to start on that node, find the file exist in /tmp and then stops the startup of the package pkg2. It can then try to start itself on node1 where there is no package running if configured like that.

The package shutdown script can be configured to remove the file from the /tmp directory.

Do remember that in situation where there was an abnormal shutdown the file will not get removed from /tmp since the package was not shutdown properly (say a crash) and it won't get a chance to remove the file from /tmp. In such a situation the file needs to be removed manuall by the SA.

Hope this helps.

Regds
Melody Pulling
Occasional Contributor

Re: Service Guard 11.12 cluster question

Thank you all very much for your suggestions!

FYI: The reason why both nodes cannot fail over to the failover server at the same time is political. Each production node is owned by a different lines of business ...

I will read, in more depth, your offerings. But once again, thank you so much for your quick responses and the helpful information.

Melody