Operating System - HP-UX
1827663 Members
3579 Online
109966 Solutions
New Discussion

Service Guard Fail over question

 
Mike Crawley
New Member

Service Guard Fail over question

I am trying to find out if i lost 80% of my nodes in a cluster could i still bring packages up? I have seen some information from presentation by HP that says if you lose more than 50% of your nodes you can not bring any packages up on the remaining nodes.
7 REPLIES 7
Manix
Honored Contributor

Re: Service Guard Fail over question

Hello Mike,

If cluster is not able to met the quorum that is it has less than 50 % of the nodes accessible ,in that case the node /nodes which has access to the lock lun /disk quorum
server forms the cluster till the last node
is available.

It depends on the no of nodes and the kind
of "split brain" solution you have.

do post your cluster configuration file ,no of nodes ,cmviewcl -v output.

Hope this helps.

Thanks

Manix
HP-UX been always lovable - Mani Kalra
melvyn burnard
Honored Contributor

Re: Service Guard Fail over question

well you need to understand, it depends on how you lose the nodes.
If by "lost 80%" you mean they failed/died, then you have lost quorum, and any remaining nodes will panic (TOC).
They will then reboot, and attempt to start a cluster, but as 100% of the nodes will NOT respond, this will timeout after 10 minutes (if left at default).
Once this has occurred, then you CAN start the cluster using the remaining nodes by using:
cmruncl -n
and then on any other node that is up do :
cmrunnode
Then verify whether all your packages have started,and where necessary change the enable any swithcing and start any packages that did not start.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
melvyn burnard
Honored Contributor

Re: Service Guard Fail over question

@Manix
>If cluster is not able to met the quorum that is it has less than 50 % of the nodes accessible ,in that case the node /nodes which has access to the lock lun /disk quorum
server forms the cluster till the last node
is available.

Sorry, but that is completely incorrect.
Even if there IS a quroum device, if all that remains is LESS than 50%, they will not even look for it and they will TOC.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Mike Crawley
New Member

Re: Service Guard Fail over question

Hi all thanks for your help. I have attached a cmviewcl output. I found the information in the white papers that talks about it. What i got from it, and from what you all are saying is, it depends but what i was worried about was. On one SD i have 5 packages on 3 nodes and where i fail over is on another SD with 1 node that can run all 5 if needed. So i was worried that if i lost all of my SD1 and needed to fail over that would be more than 50% of my nodes not accessible. But from what i now understand is that i could still have those packages come up on my failover node. I would just be split brain, and i could run that way until i get my other SD up. Correct me if i am wrong. and thanks again for your help.
melvyn burnard
Honored Contributor

Re: Service Guard Fail over question

>On one SD i have 5 packages on 3 nodes and where i fail over is on another SD with 1 node that can run all 5 if needed. So i was worried that if i lost all of my SD1 and needed to fail over that would be more than 50% of my nodes not accessible.

Well this would leave 1 out of 4 nods, so you have less than a 50% quorum.
As per my previous response, the remaining node (2nd SD) would panic (TOC).
Once it comes back up, there would be the timeout period to get hthrough (10 minutew default), and then you could start the cluster MANUALLY with cmruncl -n nodename.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Mike Crawley
New Member

Re: Service Guard Fail over question

Ok i understand now thanks for the information.
Emil Velez
Honored Contributor

Re: Service Guard Fail over question

it depends on how you loose the systems

Lets say you have a 4 node cluster

if you loose 2 nodes and the other 2 nodes have a quorum mechanism the will form a cluster.

If you loose 3 nodes at one time instantly you should not be able to form a cluster.

You can start the cluster on a subset of the nodes with the

lets say you have a 6 node cluster defined but you want to start the cluster on only the first 2 nodes. Normally you would need to wait a timeout period and if you had 3 nodes plus the quorum mechanism you can start the cluster. But you can start the cluster with

cmruncl -n node1 -n node2

It will ask you to verify but it will then start the cluster only on those subset of nodes.