1846623 Members
1635 Online
110256 Solutions
New Discussion

Re: Cluster Behavior

 
SOLVED
Go to solution
Nancy Calderón_1
Occasional Contributor

Cluster Behavior

Hello,
I have configured a two nodes cluster with two packages as well. Right now us working fine, but I have a question concerning a test I did.
I shutdown both nodes and just started one of them, I expected the starting node would bring the cluster up ,and also the two packages, but the cluster did not form it.

When I tried to manually run the cluster
#cmruncl -v

I could not do it until the other node was running again.

Is there a way to bring the cluster up even if the other node is down?

Is the concept of the test itself right?


Thanks for your help, any advice will be welcome
12 REPLIES 12
Nancy Calderón_1
Occasional Contributor

Re: Cluster Behavior


Hi again,

I forgot to ask if there is a configuration file where to set the value to start the cluster with the 50 % of the nodes?

Thanks again for your help.

NC
bhavin asokan
Honored Contributor
Solution

Re: Cluster Behavior

hi,

you can use cmruncl -n nodename to start the cluster on only one node.

if cluster is already running on one node you have to use cmrunnode & cmrunpkg commands to start the other node and pkg respectively .

regds,
Rainer von Bongartz
Honored Contributor

Re: Cluster Behavior

Service Guard expects a quorum of more than 50% of cluster nodes to be active at startup.
In a two node cluster this means that all nodes need to be up and running for normal cluster startup.

If one of your servers is down, ServiceGuard cannot form the cluster. You have to startup the cluster as a one one cluster using

cmruncl -n

This way SG will start up. At a later time you may bring up the second node and join it to the cluster

Regards
Rainer

He's a real UNIX Man, sitting in his UNIX LAN making all his UNIX plans for nobody ...
Mobeen_1
Esteemed Contributor

Re: Cluster Behavior

Don't you think just like the way we do in VMS, having a QUORUM disk will help in this scenario ?

rgds
Mobeen
Steve Lewis
Honored Contributor

Re: Cluster Behavior

The thing about this situation is that in a two node cluster, when just one of the nodes comes up it cannot tell whether the other node is really down or whether its merely the network connections are at fault and the other server is in fact still up with switched packages and actively using the storage. If this node started up packages automatically without having communicated to another active server it could re-mount already mounted filesystems, start databases and corrupt the data. How is a node in this situation to know whether only it just TOC'd or both of them?

So, when in doubt, ServiceGuard always stays down and therefore keeps your data safe and uncorrupted. This forces you, the system administrator, to investigate the situation and to decide whether to force up the remaining node, using the commands described in a previous postings.

We do have the concept of quorum servers that arbitrate.
melvyn burnard
Honored Contributor

Re: Cluster Behavior

This is standard Serviceguard behaviour. At the time of starting the cluster, all nodes need to be present and able to join in the cluster. If one node is not available, all ohter nodes will attempt to form a cluster, waiting the default timeout of 10 minutes, at which time they will cease to atte,mpt to start/join the cluster.

To start a cluster where one or more nodes are unavailable, use the cmruncl -n nodename command to get the cluster to start on the first node, then if there ar eadditional node do cmrunnode on each of them to get them to join the already running cluster.
There is NO WAY to bypass this designed method of starting the cluster.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Florian Heigl (new acc)
Honored Contributor

Re: Cluster Behavior

If You have an old D- or A-Class standing around, install it as a MC/SG Quorum server, the product is free, as far as I remeber.
By placing it outside of the heartbeat lan it is a safe way of eliminating lan-wise issues.
One node that sees the quorum server get's >50% -> comes up, the other one (obviously disconnected in some way) panics.
Two nodes that see each other -> comes up
Two nodes that see each other and the quorum server -> happily ever after.
Also, if the nodes are close to each other, try to have both a separate heartbeat and public lan and added to that serial heartbeat. I costs less than $100 but helps a lot.

According to my book here, there's no quorum disk support on HP-UX, sorry :)
yesterday I stood at the edge. Today I'm one step ahead.
melvyn burnard
Honored Contributor

Re: Cluster Behavior

Having a Quorum server or whategver has nothing to do with the observed actions of the cluster, and would make no diffference to these actions. As stated, this is the method of operation that was designed into Serviceguard to try to ensure as much as possible the safety of customer data.
Basically, when in doubt, don't do anything.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Geoff Wild
Honored Contributor

Re: Cluster Behavior

This is by design.

cmruncl -n nodename

More info in:

http://docs.hp.com/en/B3936-90079/index.html

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Florian Heigl (new acc)
Honored Contributor

Re: Cluster Behavior

I have to correct myself one more time ;)

there's a lock disk mechanism available.

Good luck,
florian
yesterday I stood at the edge. Today I'm one step ahead.
Jan van den Ende
Honored Contributor

Re: Cluster Behavior

Melvyn,

being a VMS man myself who is "strongly advised ( !! )" by the employer to "gather Unix knowledge as well", I am trying to follow this forum as well.
Please accept my different perspective, and relative ignorance.


At the time of starting the cluster, all nodes need to be present and able to join in the cluster


Is that really true? How do you go about if after some time you need to add an extra node for extra capacity, or to replace an older system by a newer, more powerfull model?

Do I correctly read your quote as in that case you need to bring down the cluster for re-config, or am I missing something?

Would that not be a serious breach of 24 * 365 operation?

What I understood so far about Tru64 cluster, they seem to be able to add nodes on the fly, I was assuming that for HPUX as well.

... just trying to learn...

Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Nancy Calderón_1
Occasional Contributor

Re: Cluster Behavior

Hi everyone,

Thanks for helping me to clearly understand the way SG works.