Operating System - HP-UX
1831357 Members
3002 Online
110024 Solutions
New Discussion

How to verify cluster lock is working?

 
SOLVED
Go to solution
Jack Wu
Occasional Contributor

How to verify cluster lock is working?

Hi,

Is there anyway to test and ensure my cluster lock had been properly configured and working normally? I've configured two nodes with one network heardbeat/data IP with standby network. When I removed both active and standby network connection from one node, the system did a TOC. However, after reformation, the one without network problem had been halt and reboot while the one without network connection kept alive. I realize that at the time of reformation, the node which win the race and obtain the lock disk will be kept as single node cluster. However, it seems that none of the node get the lock as I found the following from both nodes' syslog.log.

May 27 10:09:38 umist1 cmcld: Timed out node umist2. It may have failed.
May 27 10:09:38 umist1 cmcld: Attempting to adjust cluster membership
May 27 10:09:38 umist1 cmcld: Obtaining Cluster Lock
May 27 10:09:38 umist1 cmcld: Request to obtain cluster lock /dev/dsk/c4t0d4 failed: Device busy
May 27 10:09:38 umist1 cmcld: Failed to request cluster lock.
May 27 10:09:38 umist1 cmcld: Attempting to form a new cluster
May 27 10:09:38 umist1 cmcld: Changed serial device status to UNINITIALIZED

Any idea?

More, how can I ensure that the one with network should have higher priority to obtain the lock during reformation?

Any help would be highly appreciated.

Thank you in advance!
4 REPLIES 4
melvyn burnard
Honored Contributor
Solution

Re: How to verify cluster lock is working?

well if you pulled out the network connections from one node, and it crashed with a TOC, I assume th eother node stayed up? if so then it can only have stayed running if it got the cluster lock disc.

You also cannot set a priority for the selection of the cluster lock disc.

I suggest you ensure you have the latest SG patch installed, and monitor the cluster syslogs for a day. If the cluster lock disc is unavailable or not set, then this will be noted in at least one of the nodes syslog.
If you do not get an error message saying it is not available, then it IS available as SG polls the cluster lock disc evry 60 minutes.


It also appears that you have a serial heartbeat configured. This is SUPPOSED to ensure that if you lose all networking connectivity, then the node that HAS network connectivity is the one that should get the cluster lock. This unfortunately is not always so, and indeed the serial heartbeat is not a full heartbeat, and in fact should not be used if you can supply a heartbeat network with a standby. This is in the Release Notes of the SG software.

Also bear in mind, losing ALL network connectivity at the same time is considred a MPOF or Multiple Point of Failure, which SG is NOT designed to cater fo
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Stephen Doud
Honored Contributor

Re: How to verify cluster lock is working?

Hello Jack,

The "Managing MC/ServiceGuard" manual states:
"If the heartbeat network card fails on one node, having a serial line heartbeat keeps the cluster up just long enough to detect the LAN controller card status and to fail the node with bad network connections while the healthy node stays up and runs all the packages."

Using the Serial Heartbeat is supposed to insure that the node that still has network connectivity will be the only node to perform the cluster lock race. The serial heartbeat provides a form of heartbeat transmission while each node sorts out it's ability to transmit over it's LANs (every 2 seconds, they test their LANs for transmission ability -NETWORK_POLLING_INTERVAL).
The node which learns that it's HB LANs cannot communicate will TOC (reboot) itself. The other node will not, will note a loss of serial HB transmission from the TOC'd node, and will then perform the cluster reformation - including the cluster lock disk race.

However, the syslog.log that you provided indicates a critical failure - the cluster lock disk was busy at the time it was needed most. This caused the node to fail to get the lock; effectively forcing a TOC.

My recommendation - investigate why the cluster lock disk was "busy".
-s.
Jack Wu
Occasional Contributor

Re: How to verify cluster lock is working?

Hello Stephen,

Thank you for your help. In fact, I am not sure how the cluster lock volumn group should be configured? Should it be vgchange -c y -S y or vgchange -c y -S n? Can the lock disk volumn group be activated/mounted in exclusive read/write mode by one node while running the cluster? Or I have to vgchange -a n for this lock disk to work?

Thank you again!

Regards,
Jack Wu
Tim D Fulford
Honored Contributor

Re: How to verify cluster lock is working?

Jack

The lock disk is defined in the .ascii/.conf files (I forgett exactly which!! but it should say LOCK_DISK). When you do "cmapplyconf" SG does it's magic and writes the information into the reserved (LVM) area at the head of the disk. This means that the only requirement is that the disk in the volume group is visible by both nodes (it can even be referenced by different device names i.e. c1t2d0 & c5t2d0 as long as it is the same disk)

I've never used vgchange -c y -S [y|n] ... I've only used vgchange -c y ... usually for additional volume groups in the cluster/packages.

The headder information for the lock disk (& vgid) is backed up when you do a vgcfgbackup in /etc/lvmconf/vg??.conf.

I hope this helps

Tim
-