Operating System - HP-UX
1753426 Members
5129 Online
108793 Solutions
New Discussion юеВ

Replacing a failed cluster lock disk - while online?

 
SOLVED
Go to solution
Mark DeBoer
Occasional Contributor

Replacing a failed cluster lock disk - while online?

We lost our CL disk on our two node cluster. I replaced the disk and then did a vgcfgrestore to the disk and I thought that all was well. Now we are getting messages in syslog about a missing cluster lock disk. After reading more about replacing a failed CL disk, I know that I have to run the cmapplyconf command again to re-initialize the CL disk.

Can I do this without halting the cluster?

If I wanted to add another CL disk (to prevent a Single Point of Failure) could I do this online as well?

What's the best way to prevent a failed CL disk from bringing down the whole cluster?
4 REPLIES 4
Emil Velez
Honored Contributor

Re: Replacing a failed cluster lock disk - while online?

A cmapplyconf is quite often done while the cluster is up. You may need to add a package while the cluster is up. You may want to change the order of nodes that a package fails over to. So yes it is done and it is important that you feel comfortable under certain circumstances that you will need to do it. For all of the volume groups in the cluster configuration file when you do a cmapplyconf it does a vgchange -c y which locks them for mc/serviceguard.
James R. Ferguson
Acclaimed Contributor

Re: Replacing a failed cluster lock disk - while online?

Mark:

You will need to down the cluster. From document #W3618850:

PROBLEM:
The following messages appear in /var/adm/syslog/syslog.log about my cluster lock disk:

WARNING: Cluster lock on disk /dev/dsk/c3t0d0 is missing! Until it is fixed, a single failure could cause all nodes in the cluster to crash.

RESOLUTION:

This event has been known to be caused by the following:

a. During the most recent cluster configuration, the cluster lock VG was active in vgchange -a y on one of the adoptive nodes in the cluster.

b. The cluster lock disk was replaced or moved to a different disk.

The following fix will help regardless of the reason for the disk lock problem:

1. Halt the packages and cluster.

2. De-activate the cluster lock VG:

vgchange -a n

Note: You must de-activate the volume group on all nodes in the cluster, and then activate the volume group on the node from which you will run cmapplyconf.

If necessary, execute the following command on the VG:
vgchange -c n

3. Update the cluster binary configuration file:

cmcheckconf

4. Re-distribute the cluster binary configuration file:

cmapplyconf

You get the message once every hour because that is the frequency at which MC/ServiceGuard checks the connectivity with the cluster lock disk. This helps prevent a surprise node failure.

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: Replacing a failed cluster lock disk - while online?

Another thread, just today (!) appears to have the same issue:

http://forums.itrc.hp.com/cm/QuestionAnswer/1,1150,0x440ca24d9abcd4118fef0090279cd0f9,00.html

See KB document #KBRC00001982 and/or #W3618850 for corrective action.

...JRF...
Stephen Doud
Honored Contributor
Solution

Re: Replacing a failed cluster lock disk - while online?

If you are willing to utilize a proven unsupported command provided by the response center, open a case and ask for "cminitlock".

I have used it successfully in restoring a clusterlock structure while the cluster is running.

Otherwise the cluster must be downed for the cmapplyconf to install the clusterlock structure.

Also, we've seen a cmapplyconf fail to install the lock structure occasionally. A cmdeleteconf before doing a cmapplyconf takes care of it unless the disk has a hidden problem.