Re: Node crashed

Darren Gibbs · ‎03-25-2002

Our primary node lost connection with the secondary node in a two node cluster because the secondary node lost a CPU and caused the system to hang. When the primary node attempted to get the cluster lock, it failed with the messages below:

Mar 23 01:39:22 parrot cmcld: Timed out node quail. It may have failed.
Mar 23 01:39:22 parrot cmcld: Attempting to adjust cluster membership
Mar 23 01:39:39 parrot cmcld: Obtaining Cluster Lock
Mar 23 01:39:40 parrot cmcld: WARNING: Cluster lock on disk /dev/dsk/c13t3d3 is missing!
Mar 23 01:39:40 parrot cmcld: Until it is fixed, a single failure could
Mar 23 01:39:40 parrot cmcld: cause all nodes in the cluster to crash
Mar 23 01:39:40 parrot cmcld: Failed to obtain Cluster Lock: I/O error

After 2 1/2 minutes of attempting to obtain a cluster lock, the primary node crashed. What caused this?

There are two things to know about this situation, first the WARNING message about obtaining a cluster lock has been happening for some time but has never caused a crash. I've got the cminitlock utility but haven't had a chance to test yet.

Second, the VG that has the assigned cluster lock disk was not marked as MCSG aware, i.e. vgchange -c y vgxx, and was activated in read only mode instead of exclusive mode.

I'm wondering which one of the above reasons caused the crash?

Our cluster is comprised of two N class servers that are sharing disks from an XP256 via Fibre switches.

Christopher McCray_1 · ‎03-25-2002

The first thing I would do is a cmscancl and view the output as it pertains to the cluster lock disk. The message is saying that the cluster lock disk that MCSG thinks should be isn't there. By chance,was that disk once the cluster lock disk and then moved away? You will probalbly end up redefining the cluster lock vg/disk and recompiling the cluster binary file (cluster must be down) either by editing the cluster.ascii file or by running cmquerycl again. Remember to save the previous cluster.ascii before doing this.

Good luck

Chris

It wasn't me!!!!

Mark van Hassel · ‎03-25-2002

Darren,

You have applied the MCSG config with a cluster lock device. The volume group needs to be cluster aware, you specified FIRST_CLUSTER_LOCK_VG and FIRST_CLUSTER_LOCK_PV in the cluster ascii file.
The vg needs to be cluster aware but does not need to be activated, however, when activating read only I can imagine that the cmcld daemon can not write to the lock disk and the node can therefor not obtain the cluster lock.

To make the vg cluster aware do the following:

vgchange -a n vgname
vgchange -c y vgname
(when the cluster is up)

Alternatively, you can add the vg to the cluster ascii file and re-apply the cluster config.

HtH,

Mark

The surest sign that life exists elsewhere in the universe is that none of it has tried to contact us

Sanjay_6 · ‎03-25-2002

Hi Darren,

The Cluster lock vg has to be SG aware volume group. This volume group should be accessible from both the nodes. Since your lock disk was not sg aware, the 2nd system was unable to get hold of the lock disk when the 1st system crash.

Hope this helps.

Regds

Darren Gibbs · ‎03-25-2002

I had stated that the VG that houses the cluster lock disk was activated in read only mode instead of exclusive mode on the primary node. I should have stated that the VG was activated using vgchange -a y vgxx, instead of vgchagne -a e vgxx.

My theory is that the true failure was caused by the fact that the VG housing the cluster disk was not MCSG aware at the time of the failure. Could this be?

The reason for it not being MCSG aware was a change mistakenly made by someone else on the team the previous weekend.

Sanjay_6 · ‎03-25-2002

Hi Darren,

You are correct. Since the VG housing the cluster lock disk was made cluster unaware, even though the 1st system where this vg was activated as "vgchange -a y /dev/vg_name" went down the 2nd node was unable to activate the vg and was unable to get hold of the cluster lock disk. The cluster lock vg should never be made cluster unaware.

Hope this helps.

Regds

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Node crashed

Node crashed

Re: Node crashed

Re: Node crashed

Re: Node crashed

Re: Node crashed

Re: Node crashed