Operating System - HP-UX
1821539 Members
2829 Online
109633 Solutions
New Discussion юеВ

WARNING: cluster lock disk is missing...

 
SOLVED
Go to solution
Kenneth Platz
Esteemed Contributor

WARNING: cluster lock disk is missing...

Hello everyone,

Yesterday one of our nodes (the failover node) in our MC/ServiceGuard cluster started spitting out the following error:

Jan 19 15:13:32 westh cmcld: WARNING: Cluster lock on disk /dev/dsk/c4t14d1 is missing!
Jan 19 15:13:32 westh cmcld: Until it is fixed, a single failure could
Jan 19 15:13:32 westh cmcld: cause all nodes in the cluster to crash
Jan 19 15:13:33 westh cmclconfd[2076]: Updated file /var/adm/cmcluster/frdump.cm
cld.8 for node westh (length = 108603).

Now this disk is available on the primary node of the cluster, but the only disk-related error we've seen on this node is:

Jan 19 05:00:02 wylieh vmunix: msgcnt 1 vxfs: mesg 010: vx_ialloc - /oracle/WAD
file system inode 13 not free
Jan 19 05:00:02 wylieh vmunix: msgcnt 2 vxfs: mesg 016: vx_ilisterr - /oracle/WA
D file system error reading inode 13

The first error has been occurring hourly since last night around 1700hrs. The second error has only occurred once (0500 hrs this morning).

These disks are on an EMC array, and we are using EMC PowerPath software for both failover and load balancing. The disks appear as following:

On the troubled node:

[/var/adm/syslog] root@westh #inq 2> /dev/null | grep 888
/dev/rdsk/c4t14d1 :EMC :SYMMETRIX :5670 :33888000 :70709760
/dev/rdsk/c6t14d1 :EMC :SYMMETRIX :5670 :33888000 :70709760

And on the trouble-free node:
[/var/adm/syslog] root@wylieh #inq 2> /dev/null | grep 888
/dev/rdsk/c2t14d1 :EMC :SYMMETRIX :5670 :33888000 :70709760
/dev/rdsk/c4t14d1 :EMC :SYMMETRIX :5670 :33888000 :70709760

Any suggestions what could be causing this, and how to resolve it?
I think, therefore I am... I think!
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: WARNING: cluster lock disk is missing...

ioscan -fnC disk

Compare this to your cluster configuration and make sure the lock disk is still present.

Perhaps test it with mstm or cstm or xstm.

The cluster lock disk is supposed to be the tie breaker of heartbeat is lot on the cluster.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Kent Ostby
Honored Contributor

Re: WARNING: cluster lock disk is missing...

Either you have lost physical access to the cluster lock disk or the data structures on the disk have been removed accidentally.

ITRC document UXSGKBAN00000022 contains the fix for this.

"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
melvyn burnard
Honored Contributor

Re: WARNING: cluster lock disk is missing...

Your system has a problem talking to the disc assigned as the cluster lock disc.
I would check the hardware along the path to that duisc, cables etc.
Also check the EMC for errors, as it could have an internal issue.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Stephen Doud
Honored Contributor
Solution

Re: WARNING: cluster lock disk is missing...

Serviceguard (on each node) tests the cluster lock structure once an hour. Since one server is not reporting the errors, but the other server is - it would appear that the server reporting the errors cannot perform I/O on the disk.
See if the system can read from the disk:
# dd if=/dev/dsk/c4t14d1 of=/dev/null bs=64k
If this succeeds, perform 'cmviewconf' on the node reporting the error and confirm that the lock disk path on that node is correct.
Also consider performing
# cksum /etc/cmcluster/cmclconfig - on both nodes and confirm the numbers match across nodes.
Finally, download 'cminitlock' from
ftp://contrib:9unsupp8@192.170.19.51/crash/cminit.tar.shar
unpack it, use the README to execute the command on the node where the "missing" messages are reported.
Rita C Workman
Honored Contributor

Re: WARNING: cluster lock disk is missing...

Well...IMHO...on the 'good' node I'd first be looking at that error. Might suggest taking that down and running a filesystem check (full option) command to see if that would clean up that error.

Now on the lock disk..
Obviously you can the disk from the EMC command line. But one thing you can check is PowerPath. This sometimes gets skewed up. Not sure of your version, but trying running: powermt check
This runs a utility looking for dead connections. Relax on the word dead. Sometimes Powerpath holds on to old or bad disk info, so it marks one address as dead and will not allow the same disk (showing as active in powerpath) to gain control of the disk. Sounds confusing...but think of it like a lookup in hostfile. Once it hits a match it stops..but if that is a bad lookup or old info, you never get farther in the file (same principal here).
If it hits any dead connections, then say Y to remove them and keep doing that till it finishes. Then run powermt config and then powermt save..to save what you have.

Just a quick thought, cause I think there maybe more going on there.

Rgrds,
Rita
Kenneth Platz
Esteemed Contributor

Re: WARNING: cluster lock disk is missing...

Everyone,

We discovered the reason for these errrors. One of the EMC admins had accidentally presented these disks to another host in our environment, and that other host had performed pvcreate -f's and vgcreate's on those disks, which in turn performs a "power word nuke" on the LVM header area where the cluster lock information is kept. Since they've also newfs'ed those filesystems, we're currently attempting to determine the least painful means of recovering their data.

Funny thing is... the database which resides on those disks is still fat, dumb, and happy. Now the problem is trying to convince the DBA's how bad the problem *really* is...
I think, therefore I am... I think!