Operating System - HP-UX
1847205 Members
2982 Online
110263 Solutions
New Discussion

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

 
BERTRAND_7
Frequent Advisor

MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

We have a 2 nodes SG cluster (A.11.18) running on an rx3600 with two MSA1000 disk baies.

We have one node and one bay in a cabinet and the other node and other MSA1000 in another cabinet. Each node is connected to each MSA1000 through an FC link.

We have defined 2 heartbeats on 2 different networks and also 2 cluster lock physical volume one in each MSA1000 cabinet :
FIRST_CLUSTER_LOCK_PV /dev/dsk/c6t0d1
SECOND_CLUSTER_LOCK_PV /dev/dsk/c4t0d2

Today we had an abrupt power off on a whole cabinet. So, at the same time, we lost one rx3600 and one MSA1000.

The package running on the failing node has switched to the remaining node. Yet, after a few seconds we had a problem with the 'cmdisklockd' :
Oct 2 10:43:54 ELESTR1A cmdisklockd[2807]: Timed out waiting for cluster lock disk /dev/dsk/c4t0d2
Oct 2 10:43:54 ELESTR1A cmdisklockd[2807]: Assertion failed: !request_state(lock->state), file: disklock/dl_controller.c, line: 307
Oct 2 10:43:55 ELESTR1A cmcld[2796]: Service cmdisklockd terminated due to a signal(6).
Oct 2 10:43:55 ELESTR1A cmcld[2796]: Utility Daemon cmdisklockd died unexpectedly! It may be due to a pending reboot or panic
Oct 2 10:43:55 ELESTR1A cmcld[2796]: Exiting with status 1.

Full log is provided in attachement.

Why did the cmdisklockd died as it can not get the first lock disk that was faulty ? Why souldn't it try to lock the second lock disk ?

How can I solve the problem in that case ?
Is there anything to change in the cluster configuration ?

Thanks for your help
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

Shalom,

One of these volumes is not visible to one of the nodes:

FIRST_CLUSTER_LOCK_PV /dev/dsk/c6t0d1
SECOND_CLUSTER_LOCK_PV /dev/dsk/c4t0d2

Check ioscan status on these two disks.

Make sure they have the same address on both systems. They should.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rita C Workman
Honored Contributor

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

Can both servers "SEE" both disk?

I say that because you can read your explanation to mean that each node is connected to each MSA1000 on a one-2-one connection. But not necessarily on a cross connection so that each node could see the other MSA1000.

So...can BOTH servers SEE BOTH lock disks?

Just a thought,
Rgrds,
Rita
BERTRAND_7
Frequent Advisor

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

When everything is normal both server can see both lock disks.
The lock disks have the same address on both systems.

Yet, when the failure was detected, the second node was off and the second MSA1000 was off. It meens that the second lock disk '/dev/dsk/c4t0d2' was no more available.
At the same time, the first lock '/dev/dsk/c6t0d1' disk was still available.

I would have expected the remaining node to run as a single node cluster using the lock disk on the remaining MSA1000.

Any suggestion is welcome.
melvyn burnard
Honored Contributor

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

What patch level do you have for your SG installation?
do:
what /usr/lbin/cmcld

I suspect you may be missing a Serviceguard patch
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
BERTRAND_7
Frequent Advisor

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

ELESTR1A#what /usr/lbin/cmcld
/usr/lbin/cmcld:
Cluster Monitor Product $Revision: 82.2 $
Cluster Monitor Product Only $Revision: 82.2 $
Daemon
A.11.18.00 Date: 03/15/07
Build date: Thu Mar 15 16:07:05 PDT 2007
Build id: ibld_sg1118_1123_product
Build platform: hpux
ELESTR1A#
melvyn burnard
Honored Contributor

Re: MC ServiceGuard A.11.18 cmdisklockd terminate, cmcld exit

so you do not have any patches installed for Serviceguard
For HP-UX 11iv2 install PHSS_38423
For HP-UX 11iv3 install PHSS_38424

There are numerous fixes in these patches, and this is your starting point.
Also ensure you have the latest general patch bundle on your servers
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!