Re: HPUX CLuster : message in the /var/adm/syslog/syslog.log

Mahesh Babbar · ‎01-14-2004

Hi HPUX/MCSG gurus,

I am getting following messages in the /var/adm/syslog/syslog.log files

On node 1 :

Jan 15 10:23:38 cmgtpn1 cmcld: WARNING: Cluster lock on disk /dev/dsk/c8t15d7 i!
Jan 15 10:23:38 cmgtpn1 cmcld: Until it is fixed, a single failure could
Jan 15 10:23:38 cmgtpn1 cmcld: cause all nodes in the cluster to crash
Jan 15 10:23:39 cmgtpn1 cmclconfd[2540]: Updated file /var/adm/cmcluster/frdump.

On node 2

Jan 15 10:23:42 cmgtpn2 cmcld: WARNING: Cluster lock on disk /dev/dsk/c6t15d7 is
missing!
Jan 15 10:23:42 cmgtpn2 cmcld: Until it is fixed, a single failure could
Jan 15 10:23:42 cmgtpn2 cmcld: cause all nodes in the cluster to crash
Jan 15 10:23:43 cmgtpn2 cmclconfd[2627]: Updated file /var/adm/cmcluster/frdump.

Both the HPUX nodes are connected to XP512 disk array.

Though it appears as different devices on both the nodes but basically it's the same physical device coming from the disk array.

The message itself is scary '... a single failure can cause all the nodes in the cluster to crash.'

My queries are

Is it really something to be taken care of ?

How can I get rid of the message ?

Can I get rid of the message w/o causing packages to go down. ?

TIA

Mahesh

Sunil Sharma_1 · ‎01-14-2004

Hi,

Is this LUN workign fine ?

It seems Cluster locak disk has some problem.

Sunil

*** Dream as if you'll live forever. Live as if you'll die today ***

Jakes Louw · ‎01-14-2004

I would suggest running a cmquerycl -v and checking the status of the lock devices.

Trying is the first step to failure - Homer Simpson

Mahesh Babbar · ‎01-14-2004

Hi Sunil,

The Lun should be working fine.

diskinfo output is as below

# diskinfo -v /dev/rdsk/c8t15d7
SCSI describe of /dev/rdsk/c8t15d7:
vendor: HP
product id: OPEN-E-CVS
type: direct access
size: 102960 Kbytes
bytes per sector: 512
rev level: 0118
blocks per disk: 205920
ISO version: 0
ECMA version: 0
ANSI version: 2
removable media: no
response format: 2
(Additional inquiry bytes: (32)34 (33)30 (34)30 (35)37 (36)46 (37)32 (38)44 )
#

Mahesh

Roger Grotle · ‎01-14-2004

I ran into that problem about two years ago.

The cluster lock disk is used when the connection between the nodes are lost. In that case, all running nodes will attempt to grab hold of the disk first, will continue running, the others will crash (TOC). If no nodes can reach the disk, all of them will crash.

First, check that the disk is actually redable, not just that diskinfo works.

If it is, then the problem probably is that some header on the disk has been destroyed.

The supported way to fix that was to stop the entire cluster and do a cmapplyconf -C . However, HP provided me with an usupported program that fixed the header. I can't remeber its name, but maybe HP can help you.

G. Vrijhoeven · ‎01-14-2004

Hi Mahesh,

The diskinfo is convincing. This proves the disk is accessable. However I assume you have two paths to the disk. mc/sg does not check the alternate path.
Can it be that then Alternate path has become the primairy path. Check with vgdisplay -v. The cmquerycl
Jakes advices is also a good option.

Gideon

Elmar P. Kolkman · ‎01-14-2004

I would do something like:
dd if=/dev/dsk/c8t15d7 of=/dev/null
on node1 to make sure the disk is readable.

Also, you might try:
fuser /dev/dsk/c8t15d7
to find possible locks on the device.

You also need to check the disk is writable on all nodes. Checking this might be tricky if the disk also contains a filesystem.

Every problem has at least one solution. Only some solutions are harder to find.

melvyn burnard · ‎01-14-2004

This is a serious issue that you need to fix, otherwise you could see BOTH nodes TOC if/when there is some serious problem that would require a node accessing the cluster lock disk.
The correct way to fix this is to halt the cluster, activate the cluster lock vg on one node, and from that node re-apply the cmapplyconf command.
Failing that, you could log a software call with your local HP office and request the unsupported utility called cminitlock, which will allow you to first check the disk in question, and then to set the bits on the physical disk to fix this problem without halting the cluster. Note that this utility will NOT allow you to change the cluster lock disk.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Mahesh Babbar · ‎01-15-2004

Thanks to all for the valuable inputs,

status is

Gideon : I have single path to the XP array.

Elmar : dd is working fine. though I did not try fuser just out of fear that it won't cause any issue.

Melvyn : Thanks for sensitinsing. However, the second node which is a kind of a stand by, I mean not active, was shutdown in the same condition 2 days back and the packages/cluster remained up.

HP said, in order to get it resolved, one must bring the cluster down which I wanted to avoid.

Best Regards

Mahesh

melvyn burnard · ‎01-15-2004

If the second node was halted/shutdown in an orderly fashion, i.e. you issued cmhaltnode, or shutdown command, then thi swould NOT need the cluster lock disc. It is when there is a failure that this will be needed, and if the disk is not available, the remaining node will TOC due it's inability to garb the cluster lock.
As I said, if you log a call and request the cminitlock script, then this will assist you in repairing this online.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Jakes Louw · ‎01-15-2004

Melvyn,

Any chance that HP will be changing the rule of a mandatory lock disk for a 2-node cluster?

Trying is the first step to failure - Homer Simpson

melvyn burnard · ‎01-15-2004

As for the mandatory requirment for a lock mechanism, to my knowledge, there are no plans to drop this, as this would compromise the idea behind ensuring HA integrity.
You must have either a lock disc, or a quorum server for a two node cluster.
If you do not wish to use the cluster lock disc, then look at using the free quorum server product to supply the tie-breaking mechanism.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Bernhard Mueller · ‎01-15-2004

Mahesh,

I remember that error to occur when a cluster lock disk failed and had to be replaced. (This was in a JBOD not an array LUN).

The tool to fix the problem without bringing the cluster down was "cminitlock", which initializes a disk as a lock disk while the cluster is up. I wonder why Melvyn did not point you in that direction (maybe it is not officially supported or there are problems with LUNs being the lock disk). Would be interesting to know...

Regards,
Bernhard

Jakes Louw · ‎01-15-2004

Bernhard

You clearly did not read the thread.....

He mentioned it in his first posting.

Trying is the first step to failure - Homer Simpson

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: HPUX CLuster : message in the /var/adm/syslog/syslog.log

HPUX CLuster : message in the /var/adm/syslog/syslog.log