Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

Cliff Lim Kok Hwee
Regular Advisor

Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

Hi Forumers,

I have a 2-nodes cluster setup.

I realised that the cluster lock lun will show as status down even cluster is up. In order to ensure lock lun is up i will need to manually cmhaltcl then cmruncl then the lock lun will be having status up.

Any reasons to advise?

Thanks/Cliff
9 REPLIES
melvyn burnard
Honored Contributor

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

posted in wrong forum, moved to more appropriate forum
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
melvyn burnard
Honored Contributor

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

not quite sure what you mean here, are you saying there is a problem with th elocklun?
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Cliff Lim Kok Hwee
Regular Advisor

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

Hi,

The status of lock lun when using cmviewcl is showing down upon 1 node reboot. So in order to have it showing up i will need to do a cmhaltcl then cmruncl again.

Cliff

melvyn burnard
Honored Contributor

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

if the other node is showing up, then why not simply halt the node that shows down and reboot it, rather than the whole cluster?
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Cliff Lim Kok Hwee
Regular Advisor

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

In a nutshell, this is wht i meant:

Whenever I do a cmruncl the status of the cluster lock lun will be as below:

NodeA
Cluster_Lock_LUN:
DEVICE STATUS
/dev/sda1 up


NodeB
Cluster_Lock_LUN:
DEVICE STATUS
/dev/sda1 up

But when I do a reboot of any node, the status of lock lun will be showing as down while node is able to join the cluster. I will do a cmhaltnode then cmrunnode and status of lock lun will be showing as UP again.

Any reason to it?

Thanks/Cliff

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

What patch level of SG are you running? There was a cmlocklund defect fixed in serviceguard-A.11.16.03 but the symptoms are not exactly what you describe as there is an additional error in /var/log/messages like: "Could not set lock disk status db entry, (2)"

Are you seeing that message on any node from cmcld? If you are running a version of SG earlier than 11.16.03 you may want to consider patching because that patch is 16 months old now.

Are you seeing any messages in syslog complaining about the availability of the lock lun when it is in this state? Also, does cmviewcl -v report "unknown" or "down"? Does it report the same from both nodes?

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

I agree with Mike here.

I would request you to check the syslog messages and see if they match with the following messages.


: Could not set lock disk status db entry, (2)
cmlocklund can also fail and be re-started by cmcld.
In this case you would get messages:
May 23 00:54:49 fmwml01 cmcld[24446]: Failed to
receive from quorum server 127.0.0.1 port 46841,
closing down connection.
May 23 00:54:49 fmwml01 cmcld[24446]: Connection
failure to quorum server localhost. Please check
server's log.
May 23 00:54:49 fmwml01 cmcld[24446]: Service
cmlocklund terminated due to an exit(1).
May 23 00:54:49 fmwml01 cmcld[24446]: WARNING:
The quorum device localhost is down.
May 23 00:54:49 fmwml01 cmcld[24446]: Until it
is fixed, a single failure could
May 23 00:54:49 fmwml01 cmcld[24446]: cause all
nodes in the cluster to crash.
May 23 00:54:49 fmwml01 cmcld[24446]: Automatically
restarted service cmlocklund for the 1st
time after failure.

If yes please apply the SG A.11.16.03 patch.

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

There is an other problem of similar nature fixed in SG A.11.16.04, please check with the symptoms below and apply the SG A.11.16.04 patch if appropriate.

On RH4/U2 X86_64 and IPF the locklun device will sometimes fail with the following messages:

Oct 23 05:17:20 axe410 cmlocklund[9077]: Disk
device /dev/sde1 has been associated (bound)
to /dev/raw/raw1.
Oct 23 05:17:20 axe410 cmcld[9042]: Lock LUN
initialized (port = 33059).
Oct 23 05:17:20 axe410 cmlocklund[9077]: Could not open
cluster lock lun /dev/raw/raw1: No such file or
directory
Oct 23 05:17:20 axe410 cmlocklund[9077]: Error opening
device /dev/raw/raw1 no such device.

This would occur when a node tries to join the cluster, after it has been rebooted. To resolve this problem the Serviceguard on node has to be halted using "cmhaltnode" and then restarted using "cmrunnode".
Cliff Lim Kok Hwee
Regular Advisor

Re: Serviceguard A.11.16 in Redhat Linux version 4 Updates 4

Guys, thank for responding.

Sorry for not updating the forum earlier.

I managed to solve the issue by doing the following:

#edit the /etc/sysconfig/rawdevices file and have
/dev/raw/raw1 /dev/sda1
then have rawdevices service started.

Right now after reboot, lock lun comes up nicely without errors below:

Feb 8 03:22:11 lsgpas17 cmcld[6847]: Lock LUN Device is /dev/sda1
Feb 8 03:22:11 lsgpas17 cmcld[6847]: The quorum device localhost is being initialized.
Feb 8 03:22:11 lsgpas17 cmcld[6847]: rcomm health: Initializing timeout to 120000000 microseconds
Feb 8 03:22:11 lsgpas17 cmlocklund[6880]: Total allocated: 540672 bytes, used: 0 bytes, unused 540672 bytes
Feb 8 03:22:11 lsgpas17 cmlocklund[6880]: Port number returned by locklund_setup: 32786
Feb 8 03:22:11 lsgpas17 cmlocklund[6880]: Disk device /dev/sda1 has been associated (bound) to /dev/raw/raw1.
Feb 8 03:22:11 lsgpas17 cmcld[6847]: Lock LUN initialized (port = 32786).

Yahoo!/cliff