1819791 Members
3166 Online
109607 Solutions
New Discussion юеВ

serviceguard cluster

 
SOLVED
Go to solution
NDO
Super Advisor

serviceguard cluster

Hi all!

I recently join a company that has amongst other a 2 node cluster running hp-ux 11.33, its storage are in a netapps and its using veritas as opposed to LVM, and its MCSG version is A.11.19.00. I dont have any training on mcserviceguard, but I was asked to see why the cluster crash twice a month for no apparent reason. I did setup a script to check the network, but what I found was the following:
1. when running cmclview -v it return unknown lock disks
2. when going to sam the IP address of the server is not present (/etc/hosts), only one IP which I presume is from the heartbeat
3. when I go to /etc/cmcluster to view any logs, or configuration file I cannot see none of them.

Please can you tell where can I find the logs or any other relevant information, where this hearbeat IP is configured?
Please help

F.R.
14 REPLIES 14
Rita C Workman
Honored Contributor
Solution

Re: serviceguard cluster

First and foremost go to the following url and download cluster administration guides:
http://docs.hp.com/en/ha.html

1. To view what the actual standing of the current cluster is, run:
cmscancl -o

That will create a file that will give you the information on the existing cluster and it's node. This is your starting point!

2. Run:
cmviewcl -v >

This create the above file and tells you detail information of the running cluster.


2. Look at /etc/hosts and confirm the following exists:
Each MC/SG node IP
Each MC/SG package IP
Each MC/SG heartbeat IP

Now make sure that exists in every node in their respective /etc/hosts file.

3. Your first job is too look and find out exactly what the lockdisk is. Make sure that your lockdisk can be seen by every node in your cluster.
Not all, but many, sometimes make the lockdisk a simple 1 disk volume group. If that is the case in your small cluster than confirm that the volume is not just active but owned by the cluster. Now, I know LVM, but I am not a veritas command person. So, if you find your lock disk is a separate volume group, then you need to make it owned by the cluster, and exclusive to the node it exists on.
Example using LVM commands on a currently active volume group (not part of cluster)
vgchange -a n /dev/vglock
vgchange -c y /dev/vglock

At this state, the cluster, when it starts up would take the /dev/vglock and change it to exclusive by running
vgchange -a e /dev/vglock

Lastly,
For logs:
Package logs are located (depends on your box) under the package subdirectory:
Ex: /etc/cmcluster/packages//

I would suggest you go back and review your syslog.log file to see what you can find first. There is always a reason....no apparent reason simply means they don't know SG.

I don't know how much this will help you, but I hope it does give you some starting point.

Regards,
Rita

NDO
Super Advisor

Re: serviceguard cluster

Hi Rita!

Thanks for your help, very good one. I├В┬┤m worried about this: when I run cmviewcl -v it tells me that lock disks are in an UNKNOWN state? Is this the problem? I will follow your advise, and I did found a pdf with some worksheets that I have fill in.
Thanks again

FR
Stephen Doud
Honored Contributor

Re: serviceguard cluster

If a cluster lock VG is configured but missing, it can produce a system crash in the event that all heartbeat networks are disabled for some reason.

To reconstitute the cluster configuration ASII file, run:
# cd /etc/cmcluster
# cmgetconf cluster.ascii

The cluster lock FIRST_CLUSTER_LOCK_PV references for each node may differ if the device files on each node are ordered differently by instance number. Use 'ioscan -kfnC disk' on each node to compare the device file naming used for the given hardware paths.

What crash/panic messages are in /etc/shutdown.log?
If you have a software support contract with HP, you can engage us to help analye the crash dumps in /var/adm/crash.

For Serviceguard commands to operate correctly, every fixed IP on each server must be listed in /etc/hosts, and aliased to the simple hostname of that server. This requirement is validated in the Managing Serviceguard manual that Rita pointed you to.


Package log destination can be determined using either
# cmviewconf | grep log
or
# cmviewcl -v -f line | grep log

/var/adm/syslog/syslog.log captures some data that is helpful with conditions surrounding Serviceguard.

NDO
Super Advisor

Re: serviceguard cluster

Stephen!
There is no panicks, nothing in /var/adm/crash. The only stange thing that I saw was the following in syslog file:Aug 13 00:10:50 dbnode0 cmdisklockd[4936]: Unable to convert device to I/O tree node: I/O tree node does not exist.
Aug 13 00:10:50 dbnode0 cmdisklockd[4936]: Failed to configure lock disk /dev/disk/disk97, will retry
Aug 13 00:10:52 dbnode0 cmserviced[4941]: Request to perform run service cmlockd
Aug 13 00:10:52 dbnode0 cmlockd[4948]: Changed to working directory /var/adm/cmcluster/cmlockd.
Aug 13 00:10:52 dbnode0 cmlockd[4948]: Executing command: rm -f /var/adm/cmcluster/.cmlock.*.socket
Aug 13 00:12:05 dbnode0 cmdisklockd[4936]: Unable to convert device to I/O tree node: I/O tree node does not exist.



# cat syslog.log | grep -i warning
Aug 13 00:10:28 dbnode0 vmunix: GAB WARNING V-15-1-20115 Port d registration failed, GAB not configured
Aug 13 00:10:28 dbnode0 vmunix: ODM WARNING V-41-6-5 odm_gms_api_start_msgs fails
#


If I run cmviewcl -v it shows me the status of lock disks as UNKNOW.

FR
Rita C Workman
Honored Contributor

Re: serviceguard cluster

Nandinho,

If you find that the disks are 'unknown' then that is definitely your problem. The disks can not be used in that state. Since 'unknown' can mean alot of things, and I have no idea what those disks are or how they are set up - you need to get the disks back to 'claimed'.

Or you need to set up new disk for lock disk, and change your cluster and nodes accordingly.

Hi Stephen !!

Regards,
Rita
NDO
Super Advisor

Re: serviceguard cluster

Hi Rita!

Its very strange: If I do ioscan those disks are CLAIMED, but if I run cmviewcl -v it will show UNKNOWN. Its a pitty I'm not in the office now but tomorrow I'll send you attachments of o/p of those comands.

F.R.
melvyn burnard
Honored Contributor

Re: serviceguard cluster

well the physical disks may respond, but they appear NOT to be marked as Cluster Lock disks.
You may need to halt the cluster, activate the Cluster Lock VG(s), and then re-apply the cluster lock bits using cmapplyconf.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
NDO
Super Advisor

Re: serviceguard cluster

Hi!

I'll comeback to you all tomorrow, on this side of the world its already dark 20:30PM local time, and I've already left the office.
I'll try tomorrow

F.R.
NDO
Super Advisor

Re: serviceguard cluster

Hi

this is the O/P of cmviewcl on one node:
NODE STATUS STATE
dbnode1 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/disk/disk100 unknown

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up LinkAgg0 lan900
PRIMARY up 0/0/6/1/0 lan2
STANDBY up 1/0/1/1/0/6/0 lan6

and that is the O/P on the other:

NODE STATUS STATE
dbnode0 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vglock /dev/disk/disk97 unknown

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up LinkAgg0 lan900
PRIMARY up 0/0/6/1/0 lan2
STANDBY up 1/0/1/1/0/6/0 lan6

I├В┬┤ve got a feeling that this UNKNOWN status comes from the fact that lock disks must be configured in LVM and on this case it was used VERITAS. I might be wrong, correct me if so. As you can see the device filenames are in the new format (DSF v PERSISTENT). But the disks in vg00 are in the usual format /dev/dsk/CxTxDx


F.R.
melvyn burnard
Honored Contributor

Re: serviceguard cluster

>I├Г ├В┬┤ve got a feeling that this UNKNOWN status comes from the fact that lock disks must be configured in LVM and on this case it was used VERITAS.

No, you CANNOT use a VxVM DG disk for Cluster Lock disk, it has to be in an LVM Volume Group.
Other methods are to use a Lock LUN (not in ANY VxVM DG or LVM VG, or a Quorum Server that is a node OUTSIDE the cluster.

>As you can see the device filenames are in the new format (DSF v PERSISTENT). But the disks in vg00 are in the usual format /dev/dsk/CxTxDx

The Legacy and Agile addressing can be mixed and matched, should be no problem.
The UNKNOWN state means that the disks that ar econfigured to be used as the cluster lock disk do NOT have the bits set to indicate this. You need to correct this as per my previous response, or contact your local HP Response Centre and log a call, requesting the unsupported cminitlock utilty
if you do NOT want to take the cluster down.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
NDO
Super Advisor

Re: serviceguard cluster

Hi Melvyn!

I will log a cal with HP, but just one more query: if I do "ioscan -m dsf /dev/dsk/disk97" which is the LOCK disk, it shows me the corresponding "/dev/dsk/CxTxDx" and if I do ioscan for those corresponding disks, they are CLAIMED. So those disks seem to be fine but somehow there is a problem with them!!

F.R.
melvyn burnard
Honored Contributor

Re: serviceguard cluster

the problem is that the cluster lock bit is NOT set, as per the UNKNOWN status. That does NOT mean tha disk/LUN is faulty
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
NDO
Super Advisor

Re: serviceguard cluster

Hi Melvyn!


Thank you for your help, you too Rita, 10 points for both of you.

F.R.

next I├В┬┤m assign points
NDO
Super Advisor

Re: serviceguard cluster

Thanx to all

F.R.