Operating System - HP-UX
1834609 Members
3213 Online
110069 Solutions
New Discussion

serviceguard failover problem

 
CGEYROTH
Frequent Advisor

serviceguard failover problem

We have 2 x L2000 (rp5450) servers in a serviceguard configuration. This is setup as a campus cluster with 1 node + fc60 controller + Sc10 in each data centre. The cluster is setup with 2 cluster lock disks, 1 in each array.

We had a failure on the production server which caused a failover to the node/array in the second centre, however the node in the second centre hung on starting the package, the following errors where in the package control log when it tried to activate the first of 9 serviceguarded volume groups:-

Feb 27 10:02:55 - "hostname": Activating volume group vg01 with exclusive option
.
vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk
/c9t0d0":
The path of the physical volume refers to a device that does not
exist, or is not configured into the kernel.
7 REPLIES 7
melvyn burnard
Honored Contributor

Re: serviceguard failover problem

looks like you need to check the LVM config and disk availability on the second node
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Ravi_8
Honored Contributor

Re: serviceguard failover problem


use vgcfgrestore in second node.

also check for loose connection of the disk (use ioscan -fnC disk)
never give up
Keith Bryson
Honored Contributor

Re: serviceguard failover problem

Hi

Is there any way that you can restore the cluster to the working node and re-vgexport ALL vgs? You can then re-import the LVM configuration to the failover node.

Keith
Arse-cover at all costs
CGEYROTH
Frequent Advisor

Re: serviceguard failover problem

melvyn,

When you say check the LVM config what do you mean? lvm conf files or something else.
When you say check the disk availability do you mean ioscan's, diskinfo? I'm not sure I can do diskinfo on the disk in question unless the disks are activated on that node.

i did run a script written by dietmar konermann that uses the VGDA to identify the device names from each node and this is what it shows:-

***** LVM-VG: 0161901557-0965999312
2 backup:c8t0d0 0161901557-0971456414 0/2/0/0.8.0.4.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
backup:c9t0d0 0161901557-0971456414 0/6/0/0.8.0.5.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
prod:c6t0d0 0161901557-0971456414 0/2/0/0.8.0.4.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
prod:c9t0d0 0161901557-0971456414 0/6/0/0.8.0.5.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)

as you can see both prod and backup share the same device id for one of the routes to the disk, is that normal?
Bharat Katkar
Honored Contributor

Re: serviceguard failover problem

Hi,
LVM configuration issue mean ... VG configuration is not properly replicated over the two node.

Normally what we do is

On node 1:
1. We activate the VG and create one map file using "vgexport -s -v -p -m mapfilename vgname" command

2. Identify the PV used by VG on node one and confirm for e.g. PV1 and PV2 that seen from node 2 belong to that VG.

3. On node 2
# mkdir /dev/vgxx
# mknod /dev/vgxx/group c 64

4. Copy this mapfile to node 2. Then do vgimport "vgimport -s -v -m mapfile vgxx PV1 PV2"

5. Then deactivate VG on node 1 and you can activate it on node 2.

So the VG won't activate if the mapfile is not correct. It basically copies the entire VG structure from node 1 to node 2.

See man vgexport and vgimport.

Hope that helps.
Regards,
You need to know a lot to actually know how little you know
CGEYROTH
Frequent Advisor

Re: serviceguard failover problem

Ok Bharat, what is strange is that this cluster has been built and running for 5 years, so no changes should have taken place. However it increasing looks like a rebuild job.

what is strange is that both servers share the same device id to this disk is that right?

Also when the backup server couldn't see the array in the main data centre (and should therefore have started the package from it's local array) I couldn't do a diskinfo on the disk c9t0d0. however once the link was re-established I could do a diskinfo from the backup server to that disk device!
melvyn burnard
Honored Contributor

Re: serviceguard failover problem

My advice is to check and confirm which disks are in use for each VG, then confirm your lvmtab file, compare the two nodes etc.
Do ioscan to see what hardware is showing as NO_HW, etc.
Also check that your package scripts start the VG's in exclusive mode with no quorum as in the layout you have you will not meet vg quorum requirements if conatct is lost with the other side.
Also confirm where this disc lies, i.e. is it local or remote to the node in question, check your syslogs for any other data.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!