serviceguard failover problem

CGEYROTH · ‎02-27-2005

We have 2 x L2000 (rp5450) servers in a serviceguard configuration. This is setup as a campus cluster with 1 node + fc60 controller + Sc10 in each data centre. The cluster is setup with 2 cluster lock disks, 1 in each array.

We had a failure on the production server which caused a failover to the node/array in the second centre, however the node in the second centre hung on starting the package, the following errors where in the package control log when it tried to activate the first of 9 serviceguarded volume groups:-

Feb 27 10:02:55 - "hostname": Activating volume group vg01 with exclusive option
.
vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk
/c9t0d0":
The path of the physical volume refers to a device that does not
exist, or is not configured into the kernel.

melvyn burnard · ‎02-27-2005

looks like you need to check the LVM config and disk availability on the second node

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Ravi_8 · ‎02-27-2005

use vgcfgrestore in second node.

also check for loose connection of the disk (use ioscan -fnC disk)

never give up

Keith Bryson · ‎02-27-2005

Hi

Is there any way that you can restore the cluster to the working node and re-vgexport ALL vgs? You can then re-import the LVM configuration to the failover node.

Keith

Arse-cover at all costs

CGEYROTH · ‎02-27-2005

melvyn,

When you say check the LVM config what do you mean? lvm conf files or something else.
When you say check the disk availability do you mean ioscan's, diskinfo? I'm not sure I can do diskinfo on the disk in question unless the disks are activated on that node.

i did run a script written by dietmar konermann that uses the VGDA to identify the device names from each node and this is what it shows:-

***** LVM-VG: 0161901557-0965999312
2 backup:c8t0d0 0161901557-0971456414 0/2/0/0.8.0.4.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
backup:c9t0d0 0161901557-0971456414 0/6/0/0.8.0.5.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
prod:c6t0d0 0161901557-0971456414 0/2/0/0.8.0.4.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)
prod:c9t0d0 0161901557-0971456414 0/6/0/0.8.0.5.0.0.0 HP/A5277A (0x0
1/vg01/0161901557-0965999312)

as you can see both prod and backup share the same device id for one of the routes to the disk, is that normal?

Bharat Katkar · ‎02-27-2005

Hi,
LVM configuration issue mean ... VG configuration is not properly replicated over the two node.

Normally what we do is

On node 1:
1. We activate the VG and create one map file using "vgexport -s -v -p -m mapfilename vgname" command

2. Identify the PV used by VG on node one and confirm for e.g. PV1 and PV2 that seen from node 2 belong to that VG.

3. On node 2
# mkdir /dev/vgxx
# mknod /dev/vgxx/group c 64

4. Copy this mapfile to node 2. Then do vgimport "vgimport -s -v -m mapfile vgxx PV1 PV2"

5. Then deactivate VG on node 1 and you can activate it on node 2.

So the VG won't activate if the mapfile is not correct. It basically copies the entire VG structure from node 1 to node 2.

See man vgexport and vgimport.

Hope that helps.
Regards,

You need to know a lot to actually know how little you know

CGEYROTH · ‎02-27-2005

Ok Bharat, what is strange is that this cluster has been built and running for 5 years, so no changes should have taken place. However it increasing looks like a rebuild job.

what is strange is that both servers share the same device id to this disk is that right?

Also when the backup server couldn't see the array in the main data centre (and should therefore have started the package from it's local array) I couldn't do a diskinfo on the disk c9t0d0. however once the link was re-established I could do a diskinfo from the backup server to that disk device!

melvyn burnard · ‎02-27-2005

My advice is to check and confirm which disks are in use for each VG, then confirm your lvmtab file, compare the two nodes etc.
Do ioscan to see what hardware is showing as NO_HW, etc.
Also check that your package scripts start the VG's in exclusive mode with no quorum as in the layout you have you will not meet vg quorum requirements if conatct is lost with the other side.
Also confirm where this disc lies, i.e. is it local or remote to the node in question, check your syslogs for any other data.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

serviceguard failover problem

serviceguard failover problem

Re: serviceguard failover problem

Re: serviceguard failover problem

Re: serviceguard failover problem

Re: serviceguard failover problem

Re: serviceguard failover problem

Re: serviceguard failover problem

Re: serviceguard failover problem