Serviceguard
cancel
Showing results for 
Search instead for 
Did you mean: 

Major VG problems after MC/SG cluster crashed hard

John Rayl
Occasional Advisor

Major VG problems after MC/SG cluster crashed hard

Storage is EVA 6000
Type: HSV200
Version: 6100
Software: CR0EB0xc3p-6100

Two node SGLX cluster RHEL AS 4 SG 11.18. Proliant GL380 G5 2 each 4 GB Qlogic FC cards.

Several 1TB LUNs, three 500 GB.

A day ago, the first node had a hardisk failure (3 drive RAID) that caused the machine to barely boot, with no way to login.

NFS/SMB Fileserver package moved to node two no problems.

While working on getting node one back up, node two crashed.

Of course SGLX cannot start as I did not have node one back up to form a cluster.

For a sanity check I wanted to just activate the VGs, mount the lvols and give the data a once over.

No such luck. No VGs to activate.

pvscan -d output:


Incorrect metadata area header checksum
Incorrect metadata area header checksum
Incorrect metadata area header checksum
PV /dev/cciss/c0d0p2 VG VolGroup00 lvm2 [136.56 GB / 96.00 MB free]
PV /dev/sda1 lvm2 [1023.80 MB]
PV /dev/sdb1 lvm2 [999.99 GB]
PV /dev/sdc1 lvm2 [900.00 GB]
PV /dev/sdd1 lvm2 [350.00 GB]
PV /dev/sde1 lvm2 [499.99 GB]
PV /dev/sdf1 lvm2 [549.99 GB]
8 REPLIES
John Rayl
Occasional Advisor

Re: Major VG problems after MC/SG cluster crashed hard

vgscan output:

Reading all physical volumes. This may take a while...
Incorrect metadata area header checksum
Found volume group "VolGroup00" using metadata type lvm2

No problems with this cluster or EVA for nearly two years then this...

I do have very recent snapshots on the EVA but that is it.

I need to recover this data!
Matti_Kurkela
Honored Contributor

Re: Major VG problems after MC/SG cluster crashed hard

Multiple hardware failures. Ouch.

"Incorrect metadata area header checksum" sounds like the system has crashed at the moment it has been updating LVM structures.

You'll need a LVM metadata consistency check (vgck). The vgscan command will only pick up intact VGs.

What does "vgck -v" report?

For more diagnostics, you might run "vgscan -vvv". The output may be rather large: redirect it to a file.

To check for ServiceGuard's VG lock tags, the command "vgs -o +tags" might be useful too. If the node that was currently running the package has crashed, the lock tag may be still in place: you may have to remove the tag if you want to activate the VG manually.

MK
MK
John Rayl
Occasional Advisor

Re: Major VG problems after MC/SG cluster crashed hard

Wow, thanks for the quick reply Matti!

Here is output from your suggestions:

vgck -v
Finding all volume groups
Incorrect metadata area header checksum
Finding volume group "VolGroup00"

vgs -o +tags
Incorrect metadata area header checksum
VG #PV #LV #SN Attr VSize VFree VG Tags
VolGroup00 1 2 0 wz--n- 136.56G 96.00M

John Rayl
Occasional Advisor

Re: Major VG problems after MC/SG cluster crashed hard

Output from the vgscan -vvv is attached in my message above.
John Rayl
Occasional Advisor

Re: Major VG problems after MC/SG cluster crashed hard

the applicable part to the vgscan -vvv:

Opened /dev/sdb RO
/dev/sdb: size is 2097152000 sectors
/dev/sdb: block size is 4096 bytes
/dev/sdb: Skipping: Partition table signature found
Closed /dev/sdb
/dev/md16: Skipping (sysfs)
Opened /dev/sdb1 RO
/dev/sdb1: size is 2097141102 sectors
Closed /dev/sdb1
/dev/sdb1: size is 2097141102 sectors
Opened /dev/sdb1 RO O_DIRECT
/dev/sdb1: block size is 1024 bytes
Closed /dev/sdb1
Using /dev/sdb1
Opened /dev/sdb1 RO O_DIRECT
/dev/sdb1: block size is 1024 bytes
/dev/sdb1: lvm2 label detected
lvmcache: /dev/sdb1: now orphaned
Closed /dev/sdb1
Opened /dev/sdc RO
/dev/sdc: size is 1887436800 sectors
/dev/sdc: block size is 4096 bytes
/dev/sdc: Skipping: Partition table signature found
Closed /dev/sdc
Opened /dev/sdc1 RO
/dev/sdc1: size is 1887428592 sectors
Closed /dev/sdc1
/dev/sdc1: size is 1887428592 sectors
Opened /dev/sdc1 RO O_DIRECT
/dev/sdc1: block size is 4096 bytes
Closed /dev/sdc1
Using /dev/sdc1
Opened /dev/sdc1 RO O_DIRECT
/dev/sdc1: block size is 4096 bytes
/dev/sdc1: lvm2 label detected
lvmcache: /dev/sdc1: now orphaned
Closed /dev/sdc1
Opened /dev/sdd RO
/dev/sdd: size is 734003200 sectors
/dev/sdd: block size is 4096 bytes
/dev/sdd: Skipping: Partition table signature found
Closed /dev/sdd
Opened /dev/sdd1 RO
/dev/sdd1: size is 733993722 sectors
Closed /dev/sdd1
/dev/sdd1: size is 733993722 sectors
Opened /dev/sdd1 RO O_DIRECT
/dev/sdd1: block size is 1024 bytes
Closed /dev/sdd1
Using /dev/sdd1
Opened /dev/sdd1 RO O_DIRECT
/dev/sdd1: block size is 1024 bytes
/dev/sdd1: lvm2 label detected
lvmcache: /dev/sdd1: now orphaned
Closed /dev/sdd1
Opened /dev/sde RO
/dev/sde: size is 1048576000 sectors
/dev/sde: block size is 4096 bytes
/dev/sde: Skipping: Partition table signature found
Closed /dev/sde
Opened /dev/sde1 RO
/dev/sde1: size is 1048562487 sectors
Closed /dev/sde1
/dev/sde1: size is 1048562487 sectors
Opened /dev/sde1 RO O_DIRECT
/dev/sde1: block size is 512 bytes
Closed /dev/sde1
Using /dev/sde1
Opened /dev/sde1 RO O_DIRECT
/dev/sde1: block size is 512 bytes
/dev/sde1: lvm2 label detected
lvmcache: /dev/sde1: now orphaned
Closed /dev/sde1
Opened /dev/sdf RO
/dev/sdf: size is 1153433600 sectors
/dev/sdf: block size is 4096 bytes
/dev/sdf: Skipping: Partition table signature found
Closed /dev/sdf
Opened /dev/sdf1 RO
/dev/sdf1: size is 1153418742 sectors
Closed /dev/sdf1
/dev/sdf1: size is 1153418742 sectors
Opened /dev/sdf1 RO O_DIRECT
/dev/sdf1: block size is 1024 bytes
Closed /dev/sdf1
Using /dev/sdf1
Opened /dev/sdf1 RO O_DIRECT
/dev/sdf1: block size is 1024 bytes
/dev/sdf1: lvm2 label detected
lvmcache: /dev/sdf1: now orphaned
Closed /dev/sdf1
John Rayl
Occasional Advisor

Re: Major VG problems after MC/SG cluster crashed hard

SWEEEEETT!! I figured it out! I just resurrected THE DEAD!

After I thought the Google well had ran dry, I went back one more time and found this little gem:

pvcreate --uuid "cqH4SD-VrCw-jMsN-GcwH-omCq-ThpE-dO9KmJ" --restorefile /etc/lvm/backup/vg_04 /dev/sdd1

Now this ONLY works if you have vgcfgbackups left ON! So leave it on for times like this when you must carry out digital miracles!

IF you have the vgcfgbackups going on auto then everytime a vgchange occurs a file gets created in /etc/lvm/backups and then moved to /etc/lvm/archives.

Look for the latest one in either spot, look in the file for the id near the /dev/sdxx device file name, under the "physical_volumes" section:

physical_volumes {

pv0 {
id = "cqH4SD-VrCw-jMsN-GcwH-omCq-ThpE-dO9KmJ"
device = "/dev/sdd1" # Hint only

status = ["ALLOCATABLE"]
pe_start = 384
pe_count = 11199 # 349.969 Gigabytes
}
}

So you run(example uuid):
pvcreate --uuid "cqH4SD-VrCw-jMsN-GcwH-omCq-ThpE-dO9KmJ" --restorefile /etc/lvm/backup/vg_04 /dev/sdd1

vgcfgrestore vg_04

Activation tags if you use em in your SGLX cluster: vgchange --addtag machine.name.com vg04

vgchange -a y vg04

fsck /dev/vg04/lvolxxx

mount /dev/vg04/lvolxxx /back/from/the/dead


This is a must have for your bag of digital healing!.

Quick somebody give me some points for figuring this out!
John Rayl
Occasional Advisor

Re: Major VG problems after MC/SG cluster crashed hard

Thanks for looking.
Zinky
Honored Contributor

Re: Major VG problems after MC/SG cluster crashed hard

SWEET too!
Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler