HPE 9000 and HPE e3000 Servers
1748228 Members
4158 Online
108759 Solutions
New Discussion юеВ

Re: rp3440 cluster error

 
SOLVED
Go to solution

rp3440 cluster error

Dear Gurus,

I have a problem that I have tried tirelessly solving but haven't gotten solution yet. Can anyone help me out? This is the problem: I have two rp3440 servers that are in a cluster connected to two msa30 (each msa has 1 hard disk). I came to the office one morning and found out that one of the disks had its light off. I later suspected the disk and inserted a new disk but the new disk light came on for few seconds and went off. I ran vgdisplay from each node but got different results. Attached is the result from the diagnostic that I did. Thanks.

WAD
Knowledge is vital but knowledge without understanding is nothing.
3 REPLIES 3
Matti_Kurkela
Honored Contributor
Solution

Re: rp3440 cluster error

You're obviously using HP-UX and ServiceGuard. Which versions of them?

Too bad you did not show us "vgdisplay -v vgshare" from Node 1, just the shorter "vgdisplay vgshare". The longer version from Node 1 would have important information about the state of the individual physical disks.

The output of "ioscan -fnkCdisk" on both nodes would have been nice to understand the physical disk configuration (to map the /dev/dsk/* paths to actual physical devices).

A good way to get an in-depth view of the cluster's current state would be the "cmviewcl -v" command.

Based on the output of "vgdisplay vgshare", your cluster volume group "vgshare" has indeed lost a disk: Act PV is 1 while Cur PV is 2.

Both the vgdisplay command on Node 1 and the cluster daemon (cmcld) are issuing severe warnings about /dev/dsk/c6t0d0. This is to be expected, *IF* this is the disk that failed.

Looks like your vgshare volume group was mirrored using MirrorDisk/UX, so your data is probably safe.

(MirrorDisk does not auto-recover because it does not want to second-guess the admin's intentions. For example, if you have a disk failure at the time your system is nearly at maximum load, you might, in some situations, wish to delay the resynchronization to off-peak time rather than take an I/O performance hit immediately.)

You should now start the MirrorDisk recovery, using the standard procedure. Refer to HP's very good document "When Good Disks Go Bad":

http://docs.hp.com/en/5991-1236/When_Good_Disks_Go_Bad.pdf

The procedure you want is "Replacing the Disk", Chapter 6. You'll find step-by-step instructions there.

The note on page 19 about "Replacing a LVM Disk in an HP ServiceGuard Cluster Volume Group" refers to volume groups in _shared_ mode. Your /dev/vgshare is in _exclusive_ mode, so the note is not applicable to you. (The "VG Status" line in vgdisplay output says "available, exclusive". In shared mode it would say "available, shared".)

An added complication is that the failed disk is/was used as a cluster lock disk.
The ServiceGuard documentation indicates that the vgcfgrestore command in the standard disk-replacement procedure (in the When Good Disks Go Bad document, see above) will restore the lock disk status automatically. After running the vgcfgrestore command, both nodes should produce a log message within 75 seconds that indicates they've detected the lock disk works again.

See:
Replacing Disks -> Replacing a Lock Disk in the "Managing ServiceGuard" manual:
http://docs.hp.com/en/B3936-90122/ch08s03.html#cegjbiej

If you cannot run the vgcfgrestore command for some reason, the "Managing ServiceGuard" manual says you should see "man cmdisklock" for instructions on recreating the lock.

MK
MK

Re: rp3440 cluster error

This is to answer to your request.

WAD
Knowledge is vital but knowledge without understanding is nothing.

Re: rp3440 cluster error

Dear Mr. Matti Kurkela:

Firstly, I am so grateful having you in this forum and secondly for your reply which just help me solve my problem. Thanks a lot.

WAD
Knowledge is vital but knowledge without understanding is nothing.