- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- HPE 9000 and HPE e3000 Servers
- >
- Re: rp3440 cluster error
HPE 9000 and HPE e3000 Servers
1748228
Members
4158
Online
108759
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-16-2008 09:30 AM
тАО09-16-2008 09:30 AM
Dear Gurus,
I have a problem that I have tried tirelessly solving but haven't gotten solution yet. Can anyone help me out? This is the problem: I have two rp3440 servers that are in a cluster connected to two msa30 (each msa has 1 hard disk). I came to the office one morning and found out that one of the disks had its light off. I later suspected the disk and inserted a new disk but the new disk light came on for few seconds and went off. I ran vgdisplay from each node but got different results. Attached is the result from the diagnostic that I did. Thanks.
WAD
I have a problem that I have tried tirelessly solving but haven't gotten solution yet. Can anyone help me out? This is the problem: I have two rp3440 servers that are in a cluster connected to two msa30 (each msa has 1 hard disk). I came to the office one morning and found out that one of the disks had its light off. I later suspected the disk and inserted a new disk but the new disk light came on for few seconds and went off. I ran vgdisplay from each node but got different results. Attached is the result from the diagnostic that I did. Thanks.
WAD
Knowledge is vital but knowledge without understanding is nothing.
Solved! Go to Solution.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-16-2008 02:20 PM
тАО09-16-2008 02:20 PM
Solution
You're obviously using HP-UX and ServiceGuard. Which versions of them?
Too bad you did not show us "vgdisplay -v vgshare" from Node 1, just the shorter "vgdisplay vgshare". The longer version from Node 1 would have important information about the state of the individual physical disks.
The output of "ioscan -fnkCdisk" on both nodes would have been nice to understand the physical disk configuration (to map the /dev/dsk/* paths to actual physical devices).
A good way to get an in-depth view of the cluster's current state would be the "cmviewcl -v" command.
Based on the output of "vgdisplay vgshare", your cluster volume group "vgshare" has indeed lost a disk: Act PV is 1 while Cur PV is 2.
Both the vgdisplay command on Node 1 and the cluster daemon (cmcld) are issuing severe warnings about /dev/dsk/c6t0d0. This is to be expected, *IF* this is the disk that failed.
Looks like your vgshare volume group was mirrored using MirrorDisk/UX, so your data is probably safe.
(MirrorDisk does not auto-recover because it does not want to second-guess the admin's intentions. For example, if you have a disk failure at the time your system is nearly at maximum load, you might, in some situations, wish to delay the resynchronization to off-peak time rather than take an I/O performance hit immediately.)
You should now start the MirrorDisk recovery, using the standard procedure. Refer to HP's very good document "When Good Disks Go Bad":
http://docs.hp.com/en/5991-1236/When_Good_Disks_Go_Bad.pdf
The procedure you want is "Replacing the Disk", Chapter 6. You'll find step-by-step instructions there.
The note on page 19 about "Replacing a LVM Disk in an HP ServiceGuard Cluster Volume Group" refers to volume groups in _shared_ mode. Your /dev/vgshare is in _exclusive_ mode, so the note is not applicable to you. (The "VG Status" line in vgdisplay output says "available, exclusive". In shared mode it would say "available, shared".)
An added complication is that the failed disk is/was used as a cluster lock disk.
The ServiceGuard documentation indicates that the vgcfgrestore command in the standard disk-replacement procedure (in the When Good Disks Go Bad document, see above) will restore the lock disk status automatically. After running the vgcfgrestore command, both nodes should produce a log message within 75 seconds that indicates they've detected the lock disk works again.
See:
Replacing Disks -> Replacing a Lock Disk in the "Managing ServiceGuard" manual:
http://docs.hp.com/en/B3936-90122/ch08s03.html#cegjbiej
If you cannot run the vgcfgrestore command for some reason, the "Managing ServiceGuard" manual says you should see "man cmdisklock" for instructions on recreating the lock.
MK
Too bad you did not show us "vgdisplay -v vgshare" from Node 1, just the shorter "vgdisplay vgshare". The longer version from Node 1 would have important information about the state of the individual physical disks.
The output of "ioscan -fnkCdisk" on both nodes would have been nice to understand the physical disk configuration (to map the /dev/dsk/* paths to actual physical devices).
A good way to get an in-depth view of the cluster's current state would be the "cmviewcl -v" command.
Based on the output of "vgdisplay vgshare", your cluster volume group "vgshare" has indeed lost a disk: Act PV is 1 while Cur PV is 2.
Both the vgdisplay command on Node 1 and the cluster daemon (cmcld) are issuing severe warnings about /dev/dsk/c6t0d0. This is to be expected, *IF* this is the disk that failed.
Looks like your vgshare volume group was mirrored using MirrorDisk/UX, so your data is probably safe.
(MirrorDisk does not auto-recover because it does not want to second-guess the admin's intentions. For example, if you have a disk failure at the time your system is nearly at maximum load, you might, in some situations, wish to delay the resynchronization to off-peak time rather than take an I/O performance hit immediately.)
You should now start the MirrorDisk recovery, using the standard procedure. Refer to HP's very good document "When Good Disks Go Bad":
http://docs.hp.com/en/5991-1236/When_Good_Disks_Go_Bad.pdf
The procedure you want is "Replacing the Disk", Chapter 6. You'll find step-by-step instructions there.
The note on page 19 about "Replacing a LVM Disk in an HP ServiceGuard Cluster Volume Group" refers to volume groups in _shared_ mode. Your /dev/vgshare is in _exclusive_ mode, so the note is not applicable to you. (The "VG Status" line in vgdisplay output says "available, exclusive". In shared mode it would say "available, shared".)
An added complication is that the failed disk is/was used as a cluster lock disk.
The ServiceGuard documentation indicates that the vgcfgrestore command in the standard disk-replacement procedure (in the When Good Disks Go Bad document, see above) will restore the lock disk status automatically. After running the vgcfgrestore command, both nodes should produce a log message within 75 seconds that indicates they've detected the lock disk works again.
See:
Replacing Disks -> Replacing a Lock Disk in the "Managing ServiceGuard" manual:
http://docs.hp.com/en/B3936-90122/ch08s03.html#cegjbiej
If you cannot run the vgcfgrestore command for some reason, the "Managing ServiceGuard" manual says you should see "man cmdisklock" for instructions on recreating the lock.
MK
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-16-2008 06:50 PM
тАО09-16-2008 06:50 PM
Re: rp3440 cluster error
This is to answer to your request.
WAD
WAD
Knowledge is vital but knowledge without understanding is nothing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-16-2008 06:55 PM
тАО09-16-2008 06:55 PM
Re: rp3440 cluster error
Dear Mr. Matti Kurkela:
Firstly, I am so grateful having you in this forum and secondly for your reply which just help me solve my problem. Thanks a lot.
WAD
Firstly, I am so grateful having you in this forum and secondly for your reply which just help me solve my problem. Thanks a lot.
WAD
Knowledge is vital but knowledge without understanding is nothing.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP