- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: MD Raid system VG crashes with more than 1 md ...
Operating System - Linux
1752600
Members
4776
Online
108788
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-26-2009 04:36 AM
тАО06-26-2009 04:36 AM
MD Raid system VG crashes with more than 1 md device
NOTE: This is a standard Redhat 5.2 build on a DL580
I started by creating a "system" LVM VG (containing root and swap) that contains a number of MD raid devices, as per this thread:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1349124
But what I've found is that after the VG is extended to contain the extra md devices, the system crashes on reboot with this error:
Red Hat nash version 5.1.19.6 starting
Reading all physical volumes. This may take a while...
Couldn't find device with uuid 'xXG4FE-rq9K-IZvZ-uxQx-u2fv-FgWq-0AeI2T'.
Couldn't find all physical volumes for the volume group system.
See the attached screenshot for the full set of messages. It's just fine when it's just the one md device (md1).
The UUID it's complaining about is the UUID for the next md device in the list, md2:
--- Physical volumes ---
PV Name /dev/md1
PV UUID jPe1pE-MAzL-dn0g-N6gd-PX0s-Y7wO-dLAIkf
PV Status allocatable
Total PE / Free PE 4364 / 2252
PV Name /dev/md2
PV UUID xXG4FE-rq9K-IZvZ-uxQx-u2fv-FgWq-0AeI2T
PV Status allocatable
Total PE / Free PE 4374 / 4374
I had a look at the nash init script in the initrd image and sure enough it was only starting the one md device:
echo Scanning and configuring dmraid supported devices
raidautorun /dev/md1
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
So I remade initrd, forcing it to probe for raid volumes and that on the surface appeared to improve things.
# mkinitrd -f --force-raid-probe /boot/initrd-2.6.18-92.el5.img 2.6.18-92.el5
The new initrd image now contains an init nash script updated with all the md devices:
echo Scanning and configuring dmraid supported devices
raidautorun /dev/md1
raidautorun /dev/md2
raidautorun /dev/md3
raidautorun /dev/md4
raidautorun /dev/md5
raidautorun /dev/md6
raidautorun /dev/md7
raidautorun /dev/md8
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
I thought I had it licked at that point, but the system still crashes in this configuration with the same error.
What am I missing? Could this be a bug? If raidautorun successfully assembles the md devices then the lvm vgscan should pick them up surely.
There's nothing wrong with the VG itself. I can boot using a SystemRescueCD iso and assemble the system VG with out any issue:
# mdadm -Esb > /etc/mdadm.conf
# mdadm --assemble --scan
# vgscan
# vgchange -a y system
# vgdisplay -v
The "root" LV is mountable at that point and everything looks good. vgreducing the VG at this point to just md1 gets me a bootable system again.
Rgds,
John
I started by creating a "system" LVM VG (containing root and swap) that contains a number of MD raid devices, as per this thread:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1349124
But what I've found is that after the VG is extended to contain the extra md devices, the system crashes on reboot with this error:
Red Hat nash version 5.1.19.6 starting
Reading all physical volumes. This may take a while...
Couldn't find device with uuid 'xXG4FE-rq9K-IZvZ-uxQx-u2fv-FgWq-0AeI2T'.
Couldn't find all physical volumes for the volume group system.
See the attached screenshot for the full set of messages. It's just fine when it's just the one md device (md1).
The UUID it's complaining about is the UUID for the next md device in the list, md2:
--- Physical volumes ---
PV Name /dev/md1
PV UUID jPe1pE-MAzL-dn0g-N6gd-PX0s-Y7wO-dLAIkf
PV Status allocatable
Total PE / Free PE 4364 / 2252
PV Name /dev/md2
PV UUID xXG4FE-rq9K-IZvZ-uxQx-u2fv-FgWq-0AeI2T
PV Status allocatable
Total PE / Free PE 4374 / 4374
I had a look at the nash init script in the initrd image and sure enough it was only starting the one md device:
echo Scanning and configuring dmraid supported devices
raidautorun /dev/md1
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
So I remade initrd, forcing it to probe for raid volumes and that on the surface appeared to improve things.
# mkinitrd -f --force-raid-probe /boot/initrd-2.6.18-92.el5.img 2.6.18-92.el5
The new initrd image now contains an init nash script updated with all the md devices:
echo Scanning and configuring dmraid supported devices
raidautorun /dev/md1
raidautorun /dev/md2
raidautorun /dev/md3
raidautorun /dev/md4
raidautorun /dev/md5
raidautorun /dev/md6
raidautorun /dev/md7
raidautorun /dev/md8
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
I thought I had it licked at that point, but the system still crashes in this configuration with the same error.
What am I missing? Could this be a bug? If raidautorun successfully assembles the md devices then the lvm vgscan should pick them up surely.
There's nothing wrong with the VG itself. I can boot using a SystemRescueCD iso and assemble the system VG with out any issue:
# mdadm -Esb > /etc/mdadm.conf
# mdadm --assemble --scan
# vgscan
# vgchange -a y system
# vgdisplay -v
The "root" LV is mountable at that point and everything looks good. vgreducing the VG at this point to just md1 gets me a bootable system again.
Rgds,
John
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-26-2009 06:44 AM
тАО06-26-2009 06:44 AM
Re: MD Raid system VG crashes with more than 1 md device
Shalom,
DL580 systems generally ship with a hardware raid system that is a better performance option than using software raid.
I suspect there is a problem with one of the disks.
Take a look at dmesg output and run some dd tests to validate the disks.
SEP
DL580 systems generally ship with a hardware raid system that is a better performance option than using software raid.
I suspect there is a problem with one of the disks.
Take a look at dmesg output and run some dd tests to validate the disks.
SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-26-2009 07:14 AM
тАО06-26-2009 07:14 AM
Re: MD Raid system VG crashes with more than 1 md device
There is no problem with any of the disks. dmesg shows nothing, and mdstat shows all raid sets fully populated.
# cat /proc/mdstat
Personalities : [raid1]
md8 : active raid1 cciss/c0d7p1[0] cciss/c1d7p1[1]
143331776 blocks [2/2] [UU]
md7 : active raid1 cciss/c0d6p1[0] cciss/c1d6p1[1]
143331776 blocks [2/2] [UU]
md6 : active raid1 cciss/c0d5p1[0] cciss/c1d5p1[1]
143331776 blocks [2/2] [UU]
md5 : active raid1 cciss/c0d4p1[0] cciss/c1d4p1[1]
143331776 blocks [2/2] [UU]
md4 : active raid1 cciss/c0d3p1[0] cciss/c1d3p1[1]
143331776 blocks [2/2] [UU]
md3 : active raid1 cciss/c0d2p1[0] cciss/c1d2p1[1]
143331776 blocks [2/2] [UU]
md2 : active raid1 cciss/c0d1p1[0] cciss/c1d1p1[1]
143331776 blocks [2/2] [UU]
md0 : active raid1 cciss/c1d0p1[1] cciss/c0d0p1[0]
305088 blocks [2/2] [UU]
md1 : active raid1 cciss/c1d0p2[1] cciss/c0d0p2[0]
143026624 blocks [2/2] [UU]
unused devices:
Also about hardware RAID: this system is part of a highly available solution. Database disks will live on an EVA and be mirrored. Dual EVA controllers, dual FC paths via dual FC switches to dual FC HBAs. Dual NICs for 3 networks, and 2 DSL580s to form a dual Oracle 11g RAC cluster. Dual clusters too: one Live and one Data Guard cluster elsewhere on the campus.
So I'm not going to compromise resilience by putting local data (system disk, etc) on a RAID5 disk attached to a single controller. Especially when this customer has had previous experience of RAID controllers failing in other kit. Also, RAID10 volumes win over RAID5 in my view anyway.
# cat /proc/mdstat
Personalities : [raid1]
md8 : active raid1 cciss/c0d7p1[0] cciss/c1d7p1[1]
143331776 blocks [2/2] [UU]
md7 : active raid1 cciss/c0d6p1[0] cciss/c1d6p1[1]
143331776 blocks [2/2] [UU]
md6 : active raid1 cciss/c0d5p1[0] cciss/c1d5p1[1]
143331776 blocks [2/2] [UU]
md5 : active raid1 cciss/c0d4p1[0] cciss/c1d4p1[1]
143331776 blocks [2/2] [UU]
md4 : active raid1 cciss/c0d3p1[0] cciss/c1d3p1[1]
143331776 blocks [2/2] [UU]
md3 : active raid1 cciss/c0d2p1[0] cciss/c1d2p1[1]
143331776 blocks [2/2] [UU]
md2 : active raid1 cciss/c0d1p1[0] cciss/c1d1p1[1]
143331776 blocks [2/2] [UU]
md0 : active raid1 cciss/c1d0p1[1] cciss/c0d0p1[0]
305088 blocks [2/2] [UU]
md1 : active raid1 cciss/c1d0p2[1] cciss/c0d0p2[0]
143026624 blocks [2/2] [UU]
unused devices:
Also about hardware RAID: this system is part of a highly available solution. Database disks will live on an EVA and be mirrored. Dual EVA controllers, dual FC paths via dual FC switches to dual FC HBAs. Dual NICs for 3 networks, and 2 DSL580s to form a dual Oracle 11g RAC cluster. Dual clusters too: one Live and one Data Guard cluster elsewhere on the campus.
So I'm not going to compromise resilience by putting local data (system disk, etc) on a RAID5 disk attached to a single controller. Especially when this customer has had previous experience of RAID controllers failing in other kit. Also, RAID10 volumes win over RAID5 in my view anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2009 03:54 AM
тАО06-29-2009 03:54 AM
Re: MD Raid system VG crashes with more than 1 md device
I've decided to side step this issue completely and just put the O/S volume group into a single md raid device and treat the local data disks separately in their own VG.
Clearly a multi MD system VG is not a configuration many people are familiar with, and I don't want to be "out there" on systems that need to be solid and stable.
Rgds,
John
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP