HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- Re: Single drive failure causes logical RAID5 fail...
ProLiant Servers (ML,DL,SL)
1827293
Members
1877
Online
109717
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2008 11:17 AM
02-06-2008 11:17 AM
Single drive failure causes logical RAID5 failure
Unfortunately server is not under service, so I thought I would post to see if there are any thoughts or suggestions.
The question in short:
Is it possible to convince/force an SA6400 to attempt to boot using RAID set members that it thinks are bad?
Server: DL380 G3
Cntrl: SA6400
Phys. Drives: 6x146GB
Logical Drvs: 5-drive RAID5 (ID 0-4),
1 hot spare (ID 5)
It appears that drive ID 3 failed and caused some error on the SCSI bus that confused the controller or corrupted info on some of the other drives.
Server was running fine until reboot attempt following qtly MS patch installation. No system errors ever observed on drive array. The server never came back up after Windows restart. I was not physically watching the box during the reboot, so it's unknown what, if any, errors displayed during shutdown & initial reboot attempt.
After reboot, the lone logical drive status was failed. In fact, with all drives connected, the SA6400 showed 0 logical drives available. With the apparently failed drive ID 3 removed, the SA6400 showed the other drives as present -- ID 2 was labeled OK, but 0,1,& 4 requiring replacement.
We've already restored the data to another box but thought this might be a valuable learning experience.
Regards
The question in short:
Is it possible to convince/force an SA6400 to attempt to boot using RAID set members that it thinks are bad?
Server: DL380 G3
Cntrl: SA6400
Phys. Drives: 6x146GB
Logical Drvs: 5-drive RAID5 (ID 0-4),
1 hot spare (ID 5)
It appears that drive ID 3 failed and caused some error on the SCSI bus that confused the controller or corrupted info on some of the other drives.
Server was running fine until reboot attempt following qtly MS patch installation. No system errors ever observed on drive array. The server never came back up after Windows restart. I was not physically watching the box during the reboot, so it's unknown what, if any, errors displayed during shutdown & initial reboot attempt.
After reboot, the lone logical drive status was failed. In fact, with all drives connected, the SA6400 showed 0 logical drives available. With the apparently failed drive ID 3 removed, the SA6400 showed the other drives as present -- ID 2 was labeled OK, but 0,1,& 4 requiring replacement.
We've already restored the data to another box but thought this might be a valuable learning experience.
Regards
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2008 04:03 AM
09-29-2008 04:03 AM
Re: Single drive failure causes logical RAID5 failure
Hi Michael,
I wonder why the online spare didnt kick in?
Unless it happened like this...
ID3 failed so ID5 kicked in
Another ID failed while ID5 was rebuilding or perhaps when you rebooted...
RAID5 so you cant lose 2 drives
When its rebooted (and you weren't watching) the array controller has flagged up that horrible F1 or F2 prompt (i've killed someones server by accident not fully understanding this!)
And it DEFAULTS to F2 within 30 secs, which is "Fail logical drives" or something similar. Which I wish they would change!
(F1 is "Continue with logical drives disabled")
Not much help Im afraid, but it could be an explanation
Thanks
Mark...
---------------------------------------------------------------------------------
Please click the white Kudos star to the left if this post is helpful :)
I wonder why the online spare didnt kick in?
Unless it happened like this...
ID3 failed so ID5 kicked in
Another ID failed while ID5 was rebuilding or perhaps when you rebooted...
RAID5 so you cant lose 2 drives
When its rebooted (and you weren't watching) the array controller has flagged up that horrible F1 or F2 prompt (i've killed someones server by accident not fully understanding this!)
And it DEFAULTS to F2 within 30 secs, which is "Fail logical drives" or something similar. Which I wish they would change!
(F1 is "Continue with logical drives disabled")
Not much help Im afraid, but it could be an explanation
Thanks
Mark...
---------------------------------------------------------------------------------
Please click the white Kudos star to the left if this post is helpful :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-30-2008 04:33 AM
09-30-2008 04:33 AM
Re: Single drive failure causes logical RAID5 failure
The old SCSI bus is a parallel bus, and if very unlucky a drive could caurse disaster.
You could have a bad backplane, SCSI cable or array controller.
If you get a multible drive failure, the Smart Array controller (SA) will stop!
You can power down the box, and reseat the drives. If that brings them back, the SA will prompt you, saying that drives that previous marked bad, appears to be back on-line, and ask if you want to re-enable the LUN(s).
If it succed, then you can replace drives one by one, as quickly as possible.
If you got a bad backplane, cable or SA, then those can be replaced.
If you really got multible disk failure, then the is only your back.
When you have got hard drives running for many years, a power cycle may caurse drives and power supplies to fai.
>Mark I think you may have misread the prompt.
If you reboot or repower a SA with a failed or missing drive:
It will prompt you to press F1 or F2, and F2 is default within 30 sec.
F1 = disable affected LUN's: It will only be disbled untill the next boot, and you will be prompted again.
F2 = Remain in interim recovery mode: The LUN's will be active/running, and as soon as you install a new drive, rebuild will start automaticly.
If you incorrectly select F1, and you can't boot, the simply reboot the server and selct F2.
If you select F1, and its only data LUN's thats disabled, then you can enable it on-line, using the ACU.
You could have a bad backplane, SCSI cable or array controller.
If you get a multible drive failure, the Smart Array controller (SA) will stop!
You can power down the box, and reseat the drives. If that brings them back, the SA will prompt you, saying that drives that previous marked bad, appears to be back on-line, and ask if you want to re-enable the LUN(s).
If it succed, then you can replace drives one by one, as quickly as possible.
If you got a bad backplane, cable or SA, then those can be replaced.
If you really got multible disk failure, then the is only your back.
When you have got hard drives running for many years, a power cycle may caurse drives and power supplies to fai.
>Mark I think you may have misread the prompt.
If you reboot or repower a SA with a failed or missing drive:
It will prompt you to press F1 or F2, and F2 is default within 30 sec.
F1 = disable affected LUN's: It will only be disbled untill the next boot, and you will be prompted again.
F2 = Remain in interim recovery mode: The LUN's will be active/running, and as soon as you install a new drive, rebuild will start automaticly.
If you incorrectly select F1, and you can't boot, the simply reboot the server and selct F2.
If you select F1, and its only data LUN's thats disabled, then you can enable it on-line, using the ACU.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Support
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP