HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Drive failure??

 
murthybruit
Visitor

Drive failure??

Hi,

 

I inhertited a Proliant DL180 G6, built by my predecessor which is our Veeam backup server. I believe the server was built with 3 x 2TB drives as supplied, to which 4 x 2TB drives were added at a later date. The server OS shows a spanned drive covering a 3525GB & a 5588GB drive, presumably two seperate RAID arrays?

 

For the last few weeks, I have noticed that the bottom left drive, when looking at the server front, had an amber light, and have been meening to investigate what that meant. However after a windows update that required a reboot yesterday, the server returned a message from the Smart Array P410, bay 1 error - Unrecoverable Media Errors Detected on Drives ...Errors will be fixed automatically when these sectors are overwritten. Backup and restore recommended. Or words to that effect, I wish I'd taken a photo on reflection. And the message wasn't exactly the same on every boot attempt.

 
Following which the server hung on a black screen with a flashing cursor.
 
On removing and reinserting the drive that had an amber light, it appeared to go green, but then the drive above it had no light (sure it did before!). However this time the OS booted and finished it's 50,000 registry checks, then rebooted. Back to a flashing cursor.
 
Then removed the first drive (top left), re-inserted and this time the OS booted, and has remained up all day, the top left drive appears to be dead, but the server is operational:
 
My question is, is that drive dead? Can I replace it and expect the RAID to rebuild? The backups the night before were still running the next day, which fits with the notion that you can expect poor write speed when a drive fails in RAID5. What are my replacement options?
 
Thanks in advance
 
Martyn

 

5 REPLIES
Meph
Occasional Visitor

Re: Drive failure??

Hi Martyn,

 

As this is a windows machine i would recommend installing the HP Array Configuration Utility. That should tell you alot about the physical and logical drives, also which have failed.

 

I know that i run RAID 5 configurations and when a drive failed the other day it had an amber light (iirc) and the rest of the drives in the array had blue status lights so it was easy to see what was going on. When i removed the bad drive the other blue lights went out however.

 

If there are 4x drives, they may be in a RAID 10, 5 or 6. If a drive fails on RAID5 or 6 there will be no slowdown in performance, the slow down occurs when you insert the new drive and the raid begins rebuilding. Reading from RAID 10 with a failed drive may be slower as it is reading from the one drive not 2 combined.

 

If you replace the drive with one of the same size i believe that the RAID will just rebuild, it did in my case.

 

Keep in mind my setup has a P410 raid card, running firmware 6.64

 

Install the Array Configuration Utility, you will get a lot of information there.

 

-Phil

murthybruit
Visitor

Re: Drive failure??

Hi Phil,

 

Thank you for your extremely helpful response, I wasn't on site yesterday, or I would have replied sooner.

 

Ok, I've followed your advice and installed the utility which has been a big help, but has thrown some confusion over what I'm actually seeing on the front of the server....

 

Following my first post, the drive lights reverted to the state they were all in before the upset, all flashing green bar the bottom left drive that is a solid amber.

 

However the Array Config Utility is reporting the following:

 

Physical Drive (2 TB SAS) 1I:1:2 Critical The physical drive has failed.


Logical Drive 1 Warning Logical drive state: The current array controller has a bad or missing drive. This logical drive is operating in interim recovery mode with reduced performance and a further physical drive failure may result in data loss depending on the fault tolerance.


Logical Drive 2 Warning Logical drive state: The current array controller has a bad or missing drive. This logical drive is operating in interim recovery mode with reduced performance and a further physical drive failure may result in data loss depending on the fault tolerance.

 

Where by Drive 1I:1:2 sits here:

 

Internal Drive Cage at Port 1I : Box 1

Drive Cage on Port 1I

Physical Drive (2 TB SAS) 1I:1:1

Physical Drive (2 TB SAS) 1I:1:2 ******
Physical Drive (2 TB SAS) 1I:1:3
Physical Drive (2 TB SAS) 1I:1:4

Internal Drive Cage at Port 2I : Box 1

Drive Cage on Port 2I

Physical Drive (2 TB SAS) 2I:1:5
Physical Drive (2 TB SAS) 2I:1:6
Physical Drive (2 TB SAS) 2I:1:7

 

This would suggest that looking at the fron of my server, the second from top left should be the failed drive? Or am I misunderstanding the numbers?

 

On refelction, I suppose if they are numbered top down, top down, then the bottom left would be1I:1:2, which would fit with that drive showing the amber light?

 

Have I just answered my own question!?

 

Cheers,

Martyn

murthybruit
Visitor

Re: Drive failure??

I've attached the utility report if that's any help!

Meph
Occasional Visitor

Re: Drive failure??

Hi Martyn,

 

Yes you have answered your own question. The bays run top to bottom left to right

 

For some reason not all of my 180s are alike with the drive setup. My main server has vents in the top row and the next two rows run top to bottom, left to right starting with 1. First 4 are PORT 1I and second are PORT2I

 

So anyway yes i believe your bottom leftmost is drive 2, the drive that has failed. It seems to be part of two logical drives.

 

This log has a lot of information so here are some recommendations:

-Update the firmware on your controller. It seems to be a 3.00 and the latest is over 6.6.

-Also your array is set to cache 100% read and 0% write. Remember that most raids have a write penalty so i would be moving that to more 50% - 50%

-Is the battery on the controller flat? it doesnt seem to show one present.

murthybruit
Visitor

Re: Drive failure??

Hi Phil,

 

Thanks again for your advice, is the firmware version something I can upgrade through Windows? I have no idea if the battery is flat, how would I tell?

 

I have spoken to a vendor with regards to getting a replacement drive, only to be told they are end of life and that they can't get hold of them. Is there an alternative drive that I can retro fit? Will any HP 2TB 7.2K  hard drive fit? For example: http://www.amazon.co.uk/695507-002-SC-2TB-7-2K-LFF-SAS/dp/B00PY6TQQ4

 

Thanks in advance

Martyn