ProLiant Servers (ML,DL,SL)
1748074 Members
5432 Online
108758 Solutions
New Discussion юеВ

Problem with RAID 5 Array

 
Ariel Filotti
New Member

Problem with RAID 5 Array

The server is a DL380 G4 with Smart Array 6i disk controller. There is a RAID 5 array with 4 disks, 300GB each.
The other day, a red light starts blinking on one disk. So I run an array diagnostic and I get something that says:

SCSI Port 2 Drive ID 2 has exceeded the following threshold(s)
Pred failure errors
SOLUTION: Please replace this drive when conditions permit.

This server is 5 months old, so I thought it was a false alarm, I figured I would pull out the disk and replug it. So, I pulled out the disk with the blinking red light, plugged it back in, and the rebuilding process started.

After some time, I checked with the Array Configuration Utility, and I see this in status messages:

Background parity initialization is currently queued or in progress on logical drive 1 ( RAID 5 in array A). If background parity initialization is queued, it will start when I/O is performed on the drive. When background parity initialization completes, the performance of the logical drive will improve.

I waited some more time, but nothing happened. Then I noticed something really strange. This server houses a 90GB SQL database. The SQL backups were failing, citing an I/O error. I start checking everything, and then I notice that the disk I pulled was number 3, and the one with the errors was number 2!

Now, I'm pretty sure that I pulled the disk with the red light, and disk 2 never blinked a red light, but the thing is that now I can't finish rebuilding the array, the SQL backups are failing, I can't copy the data to another server, and I don't know how to fix this without breaking the array.

I already got a replacement drive, but I can't pull out disk 2 because disk 3 is still rebuilding, and I don't know if it's ever gonna finish...

I have two logical disks, and this is what I get with the ACU CLI:

Smart Array 6i in Slot 0
logicaldrive 1
Size: 43.9 GB
Fault Tolerance: 5
Heads: 255
Sectors per Track: 32
Cylinders: 11294
Stripe Size: 64 KB
Status: Ok
Array Accelerator: Enabled
Has Data On Drive: True
Parity Initialization Status: In Progress
Preferred Controller Chassis Slot: 1
Disk Name: \\.\PhysicalDrive0
Mount Points: C:\ 39 GB, P:\ 4.87 GB



Smart Array 6i in Slot 0
logicaldrive 2
Size: 794 GB
Fault Tolerance: 5
Heads: 255
Sectors per Track: 32
Cylinders: 65535
Stripe Size: 64 KB
Status: Ready for recovery.
Array Accelerator: Enabled
Has Data On Drive: True
Preferred Controller Chassis Slot: 1
Disk Name: \\.\PhysicalDrive1
Mount Points: D:\ 794 GB

Is there any hope of rebuilding the array?
11 REPLIES 11
sandeep_raman
Honored Contributor

Re: Problem with RAID 5 Array

Hello Ariel,

Its not a good practice to pull out a disk from a raid.
If you can schedule a downtime,boot the server from the Firmware Maintenance Cd 7.50
http://h18023.www1.hp.com/support/files/server/us/download/24777.html
and update all the firmwares.
Power-om the server and see if it helps.

SRH
sandeep_raman
Honored Contributor

Re: Problem with RAID 5 Array

Attila Szab├│
Frequent Advisor

Re: Problem with RAID 5 Array

Dear Ariel,

As i see, you have an Array from 4 disks, and there is 2 RAID5 striped on it (~44GB and a 794GB).
If the activity LED (looks like an arrow) is ON/OFF/Flashing, online LED is ON/OFF (not flashing!) and error led is flashing, then "A predictive failure alert has been received for this drive. Replace the drive as soon as possible.". When you restart the server, the RAID controller will tell you, which disk(s) are bad. If there is only one bad disk, then you will loose some of data, but maybe a chkdsk will solve the problem (many times just a file system error occured, when more then one disk pulled out from the RAID, or the RAID completly failed. If the file system is NTFS, the you have a good chance to get back the data).
To see, which disks are online, run the Array config utility, and take a look on "phisical view", you will see, which disk is really a healthy part of the RAID.
There is a way to speed up the rebuild process. Start ACU, and in controller setting, set the rebuild priority to high (if a rebuilding is in progress, maybe a restart needed).
BUT! if its a very important database for the company, then ask an expert company to try to recover your data (at here, Hungary, there is a company (but only one), which can retreive data from disk, even its burned or crashed with very good chance - on very high price. But its much lower then the database rewrite costs.).
The firmware upgrade can help, but in a rebuild state of a RAID controller, it has very high risk.
I hope it help...
Attila Szab├│
Frequent Advisor

Re: Problem with RAID 5 Array

Just a short note, maybe i was not clear. When a disk failed in a RAID (5), you will not loose data, but when you remove a second disk, then the controller disable the full RAID array. After you reinsert the disk (and there is only one bad disk in the array), and restart the server, you have an option to reenable the Array in interim recovery mode (it means, when you insert a replacement disk in the place of bad disk, it will redistribute the array).
The problem is the cache (in operating system side and on controller). If there was any data in the cache, waiting to write out to the disks, when the RAID crashed, that data from cache going to waste. Thats why file system error can occur. Because of NTFS, you have a good chance to recover all the files from disk, which are damaged.
In your situation, i think the RAID array not disabled, but there was some problems, and data in the array damaged.
Simple run chkdsk (first not in repair mode!), to check for file system errors, but because file system repair can delete files, make sure, you have a not so old backup before try to repair!
Ariel Filotti
New Member

Re: Problem with RAID 5 Array

I ran CHKDSK /R last night, and it found and fixed a bad sector (on the SQL database). Now it tried to back it up and it was successful!
Now I'm waiting for the array to rebuild so I can remove the disk with the bad sector (#2), and replace it.
The first part of the rebuilding is done, but now it doing the parity initialization. I'll keep the thread open for now, until that is finished, does anyone know how much time can I expect it to take, on a production server with big disks? The first stage of rebuilding had a progress bar, but this parity stage doesn't.
Attila Szab├│
Frequent Advisor

Re: Problem with RAID 5 Array

It can takes a half day if the rebuilding is set to low priority, and there is disk activity, sometimes it can take more... its hard to calculate.. :)
sandeep_raman
Honored Contributor

Re: Problem with RAID 5 Array

Time Required for a Rebuild
The time required for a rebuild varies considerably, depending on several factors:
├в ┬в The priority that the rebuild is given over normal I/O operations (you can
change the priority setting by using ACU)
├в ┬в The amount of I/O activity during the rebuild operation
├в ┬в The rotational speed of the hard drives
├в ┬в The availability of drive cache
├в ┬в The brand, model, and age of the drives
├в ┬в The amount of unused capacity on the drives
├в ┬в The number of drives in the array (for RAID 5 and RAID ADG)
Allow approximately 15 minutes per gigabyte for the rebuild process to be
completed. This figure is conservative, and newer drive models usually require
less time to rebuild.
System performance is affected during the rebuild, and the system is unprotected
against further drive failure until the rebuild has finished. Therefore, replace
drives during periods of low activity when possible.
CAUTION: If the Online LED of the replacement drive stops
blinking and the amber Fault LED glows, or if other drive LEDs in the
array go out, the replacement drive has failed and is producing
unrecoverable disk errors. Remove and replace the failed replacement
drive.
When automatic data recovery has finished, the Online LED of the replacement
drive stops blinking and begins to glow steadily.

SRH
Ariel Filotti
New Member

Re: Problem with RAID 5 Array

That means I should wait at least 840 x 15 minutes = 8 days and 18 hours before worrying... :)

As it is now, three of the drives have the green cylinder on, the one that is rebuilding has the green cylinder off, and the activity green arrows are blinking randomly across all four drives. If I understand correctly, only after the green cylinder lits up on the rebuilding drive is the array fault tolerant again.
NMory
Respected Contributor

Re: Problem with RAID 5 Array

Ariel:

I suggest you to wait that time, if it's still the same then, call HP, but before call them run and ADU report on that array, because they will ask for it, and be ready to use HP Insight Diagnostics also.

LN