ProLiant Servers (ML,DL,SL)
1753797 Members
7331 Online
108799 Solutions
New Discussion юеВ

Re: Strange array accelerator issue with a Proliant ML530 G2

 
E. Tang
Occasional Contributor

Strange array accelerator issue with a Proliant ML530 G2

I support a remote Proliant ML530 G2. Recently, this server reported a predictive failure in one of the array hard drives. The array is RAID-5 consisting of 8 72GB hard drives connected to a SmartArray 6400 controller. All of the drives are in the internal storage cage.

We use a third-party for support since the server is out of the factory warranty period. They replaced the drive and all seemed normal for about 10 minutes.

The array controller reported it was rebuilding for a few minutes, then it stopped and reported that the array was "ready for recovery." A reboot did not change this message.

The support company replaced the drive again, with no change in status. The array reported it was still "ready for recovery." We even tried putting the original predictive failure drive in and that did not change anything either. In both cases, the replaced drive came up green with no errors. The controller just would not rebuild.

Looking at some of the hardware information, it looks like one of the other drives in the array is reporting a lot of read errors, more than the predictive failure drive. But the status of this second drive is green; it's not showing any failures.

Could the read errors on this drive be preventing the controller from rebuilding? If so, what are my options at this point?

The driver and firmware for the SA6400 are current. The individual drive firmwares vary from one to the next.

The support company's suggestion is to delete the array and recreate it, and restore data from backup. This idea is complicated by the fact that the server is backed up over the wire to a tape library in California so the restore time isn't going to be fast. (Plus, we do eternal incremental backups on all of our servers so there isn't a single full backup of all of the data.)

One idea that I discussed with the support company is to remove that second drive with the read errors and replace it. However, the big unanswered question is whether or not the original predictive failure drive is still an active member of the array. If it is, then removing the second drive *should* be equivalent to a situation where the second drive fails on its own while all the others remain green.

I found other incidents in the forums that were pretty much the same as this one, and unfortunately those incidents ended with the array being rebuilt from scratch. I'm hoping someone here might have an idea or input that will help avoid that.

Thanks!
8 REPLIES 8
Michael A. McKenney
Respected Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

Upgrade the firmware on controller, drives, and server first. I have seen this fix the issue. If you still have it, call HP and replace the drives.
Michael A. McKenney
Respected Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

Do you have a spare drive installed? If not, put one in and let it rebuild. If the spare has errors add another spare. It could be the backplane or controller with errors.
E. Tang
Occasional Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

I did send out a spare drive but I think the current status of the array is preventing me from adding it as a spare. There are two arrays in the server; one for the OS and one for data. I can designate a spare for the OS array, but not the data array.

Which server firmware do you think would be required?
Terry Hutchings
Honored Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

I suspect there is another drive in the array which is having problems. The drive is not failed yet, but will soon. Do you have the system management homepage installed? If so, you may be able to see which drive is having a problem by selecting each drive, then looking at the statistics for each. It may require pull the output from the array diagnostic utility also in order to identify the problem drive. This does not occur very often, but when it does my first recommendation is to backup immediately as it will require replacing, at least, two drives in the array to permanently resolve the problem.

Rebuilding the array from scratch will only resolve the problem if you replaced whichever drives may be having a problem.
The truth is out there, but I forgot the URL..
E. Tang
Occasional Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

As I mentioned in my original post, I did go over the individual statistics for each drive and found another drive in the array that was reporting a lot of read errors but was still showing a green indicator.

The speculation is that this drive is preventing the rebuild from happening since a rebuild requires reading data from the drives.

The question now is whether or not the originally failed drive contains any of the array data. If it does, then it *might* be possible to replace the other drive with all the read errors.

Even if this is possible, I know it would be a significant risk.

Our other alternative, which is looking better and better each day, is to get the replacement server online as quickly as possible. Unfortunately this is in a remote location so I have no control over the timetable it takes to do so.
Terry Hutchings
Honored Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

I am not clear I understand. If you pull out a perfectly good drive from a server it will place the array into a degraded state. That drive that you pulled out DOES have array data on it, BUT it is not useful as that data will be destroyed/overwritten when reinserted back into the server.
The truth is out there, but I forgot the URL..
E. Tang
Occasional Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

Here is the sequence of events:

1. Drive at Port 1 Drive 1 reported it was in predictive failure. I opened a ticket with our 3rd-party support vendor to replace it. This occurred 3/18.

2. The vendor replaced the drive and the array controller reported it was rebuilding, but then switched to "ready for recovery" status within a few minutes. This condition persisted after a server reboot the Sunday after the drive was replaced.

3. I reported the status to the vendor and they replaced the drive a second time with another drive on Monday. This resulted in the same outcome noted in item #2.

4. On Tuesday, the original drive from Port 1 Drive 1 was replaced. The support vendor thought it would be best to let the drive fail which they believed would trigger a rebuild when it was replaced.

5. Tuesday afternoon I looked through the SMH and found a lot of read errors on the drive at Port 2 Drive 2. This drive was still showing a green indicator, which seemed odd contrasted to the number of read errors. (Almost 3x as many as the drive at P1D1.)

After item #5, no other disk movement has been performed. The array remains in "ready for recovery" status and I'm discussing options with the remote site and the support vendor.

The current theory is that the drive at P2D2 is reporting enough read errors that the rebuild can't start. However, since there is already another drive in the array that's reporting a predictive failure, it's unclear whether or not this array can be recovered without deleting the array and recreating it. Our backup environment would make a restore a time-consuming chore.

The next plan is to get the replacement server up and running and copy the data over in advance of a planned migration. However, since this site is remote relative to me, I do not know how long they would need to do this. The server is already configured so all they need to do is plug it in. But they have to move the new server into position, and after seeing pictures of the computer room, that looks like a fair amount of work. I don't think there is a local IT staff in that building so I'm at the mercy of the staff schedules out there.
Terry Hutchings
Honored Contributor

Re: Strange array accelerator issue with a Proliant ML530 G2

The drive with the predictive failure should still work (for a while anyway), but now that it has been removed from the server it will need to rebuild onto the drive if it is reinserted.
The truth is out there, but I forgot the URL..