Disk Enclosures

Re: Bad Drives in Smart Array P822

Occasional Visitor

Bad Drives in Smart Array P822

I have been working on this for a while now and can't figure it out.  I have a backup serve in our DR location with the following setup.

DL380p Gen 8 Server
Smart Array P822 Controller (4 External Ports)
6 - D2600 / BK765A External Disk Arrays.
Each Disk Array contains 12 2Tb Hard Drives
I have 2 Logical Drives setup.  Each is a Raid 60 with 36 Drives.
There are actually 33 Drives in the Raid Set and 3 Hot Spares per Logical Drive.  1 Hot spare in each External Array.

Before you ask, Yes I did update every bios, firmware and driver I could find on this server.  All raid controllers are at latest firmware, I did firmware updates on the disks themselves.  I did firmware updates on the enclosures as well.

Now that said, the first of the 2 Raids works perfect.  No issues.
The second one however is a problem.  I used to have 2 identical servers.  One in Prod and one in DR.  Each with 3 External Arrays.  The one in Prod had this same issue and when we retired it we moved the 3 disk arrays to DR hooked them to the sister server there and the problem persists.

Once I start filling that Raid up with backup data it eventually fails.  I noticed that different drives show as failed in the Smart array utilitiy and it isn't always the same drive.  On mutliple occasions I saw that 2 drives had failed.  When either situation happens the drive letter goes offline in Windows.  The logical drive in the controller cannot be re-enabled unless I do a reboot of the server.  Once a reboot completes I can then go into the utilitiy and re-enable the logical drive.  At that point I show no bad drives as well.  However after about 24-48 hours of writing data the same issue occurrs.

Clearly I have some bad drives.  The problem is I can't seem to figure out 2 things.  Which drives specifically are bad and why the Smart Array controller isn't simply marking the drives bad and failing to a hotspare.   Even in the situation where I see the drive is showing bad in the utility it never attempts to fail to a hot spare and instead simply disables the logical drive.  This drive currently has no data on it and I am using a utility to write zeros into all sectors to see what happens. 

Occasional Visitor

Re: Bad Drives in Smart Array P822

Alright, I now have a failure.  I am using a tool called HDScan to write the zeros to the entire disk and at about 14% The array controller reported a bad drive and began the rebuild.

Problem is the bad drive which was reported was just replaced.  Durring previous troubleshooting I noticed at least 1 drive that had failed more than twice so I purchased a new HP drive form our vendor and replaced that bad one.  Yet here we are with the exact same drive showing bad blocks and rebuilding itself.  At least this time it actually did rebuild though.

I paused the erase process and will let the drive rebuild to the hotspare before I continue.  This is going to take a while.