ProLiant Servers (ML,DL,SL)
1751765 Members
5247 Online
108781 Solutions
New Discussion

Re: ML350 G6. Drive or controller failure?

 
Andy Baguley
Occasional Advisor

ML350 G6. Drive or controller failure?

I have an ML350 G6 with p410i, Smart Array Advanced Pack (SAAP), and 6x 146GB hot swap in Raid 6.

 

It runs SBS 2008, on 4 partitions, and has been OK for 3.5yrs.

 

I had occasion to recover a file from the fourth (client data) F: partition, and SBS Backup / Recovery was not showing this drive in recovery mode.

 

Backup history showed the F: dive had not been backed up for about 3 weeks. It was showing on earlier backups. The daily e-mail continued to show backup OK though!!!

 

Looking into SBS backup it showed an I/O error on the F: drive, in the backup history.

 

Visual inspection shows drives 5 and 6 have perm red lights! All was green about a month previous, as I remember checking. Drives are populated from left to right on the lower row of caddies.

 

Google suggests it could indicate either drive failure, or lost connection.

 

Strange that two drives have failed within such a short time span, or am I looking at a controller failure?

 

What diagnostics should I download and run to identify the problem?

 

Thanks 

7 REPLIES 7
Suman_1978
HPE Pro

Re: ML350 G6. Drive or controller failure?

Hi,

 

It could be the drives itself or controller or even the drive backplane.

But its difficult to predict without performing any diagnostics.

 

Have you tried ACU or Insight Diagnostics from HP, if not, download these and perform the diagnostics.

Once you install ACU, choose Diagnostics to perform array diagnostics.

Once you install Insight Diagnostics, choose Maintenance tab and perform test on the drive or controllers if listed.

 

 

Thank You!
I am a HP employee
__________________________________________
Was the post useful? Click on the white KUDOS! Star.


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Andy Baguley
Occasional Advisor

Re: ML350 G6. Drive or controller failure?

Thanks.

 

I've run ACU diagnostics, and the log is attached.

 

Would it be possible for you to please take a look and maybe advise further?

 

Have also installed HP Insight Diagnostics Online Edition for Windows x64 Editions, as per your link.

 

Running that opens IE pointed at https://localhost:2381/hpdiags/frontend2/startup.php, but unfortunately IE cannot display the webpage.

 

Thanks and Regards

 

Suman_1978
HPE Pro

Re: ML350 G6. Drive or controller failure?

Hi,

 

According to the ADU Report:

 

4 hard drives under Internal Drive Cage at Port 1I : Box 1

2 hard drives under Internal Drive Cage at Port 2I : Box 1

 

2 Physical Drives in 2I:1:5 and 2I:1:6 shows as failed.

 

Smart Array P410i in Embedded Slot has firmware version of 2.74 (Jan2010) but at HP has an latest version of 6.40(B)

 

The report does not show any failures for Controller, but its recommended that firmware be updated for P410i.

ADU cannot detect cable or backplane faults.

 

In my opinion, replace both the drives, rebuild, monitor, if it fails again, try swapping the SAS cables and or drive backplane.

 

 

Thank You!
I am a HP employee.

 


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

Andy Baguley
Occasional Advisor

Re: ML350 G6. Drive or controller failure?

Thank you for your time in analysing the log.

 

I have two replacement drives on order.

 

Given the fragile state of this array should I replace both at the same time, or one at a time allowing the array to rebuild before the second drive is changed?

 

Should I fully power down to change drives, or merely replace (hot plug) whilst the server is running?

 

Presumably I shouldn't need to do anything software or Raid configuration wise, as the P410i is clever enough to detect the drive swap, and sort it all out for me?

 

Your support is appreciated.

Thanks.

Torsten.
Acclaimed Contributor

Re: ML350 G6. Drive or controller failure?

I wonder why both disks on the same channel are failed.
Just to test I would cross-change the 2 internal SAS cables and see what happens. Another test could be to swap the 4 disks on the left to the right and the 2 from the right to the left. Power on and press F8 while the smartarray is initializing to check the status. Maybe there is something wrong with the disk cage or the cable or even with the controller itself.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Andy Baguley
Occasional Advisor

Re: ML350 G6. Drive or controller failure?

Yes my thoughts too.

 

A coincidence that two drives on the second channel fail within a short timespan?

 

I'm reluctant to play too much with it presently, as I don't have a full backup of the F: Drive. Last thing I want is to see the array vaporise because I've swapped things around too much.

 

I now have two replacement new drives, and will swap one over the weekend and see if that port on the p410i, and that backplane are OK. If the new drive goes red then is confirm problem is on the controller side.

 

If someone can clarify the drive change procedure as per my post above it would be appreciated. Or can I just add it buy plugging it into one of the empty slots on that side?

 

I'm treading carefully and cautiously with this one.

 

Regards

Andy Baguley
Occasional Advisor

Re: ML350 G6. Drive or controller failure?

Two new drives have arrived, and been fitted, and the array has re-built.

 

They were actually 300GB not 146GB, but marked externally with the 146GB part number.

 

The SBS backup is still failing however, only on the fourth F: partition, same as originally. Backup of C:, D:, and E: only completes without error. Am running this presently to keep a lid on the Exchange logs. F: backup fails right at the end with an I/O error.

 

A bit of Googling suggests it might be corrupt sectors at the end of the drive, so I tried shrinking it. Following a defrag I can shrink up to a max of 300MB, but if I try any more, it causes the Disk management snap-in to hang. Lots of Event ID 11, The driver detected a controller error on \Device\Harddisk0\DR0, get logged at the same time as it hangs. This partition is 286GB with 209GB in use. 

 

The latest Array diagnostics report is attached.

 

Any pointers on where to go now?

 

Regards