ProLiant Servers (ML,DL,SL)
1753946 Members
7761 Online
108811 Solutions
New Discussion юеВ

Re: SmartArray 5i Data Loss

 
Kay Behrmann_1
Occasional Contributor

SmartArray 5i Data Loss

Dear Forum,

on a DL380G3 we operate a Smartarray5i plus Battery-backed Write-cash. It has worked well for four years, but last weekend four disks failed in one night. We lost data. After a reboot, all disks were fine again, which leads to the suspicion, that not disks, but controller was faulty.

The System Event log starts with a "cpqissm" Event 9 (device did not answer within time), then a CPQISSE Event 24683 (SCSI bus fault occurred on Storage Box 0, Port 1 of Array Controller [Embedded]. This may result in a "downshift" in transfer rate for one or more hard drives on the bus.).
Later there was CPQISSE Event 24597 (Physical Drive on SCSI Port 2, ID 0 of Array Controller [Embedded] has failed. Failure Code: 0x0c.) and after a few more 24683, 24597, 24598 there were Events 24600 (drive failed) and 24800 (fatal error on a read/write request on the volume).

The 5i controller was running firmware version 2.74, which should be fairly recent. The Disks were BD14688278/BD07288277/BF14684970 SCSI HotPlug Disks with Firmware HPB4 and HPB5, which also should be acceptably recent.

Does anybody have any ideas what went wrong? The server is now up and running, however, there are again Events of type 24683 and I suspect it may fail soon again.

regards,
Kay
6 REPLIES 6
Gary Antonio Benavides
Trusted Contributor

Re: SmartArray 5i Data Loss

Run the Array Diagnostic Utility, save and attached it to this forum, this way we might be able to pin point the issue. It might be the hard drive Backplane also.
If it's not fun, you're not doing it right
MT19
Valued Contributor

Re: SmartArray 5i Data Loss

My votes with that backplane. We've had too many DL380 G3 with identical issues the past year where we updated firmware, replaced controllers, and in the end it was always the SCSI backplane that finally resolved the issue.
Kay Behrmann_1
Occasional Contributor

Re: SmartArray 5i Data Loss

OK, here's the output from ArrayDiagnostics
Regards
Kay
James ~ Happy Dude
Honored Contributor

Re: SmartArray 5i Data Loss

Dear Kay,

I believe its a firmware issue. Boot the server using the latest firmware maintenance CD 8.1. & update all firmware at once.

"SCSI BUS" faults as u mention in the error, are for the Cables. "Its no biggie" unless a potential cable failure. & the "DOWNSHIFT" would mean that the rate of speed at which the data travels will slow down.
This could also be a behavior of an upset Backplane.

Once you have updated the firmware, as suggested by Gary, post the ADU report. We may try look in, to find any other issues.

Regards,

*Battery-backed Write-**CACHE** ;-)
James ~ Happy Dude
Honored Contributor

Re: SmartArray 5i Data Loss

Just looked into the ADU report.

SCSI Port 2 Drive ID 0 is not stamped for monitoring.
Last Failure Reason: 0x14 (Drive removed from hot plug)

update the firmware & a restart should take care of this.(a basic controller re-initialize is needed)
Also this particular HDD has the old firmware.

Surface analysis delay: 15 secs.

Why have you delayed the Surface scan ?

SCSI Port 2, Drive ID 2
SCSI Port 2, Drive ID 3
SCSI Port 2, Drive ID 4
... has the same old firmware as the above & this drive is NOT ready.


SCSI Port 2, Drive ID 5 ... has the OLD firmware & failed to spin up.... while it was attempting to recover from the other HDD failure.

SCSI BUS 1 PARAMETERS:
Inquiry Data Valid: No
Physical Connector: J3 (controller connector attached to drive)

This seems to have issue.

RIS says the Drives went "BAD/NOT present" most of the times.... which i would guess... was not able to communicate with it.

Could change the cables ?

Regards,
Kay Behrmann_1
Occasional Contributor

Re: SmartArray 5i Data Loss

Thanks for the reply. I followed your advice and downloaded the latest Firmware Maintenance CD (Version 8.10). The "Smart Update Manager" reported all Firmware being actual. There are no later Versions.

Our six Disks are all on SCSI Port 2, i.e. we only use one SCSI-Bus on our Controller. For this "simpex SCSI installation", Port 2 is the default installation, so Port 1 is left empty. This explains Why SCSI Bus 1 reports "no valid data".

How do you modify the "Surface analysis delay" ? I have never come across this feature and have no idea why it is set to 15 seconds. What is the default value ?

It seems the best guess so far is to replace the backplane and get new internal cabeling. Any comments ?

Kind regards,
Kay