Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

HP AutoRAID hanging with 1 lost disk

Aaron Miller_4
Occasional Visitor

HP AutoRAID hanging with 1 lost disk

Hi

I have an HP 9000 - HP-UX 11. With HP AutoRAID array:

11x9GB drives
Logical capacity 70725MB
Unallocated 0MB
Active spare 8638MB
Data redundancy 16107MB
Total 95515MB

Active spare enabled
Auto-Rebuild enabled
Rebuild Priority high

We lost a disk on the system, and for a while I/O continued with messages in the hparrary logs:

Controller error record for Subsystem 0000001371ED at Sat Jun 5 02:14:04 2004
Controller timestamp = 2808172
Event code = 166
Event code description = Internal SCSI Event
Event count = 1
Component ID = 7
FRU ID = 9
FRU description = Disk in slot B5


Controller error record for Subsystem 0000001371ED at Sat Jun 5 02:14:04 2004
Controller timestamp = 2808172
Event code = 98
Event code description = Read Recovered With RAID 5 Redundancy
Event count = 1
Component ID = 7
FRU ID = 129
FRU description = Reporting Controller
...etc

Then all filesystems associated with the array became unavailable. Applications using the filesystems started to hang. No more messages in the hparray logs.

This state continued for several hours until hardware vendor could arrive and replace the disk. I/O was available shortly after. The auto rebuild completed later OK.

I dont understand why with an active spare and RAID 5 redundancy, the luns were totally unavailable with only 1 lost disk.

Am I missing something here?

Thanks

Aaron
5 REPLIES
Uwe Zessin
Honored Contributor

Re: HP AutoRAID hanging with 1 lost disk

Aaron,
I am not familiar with this type of array, but a defective disk drive can block an entire SCSI bus.
.
Aaron Miller_4
Occasional Visitor

Re: HP AutoRAID hanging with 1 lost disk

Thanks Uwe for the quick response!

Dont suppose you know if this behaviour (hanging SCSI bus due to failed array disk) is documented anywhere?

Aaron
Uwe Zessin
Honored Contributor

Re: HP AutoRAID hanging with 1 lost disk

I am afraid I have NEVER seen this in any documentation, but I have seen it myself, got told it from colleagues and other people.
.
A. Clay Stephenson
Acclaimed Contributor

Re: HP AutoRAID hanging with 1 lost disk

I have had 10's of disks fail in AutoRAID's over the years and I have never experienced your problem. However, Firmware HP60 does address a problem which sounds suspiciously like yours. If you are not already at HP62, I would download the firmware and install it using the "download" command. Man download for details. I suspect that if you had actually pulled the drive the array would have recovered. By the way, you are killing your array performance by using all the space as LUN's.


If it ain't broke, I can fix that.
Yew Lee
Advisor

Re: HP AutoRAID hanging with 1 lost disk

I once encountered a bad disk hanging the whole AUTORAID. I got a little suspicous as the light of the bad disk was on all the while, but none of the rest blinked. A system check indicate that IO access to the AUTORAID hanged. I pulled out the faulty disk and the rest of the disks started blinking again. Then, I grab HP to replace the faulty disk. :)

So I guess the bad disk hang the internal SCSI bus of the AUTORAID. I have also seen Jamaica disk (JBOD) failure which hang the entire FWD SCSI bus as well.
On the move....