Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

RA4100 disk failure

Martin Bedford
Occasional Visitor

RA4100 disk failure

Hello,

Please see a problem a collegue of mine recently experienced:

'One of our clusters has several Raid 1 & Raid 5 arrays connected to a Raid
Array 4100.

One of the disks in the Raid 5 array was reporting a PFA Alert. The stats
on the drive confirmed this with 2892 Hard Read errors and 1 Recovered write
error.

The event log on the NT Cluster reported (repeatedly) "The device,
\Device\ScsiPort3, did not respond within the timeout period. "

However, before we could replace the disk, the oracle database on the
cluster reported and OS error 21 and took datafile 67 offline. The cluster
service also report "Cluster disk resource 'Disk L' did not respond to a
SCSI inquiry command." and "Cluster resource 'Disk L' failed. "

From my view I would suggest that due to the disk errors, NT has failed to
access the raid array for a short period, in which time oracle has timed out
and failed.

The Oracle error is upsetting as we have had to do media recovery on a 150Gb
Oracle database.

We replaced the 36Gb disk with a new one, the array has rebuilt OK with no
apparent data loss.'

This may be a possibility of a firmware issue that has been fixed in a later revision, or not.

Can anybody please comment?

Regards

Martin