ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

frequently failing logical drives in Smart Arrays under 100% IO utilization

TomHod2
Occasional Contributor

frequently failing logical drives in Smart Arrays under 100% IO utilization

Hi,
We have 10 DL180/DL380 servers with smart array disks with largish RAID5 6-10TB logical disks.

The logical partitions are exposed via an NFS file-system, and nfs client processes load the disks to 100% utilization for about 3 hours at a time.

Quite frequently alerts are generated where the logical array has failed because more than 1 disk has failed in quick succession and needs to be re-seated. (generally if the disk is re-seated it then works OK)

Other times the logical drive has failed, and needs to be re-enabled with no specific disk failure message.

Once the partition is available we can repair the disk and the data is available, sometimes with some lost files.

This is happening on various different servers, each with slightly different hardware. The only common seems to be the cciss driver under heavy load.

Any suggestions on how to proceed with trouble-shooting?