Email Subscription Notifications Suspended Temporarily
We are in the process of making navigation in the Servers and Operating Systems forums simpler and more direct. While doing this, we have to temporarily suspend email notifications for subscriptions. If you are subscribed to one or more discussion boards or blogs in the community, please check them daily to see new content. Notifications will be turned back on in a few days. We apologize for any inconvenience this may cause. Thanks, Warren_Admin
Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

MSA50s failed with high I/O load

Andrew_346
Regular Advisor

MSA50s failed with high I/O load

Had a very strange failure with some MSA50s this weekend.

Here's the configuration:

ML530G2
2x P600
3x MSA50 w/ 10 - 72GB 10k SAS SFF

Changed the arrays from 3 10-disk RAID 10 to 1 20-disk RAID10 left the 1 10-disk.

Started a 40GB file copy from another server (10-disk RAID10 w/ 15k SAS drives) over 1Gb network to the 20-disk RAID 10. During that transfer, which only took a few minutes, two disks in the 20-disk array on the MSA50s tripped offline. One of them with failure code 7, the other with code 32.

I reseated those drives and they came back online with no problem and rebuilt from their mirrored pair.

Then I started restoring a database from that 40GB file to those arrays on the MSA50s. At that point, the drives started lighting up like a Christmas tree. I lost ten drives in the 20-disk RAID10 (one of each of the mirrored pairs), and on the 10-disk array, I lost three.

All of these failures were either failure code 7, 20 or 32. There are 4 SCSI bus faults on all drives in the arrays and on the drives that show failed, under "Other Failures" there is a count of 3.

All of the disks are model# DG072A8B54 and are running an older version of firmare, HPD4.

Unfortunately, HP does not show what updates were made in the subsequent 3 firmware releases for these drives. Nor have I ever received an alert concerning these SAS SFF drives requiring firmware updates.

The thing that shocks me the most is that this server had an I/O load on it before the reconfigure and the tasks I performed this weekend. Why did this setup crap out on me like this now?

I've created a case with HP and set in a few ADU reports from different times during these failures. Hopefully they can come up with something other than, "You need to update your firmware." I need to know why this failure happened and assurance that firmware HPD7 will fix it for good...