Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
Disk Arrays
Showing results for 
Search instead for 
Did you mean: 

MSA50s failed with high I/O load

Regular Advisor

MSA50s failed with high I/O load

Had a very strange failure with some MSA50s this weekend.

Here's the configuration:

2x P600
3x MSA50 w/ 10 - 72GB 10k SAS SFF

Changed the arrays from 3 10-disk RAID 10 to 1 20-disk RAID10 left the 1 10-disk.

Started a 40GB file copy from another server (10-disk RAID10 w/ 15k SAS drives) over 1Gb network to the 20-disk RAID 10. During that transfer, which only took a few minutes, two disks in the 20-disk array on the MSA50s tripped offline. One of them with failure code 7, the other with code 32.

I reseated those drives and they came back online with no problem and rebuilt from their mirrored pair.

Then I started restoring a database from that 40GB file to those arrays on the MSA50s. At that point, the drives started lighting up like a Christmas tree. I lost ten drives in the 20-disk RAID10 (one of each of the mirrored pairs), and on the 10-disk array, I lost three.

All of these failures were either failure code 7, 20 or 32. There are 4 SCSI bus faults on all drives in the arrays and on the drives that show failed, under "Other Failures" there is a count of 3.

All of the disks are model# DG072A8B54 and are running an older version of firmare, HPD4.

Unfortunately, HP does not show what updates were made in the subsequent 3 firmware releases for these drives. Nor have I ever received an alert concerning these SAS SFF drives requiring firmware updates.

The thing that shocks me the most is that this server had an I/O load on it before the reconfigure and the tasks I performed this weekend. Why did this setup crap out on me like this now?

I've created a case with HP and set in a few ADU reports from different times during these failures. Hopefully they can come up with something other than, "You need to update your firmware." I need to know why this failure happened and assurance that firmware HPD7 will fix it for good...