HPE EVA Storage

MSA2324SA Raid-10 drive failure caused problem for overlaying fs

 
Nicolai Rasmussen
Regular Advisor

MSA2324SA Raid-10 drive failure caused problem for overlaying fs

I have a C3000 with 3 BL460c, dual sas bl switches and one MSA2024 (MSA2000 G2) attached to it.

On the MSA I have a Vdisk (RAID10) array with 6 drives (and some raid-5 arrays - but those are not important here). The vdisk is divided into two volumes.
On each volume(LUN) I have a vmfs3 file system.
On one of the LUNS I run a virtualized SQL Server 2008.

When one of the drives failed in the RAID10, my SQL server logged a "Reset to device, \Device\RaidPort0" from its LSI SAS controller (Vmware virtual). This caused all connections to the SQL server to be lost for 2-3 seconds.

I'm pretty sure that is NOT supposed to happen? - Did I configure something wrong, or is this the trade-off with the SAS interconnect version of the MSA?

Anyone?
6 REPLIES 6
Prokopets
Respected Contributor

Re: MSA2324SA Raid-10 drive failure caused problem for overlaying fs

Nicolai, it's quite strange that host noticed disk failure in MSA raid10. Can you show the MSAs logs?
Nicolai Rasmussen
Regular Advisor

Re: MSA2324SA Raid-10 drive failure caused problem for overlaying fs

I've attached the event logs from the time the MSA started logging a disk error.

Thanks for your time :)
Prokopets
Respected Contributor

Re: MSA2324SA Raid-10 drive failure caused problem for overlaying fs

There are a lot of "Disk channel error" in MSAs logs and i think that they caused lost of io. AFAIK that's a known issue (damn, i hate when someone tells me this words :)) with MSA2000 G2. Have you tried to upgrade firmwares including disk firmwares?
Nicolai Rasmussen
Regular Advisor

Re: MSA2324SA Raid-10 drive failure caused problem for overlaying fs

I'm running M110R21 on the controllers, and I see that M110R28-02 is available, but I don't see any fix for this issue, in the list of fixes:

â ¢Fixed a timing problem where a super-capacitor failure was erroneously reported during start up.
â ¢Fixed an issue where a power supply failure went undetected by the array. If a failure of the first power supply went undetected and the second power supply subsequently failed, the entire array abruptly powered off.
â ¢Fixed an internal bus timing issue that sometimes lead to NMI (non-maskable interrupt), ECC (error checking and correcting), or other controller timing-related failures.
â ¢In sequential read/write environments under maximum load, this fix may cause a performance degradation, but performance will remain within all published specifications.
â ¢Fixed a problem detecting the number of disk drive bays in an expansion enclosure.
â ¢Fixed the issue that all drives spun up concurrently when a single power supply was in use. If only a single power supply is functioning, disk drives now spin up in a staggered manner. (Do not perform system updates or operate the enclosure for an extended period of time on one power supply; failed components must be replaced as soon as possible.)
â ¢Fixed a situation where a SATA disk drive may be dropped from a vdisk during a firmware upgrade.
â ¢Fixed a problem where user credentials were reset to default values when controller A was replaced.
â ¢Fixed an issue where the vdisk number was mis-reported.
â ¢Fixed an issue where a firmware upgrade did not error out when an invalid package was used.
â ¢Fixed an issue that allowed a user to set the independent cache mode to an unsupported configuration.
â ¢Fixed an issue that a reconstructing vdisk would report as completed when just beginning the rebuild.
â ¢Fixed an issue where a failed disk drive might cause a controller to stall.
â ¢Fixed an issue where correctable ECC errors caused a scrub failure.
â ¢Fixed a problem where a scrub error was reported when, in fact, the background scrub was manually stopped.
â ¢Fixed an issue where an offline vdisk might not be appropriately quarantined when an array is powered up.
â ¢Fixed an issue where the array did not correctly report all supported persistent reservation types.
â ¢Fixed an issue where deleting a vdisk failed.

"Fixed an issue where a failed disk drive might cause a controller to stall." is the closest one, but I wouldn't say that the controller stalled...

Anyway, I know getting any form of assistance from HP WITHOUT having the latest firmware installed is impossible, so I might aswell just install it :)

We'll see what happens next time a drive fails (if any)...

Thanks for your help :)
Prokopets
Respected Contributor

Re: MSA2324SA Raid-10 drive failure caused problem for overlaying fs

Nicolai, if you'll update the firmware and if it solves your problem, please write it here, so the search engine could find this (possible) solution.
Nicolai Rasmussen
Regular Advisor

Re: MSA2324SA Raid-10 drive failure caused problem for overlaying fs

I will schedule the firmware update in our next maintenance window. I really dread doing these types of FW upgrades, as it would kill our business if something goes wrong...

I won't be able to tell if it has solved anything though. Not until another drive fails - which is hopefully never :)