HPE EVA Storage

Power supply failure takes out entire SAN (MSA 2312i)

 
Daniel Kleeman
Advisor

Power supply failure takes out entire SAN (MSA 2312i)

Our fully redundant MSA 2312i SAN suffered a catastrophic failure last night triggered by a single PSU failure. There was no host traffic possible at all. After many attempts to get the two controllers restarted through the web and command line interfaces a cold reboot recovered service (the PSU remains failed).

Surely this SAN is designed to keep working in this situation? Why can one FRU take down a "fully redundant" box? This type of service outage is intolerable.

Does anyone else have similar experiences?
10 REPLIES 10
TTr
Honored Contributor

Re: Power supply failure takes out entire SAN (MSA 2312i)

I had several similar experiences with disk drives. In some cases when a disk failed it took down the entire bus (enclosure) and all other disks were not visible.

I think it depends how the PS failed. It may have failed in such a way that it left the power connector in state that resulted a full short which cut power to the entire storage unit.
Daniel Kleeman
Advisor

Re: Power supply failure takes out entire SAN (MSA 2312i)

The storage unit did not short out the whole enclosure. There was no reboot of the storage or management controllers. However, they went into a very strange mode where they refused to shut down or pass traffic. Not good.
Patrick Terlisten
Honored Contributor

Re: Power supply failure takes out entire SAN (MSA 2312i)

Hello Daniel,

save a support log bundle and log a call at HP. They could do a technical deep-dive analysis for you.

Sure, a single PS shouldn't bring down the whole box, but I saw even bigger storages dying caused by much cheaper parts (an IBM Shark was killed by a 10 â ¬ battery...).

Regards,
Patrick
Best regards,
Patrick
Daniel Kleeman
Advisor

Re: Power supply failure takes out entire SAN (MSA 2312i)

Thank you Patrick. I have raised a case with HP and I am also interested in similar community experiences.
Uwe Zessin
Honored Contributor

Re: Power supply failure takes out entire SAN (MSA 2312i)

I've seen something with power on an MSA2312fc this week, too. System does not start with both power supplies. It starts with one of them, but then it turn off rather quick. According to the console log the controller was just uncompressing its image - so no way to dive into any logfiles :-(
.
Cass Witkowski
Trusted Contributor

Re: Power supply failure takes out entire SAN (MSA 2312i)

In the good old days a power supply's rectifier could fail so that instead of 5 Volts DC you got 5 Volts with 1 Volt of ripple. Because the power supply bus is shared one power supply could cause problems.

So fully redundant is not completely redundant. Just as Uninteruptible Power Supplies are really Interruptible Power Supplies.

Patrick Terlisten
Honored Contributor

Re: Power supply failure takes out entire SAN (MSA 2312i)

Hello Uwe,

strange... some kind of fear is rising in me, I've got serval MSA 2000 G1 and G2 at customers. :( Fortunately no problems until today, except of some disks.

Regards,
Patrick
Best regards,
Patrick
Daniel Kleeman
Advisor

Re: Power supply failure takes out entire SAN (MSA 2312i)

After further investigation it appears that the first failure was a controller failure, followed some time later by a power supply failure.

This still leaves unexplained why the SAN stopped serving requests. HP are looking in to it.
Dejan Savic
Occasional Contributor

Re: Power supply failure takes out entire SAN (MSA 2312i)

Unfortunately, I just had exactly the same expirience. Controller B seamed to be defective, but it turned out that controller B and right PSU died at the same time (one thing probably caused another), and few hours later left PSU died too. Unfortunately, both PSU's died in a way that they kept reseting the entire box every few seconds which ultimately caused destroyed vdisk's and loss of allmost all data. Daniel, have you ever received any explanation from HP? HP technician that came on site mentioned "that there might be a manufacturing/design flaw with those PSU's". I beleive he is right because HP replaced faulty PSU's with completly another type of PSU. Significantly, they've kept the same PN on those new PSU's like the old ones were never in the production. I've also found another unfortunate admin who had same failure of MSA 2312i
http://community.spiceworks.com/topic/111921