HPE EVA Storage

MSA2012i: crashes all the time and is unavailable

 
piet_10
Occasional Advisor

MSA2012i: crashes all the time and is unavailable

Hi,

My MSA crashed all the time, when restarting it I am able to connect to it for a few minutes and then it is unavailable again.

there are a few events that I am concerned about:

1.Warning: Killed partner controller; reason=5 (Other not present)

2. Critical: FRU type: A/C PSU, Right, problem: encl 0. ..

3. Warning: Drive enclosure event: Critical, enclosure 0, power supply 1, power supply status Under voltage, DC failure, HP SPS Chassis 3210

Especially message 3 tells me there is something wrong with a power supply, the warning "Under voltage" look likes something is wrong with the voltage delivered to the MSA, I have swapped cabled and got power from another rack, but that didn't matter.

Anyone an idea what to do with this ?
Do I have to replace the power supply ?

The power supply message also comes from an notification of RAID Controller A. could there be something wrong with the RAID controller ?

If so where is this controller located ? inside the MSA controller ?
6 REPLIES 6
Johan Guldmyr
Honored Contributor

Re: MSA2012i: crashes all the time and is unavailable

Hi, piet.

See the quickspecs here:

http://h18000.www1.hp.com/products/quickspecs/13187_emea/13187_emea.HTML#Configuration Information

The controllers are the two horizontal parts in the middle of the enclosure. Do you have dual controllers?

One thing that's always important with these MSA2000 is the firmware - are you up to date?

They can be downloaded from here:
http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?prodNameId=3687132〈=en&cc=us&taskId=135&prodClassId=-1&prodTypeId=12169&prodSeriesId=3687128

The store.logs might be interesting to see, see here for a procedure how to capture them:

http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&objectID=c01907232

One thing you could try is to perhaps run the system without the PSU it is complaining about. Or maybe without the controller it is complaining about. I'd start with the firmware though.

As far as error 1. it looks like one controller is disappearing for some reason. Could be firmware, psu problem (quite likely with error 2 and 3) or maybe some kind of midplane issue.
piet_10
Occasional Advisor

Re: MSA2012i: crashes all the time and is unavailable

Hi,

Thank you for the information,
The MSA is running with only 1 controller.
I have updated to the latest firmware about 2 months ago, and I checked the link which doesnâ t have a newer version available.
Good idea to remove the PSU which is causing problems, I will do that now and see if the MSA stays up and running. If that is the case I will have to replace the PSU.

I will also check your link about the way to capture the store.logs.
Will post an update when done that.
piet_10
Occasional Advisor

Re: MSA2012i: crashes all the time and is unavailable

The problem PSU has been removed and the MSA keeps on running now.

I have generated the store.logs file, to which email address can I send it to for analysing it ?
Johan Guldmyr
Honored Contributor

Re: MSA2012i: crashes all the time and is unavailable

Too big to attach to this thread?
piet_10
Occasional Advisor

Re: MSA2012i: crashes all the time and is unavailable

No, not a problem at all.

Here it is.
Johan Guldmyr
Honored Contributor

Re: MSA2012i: crashes all the time and is unavailable

Looks like a PSU error to me.

If it keeps on running fine I would suggest replacing it.

To be even more sure you could of course swap the power supplies and see if the problem is then seen in the other PSU slot.


A753 2011-05-25 07:08:05 168 W A Drive enclosure event: Critical, enclosure 0 WWN 500C0FF0D88F633C, power supply 1, power supply status Under voltage, DC failure, HP SPS Chassis 3210
A754 2011-05-25 07:08:05 314 C A FRU type: A/C PSU, Right, problem: encl 0. Product ID: 481320-001, S/N: 3CL930M493 rev: E1. Related event ID: 753, type: 168
A760 2011-05-25 07:14:42 84 W A Killed partner controller; reason=5 (Other not present)