EVA4400 message

 
SOLVED
Go to solution
Mauro Livi
Valued Contributor

EVA4400 message

Hi all,
Sorry if I get long winded, I just want to give as much info as possoble. Anyway, we've been operating on a production EVA4400 since January 2009 and all is relatively stable (knocking wood).

However, I periodically (once a month or so) see a message in my Controller Event Log that reads "An HSV300 controller's operation is degraded". It looks like it then "corrects" itself and operates normally again.

We are running firmware version 0900400 and from what I've read this is a known firmware bug. In calling HP they recommended upgrading to 09522000...OK I understand that.

Here's my concern: I've read that recent firmware versions for the EVA4400 have been absolute nightmares for people. Versions 0951xxx was an absolute disaster and many revisions kept getting pulled because of serious bugs. It seems that 09522000 may be a little more stable, but it has been out for only a couple months and I'm afraid that all the bugs have not been discovered yet. My point is that my EVA is a production environment and other than the messaging mentioned above it has been very stable (knocking wood).

I cannot afford to go to a new firmware that may cause serious problems impacting production, so I'm wondering if I'm not better off sticking it out with 0900400 a bit longer and see how 09522000 holds up in the next few months. Are there other any known risks with 0900400? I mean, if all I am getting is a the message above periodically, I'd rather have that than something that could potentially cause down time. Perhaps I'm being paranoid???
Any feedback would be greatly appreciated.

Thanks
Mauro

7 REPLIES 7
Víctor Cespón
Honored Contributor

Re: EVA4400 message

Hi Mauro, you better think about updating, XCS 09004000 version will become inactive (out of support) on 31 October 2009.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01850026

The messages "An HSV300 controller's operation is degraded" are due to a miss-reading of the battery status. The battery charge is read as 0 and the next second it's read as enough for 190 hours.

The same can happen with the temperature, which can be read as 0º for a moment. Those are algorithm timing bugs kwnown and solved on later firmware versions.

XCS 09522000 corrects a couple dozen bugs found until august 2009. You can see the full list here:

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c01849456/c01849456.pdf

XCS 09522000 is being proactively deployed on all customers, and even HP is assuming all the costs.
Mauro Livi
Valued Contributor

Re: EVA4400 message

Hi vcespon,
Thanks for your reply. I saw that about the support issue, but I should still be able to get support on my hardware. However, I do plan to upgrade and will plan on scheduling that with HP (I'm thinking closer to end of year since I'll have to shut things down).

So other than the mis-reading of batteries/temperature, the message does not seem to represent any "real" hardware failure correct?

I agree that 09522000 seems to correct many things I just wondered if there were any reports of critical issues at this time.

Thanks
Mauro
Víctor Cespón
Honored Contributor
Solution

Re: EVA4400 message

XCS 09522000 needs a full shutdown of the EVA and a careful inspection of the logs on the following hours. On several cases I/O modules did not reset correctly and we had to reseat them.

Regarding the "hardware support", 70% of the issues on EVA4400 are firmware bugs (leading to components being marked as failed for no real reason), 20% are disk failures (mainly 1 TB FATA), and only 5% are real hardware failures.
Uwe Zessin
Honored Contributor

Re: EVA4400 message

I thought you have to do a full power cycle of the whole system (controller + disk drive enclosures). Was the reseating of the wonderful I/O modules done because the pwr-cycle was omitted?
.
Víctor Cespón
Honored Contributor

Re: EVA4400 message

Ooops, I forgot this:

The remaining 5% are due to incorrect installation.

Incorrect cabling on the disk enclosures (this leads to controller not booting and unneeded controller replacements)

Icorrect zoning/switch parameter configuration on Continous Access (this leads to replication being slow and replication link being interrupted)
Víctor Cespón
Honored Contributor

Re: EVA4400 message

In response to Uwe above:

No, even after power-cycling the whole EVA (twice), some I/O modules do not show it's firmware version on Command View, and keep breaking the loop (you see messages of the loop failing and recovering constantly).

In these cases, the I/O modules were removed and reinserted (with the EVA powered on), and situation was corrected.

Same situation as this:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01728318
Mauro Livi
Valued Contributor

Re: EVA4400 message

Hi,
Well just talked to my HP field rep and actually got this upgrade scheduled on the books (closer to end of year as I said).

Also as I've mentioned the messages discussed above are virturally the only problem I've had with the EVA (and it hasn't been much of a problem at that). So you can probably understand why I've been kind of paranoid of upgrading from such a stable environment.

However, my HP field rep will be doing the upgrade so I feel better about that.

Mauro