HPE EVA Storage

EVA4400 Lun's Disappearing

 
edb_1
Advisor

EVA4400 Lun's Disappearing

Hi,

We have recently purchased 4 EVA 4400's two of which are currently in production. In the last two days we have lost a Lun on each of the Production Eva's. What appears to happen is this...

:- Lun disappears from server (1x 2008 SQL CLuster and 1 x ESX Cluster)
:- Command view performance is degraded, then should you attemp anything in command view, you lose control of the EVA
:- After an hour the EVA is contactable again , having rebooted the controller.

The root cause seems to be disk failure, however, we have also 2x EVA 5000 who have had disk failures and not noticed one bit. The 4400 seems to be completely unable to cope, can anyone explain the reasons for this?

We are on code XCS 90000.40000..
13 REPLIES 13
Víctor Cespón
Honored Contributor

Re: EVA4400 Lun's Disappearing

Latest firmware is 09500000

A disk failure does not generate loss of access to vdisks on a EVA4400. The hardware is not worse than any other model.

You need the controller Event log to be checked. I'm betting on "indicated virtual disk has transitioned to stalled too long" kind of messages.
edb_1
Advisor

Re: EVA4400 Lun's Disappearing

Yeah we get the Stalled Error, however where is 09500000 available from. All I can find is 09000.6000 nothing later...
Víctor Cespón
Honored Contributor

Re: EVA4400 Lun's Disappearing

You should get that log checked before doing any changes, the most common cause for "stalled too long" is a bad SAN configuration.
Curt Fortenbery
New Member

Re: EVA4400 Lun's Disappearing

We too have seen this. Out of 5 EVAs, it has happened once on EVA #1 and EVA #3, and 4 times on EVA #5 (which is our busiest EVA by far). Different times a different LUN is effected at first. Once we try to make any presentation changes to the EVA, the controller THEN becomes unavailable and after an hour the controller resets. The controller effected is also (at the time) the "master" controller. Doing a reset of the controller through the management module after it gets in it's weird state does not work. And usually we get a predictive drive failure, or a definite drive failure, either before the problem or it pops up right after the controller comes back online. It also has to do something with leveling at the time the controller starts experiencing issues.

We are also on firmware version 09004000.

Have you tried updating to 09006000 and if so did the problem reoccur? I have been told by HP that there is no fix for this specific issue as they cannot reproduce it in the lab. I have been told that there is no way to force the controller to reset quicker and removing/reseating the controller while it is in that state is NOT the optimum solution.
edb_1
Advisor

Re: EVA4400 Lun's Disappearing

Hmm interesting. We've upgraded to XCS 09501000 and it seems to have abated a bit. Hp have been helpful by replacing drives which were "faulty", but cannot see any issues from the logs apart from the syptoms you described. We've done a code load on everything drives, controllers, management unit and it does seem to be more stable. We were getting loop failures on individual disks which we think were fairly spurious, it does seem to have "recovered a bit since the code load, but i'm a little wary of trusting it. A bit difficult when we need to move from a EVA 5000 to the 4400's.

I'm not really sure how we could have configured the SAN badly, you plug it in and create disk groups, thats about it....
edb_1
Advisor

Re: EVA4400 Lun's Disappearing

"A disk failure does not generate loss of access to vdisks on a EVA4400. The hardware is not worse than any other model."

It seems too on this model....

"You should get that log checked before doing any changes, the most common cause for "stalled too long" is a bad SAN configuration"

Not really sure how to configure it more than turning it on and creating disk groups....

The whole point of the 4400 was that it is user install, user configuration and user updated...
bmadhav
Advisor

Re: EVA4400 Lun's Disappearing

Hi,

Has this problem fixed with the latest firmware. We too are facing the same issue with EVA4400 and we are on XCS 90000.60000..

And we also faced this issue of LUN disappear whenever there is a ungrouping of disks happens or any disk goes to 'about to fail' state.. And it comes back to visible state once i reboot the controller or entire storage system.

This EVA is connected with multiple ESX hosts and we have multipathing enabled for loadbalancing from both EVA and ESX levels.

Also we never faced this issue in EVA5000 which is still running with the same setup of EVA4400, but only difference is with EVA4400-FATA and EVA5000-FC disks(but we had failures of FC disks also in the EVA5000, which didnt result in any LUN disappering)...

Regards
Bindumadhava.
Uwe Zessin
Honored Contributor

Re: EVA4400 Lun's Disappearing

The EVA-5000 V1.0 had lots of problems with its I/O modules, too.
.
Kevin Beaumont
New Member

Re: EVA4400 Lun's Disappearing

All

Just so you know, here's the official HP support verdict on this one:

"CSM stands for Cell State Manager. This is essentially the process scheduler and monitor for the array software.

A CSM reset is caused when the CSM has discovered a hung process, and the process makes no progress (remains hung) for 60 minutes.

This is an expected behavior when there are hung processes within the array software.



The above events have been noticed with the XCS code 09004000 and 09006000 on the EVA 4400, and a formal fix of this issue has been included in XCS 09501100. A suggested action plan for a permanent resolution of the issue is given below



Corrective Action Plan:

· The controller if down may be restarted using console input or reseat in the enclosure.

· Update the XCS Code to 09522000 and Command View to 9.0.1 to prevent the recurrence.

"