Disk Enclosures
1753716 Members
4588 Online
108799 Solutions
New Discussion юеВ

Re: EVA8000 Crashed !

 
Sajeev2007
Frequent Advisor

EVA8000 Crashed !

We recently had all our systems lose their LUNs because the Controllers were hung.

The Vendor claimed that the root cause for this was a failed disk that was still sitting in its slot and was not removed on time.

This is an EVA8000

6 REPLIES 6
Sajeev2007
Frequent Advisor

Re: EVA8000 Crashed !

can this happen ? isnt the EVA fully redundant ? how can one disk failure cause an entire array to come crashing down ?
mmax
Valued Contributor

Re: EVA8000 Crashed !

Hi.
What's configuration of your EVA? How many Disk Groups and disks do you have in the EVA?
V├нctor Cesp├│n
Honored Contributor

Re: EVA8000 Crashed !

Rather than a failed disk, it would be a failing disk. This can happen when a disk is failing in a way that causes communication disruption on the loops, and the EVA marks other disks as failed because can't communicate with them.
This will lead to several disks failed at the same time, and the disk group become inoperable. To solve this an HP engineer had to replace the failing disk and send an command to the controller to clear the DSL (Drive Suspect List).
With the new EVAs 4100/6100/8100 this is less likely to happen, because the I/O modules now insolate better a disk from the others.

Re: EVA8000 Crashed !

This can happen from a failing drive as posted above but is extremely rare with current VCS code and disk fw. An EVA8100 runs the new code out of the box so a lot of people belive it the main reason to be the new IO modules but the increase in reliability is mainly due to the disk firmware upgrades. Granted the new IO modules help by further isolated the drives from one another but the possibility still exists for an FCAL LIP storm. This is not a problem limited to the EVA but to any array which employs FCAL hard drives. If a rogue drive initiates a massive number of LIPs it can take down the entire array by causing delays in communication with other drives. I have seen it in multiple vendors arrays not just HP's.
IBaltay
Honored Contributor

Re: EVA8000 Crashed !

Hi,
there can be many reasons for that:
a) firmware bug (5031 deactivated in march 07, 6000 deactivated in december 07). The latest supported is 6100/6110 and the new is going to be released in 04-05/08.
b) the combination of the buggy deactivated firmware the EVA SAN switches incompatibly set (especially the aptpolicy, iod, dls)
c) in the EVAs with 4 and less enclosures the fact that the RSSs cannot be spread verticaly
the pain is one part of the reality
IBaltay
Honored Contributor

Re: EVA8000 Crashed !

Hi,
one additional note to this situation.
The precise analysis of the controller logs should be done to distinguish if any of the known deactivated firmwares bugs occured as follows:
a) controller stalled and waiting to the resync to long
b) during the ungroup due to the disk failure there was critical few amount of free space
c) especially in the maximum 4 enclosures configuration where the RSS members cannot be isolated in one enclosure (8 enclosures), in some circumstances it can lead to the DG set to the inoperable state
the pain is one part of the reality