EVA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

How to survive EVA4000 enclosure failures?

 
SOLVED
Go to solution
Highlighted
Occasional Advisor

How to survive EVA4000 enclosure failures?

Hi all,

I'm a bit concerned regarding potential enclosure failures in our EVA4000 system.

Have any of you experienced problems with this (is it common?), and I´m not talking about the active replaceable components such as psu,fans and such but the passive box itself, backplane etc?

And this leads to the obvious question, how can I minimize the downtime in case of such event without replication? Should I have a spare one and replace the broken unit and move all physical disks to it or what?

Any ideas?

Running on 2C2D today.
8 REPLIES 8
Honored Contributor

Re: How to survive EVA4000 enclosure failures?

Hi,
for EVA3000/4000/4100 this is the problem because of the max number of enclosure is 4 and the RSS is 6 physical disks at minimum...
You need the 2C8D to have full vertical solution - and you can even loose the whole enclosure without the production downtime.
the pain is one part of the reality
Highlighted
Honored Contributor

Re: How to survive EVA4000 enclosure failures?

the enclosure failure need not mean the loss of data, but obviously means that the disk group (DG) is inoperative and data are not accessible.

Your idea of having spare enclosure does not work, because in EVAs all the data is distributed across all physical disks, so the disks are virtual and in the whole enclosure failure (1 enclosure is 14 disks max) there is multiple disk failures in a disk group.
So then the only good/safe/quick solution is remote replication (2 EVAs 4000/4100) with either the CLX (automatic failover feature) or CA (manual failover feature)
the pain is one part of the reality
Highlighted
Honored Contributor

Re: How to survive EVA4000 enclosure failures?

Hi,

I've managed a few EVAs and don't recall ever seeing a complete enclosure failure.

Yes I've had failure of PSUs, I/O modules and the like, but never the actual enclosure.

The closest I've ever seen was a self induced failure of a shelf in a HSG80 system, which are essentially the same units, just SCSI rather than fibre. In that case I removed a failed PSU and didn't replace it straight away. The whole shelf shut down without the second PSU in place, after a few minutes. Unsurprisingly, I never actually tested whether an EVA shelf would do the same, but I guess they will...

Hope this helps,

Regards,

Rob
Highlighted
Occasional Advisor

Re: How to survive EVA4000 enclosure failures?

Thanks for quick replys.

Yes, the EVA has that PSU time thing.
You have x minutes before it shuts down to prevent overheating.

Having a spare "offline" enclosure does not prevent downtime of course but what I meant was, will it work to just switch the enclosure and move the physical disks, psu's and everything, over to get it up and running again (coldswap) or does the HSVs detect the enclosure not being the same and refuse to bring to groups online again? There is no intelligence in the box itself right?

Highlighted
Honored Contributor
Solution

Re: How to survive EVA4000 enclosure failures?

Hi,

No, there's no intelligence in the enclosure at all, so having one as a cold swap would be a reasonable plan.

Cheers,

Rob

P.S. Don't forget to assign points.
Highlighted
Occasional Advisor

Re: How to survive EVA4000 enclosure failures?

Thanks, ordering one extra.

Points on the way.

/R
Highlighted
Honored Contributor

Re: How to survive EVA4000 enclosure failures?

> does the HSVs detect the enclosure not being the same
> and refuse to bring to groups online again?

Not at all - that would be a great design error - how could you ever deal with a defective disk drive enclosure?

On an older model, I have once removed both controllers and replaced them with other modules and everything went fine.

On a different system I have removed all disk drives and stored them away. Later I put them into yet another controller/disk enclosure setup and all data was still present.
.
Highlighted
Honored Contributor

Re: How to survive EVA4000 enclosure failures?

Hi,

With a shelf you have three things that will potentially bring a shelf down after 7 mins:

1: physical removal of a PSU - NOT the fact that it has failed [unless it is causing other problems as well]

2: BOTH fans fail

3: two out of three temp sensor groups go over temp

Mark...
if you have nothing useful to say, say nothing...