HPE EVA Storage
1827245 Members
2334 Online
109716 Solutions
New Discussion

Resiliance on the EVA (losing more than 1 disk)

 
Simon P.
New Member

Resiliance on the EVA (losing more than 1 disk)

A colleague was talking about a scenario where they lost 1 HDD in his EVA5000, whilst waiting for a replacement part another HDD failed. This meant his SAN went down. How should you approach this so you don't end up in the same situation.
19 REPLIES 19
Ivan Ferreira
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Always use a disk protection level double (4 disks spare), and have at least 10% of free space.

We have a mix of vraid5 and vraid1, we lost up to four disks at the same time (disk firmware problem) and the storage and the vdisks remained up.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Uwe Zessin
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

There is no 100% safeguard against any possible double disk failure on the EVA. You would need to mirror the data against a second storage array.

Defining a protection level is good practice, because it makes sense to put disk space aside for the event of a disk failure. The EVA can use this space to restore redundancy immediately, but, again: it depends on the location of the second failed disk if data loss occurs or not.
.
SAKET_5
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Hi Simon,

From the information you provide, it seems your colleague had configured just one "Disk Group" on the EVA without double sparing - and hence two simultaneous disk failures took the entire storage array down?

In my opinion, another way of looking at it is - although configuring one large Disk Group on the EVA has the performance enhancement potential (again subject to I/O profile of the application)it also has the negative side of exactly what you describe. 2 disks failure (with sparing value < 2) in an RSS brings the entire disk group down. This was one of the main reasons why we chose to configure 3 disk groups each on our EVAs - so to contain such unfortunate simulatenous disk failures to perhaps a disk group and not your entire storage array.

Hope, it helps.

Regards,
Saket.
Uwe Zessin
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

No matter how many disk groups you define and what 'protection level' you set: if the *wrong* pair of disks fails at the same time, you have data loss!

The 'protection level' only defines how much space is set aside for recovery - it has nothjing to do with RAID and does not protect any data (bad & confusing naming).


That problem is not specific to EVA - all RAID implementations suffer from this if they can only tolerate the loss of a single disk.
.
SAKET_5
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Hi Uwe,

totally agreed but my point was that having multiple disk groups as opposed to just one does provide you *potentially* higher availability. Having said that yes - its a matter of combination of the disks that fail - you could have simultaneous disk failues in mulitple RSSs across multiple disk groups and yes in that case you would still lose your entire EVA. But say if 2 disks fail from the same RSS in a specific disk group on an EVA with more than one disk groups - the impact would be loss of one disk group rather than your entire EVA.

Regards,
Saket.
Mahesh Kumar Malik
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Hi Simon

It is recommended to have disk protection level set to at least 2 drive modules per bay if bay is full

Regards
Mahesh
Uwe Zessin
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Hello SAKET,
I was confused by this sentence:
> 2 disks failure (with sparing value < 2)
> in an RSS brings the entire disk group down.

I'm sure you agree that the 'protection level' has nothing to do with it.

--

Sorry, Mahesh, no offense intended, but you have not understand how disk protection works on the EVA (and those who have told you don't as well). The setting is per disk group, the capacity is distributed over all disk drives within the group and it has *nothing* to do how populated a single disk drive enclosure is.

Following your arguments, how do you configure an EVA with 18 fully loaded disk drive enclosures? [extra points if you detect the error in this question ;-) ] Do you really intend to reserve 36 disks for sparing ?!
.
Donald Kok
Respected Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Hi Uwe,
The extra points question: you can not have 18 enclosures in an EVA.
Greetzz
Donald
My systems are 100% Murphy Compliant. Guaranteed!!!
Uwe Zessin
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Greetings, Donald. Unfortunately, the answer is wrong:

http://h18006.www1.hp.com/products/quickspecs/11006_div/11006_div.html
EVA 5000 - 2C18D

It means 1.5 cabinets ;-)

2 more tries left...
.
Jefferson Humber
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Uwe,

You can only have upto 240 HDD per EVA pair. 18 enclosures x 14 HDD would give you 252 spindles, too many. I assume in the 2C12D + 0C6D config you can't fully populate them all ?

Do I get the prize ?

Jeff
I like a clean bowl & Never go with the zero
Uwe Zessin
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

Hi Jeff, the price (fame for being able to decipher my cryptographic responses ;-) is yours!!

Indeed I wrote "with 18 fully loaded", but then realized that one cannot load all, because the loops would run out of AL_PAs.
.
Jefferson Humber
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

I hope that question comes up in my HP0-690 exam on Friday..... At least I know I'll get one right now. ;-)
I like a clean bowl & Never go with the zero

Re: Resiliance on the EVA (losing more than 1 disk)

Simon, The best explanations I have found for understanding EVA redundancy concepts such as disk groups, RSS's, RAID and protection levels is found in the found in the "EVA Configuration Practices - White Paper" located at:
http://h200005.www2.hp.com/bc/docs/support/SupportManual/lpg29448/lpg29448.pdf
This document is only 22 pages long, and it's the most concise yet thorough document you'll find on the subject. It will help you understand how to optimize your configuration to suite your needs.
We are powerful creations of Love. To Love we are destined, and in this reality lies our joy.

Re: Resiliance on the EVA (losing more than 1 disk)

Simon, One short follo-up. Diskgroups are divided into RSS's (Redundant Storage Sets). If a disk group contains a large number of disks it will be subdivided into RSS's of approx. 8 disks each. So, a diskgroup comprised of 64 disks would probably consist of 8 RSS's. Each RSS can tolerate the loss of a single disk at any given time without causing data loss to vdisks with a RAID level of 1 or higher. (RAID 0 can never tolerate a disk failure - never.) Each RSS functions as subgroup (with RAID protection) of the larger diskgroup. Anytime a second disks failure occurs within a SINGLE RSS before the first failed disk's data can be fully reconstructed onto spare (free) disk space, the entire diskgroup is lost, not just the offending RSS. This potential exist no matter how the "Disk Protection Level" is set. On the other hand, a diskgroup consisting of 64 disk with 8 RSS's could tolerate as many as 8 disk failures simultaneously without any loss of data, as long as only one disk failure occurs in each RSS; again, regardless of the "disk protection level".

As Uwe said, the "disk protection level" has no direct affect on the number of simultaneous disk failures a diskgroup can tolerate. The "Disk Protection Level" only determines the amount of spare disk space that will be held in reserve for fully reconstructing data from failed disks. I hope this helps.
We are powerful creations of Love. To Love we are destined, and in this reality lies our joy.
Jefferson Humber
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

The more I read this thread the more confused I am becoming, is the following below correct ?


64 HDD in 1 disk group - 8 RSS's (estimate)

You can loose 1 HDD per RSS, and Vdisks will be OK (assuming Vraid0 not used), data redundancy achieved through the Vraid characteristics of the Vdisks ?

However you can never loose 2 HDD simultaneously from the same RSS ?

The protection level (None, Single, Double) is simply an amount of disk space assigned to rebuild the failed HDD into. Once reconstruction is complete another failure could occur (assuming Double was chosen) ?
I like a clean bowl & Never go with the zero
Simon P.
New Member

Re: Resiliance on the EVA (losing more than 1 disk)

Before I close this thread, I was just wanted to thank everyone for their knowledge and support that they have provided. Where applicable scoring has been applied.

Thanks
Simon P.
New Member

Re: Resiliance on the EVA (losing more than 1 disk)

Thanks for all the information provided.
Uwe Zessin
Honored Contributor

Re: Resiliance on the EVA (losing more than 1 disk)

The redundancy (VRAID-1, VRAID-5) is stored _within_ the RSS. Using VRAID-1, for example, the data is stored on a pair of disks within an RSS. If you loose such a pair, you have lost user data. Using VRAID-5 (4D+1P), the data is 'multiplexed' in the RSS. If you loose 2 disks out of this chunk, you have lost user data as well.

You can safely run an EVA with protection level None (0). As long as you have enough unconfigured capacity it will draw from this to restore the redundancy.
.

Re: Resiliance on the EVA (losing more than 1 disk)

Jefferson Humber,

You are correct on all counts. Again, the EVA Configuration Best Practices Document is the best place to read more on this subject.
We are powerful creations of Love. To Love we are destined, and in this reality lies our joy.