1763018 Members
2743 Online
108909 Solutions
New Discussion юеВ

Re: EVA Releveling

 
SOLVED
Go to solution
Bill Mace
Occasional Advisor

EVA Releveling

When a disk fails, the EVA relevels and distributes the data to the remaining disk. In a frame with 168 disks in one group, configured for maximun performance and 4 disks for double sparing (protection level) the rebuild time is taking multiple days. The EVA's are running at 95% + occupancy. The latest Best Practise document states that only 5GB of free space are required for maintenance. Free space has nothing to do with the rebuild process other that having less to rebuild. Host i/o's accessing the EVA during rebuilding of the array imppacts the process. Is there anything other than limiting host i/o's that will speed up this process. I can not find any documentation on how the rebuild process works. I am concerned that when an EVA is at 98% and is bebuilding that another failure will cause it to corupt data.
30 REPLIES 30
Uwe Zessin
Honored Contributor
Solution

Re: EVA Releveling

A disk failure is followed by two tasks:

- the recovery
it does the minimum work to restore redundancy

- the leveling
it makes sure that user data is equally distributed over the disks


If another disk fails during recovery, it depends on the 'location' of the disk if data loss is the result or not (same as a traditional RAID system). The EVA divides its disk groups into different failure domains called RSS (Redundant Storage Set). It can tolerate the loss of multiple disks as long as each disk belonged to a different RSS.
.
Bill Mace
Occasional Advisor

Re: EVA Releveling

Uwe,
I totally agree, but the EVA is at risk during the reconstruct of the RSS's. If the RSS reconstruct is coompllete then the EVA can handle another failure and resart a new releveling process. I understand that it reconstructs RSS, thern relevels Vraid5, then relevels VRaid1. The only time that it is critical is during the reconstruct of the RSS's. Outside of suspending I/O's (not possible in production), is there any other way to shorten the releveing time, still using 168 disk group, The EVA's are all running with 98%+ occupancy.
Thanks Bill
Uwe Zessin
Honored Contributor

Re: EVA Releveling

Bill,

this is nothing different than any other storage array that is not running some kind of ADG/RAID-6/RAID-5DP or whatever the vendor calls his implementation of 'double protection'. If too many disks fail, the data is gone :-(

I am not aware that there is any way to 'tune' the leveling process, make it faster, slower, suspend it...
.
Mario_66
Valued Contributor

Re: EVA Releveling

Hi,

can you clarify your first statement: Rebuild takes multiple days?

What do you mean by that? Reconstruction or migration with or without leveling?

Can you point me to the best practise document that you mentioned here?

If I understand you correctly then the most problematical situation could be (Vraid0 is problematical by design ;o):

Vraid5, one failed disk. During reconstruction phase another disk in the same RSS has failed. But it does not neccessary mean that you lose your data. It depends about failure type.

I would say that it is not EVA specific issue. It is RAID5 limitation by design if we are talking about RAID5 by definition.

BTW, I would not say that more free space will not speed up rebuilding process.

Regards,
M.

David Ell
Advisor

Re: EVA Releveling

I am having the same problem... The question here is not redundancy but one of the time it takes to relevel an entire disk group.
I have pinged HP on this and was told there is no way to "prioritize" the releveling process. What makes it worse is the more host IO the longer leveling takes....
Mario_66
Valued Contributor

Re: EVA Releveling

Hi,

that's true, but what is a problem if the leveling is running longer then expected?

I am not aware of any impact on data availability or data integrity. Leveling is optimization process and the most of the storages have some kind of it.

Maybe I missed something?

Regards,
M.
David Ell
Advisor

Re: EVA Releveling

It the the management overhead... Best practices states that you should wait for leveling to complete before replacing a drive. In addition, if you are experiencing drive failures often, you would never "catch up"
Mario_66
Valued Contributor

Re: EVA Releveling

Hi,

"The Best Practise" says that you should wait for the reconstruction to complete, not leveling.

http://h200005.www2.hp.com/bc/docs/support/SupportManual/lpg29448/lpg29448.pdf, page 8

Regards,
Mario.
David Ell
Advisor

Re: EVA Releveling

Our Gold TAM recommends waiting for leveling to complete. This is interesting though