Disk Enclosures
1747997 Members
4926 Online
108756 Solutions
New Discussion юеВ

Re: How to view RSS on EVA! + Expansion

 
IBaltay
Honored Contributor

Re: How to view RSS on EVA! + Expansion

small errata in my wording:

ll try to give u the example for RAID1:

1. Single disk failure (rebuild)
if one disk carryng the RAID1 stripes of the RSS fails, then it is only a 1 disk failure of the RAID1 mirror set and all its mirror stripes will replicate to other RSS members as well as it will be mirrored to the adjacent RSS members in the new places to create the complete mirrored set.

2. Double disk failure (rebuild)
if one disk of the RSS fails and carries the RAID1 stripes, and at the same time the second failed disk from the different RSS does not hold any of the mirror stripes of the first failed disks, then the rebuild will start as described in the variant 1

3. Double disk failure (data loss)
if one disk of the RSS fails and carries the RAID1 stripes, and at the same time the second disk from the different RSS holding the "secondary" mirror stripes fails, then it is a double disk failure of the whole disk group
the pain is one part of the reality
sam bell
Regular Advisor

Re: How to view RSS on EVA! + Expansion

@IBaltay: If I understood the corrections you made in your description you mean that the mirrors of a RAID 1 disk are residing in different RSS. Are you sure about that?

My understanding was that a RSS forms a complete RAID protection domain as stated here by Ulrich Zessin in the following thread:

--snip--
A disk group is devided into multiple RSSes (Redundant Storage Sets). Each RSS forms a complete RAID protection domain.

Disks are 'married' to pairs to define the VRAID-1 mirror members and both members are always contained in the same RSS. If one of the members fails, yes, the EVA picks up the data of the remaining member and stores it on a different member pair. Then it releases the data from the 'widow'.
--snap--

http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1257422203270+28353475&threadId=1111467

UselessUser
Frequent Advisor

Re: How to view RSS on EVA! + Expansion

WOW...

I was not expecting so many replies so quickly...

Reading other forum messages I am 99% sure of the following:

Single level protection

--Can not lose more than 1 disk in an RSS if using VRAID 5 full stop.

--Can lose more than 1 disk in an RSS if using VRAID 1 as long they are not from the same pair.

Double level protection

--Can not lose more than 1 disk in an RSS if using VRAID 5 full stop.

--Can lose more than 1 disk in an RSS if using VRAID 1 as long they are not from the same pair.

--"Protection Level" is just spare space to allow the EVA to rebuild the redundancy set somewhere TEMPORARILY until the physical disk is replaced. I assume if I had a maxed out EVA and lost a disk in a VRAID 5 or a mirror pair in VRAD 1 and had no protection level set, the EVA would continue to function but the RAID controller would have to calculate the data "on the fly" as opposed to simply reading the data off a disk?

--Single protection provides enough space on the EVA to rebuild 2 failed disk's data somwhere else

--Double protection provides enough space on the EVA to rebuild 4 failed disk's data somewhere else

Set to be 2/4 multiples due to VRAID1 and the fact a fail is fixed by moving the data off the remaining disk of the pair.

Bottom line - level of protection does not influence data protection, just availability of data!

So the recommendation of double for groups larger than 100 disks etc is simply a mathematical calculation based on the probability that there is significantly more chance of a multiple disk failures in 100 disks than say a group of 32.

And as nobody has corrected me otherwise regarding the whole shelf redundancy issue I am sticking to VRAID 1 and hoping RSS has separated each disk pair out onto separate shelves.

I am hoping Uwe will respond at some point!
IBaltay
Honored Contributor

Re: How to view RSS on EVA! + Expansion

sorry another errata in my wording, it is quite hard to explain it in words (hopefuly this will be better :-)):

2. small errata in my wording:

ll try to give u the example for RAID1:

1. Single disk failure (rebuild)
if one disk carryng the RAID1 stripes of the RSS fails, then it is only a 1 disk failure of the RAID1 mirror set and all its mirror stripes will replicate to other RSS members as well as it will be mirrored to the adjacent RSS members in the new places to create the complete mirrored set.

2. Double disk failure (rebuild)
if one disk of the RSS fails and carries the RAID1 stripes, and at the same time the second failed disk from the different disk of the same RSS does not hold any of the mirror stripes of the first failed disks, then the rebuild will start as described in the variant 1

3. Double disk failure (data loss)
if one disk of the RSS fails and carries the RAID1 stripes, and at the same time the second disk from the different disk of the same RSS holding the "secondary" mirror stripes fails, then it is a double disk failure of the whole disk group
the pain is one part of the reality
sam bell
Regular Advisor

Re: How to view RSS on EVA! + Expansion

Hi,

> Single level protection
> --Can not lose more than 1 disk in an
> RSS if using VRAID 5 full stop.
>
> Double level protection
> --Can not lose more than 1 disk in an
> RSS if using VRAID 5 full stop.

what do you mean with "full stop" in regards to Vraid5? Regarding the failure of disks please note that even with a Double Level protection you cannot loose more than one disk simultaneously without loosing all your Vraid 5 LUNs.

As I understand with Single Level Protection you can loose one disk per RSS simultaneously without affecting any Vraid5 LUNs. Once the failed disk has been reconstructed to the free space or the space that has been reserved for the Protection Level, another disk from the same RSS of the first failed disk can fail again. This disk however cannot be reconstructed as long there is no free space available (either available space or space reserved for the Protection Level).

> --Single protection provides enough space
> on the EVA to rebuild 2 failed disk's data
> somwhere else
> --Double protection provides enough space
> on the EVA to rebuild 4 failed disk's data
> somewhere else

I think Singe protection provides only enough space for one failed disk and Double protection for two failed disks. This is because if you are using Vraid1 in your disk group, each disk in that group contains Vraid1 information (as EVA stripes all Vdisks across all disks). And if I understood correctly, the EVA always reconsturcs *both* the disk that contains the Vdisk1 data information *and* the disk that contains its mirror!

Not sure what happens if you are only using Vraid5 in your disk group - maybe you can reconstruct two failed disks even with a Single protection Level?

UselessUser
Frequent Advisor

Re: How to view RSS on EVA! + Expansion

Hi,

OK I think we are getting somewhere...

Taking this:

As I understand with Single Level Protection you can loose one disk per RSS simultaneously without affecting any Vraid5 LUNs. Once the failed disk has been reconstructed to the free space or the space that has been reserved for the Protection Level, another disk from the same RSS of the first failed disk can fail again. This disk however cannot be reconstructed as long there is no free space available (either available space or space reserved for the Protection Level).

I would tend to agree with this statement, because if a disk fails as the EVA uses all disks in a disk group to spread the recovered data onto, then my argument of a single disk from the original RSS containing both data and parity chunks from the same stripe would never happen. This would of course also equate to the two stages of recovery which I believe the EVA performs, the first where it rebuilds the data from within the RSS set, and then once this is done it performs levelling, which is the dispersing of it to the other RSS's equally.

The second statement:

I think Singe protection provides only enough space for one failed disk and Double protection for two failed disks. This is because if you are using Vraid1 in your disk group, each disk in that group contains Vraid1 information (as EVA stripes all Vdisks across all disks). And if I understood correctly, the EVA always reconsturcs *both* the disk that contains the Vdisk1 data information *and* the disk that contains its mirror!

I also agree with, but I dont think I wrote it well. Technically speaking as the EVA reserves 2 x the largest disk size in single mode, it can protect 2 disks. However with the EVA you never get to use a single disk on its own, its always in a VRAID, and as you pointed out, the failure of a disk with VRAID 1 would move both copies onto the space so its only protection for 1.

The only reason I doubt this is this:

As an example, a Protection Level of 1 provides continued operation in the event of two disk failures, assuming the restore from the first failure completes before the second disk fails.

But this quote does not explain what kind of operation this is... and I think this ties in with what you are saying in that a protection of single, provides enough space for 1 disk to fail, and be recovered within the EVA. If a second were to fail after the rebuild, then the EVA would still be able to operate as enough data would exist on the disks for the EVA controllers to work out the missing data chunks on the fly.
sam bell
Regular Advisor

Re: How to view RSS on EVA! + Expansion

> The only reason I doubt this is this:
>
> As an example, a Protection Level of 1
> provides continued operation in the event
> of two disk failures, assuming the restore
> from the first failure completes before
> the second disk fails.

Hmm, I currently cannot see where this opposes to what I have written?

BTW - What do you think about my example I posted some post above - would you see it the same way?:

Given on an EVA4400:

* 24 disks
* 3 RSS with 8 disks each
* LUNs: 3x Vraid 5, 2x Vraid1, 2x Vraid6
* Protection Level 1

If I understand correctly, then:

- I can loose one disk in each RSS without affecting any of my Vdisks. I can loose a second disk in one of those RSS if the disk that failed first has been reconstructed to the free/protection level reserved space.

- If two disks in one RSS fail simultaneously I'll definitively loose *all* my Vraid5 Disks (since the Vdisks are striped over the whole disk group) and possibly also all of my Vraid 1 LUNs (only if the failed disks were married to one pair). So I think this really means that when having real bad luck you could loose all your data only because of two simultaneous disk fails in one RSS (when only using Vraid5 disks for example).

- The Vraid 6 LUNs can withstand a simultaneous fail of two disks in each RSS, which means a total of six drives (two per RSS).

- The Vraid 1 LUNs can withstand a simultaneous fail of four disks in each RSS as long as no married pair is affected, which means a total of 12 drives (very unlikely but theoretically possible).
V├нctor Cesp├│n
Honored Contributor

Re: How to view RSS on EVA! + Expansion

Let me try to explain in in short sentences:

1 disk failed = No problem at all
2 disks failed, different RSS = No problem at all
2 disks failed, same RSS = RAID 5 fails, RAID 1 depends on whether the disks had the same information

The parameter "disk failure protection" confuses many people. It does NOT add any further protection against disk failures. 2 disks failed on the same RSS will invalidate all RAID 5 vdisks, whether you have None, Single or Double.

It's a parameter intended to reserve spare space, so that when a disk fails, the EVA can rebuild RAIDs using that space and go back to redundant state as soon as possible. In the extreme case there's no free space anywhere, the RAID will remain in degraded state until you physically replace the failed disk.

Why 2 x (biggest drive on disk group)?
When a disk fails, the EVA has to rebuild the RAIDs. RAID 5 info is regenerated from the remaining 4 data stripes and writed to a disk on the RSS. RAID 1 data has to be moved to another PAIR of disks. The disk that failed had a partner, with the other copy of the data. We need to have two copies of this data, so we need to move it to the other disks, making two copies of each stripe.

That's why the spare space is 2 x biggest disk. It's to make sure we can preserve data redundancy in the worst case: one of the biggest disks fails and it's full of RAID 1 data. You're going to need twice it's size to store all the RAID 1 data.

Disk failure protection = double means the EVA can have a disk failure, a rebuild, and another disk failure and still be fully redundant, before you replace the failed disks.
sam bell
Regular Advisor

Re: How to view RSS on EVA! + Expansion

Thanks for your explanations!

> That's why the spare space is 2 x biggest
> disk. It's to make sure we can preserve
> data redundancy in the worst case: one of
> the biggest disks fails and it's full of
> RAID 1 data. You're going to need twice
> it's size to store all the RAID 1 data.

Am I correct in assuming that once you are using at least one Vraid1 disk the reconstruction process will always restore two disks? Since even if there is only one Vraid1 disk it is striped across all disks in the disk group so every disk holds Vraid1 data and has a mirror / acts itself as a mirror for another disk.
UselessUser
Frequent Advisor

Re: How to view RSS on EVA! + Expansion

vcespon... can you help me with this:

2 disks failed, same RSS = RAID 5 fails, RAID 1 depends on whether the disks had the same information

Now we know that this is the case if 2 disks fail in a single RSS AT THE SAME TIME.

However is this the case if:

Protection Level is single
A single disk in an RSS has failed and reconstruction has completed
A second disk in the same RSS as the first has just failed...

I think this is the real question being posed here...

I assumed as the protection level allocates space across all disks then the recovery of data must be placed across all disks (thus splitting across RSS if more than 11 disks), therefore technically as an RSS is a failure domain you could in theory lose another disk in the same RSS without losing VRAID 5 data.

Any thoughts?