- Community Home
- >
- Storage
- >
- Midrange and Enterprise Storage
- >
- HPE EVA Storage
- >
- Re: EVA Releveling
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 01:07 PM
08-24-2005 01:07 PM
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 06:44 PM
08-24-2005 06:44 PM
Solution- the recovery
it does the minimum work to restore redundancy
- the leveling
it makes sure that user data is equally distributed over the disks
If another disk fails during recovery, it depends on the 'location' of the disk if data loss is the result or not (same as a traditional RAID system). The EVA divides its disk groups into different failure domains called RSS (Redundant Storage Set). It can tolerate the loss of multiple disks as long as each disk belonged to a different RSS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 08:44 PM
08-24-2005 08:44 PM
Re: EVA Releveling
I totally agree, but the EVA is at risk during the reconstruct of the RSS's. If the RSS reconstruct is coompllete then the EVA can handle another failure and resart a new releveling process. I understand that it reconstructs RSS, thern relevels Vraid5, then relevels VRaid1. The only time that it is critical is during the reconstruct of the RSS's. Outside of suspending I/O's (not possible in production), is there any other way to shorten the releveing time, still using 168 disk group, The EVA's are all running with 98%+ occupancy.
Thanks Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 09:38 PM
08-24-2005 09:38 PM
Re: EVA Releveling
this is nothing different than any other storage array that is not running some kind of ADG/RAID-6/RAID-5DP or whatever the vendor calls his implementation of 'double protection'. If too many disks fail, the data is gone :-(
I am not aware that there is any way to 'tune' the leveling process, make it faster, slower, suspend it...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 09:59 PM
08-24-2005 09:59 PM
Re: EVA Releveling
can you clarify your first statement: Rebuild takes multiple days?
What do you mean by that? Reconstruction or migration with or without leveling?
Can you point me to the best practise document that you mentioned here?
If I understand you correctly then the most problematical situation could be (Vraid0 is problematical by design ;o):
Vraid5, one failed disk. During reconstruction phase another disk in the same RSS has failed. But it does not neccessary mean that you lose your data. It depends about failure type.
I would say that it is not EVA specific issue. It is RAID5 limitation by design if we are talking about RAID5 by definition.
BTW, I would not say that more free space will not speed up rebuilding process.
Regards,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 10:17 PM
08-24-2005 10:17 PM
Re: EVA Releveling
I have pinged HP on this and was told there is no way to "prioritize" the releveling process. What makes it worse is the more host IO the longer leveling takes....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2005 11:33 PM
08-24-2005 11:33 PM
Re: EVA Releveling
that's true, but what is a problem if the leveling is running longer then expected?
I am not aware of any impact on data availability or data integrity. Leveling is optimization process and the most of the storages have some kind of it.
Maybe I missed something?
Regards,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 12:05 AM
08-25-2005 12:05 AM
Re: EVA Releveling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 12:21 AM
08-25-2005 12:21 AM
Re: EVA Releveling
"The Best Practise" says that you should wait for the reconstruction to complete, not leveling.
http://h200005.www2.hp.com/bc/docs/support/SupportManual/lpg29448/lpg29448.pdf, page 8
Regards,
Mario.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 12:23 AM
08-25-2005 12:23 AM
Re: EVA Releveling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 01:19 AM
08-25-2005 01:19 AM
Re: EVA Releveling
Also, the disk state should be ungroupped and not migrating.
But what i want to know is where in the best practices do you see that only 5GB is needed for maintenance. Our recomendation was to leave 10% of total space unused.
Can you post the link?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 01:29 AM
08-25-2005 01:29 AM
Re: EVA Releveling
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 01:42 AM
08-25-2005 01:42 AM
Re: EVA Releveling
Mario, The EVA is not like tradictional Raid 5. The sparing level 0,1,2 determines the number of drives to use for releveling.
The more free space you have means less data to move so it will be faster. But at 98%+ utilization it takes a long time (Mutliple days). As David pointed out, som,etimes a failure occurs during this process, then that drive has to be migrated out, then recovery of the RSS happens again, then the rebuild processs starts.
Host I/O's must continue and this is what additionally slows the rebuild.
The data is in jepordy during the recovery of the RSS and no one seems to know how long this takes.
Running at 99%+ occupancy, I have seen the sparing level drop From Double requested to Single available. Going one step further I have seen Double requested and NONE Available. I believe that this is an error and attribute this to 3.014 VCS. We have array's running at 3.014, 3.020 and 3.025.
With None (sparring level) available I fell that this is a time were another failure will lead to data construction and can not get a definite answer to how long the Recovery phase takes.
David, As you pointed out I have had failures during the releveling and it has started all over, so far no data lost.
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 01:48 AM
08-25-2005 01:48 AM
Re: EVA Releveling
Consider the new 300 gb drives comming out. An EVA with 168 drives at 10% means that 5TB raw is used as free space. HP stated in the best practice that only 5 GB is required for maintenance. That tells me that I could run at 99.99 peresent occupnacy as long as I have 5GB available for VCS code loads.
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 01:56 AM
08-25-2005 01:56 AM
Re: EVA Releveling
I will raise the occupancy level to 95%. I will keep enough free space to ensure leveling and data reconstruction, as free space is used as sparing space.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 02:32 AM
08-25-2005 02:32 AM
Re: EVA Releveling
I've never heard that claim before and I've been working for 3.5+ years with EVA...
It sounds like you're talking about the 'protections space' which is set aside in place of any dedicated spare disks. It has nothing to do with leveling.
0 = no reservation
1 = 2x size of largest disk drive in group
2 = 4x size of largest disk drive in group
The 'leveling' will always go over all disk drives in the group.
> The data is in jepordy during the recovery of the RSS and
> no one seems to know how long this takes.
Of course not. It depends on the size of the RSS (number of disk drives), the speed of the disks, the amount of data to recover, the VRAID level of the data, concurrency with host I/O.
> Running at 99%+ occupancy, I have seen the sparing level
> drop From Double requested to Single available.
After a disk failure I'd say. Add back a disk, do not create additional Vdisks and it should go back to double.
> Going one step further I have seen Double requested
> and NONE Available.
Lost two unmarried disks?
> With None (sparring level) available I fell that this is
> a time were another failure will lead to data construction
ANY disk failure will trigger a recovery attempt. Whether it succeeds depends on whether there is enough free space. It can come from the protection space OR the free space for Vdisks, preference is given to the second.
> Consider the new 300 gb drives comming out.
I've already installed an EVA with those disks, so they are there.
-----
Bill, you post from two different accounts:
http://forums1.itrc.hp.com/service/forums/publicProfile.do?userId=CA1315413&forumId=1
http://forums1.itrc.hp.com/service/forums/publicProfile.do?userId=CA1317758&forumId=1
-----
Ivan,
> I will raise the occupancy level to 95%.
> I will keep enough free space to ensure leveling and
> data reconstruction, as free space is used as sparing space.
Define a protection level > 0 and you have set aside space for reconstruction - that's its purpose. The occupancy level is just a warning highwater mark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 02:44 AM
08-25-2005 02:44 AM
Re: EVA Releveling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 02:58 AM
08-25-2005 02:58 AM
Re: EVA Releveling
Uwe, you should be faster next time. I have almost finished my reply and now it is worthless :)).
BTW, I think that in a case that EVA does not have enough unallocated space, it will temporary use space dedicated for protection as a space for leveling process. I am not sure that it can be visible as decreasing protection level. I have never tried.
Regards,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 03:24 AM
08-25-2005 03:24 AM
Re: EVA Releveling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 05:03 AM
08-25-2005 05:03 AM
Re: EVA Releveling
"Disk failure protection" is the correct term and I refered to it as sparing level.
The nuber of drives is 168 in the default disk group and they are 146GB on most arrary's. The 99% ocupancy is true for about 85% of the EVA's currently installed.
I have two user ID's because when I treid to respond this morning it would not let me in and said I was not registered, I did that stupid thing and automatically log in on my home machine and they could not find me, hence another ID to access this forum and hope that I could get and answer to my posts.
You are also correct that the Disk failure protection level determines the amount of disk space X 2, X largest drive size x protection level selected.
I still haven't found out how much time elapses from the start of reconstruction/recovery of the RSS to the time it starts the releveling. That is the time I am concerned with as that is when I believe the EVA is susceptable to coruption. I do not know weither it is 20 micro seconds or 20 minutes. another factor I know effects the releveling is the size of the vdisks. Smaller vdisk level quicker but I did not want to go there. The disk group is also only Vraid5 so it does not have to do leveling of Vraid1.
Now to make it interesting say that over the period of 2 years the group has had mulipte disk failures. Say 20 disks have been replaced, (please don't quote 3% industry standard failure rate) then it is very possible that the RSS's have been compromised. Is that not correct?
Is the RSS Disk state, None, Parity, Mirrored an indication of the health state of the RSS's.
Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 05:11 AM
08-25-2005 05:11 AM
Re: EVA Releveling
you should be able to check the controller logs to find out the duration of the rebuild _after_ it finished, but I think you agree that it is impossible to predict it.
What do you mean by "RSS's have been compromised"?
The RSS Disk state is a _very_ important indicator of health state if you want to protect yourself against what is called a 'shelf/ enclosure meltdown'.
none: the EVA cannot tolerate it
mirror: the EVA can tolerate it if you only have VRAID-1 Vdisks
parity: the EVA can tolerate it
-----
Mario,
I hope that was better now...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 06:05 AM
08-25-2005 06:05 AM
Re: EVA Releveling
The log was in excess of 300+ pages, I have been looking and trying to find start/stop.
"What do you mean by "RSS's have been compromised"?"
I believe that after the RSS have rebuild many times that they may not be correctly laid out. The state of some is "None"
which you have answered below. I don't have protection in the event of a shelve failure. Not likey but I had seen a shelve failue with a bad drive but, that was on power up and first time iinitalize.
" The RSS Disk state is a _very_ important indicator of health state if you want to protect yourself against what is called a 'shelf/ enclosure meltdown'."
" none: the EVA cannot tolerate it
mirror: the EVA can tolerate it if you only have VRAID-1 Vdisks
parity: the EVA can tolerate it"
I only have Vraid5 so it looks like the EVA's with a RSS state of none are the ones that I should pay attention to first.
Thanks Bill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 06:17 AM
08-25-2005 06:17 AM
Re: EVA Releveling
That definition of 'compromised' makes sense to me, thanks.
I'm always looking for some test data...
I'll attach a series of SSSU commands. If you can run them against 2 or 3 EVAs, put the output in a .ZIP file and attach it, I'll see if I can process them and send you back some reports.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 07:40 AM
08-25-2005 07:40 AM
Re: EVA Releveling
Uwe, this time you have been very fast :).
Bill, if you have and I assume that you have HP contract for each EVA then you should also have some kind of reporting tool. In the past it was EVE/ISEE and now it is WEBES/ISEE. Those tools are almost must have. If for any reason they are not installed ask HP to install them. EVE has a very nice features and one of them is web page where you can browse through the EVA log. Also, EVE adds some additional information about events and it is really easy to find out what is going on. If you spend some time with it, you will be able very quickly to find how much time is consumed by any recovery process on your EVA.
Regarding RSS states, None is not the best thing you can have, but it is also supported by HP. There is no supported way how you can manage this thing except ungrouping/moving/grouping disks one by one and hope the best.
If your main concern is data corruption/loss then the most critical thing is the reconstruction time. But, I think that RSS concept offers you a good protection level against data loss then traditional RAID5.
Even if you lose one shelf (which is highly unlikely) it does not always means that you have lost your data.
Also, good backup strategy is very important.
From my experience, if your loops are stable I would not worry so much. Every storage in this world has its own problems, limitations and traps. EVA is just a storage and I have to say not a bad one.
There are two things which come to my minds. One is the Monty Python song: Always Look On The Bright Side Of Life. The other one is: whatever action you will take, there is another M.P. thing called the Spanish Inquisition. Do not worry too much.
Enjoy,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2005 08:55 AM
08-25-2005 08:55 AM
Re: EVA Releveling
I have been following this thread and it seems very interesting. However, I still do not understand how the RSS groups can become compromised. Do you have any more information in regard to that?
Tim