StoreVirtual Storage
1751975 Members
4556 Online
108784 Solutions
New Discussion юеВ

Re: P4500G2 - Rebuild Time

 
Sean McMorrow
Occasional Visitor

P4500G2 - Rebuild Time

Can anyone give a "rough" estimate as to the length of time we can expect it to take to restripe a P4500G2 after (re-)adding it to an existing Cluster...?? The current Provisioned Space on each node in our Cluster is 3.5Tb.

 

We read that the HP guide is an estimate of 15mins per Gb of data....?? That calculation would mean an estimate of around 36 days....??!!

 

Does anyone have "real world" experience of this, how long does it really take.....??

 

Thanks..........

 

 

P.S. This thread has been moved from Storage > General  to HP StoreVirtual Storage / LeftHand. - Hp forum Moderator

7 REPLIES 7
oikjn
Honored Contributor

Re: P4500G2 - Rebuild Time

it all depends on how much demand you have on your SAN.  You can adjust the rebuild rate to be a specific MB/s rate.    I think the default is 16MB/s, but you can adjust this faster/slower depending on how it affects your LUN access.  Generally 16MB/s is fine for most normally loaded SANs.

Sean McMorrow
Occasional Visitor

Re: P4500G2 - Rebuild Time

Thanks for replying oikjn....

 

We appreciate that we can adjust the rebuild rate of the restripe....the other factor is that this Cluster is currently hosting a large percentage of our "live" vm environment, we are concerned that the restripe will impact on SAN and vm performance during the rebuild to a detrimental level....

 

....are we looking at weeks of "slow" performance from our environment....??

 

I appreciate the answer is a guesstimate....but is it going to be longer than a few days....??

 

Thanks again.....

oikjn
Honored Contributor

Re: P4500G2 - Rebuild Time

it all depends, but it is definitely something that can be done on a live environment with almost no impact... the catch is the less impact you want, the slower the rebuild time.  If my math was right, 1GB/15min is about 1MB/sec.  The default rate is 16MB/sec, but you can adjust that from 0.25 to 40MB/sec and then the next option is "Max".  The setting is cound in CMC if you right click on the management group and then select edit, there is "local bandwidth priority" and that is the value you adjust to control the impact.  I would suggest you leave it default, start monitoring your SAN performance, start the rebuild process and then speed up or slow down the bandwidth depending on what you determine to be best for you.

 

Sean McMorrow
Occasional Visitor

Re: P4500G2 - Rebuild Time

Thanks oikjn,

 

It really does seem to be one of those situations where you wont know until you try...unfortunately this is going to impact on a huge percentage of our live environment....and once we start the process theres no going back....!!

 

The latest calculation I read was to divide the overall data by 5Mb\s (which would be "moderate" rebuild v user accessibility?).....this gives us a figure of around 8 days overall rebuild time....

 

ok,  thanks for your help, we now have some figures to work with.......

 

oikjn
Honored Contributor

Re: P4500G2 - Rebuild Time

yea, its one of those things that you can't stop once it starts and you won't know until you try, but the only thing is you can always slow it down so it will have virtually zero impact on performance and its then only a matter of how long you can wait for the new space to be available for use and how much of an impact you find acceptible for your live production.

Gediminas Vilutis
Frequent Advisor

Re: P4500G2 - Rebuild Time


@Sean McMorrow wrote:

Thanks oikjn,

 

It really does seem to be one of those situations where you wont know until you try...unfortunately this is going to impact on a huge percentage of our live environment....and once we start the process theres no going back....!!

 

The latest calculation I read was to divide the overall data by 5Mb\s (which would be "moderate" rebuild v user accessibility?).....this gives us a figure of around 8 days overall rebuild time....

 

ok,  thanks for your help, we now have some figures to work with.......

 


Observations from my practical experience with cluster edit/restripe operations (did  that probably ~10 times during last 4 years, all on live environment).

 

Rebuild time depends on number of nodes in existing cluster plus how much actual data resides on each node. E.g. if you have 3 nodes with 4TB data in each (12 TB total in cluster) and add 1 additional node to cluster, after restripe 3TB of data will be placed on each node. During restripe some data won't be moved on 'old' nodes (or will be 'restriped' internally), but all 3TB of data needs to be transfered to newly added node. 

 

Network bandwith limit ('Local bandwith priority' setting for management group) for restripe is enforced on per node basis. I.e. when estimating restripe time, you should take amount of data, that needs to be transfered to newly added node (3 TB in our example) and calculate how many hours it will take to transfer this amount of data with enforced bandwith setting. E.g. 16 MB/s = 56 GB/hour = 55 hours to restripe, with  40MB/s it would take 22 hours to restripe.

 

Regarding impact on live environment. It greatly depend on network bandwith of interfaces (1 Gbps or 10 Gbps) and total load of cluster. During off peak hours (off peak I mean when cluster load is less than 10-20% from total theoretical performance max), if nodes are on 10G network, I sometimes play with 'max' restripe speed without noticeable impact to initiators (i.e. read/write latency remains in acceptable range). On 1G network it is safe to stay at 40 Mb/s setting during off peak. During normal hours (from my experience and in my environment) 40 MB/s is safe setting for 10G network, and ~25-30 MB/s for 1G network. I drop a few additional MB/s if disks are 7.2krpm.

 

David_Tocker
Regular Advisor

Re: P4500G2 - Rebuild Time

I would look at the actual IOPS you are pushing through, then spin around in circles on the spot for a moment until you are dizzy, then crank the setting to 40.

It doesnt seem to effect the performance that much as far as I have seen.

If you are running mission critical applications and a heavy IO load, you may see issues, but most implementations of StoreVirtual I have seen have not suffered overly from restriping and it has suprised me how fast it can happen.
It seems that if you set it to 100% it does not stop node access or management - rather I think the rebuild and the work load go round robin and fight against each other, resulting in lower speed for the work load, but it doesnt break anything (SQL servers may report storage latency however)

Regards.

David Tocker