StoreVirtual Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Procedure for safely powering down and moving equipment HP4500's

Adski
Occasional Visitor

Procedure for safely powering down and moving equipment HP4500's

Hi,

 

I'm running a lefthand environment with three ESX hosts and four HP4500.  Currently the whole virtual environment sits in our primary office and I need to move one of the ESX hosts and two of the 4500's to our DR site.   I've not needed to power off any of the equipment until now and want to make sure I do it properly so that current working VM's are unaffected.  

 

I'm looking for confirmation of the procedure I believe will be safe or further information that I may be missing if it can be provided?

 

This is how I plan to move the equipment:

 

  1. Migrate all VM’s to another ESX HOST (1 or 2) that will stay live during the move
  2. Switch the ESX Host 3 to maintenance mode in Vmware vSphere client
  3. Power off/Shutdown ESX Server in vSphere
  4. Power off Storage 3 & 4 (P4500’s) through HP CMC
  5. Move equipment and setup cabling in new location
  6. Power on Storage 3 & 4
  7. Power on ESX Host 3
  8. Migrate VM’s back to ESX Host 3

Further info - We have a Failover manager VM running on ESX Host 1 which should provide quorum (3) for the environment whilst the two storage units are powered off and moved.

5 REPLIES
oikjn
Honored Contributor

Re: Procedure for safely powering down and moving equipment HP4500's

are you going to run a multi-site cluster with the VSA systems or is it going to just be asyncronous?  Have you made sure you met your bandwidth and latency requirements for a multi-site cluster because most deployments don't have that correct.

 

As for the VSAs, you should change the logical sites for the cluster NOW.  That will let the clusters restripe across so you get your redundancy correct (which would be mirror across site stripe inside the site for NR10).  Is your move is close enough where you or someone from your company is driving the severs to their new location and installing them the same day they were down?  If the storage nodes are going to be off for days, I would consider changing the volume protection level to NR10+1 for any mission critical data just in case one of your remaining nodes decides to take a dump during the migration.

 

Before turning off your nodes that are getting moved to the new site, I would shut down their managers so they aren't considered in the quorum while in transit (you can always turn them back on when you power them on at the new location).  After that, you are safe to simply shut them off fhrough CMC and hit the road (assuming you have them secured for transit).

 

Assuming CMC shows your nodes listed in the order S1, S2, S3, S4, I would suggest that you take S2, S4 to the new site, because if you want to take SS,S4, that will require a restripe of your LUNs when you switch to Multi-site while S2,S4 are already the correct combination for a stripe set.

 

Also, you say ESX1 has your FOM on it.  This WILL NOT WORK if you actually want a seamless DR failover to the second site.  You MUST figure out an independent thrid site for your FOM or the 2nd site can't ever get quorum if the primary goes down.  Bandwidth requirements are minimal, so simply renting some space in a colo would work if your company does't have a 3rd site.

 

 

Adski
Occasional Visitor

Re: Procedure for safely powering down and moving equipment HP4500's

We are going to run the Multi Site Cluster with our vsa. Our Bandwidth meets the recommended requirements from VMware so I'm not worried about that at the moment, plus, I will be carrying out this work at a time when I can test throughput and communication.

Not quite sure what you mean by changing the logical sites - please explain further?

Between power down at main site and power up at DR, I would expect about 1 hour.

CMC cluster shows two Storage System Sites - Live and DR. Live has S1 & S2, DR has S3 & S4. The storage systems section under the cluster shows S1, S3, S2, S4 so I believe it is already setup correctly for S3 & S4 to go.

I failed to mention that we have two FOM's, live and DR. FOMDR is running on ESX3 which will be used in a situation where the live site goes down. However - it is not showing as part of the managers (in the management group) at the moment and I presume I would have to add it manually as a manager in a DR situation. Currently, all four Storage units and the FOMLive are managers. I will need to check with my colleague about the configuration of that unless you can shed further light?

oikjn
Honored Contributor

Re: Procedure for safely powering down and moving equipment HP4500's

since you setup the nodes with two sites in CMC, you are actually running a multi-site cluster already (just at a single physical site).  You are correct then that you can just move the servers to the new site now and you are good to go.

 

I don't know about that FOM plan you have.  If that is a totally seporate FOM not included into the management group, you will not be able to add it to the management group during a DR event... think about it... if you don't have quorum you can't do anything until quorum is regained and that definitely includes adding a new node to a cluster.

 

I would still stop the managers on the nodes you are moving just because it takes two minutes and would make sure that if a 3rd node goes offline that you don't lose quorm in the hour you move.

 

You can always figure out how you will handle the FOM later, but you should be ok to move assuming the bandwidth/LATENCY is what you need.  Only other thing to think about is where you want your redundancy, with 4-nodes in a multi-site cluster, you get site redundancy, but if your backup site goes down AND another node goes down, you have just lost access to your LUN... if you want Site and local node redundancy, you need NR10+1 for the LUN...  you might want to run that for your most critical LUNs.

Adski
Occasional Visitor

Re: Procedure for safely powering down and moving equipment HP4500's

Thanks for your time so far - I really appreciate it.

 

Re the FOM plan - yes, it seems that we are not certain exactly what our recovery method would be if Site A went down (S1 & S2 plus FOMLive running on ESX1).  Obviously quorum would be lost due to only 2 managers remaining on the CMC at Site B.  I've been looking at the HP 4000 MultiSite HA/DR Guide (http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c03041871/c03041871.pdf) and on page 18 it talks of the 'recover quorum operation' which I believe may be the step we would have to take in this SiteA failure situation.   We would then add the FOMDR VM running on the ESX3 as the third manager into CMC.

 

I'll check about our current raid setup and whether it already is NR10+1 or not. 

oikjn
Honored Contributor

Re: Procedure for safely powering down and moving equipment HP4500's

you would have to test this, but its probably better to just backup the FOM and replicate that backup to the DR site and then if the primary site dies you can recover the FOM to the secondary site and get quorum back without having to worry about a quorum recovery through CMC as that generally involves help from support for most people.  This will definitely be a few hour downtime, so if that isn't something your company wants the simple solution is to just host the FOM at a third site which could be in "the cloud" TM if you can't find a 3rd site.