Re: array based Storage migration with minimal downtime

Occasional Visitor

array based Storage migration with minimal downtime

We are cloning luns via EMC SRDF to a new array and want to migrate with minimal downtime.  I'm wondering it the following basic approach will allow us to do this with downtime simply being the 30 seconds it takes to fail over the cluster.  The approach is:

  • Take down the cluster while leaving the package resources online.  That would allow us to migrate the lock lun.
  • Bring up the cluster with the new lock lun and “acquire” the package resources that are currently running but not under cluster control.
  • Take down the cluster on passive node and perform the lun swap of the VGs in the package to the new array.
  • Split the R1:R2 while DB frozen.  Currently the package still running on the active node on old array.
  • Bring up the cluster on the passive node and do a failover.  The packages on one node point at luns on one array and on the other node point to the other array.
  • Take down cluster on the “now passive node” and perform lun swap..

This will not work if serviceguard does some verification and somehow doesn't like the fact that the luns on the two cluster nodes are on different storage arrays.



Re: array based Storage migration with minimal downtime

You don't give the details of the configuration so we don't know if this is HP-UX or Linux but generally speaking, changing the cluster lock can be an online operation with only a very small window for a problem during the actual cmapplyconf operation. If you wanted to take the lock disk out of the picture entirely you could switch to a quorum server for the duration of the migration and then back to lock disk when everything is stable. The QS software is free.

Serviceguard itself does not do any particular checking of lun ID's but it does use underlying volume manager commands (VxVM, LVM, etc.) to activate and manage the storage, so depending on how your volume manager is configured you might have to tweak your procedure. Typically when you do a R1/R2 split, you end up with a second set of luns that have the same physical volume ID as the primary set and this can sometimes cause problems when you do the VG activation. You would probably have to import the VG on the passive node using the new device files (assuming they changed). I'm not sure why you would need to halt the node from the cluster on the side where you are changing the luns unless you need to reboot the node for some reason but I am not a EMC storage expert by any means.

There are simply too many unanswered questions to say exactly what you can expect here. Generally speaking the idea is probably sound but the details are going to be very, very important. You should definitely test this prior to trying it on a production package. It should be easy enough to setup a package with a simple VG using only one or two test PV's and a couple of mount points, and you could run through the package migration part. Make sure you are up to date on Serviceguard patches and operating system patches, especially related to IO and volume manager before you start down this path. You should review the Managing Serviceguard manual for the specific version you are working on here to see what your online versus offline config change options are.