Re: Volume Shadowing Copy After Reboot

Allan Large · ‎09-02-2009

I have found somewhat of a solution ...

As Jon stated above, the issue is related to the location of the bitmap. I had hoped that the cluster/volume shadowing software was smart enough so that when a cluster member rebooted, it could find the bitmap for the shadow set on another node. Maybe this is something that be put on a wish list ;-)

Anyway, the solution we came up with was to add coding to the SYSHUTDWN.COM procedure to:

1) Make sure another node was currently up and running the SMISEVER.
2) Look for all disks that are currently shadow set members on the local node.
3) Make sure those disks are not currently in a COPY or MERGE state.

The procedure will then use SYSMAN SET/ENVIR=node to execute the DISMOUNT/POLICY=MINICOPY=OPTION disk on each shadow set member on the local node.

Assuming the remote node remains up, when the current system reboot is in progress, it will have the remote node re-MOUNT the disks. This allows the local shadow set member to be mounted using the bitmap created on the remote node from the dismount. This allows a mini-copy to be performed rather than a full copy.

If the node crashes, we are just out of luck. A full copy will always happen.

Jon Pinkley · ‎09-02-2009

Allan,

The important thing is where the master bitmap is created, as the master has to exist on a node that will be around for the duration of the time the member is out of the shadowset.

It is not a requirement that the mount is done from the node that has the master bitmap. In fact, a node that isn't the bitmap master can do the copy.

You can verify this by dismounting the $3$DKA100: member on OLDMOE with /policy=minicopy, making some changes to the DSA0: VU and then on RADIA64, issuing the mount command to add $3$DKA100: back into DSA0:. The actual copy doesn't necessarily have to occur on the node where the master bitmap was, or the node where the mount command executed. See this thread: http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1176993

If a valid master bitmap exists in the cluster, then when the member is added back into the shadowset, it will use a minicopy; you don't even need to ask mount to use the bitmap. So your wish "I had hoped that the cluster/volume shadowing software was smart enough so that when a cluster member rebooted, it could find the bitmap for the shadow set on another node." is already true.

What isn't as obvious is that you have to do the dismount on a node that will be up for the duration of time that the member will be absent from the shadow set. Also, minicopy bitmaps created by a dismount are single mastership, so you have to choose where you what the master to be. If you know your are going to reboot, and you are in a two node cluster, then the bitmap should be created on the "other" node, but if you were removing a member for a static snapshot for backup, then it isn't as clear where it should be. In that case you would want to dismount the member that was being locally served by the node with the backup tape, and from a performance point of view, it may be better to have the master bitmap on the node that has the other member of the shadowset (that is still in the DSA VU), so in that case you may also want to have the master on the other node that isn't performing the backup. Better would be to have an additional disk drive available on the node with the backup tape drive and to dismount that drive for the backup. But if you are going to have three member shadowsets, you should have at least two on the "master" node that maintains quorum.

Your last paragraph "If the node crashes, we are just out of luck. A full copy will always happen." does not have to be true if you are at 8.3 or above.

Try this:

$ set shadow dsa0:/policy=hbmm ( -
_$ (master_list= (oldmoe), count=1, multiuse=1), -
_$ (master_list= (radia64), count=1, multiuse=1))
$ show device/bitmap/full dsa0:

Then force a crash on RADIA64. This will cause the DSA0 to stall for SHADOW_MBT_TMO then the $3$DKA100: being MSCP served by RADIA64 will be ejected from the DSA0 shadowset. AMCVP (Automatic MiniCopy on Volume Processing) will take effect, and the minimerge bitmap on OLDMOE will be converted to a multiuse bitmap (it acts like a minicopy bitmap, but it knows about all 127 block segments on DSA that have been written to since the last time it was zeroed, which means some time before RADIA64 crashed). It will probably have to copy more blocks than if the member would have been dismounted, but it will still be preferable to a full copy).

This is all discussed in http://h71000.www7.hp.com/openvms/journal/v11/HBMM_AMCVP_OpenVMS_shadowing.pdf which I referenced at the end of my comment posted Aug 31, 2009 19:59:10 GMT. The last page of the article has a state diagram showing the events that trigger the conversion of the HBMM bitmap to a multiuse bitmap and back to HBMM once the VU has completed the minicopy.

I haven't played with the multiuse feature, and we have SAN storage (directly visible to our nodes), so it isn't as beneficial to us as it would be for you. I would still force the member to be dismounted when you are in a normal shutdown, as you then don't have to experience the stall for SHADOW_MBT_TMO. But you should be able to simultaneously use both multiuse bitmaps and minicopy bitmaps created by explicitly dismounting the member that will no longer be MSCP served when the node reboots.

Jon

it depends

Allan Large · ‎09-03-2009

Jon:

Thanks for the reply. I have indeed discovered what you said.

As for :

"If a valid master bitmap exists in the cluster, then when the member is added back into the shadowset, it will use a minicopy; you don't even need to ask mount to use the bitmap. So your wish "I had hoped that the cluster/volume shadowing software was smart enough so that when a cluster member rebooted, it could find the bitmap for the shadow set on another node." is already true."

What I meant was .... that I had hoped the node that was rebooting would be able to "see" the bitmap on the other node. This would allow it to mount its own disk(s) and the rebuild would happen on that node. In this scenario, the node must send a command to another node to do the mount.

It would just make life a little more simple.

Thanks again for taking the time to respond.

Bill Hall · ‎09-03-2009

Allan,

The real solution to this problem is adding inexpensive shared storage, assuming the two servers are within a few meters of other.the addition of about $6500 of hardware (rough list price). At the age of your rx2620s, used or HP refurbished should cut the cost in half. You'll then have a cluster that is more than a lab exercise and actually can provide a level of higher availability with probably one "single point" of failure, that being the four port I/O module of the MSA30.

One 359645-B21, MSA30MI Multi Initiator dual bus shelf, two A7131A, PCI-X Dual Channel U320 SCSI HBA, for the shared scsi buses. You would need a minimum of three universal drives to move the quorum disk and the shadowed USERS volume to the MSA30. Two more disks shadowed and you can move to a shared system/boot disk and closer to a real homogeneous cluster.

Bill

Bill Hall

Jon Pinkley · ‎09-04-2009

Allen,

I will try to make this as explicit as possible, since I was not clear enough:

The mount can occur on the newly booted node and it will find the bitmap that was created on OLDMOE. You don't need to get the SMISERVER involved to force the mount to happen on OLDMOE where the master bitmap was created. The mounting part really works like you want it to.

This should work and result in a minicopy:

~~~~~~~~~~~~~~~ On node RADI64 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ define/user sysmanini exe_on_oldmoe.sysmanini
$ mcr sysman do dismount $3$dka100:/policy=minicopy
$ @SYS$SYSTEM:SHUTDOWN

(with automatic reboot)

~~~~~~~~~~~~~~~ After reboot on node RAD164 ~~~~~~~~~~~~~~~~~~~~~~~~~

$ mount/system dsa0:/shad=($3$dka100) users

Once this mount happens (and the mount can happen anywhere in the cluster, including the just booted node), then a minicopy will start somewhere in the cluster (controlled SHADOW* system parameters, specifically SHADOW_MAX_COPY, and possibly SHADOW_SITE_ID). The $3$DKA100 will become a member of every nodes DSA0: VU; the use of /cluster is not relevant when adding a member to a DSA VU that is currently mounted somewhere in the cluster. If you have more that two nodes (for example if you had a third satellite node that was being served the shadowset members, then the use of /cluster would be significant, if the mount was instantiating the DSA0 VU, then the satellite would need to be notified to mount the new DSA0: Virtual Unit.

The minicopy write bitmap that gets created by dismount $3$dka100:/policy=minicopy is the only part where you have to force it to happen on a node other than the one that is shutting down.

Try it.

Jon

P.S. I have to agree with Bill Hall about getting some shared storage. It makes the systems much more robust and allows for rebooting either system without lengthy cluster stalls. See http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1018008 for some ideas.

it depends

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Volume Shadowing Copy After Reboot