Operating System - OpenVMS

Re: Volume Shadowing Copy After Reboot

 
SOLVED
Go to solution
Bill Hall
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Change the startup procedure on each of your systems that mount DSA0 to:

$ mount/system dsa0:/shad=($2$dka100,$3$dka100)/policy=minicopy=optional users

Change the syshutdwn.com procedure to dismount the shadowset on the local system, don't dismount the local member of the shadowset:

$ dism DSA0:/policy=minicopy=optional

That should help. But as someone else mentioned, you really should invest in shared storage. Even a low end SCSI or SAS storage shelf will help tremendously. There is at least one StorageWorsk MSA shelf that should work for you.

Bill
Bill Hall
Zeni B. Schleter
Regular Advisor

Re: Volume Shadowing Copy After Reboot

I use Dismount/Policy=Minicopy=Optional. Look at the HELP on that subject and see if that is what you need.
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Bill:

Shutting down one node without dismounting the local disk causes the shadow set to go into a mount verification which renders it useless for quite a long time.

Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Zeni:

If you will look at the attachment in my earlier post, you will see that I am doing:

$ dism/policy=minicopy=optional

Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Let's reduce the requirements for a test to the subset case of an orderly shutdown of the essentially non-voting cluster member. I will assume that RADI64 is the node that is going to be rebooted, and that its votes are not needed to maintain quorum. (Side note: In a 2 node cluster, if your quorum disk is not on shared storage, then there is no point in having one. Just set your required node's votes to 1, the other node's votes to 0, and set expected votes to 1 on both nodes)

See chapter 7 of the 7.3-2 Shadowing manual, section "Minicopy Restrictions". In the pdf version of the manual, this starts on page 117. On page 118, a bit more than halfway down the page, see the bullet

"If a node with one or more master bitmaps shuts down or crashes, the bitmaps on the node are deleted. Therefore, the shadow sets whose master bitmaps were deleted will not be able to use a minicopy operation. Instead, a full copy will be performed."

Clue #1: Make sure the master bitmap is on the primary node, not the node that will reboot.

See chapter 7, section "Master and Local Write Bitmaps". In the pdf version of the manual, this is on page 120.

"In an OpenVMS Cluster system, a master write bitmap is created on the node that issues the DISMOUNT or MOUNT command that creates the write bitmap. When a master write bitmap is created, a local write bitmap is automatically created on all other nodes in the cluster on which the shadow set is mounted, provided the nodes have sufficient memory."

Clue #2: Where the dismount must occur is on a node that will not reboot. In your case, that is OLDMOE.

Bitmaps created at mount time are to be used to add additional members later that were there at the time that the shadowset VU (DSAxxx) was dismounted. The time you would use that would be after a cluster shutdown, and the subsequent reboot of OLDMOE. This is not the case you are interested in when RADI64 is going to reboot.

When RADI64 is going to reboot, you want to remove the member that is being MSCP served by RADI64, and you want to do this dismount from the OLDMOE node, since you want the master bitmap to be on OLDMOE.

The thread "Mounting of HBVS disks in sylogicals.com fails on a node" discusses a technique I have used, starting with the comment dated Oct 29, 2007 15:55:00 GMT

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1172934

Also, since you are running V8.3, you should investigate multiuse bitmaps, which allow the same write bitmaps to be used for both minicopy and minimerge. The documentation is sparse, and what is available is in the V8.3 new features manual. The best documentation I am aware of is the following: http://h71000.www7.hp.com/openvms/journal/v11/hbmm_amcvp_openvms_shadowing.html or in pdf http://h71000.www7.hp.com/openvms/journal/v11/HBMM_AMCVP_OpenVMS_shadowing.pdf

Jon
it depends
Steve Reece_3
Trusted Contributor

Re: Volume Shadowing Copy After Reboot

I've just flicked through the "Guidelines for OpenVMS Cluster Configurations" manual on the HP website and, specifically, the interconnects section on http://www.openvms.compaq.com/doc/82final/6318/6318pro_002.html#bottom_002

As I suspected, the SCSI interconnect hasn't been taken across to Integrity and the use of shared SCSI buses isn't supported. The cheapest solution to shared storage that's supported will be an MSA2000 I guess? Pretty rubbish and pretty expensive for a low-end cluster, but then the clustering licenses on Integrity aren't exactly cheap either I guess so YMMV.
The Brit
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Referring back to Bill Hall's last reply,

On the node which is shutting down, you should dismount the VOLUME, i.e. DSA0, not just the local disk. Dismounting the Volume, on the system which is shutting down, will automatically dismount the local and remote member disks **on THAT SYSTEM**, i.e. the "shutting down" system.

When the system reboots, it will remount the volume (using both units) and it should not require a copy since the Volume is already in a consistent state, (having been maintained that way by the system which stayed up).

Dave
Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Dave,

Please read Allan's attachment showing the configuration of the shadowset (posted Aug 31, 2009 13:10:45 GMT)

The problem is that dismounting the DSA0 VU on the system that is shutting down (RADIA64) does not cause the member device $3$DKA100:(RADIA64) to no longer be MSCP served by RADIA64 to the cluster, so the $3$DKA100: device remains in the shadowset and continues to be modified until RADIA64 completes the shutdown. At that time, $3$DKA100: goes into mount verify, shadowing software stalls all I/O to the DSA0 and after SHADOW_MBR_TMO expires, $3$DKA100: is ejected from the shadowset. If multiuse bitmaps are in effect, then the $3$DKA100: device will not have to go through a full copy when it is reintroduced into the shadowset, but the cost is that all I/O activity to the VU will be stalled for SHADOW_MBR_TMO seconds whenever RADIA64 shuts down.

Allan states that effect in his post from Aug 31, 2009 18:19:42 GMT. "Shutting down one node without dismounting the local disk causes the shadow set to go into a mount verification which renders it useless for quite a long time."

Jon
it depends
Jon Pinkley
Honored Contributor
Solution

Re: Volume Shadowing Copy After Reboot

Allan,

From your attachment:

--------------------------------------------------------------------------
~~~~~~~~~~~~~~~ On node RADI64 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ dism/cluster $3$dka100/policy=minicopy=optional
$ @SYS$SYSTEM:SHUTDOWN

(with automatic reboot)


~~~~~~~~~~~~~~~ After reboot on node RAD164 ~~~~~~~~~~~~~~~~~~~~~~~~~

$ mount/cluster dsa0:/shad=($2$dka100,$3$dka100)/policy=minicopy=optional users
%MOUNT-I-MOUNTED, USERS mounted on _DSA0:
%MOUNT-I-ISAMBR, _$2$DKA100: (OLDMOE) is a member of the shadow set
%MOUNT-I-SHDWNOMCPY, _$3$DKA100: (RADI64) added to the shadow set with a copy operation (unable to use minicopy)
--------------------------------------------------------------------------

If you want to see why minicopy is not being used, do the following interactively on RADI64 from a privileged account.

$ set prompt="RADI64$ "
RADI64$ mcr sysman set environment/cluster
SYSMAN> do show device/bitmap DSA0: ! this will show the minimerge bitmaps if they exist
SYSMAN> do show device DSA0: ! this should show 2 members
SYSMAN> exit
RADI64$ dismount $3$dka100:/policy=minicopy ! note: /cluster does nothing in this case, you should remove it.
RADI64$ mcr sysman set environment/cluster
SYSMAN> do show device/bitmap DSA0: ! others + minicopy bitmap with mastership on RADI64
SYSMAN> do show device DSA0: ! this should show only the member from OLDMOE
SYSMAN> exit
RADI64$ reply/enable=disk ! so you can see the shadowcopy
RADI64$ mount/system DSA0: /shadow=$3$dka100: users ! we don't need to specify /cluster or /policy here

This will result in the $3$dka100: member being added back into the shadowset with a minicopy

So why does it work here, but doesn't it work after RADIA64 reboots?

It works here because the minicopy mastering node is still up. However, the master bitmap is on RADIA64. And as the shadowing documentation states, when a node crashes or reboots, any write bitmaps that it is mastering are deleted. Once the bitmap is created, there is currently no way to move the bitmap master role to another node (that I am aware of).

However, you can control where the master bitmap is created.

Did you try my suggestion? For the people that don't want to follow links, here it is in a nutshell:

On the node that is shutting down use SYSMAN to dismount the member being served only by the node that is being shutdown

For your case, something like this:

Contents of exe_on_oldmoe.sysmanini
------------------------
set environment/node=oldmoe
set profile/privilege=log_io ! needed to create bitmaps
------------------------

Now on node RADI64

$ define/user sysmanini exe_on_oldmoe.sysmanini
$ mcr sysman do dismount $3$dka100:/policy=minicopy
$ @SYS$SYSTEM:SHUTDOWN

The advantage of this technique is that your DSA0 VU will not stall when the MSCP serving node stops.

The combination of this technique plus multiuse bitmaps (to handle the case of crashes) is about as close to what you are looking for as is (currently) possible with your configuration.

When all member devices can be accessed by all nodes without the node that is being shutdown, then these problems don't exist, since no members will need to be ejected from the shadowset.

If you are not going to have a quorum disk on a shared bus, you may as well put all your member devices on OLDMOE and MSCP serve them to RADIA64. Then RADIA64 can come and go as it pleases without affecting the state of the shadowset. When OLDMOE shuts down, it is going to stall the cluster anyway.

Jon
it depends
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Jon:

Your answer was dead on. The situation I described here was a test case. I had sole access to these systems where I could reboot at will and I wouldn't mess anything up.

I have already come to the same conclusion as you. I will post more on my final outcome later.

Thanks !!!