Operating System - OpenVMS
1839275 Members
3321 Online
110138 Solutions
New Discussion

Re: How to remove shadowset member to be used as backup

 
SOLVED
Go to solution
Cor Mom
Advisor

How to remove shadowset member to be used as backup

Hi,

I have an OpenVMS cluster of three nodes. Two nodes have installed an extra disk drive which together form a shadow set. On one of the two nodes the application is running, which needs to be modified. Now I would like to split the shadow set (remove one member) to be used as a backup. The other member will be updated with the new software. If everything goes ok, the removed member will be mounted again into the shadow set and everything will be fine. On the other hand, if the upgrade fails, I would like to use the removed shadow set member as the new active and valid member. The member with the updated software may be overwritten.

What are the exact steps to be taken?

How do I take care that the removed member stays a valid member and can be used again after the upgrade failed?

Do I need the shutdown a node and if so, do I use shutdown options?

Thanks in advance.

Cor Mom
OpenVMS - Nothing stops it
18 REPLIES 18
Mobeen_1
Esteemed Contributor

Re: How to remove shadowset member to be used as backup

Cor,
The following command can be used to dismount

dismount/nounload

= the member/extra disk in a shadowset that you would like to use later if your update fails

If the update has been sucessfull use the following to add the disk back into the shadowset

mount/sys/noass/policy=minicopy=optional /shadow=(Physical Device) LABEL

If the update fails, you can then make use of that disk that you dismounted for recovery

Hope this helps.

Let me know if you have any specific questions

rgds
Mobeen
Kris Clippeleyr
Honored Contributor

Re: How to remove shadowset member to be used as backup

Hi Cor,

Let me assume that the shadowset is NOT the system disk.
If so, simply dismount one of the members, e.g. shadowset DSA1 (DKA100,DKB100), DISMOUNT DKB100:
Install the necessary software on DSA1 (DKA100 only).
If ev'rything's OK, simply mount the DKB100 back in.
If someting's not OK, make sure that no files are left open on DSA1. Then DISMOUNT DSA1 (on all cluster memebers). Hereafter MOUNT/SYS/CLUSTER DSA1/SHADOW=(DKB100:), and you're back to the original situation.

For a system disk, it's another ballgame.

Regards,
Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
Ian Miller.
Honored Contributor

Re: How to remove shadowset member to be used as backup

in addition, ensure no files are open on the shadow set (SHOW DEV/FILES DSAx) before dismounting the disk. This will ensure a consistant backup.
____________________
Purely Personal Opinion
Jan van den Ende
Honored Contributor

Re: How to remove shadowset member to be used as backup

Ha die Cor,

follow the scenario given by Kris.

One final step in the back-out sequence:

After the Shadow-SET dismount, and recreating the set with ONLY the saved (backup) member, then, if you are prepared to completely give up the work on the failed upgrade, only then ADD that one as an extra member to the newly created set.
(even specifying minicopy, this will start a FULL copy, because the now-new member is no longer consistent with the already-existing set).

hth.

Proost.

Have one on me (and then another one because today is my birthday!)

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: How to remove shadowset member to be used as backup

Cor,

it may also be a good idea to temporarily change the MOUNT command in SYSTARTUP_VMS.COM to NOT include the 'backup' member - so in case something happens with one of the systems unexpectedly (crash), the startup won't clobber your 'backup' member.

Volker.
Cor Mom
Advisor

Re: How to remove shadowset member to be used as backup

Hi everybody,

Thanks for your quick responses. I think I can use the supplied info. There only one problem. In the past the system has been setup with the queue manager files on the shadow set disk. The application is running in a batch process. So besides stopping the application, I also will have to stop the queue manager before dismounting one shadowset member. After that I am going to use a separate queue manager on each node, because I don't need a clusterwide queue file.

Thanks.

Cor Mom

PS: Jan, van harte!! 10 dagen na mij!! Ik heb er gisteravond inderdaad een op gedronken.
OpenVMS - Nothing stops it
Ian Miller.
Honored Contributor

Re: How to remove shadowset member to be used as backup

Cor,
to thank the people who have helped you see
http://forums1.itrc.hp.com/service/forums/helptips.do?#33
____________________
Purely Personal Opinion
Anton van Ruitenbeek
Trusted Contributor
Solution

Re: How to remove shadowset member to be used as backup

Cor,

I think it would be very important to check your environment. Whould it not be nice to install both of the versions concurrently. In this case you need to modify the environment.
Because if you want to update in your environment the whole application would be down during the upgrade.
If you can create eq a topdirectory [_V] for version , and [_V] for version and point a eq _ROOT logical to this :[_V.] /translation=concealed in the system / cluster logical name table to it. You can during the ugrade point a JOB/GROUP/PROCESS or even if one node is been quieted for users SYSTEM logical to :[_V.] .
Now you can install the environment to _ROOT: and if anythings go wrong you just remove the logical en the directory tree. If anything is going OK, you stop the production version, change the logical ...

NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !
Jan van den Ende
Honored Contributor

Re: How to remove shadowset member to be used as backup

Cor,

as you might have expected, I fully agree with Anton on set Multi-version setup.
It even allows you to run some period for evaluation and/or re-training.

One remark on your Queue manager issue:
Absolutely NO need to do ANYTHING for the one-member dismount!
Only, _IF_ you will have to change back to the old member, _THEN_ you might decide to copy the 'current' files of the upgraded disk onto the disk that will be coming back in (depending on other jobs that have already run, or are submitted, during your upgrade procedure.

But really, have another good look at the parallel-version scenario!

Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Cor Mom
Advisor

Re: How to remove shadowset member to be used as backup

Anton,
That's a good point, because it reduces the risks. In fact we do have a structure like you described and I think I am gonna use it.

Ian,
I did not forget, did not have time yet.

Regards,
Cor
OpenVMS - Nothing stops it
John Gillings
Honored Contributor

Re: How to remove shadowset member to be used as backup

Cor,
(and everyone else)

HOLD ON! This is not a good plan. The fundamental purpose of shadowing is to protect you from bad blocks. As soon as you reduce a shadowset to less than TWO members you are actually DOUBLING your exposure to bad blocks.

Remember shadowing is NOT a tool for making backups easy, it's NOT a tool for making I/O faster. You paid the big bucks for shadowing licenses to eliminate bad blocks. So, to use it in a way that *increases* your exposure makes no sense.

A simple rule to follow... don't ever reduce a shadowset to less than TWO members.

In this case, take a spare disk, add it to the shadowset and let it copy, then remove the newest disk as your backup.

As long as you're on V7.3 or higher, and there are no open files on the disk, a member dismounted from a shadowset will be in a consistent state.

As of V7.3-2, if you really can't find a 3rd member, then you can minimise your exposure to bad blocks by forcing a full merge on the shadowset:

$ SET SHADOW/DEMAND_MERGE DSA1

Note that this doesn't *eliminate* the risk, but it makes sure that all existing bad blocks are detected and corrected.
A crucible of informative mistakes
Robert_Boyd
Respected Contributor

Re: How to remove shadowset member to be used as backup

John,

I agree with you about one of the purposes of using shadow sets being bad blocks. However the primary purpose that they have served in my experience is redundancy. The probability of having more than a single drive failure in a shadowset is much much lower than that for a single drive. This makes it very unlikely that the users would ever be subjected to a loss of access to the drive due to single drive failures. The primary risk for any period of time where the shadowset is reduced to 1 member is that any failure of that drive will result in loss of access and possible loss of data that has not been backed up.

I agree completely with the preferability of adding in a 3rd member to copy the disk rather than reducing the shadowset to 1 member if it is going to be a prolonged situation. As always, these risk management decisions involve tradeoffs: time, money, exposure to loss, efficiency, etc....

Robert
Master you were right about 1 thing -- the negotiations were SHORT!
Jan van den Ende
Honored Contributor

Re: How to remove shadowset member to be used as backup

John,

thanks for pointing out the thing I completely overlooked, and which others may therefore overlook as well, although from the other side.

In _MY_ perspective, I did not even think in reducing to single member. We reduce 3-member sets to 2-member.
At the current prices of diskdrives, any single issue is sooooo much more expensive, in repair time alone already, and if you factor in the production hours lost...

We are on record of requesting the 3 member limit to be expanded. We would much prefer to have 2 members at each site, plus the availability to ADD the backup member, instead of having to take it out!

Proost.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Andy Bustamante
Honored Contributor

Re: How to remove shadowset member to be used as backup


One more important point. You need to know that your application has a consistent database on the the shadowsets. Either a way to hold or force outstanding I/O or shut down. A "flying dismount" may lead to problems with data consistency. The shadowing manual has many cautions about ensuring data consistancy when dismounting a shadow set member, mostly saying "you should know your application and take appropriate steps."

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
John Gillings
Honored Contributor

Re: How to remove shadowset member to be used as backup

Robert,

>The primary risk for any period of time >where the shadowset is reduced to 1 member >is that any failure of that drive will >result in loss of access and possible loss >of data that has not been backed up.

No so!

At any time there is a small, but finite probability that any individual block on a disk will go bad. We won't know it's bad until we read it. With very large numbers of blocks, even very small probabilities start to turn into certainties. In any case the probability of ANY block on the disk going bad within a specific time period is many orders of magnitude higher than that of the whole disk failing.

So we need to accept bad blocks WILL happen, guaranteed! However, provided the corresponding block on all other shadow members doesn't also go bad before we read the bad block, shadowing will be able to repair the damage from known, good data. (note that bad blocks behave like Schrodinger's cat in the classic physics thought experiment).

A shadow set with two members may have a number of bad blocks on one member, and a (hopefully different) set on the other member. As long as those sets don't intersect, then we don't care - they effectively don't exist. But, if we break the set, we've exposed all the bad blocks on both members.

Before minicopy, breaking the set was the equivalent of doubling the probability of exposure to bad blocks because a latent bad block on the surviving member would be realised when it was read, and the ones on the removed member would be propagated to the backup and realised when it was restored.

Mini copy helps a lot, because we might get the shadow set back together before reading one of the bad blocks, so our exposure is reduced to the set of blocks read while the shadow set is reduced.

Forcing a merge before breaking the set will make sure every block on every member is read, thereby finding any bad blocks and fixing them from a good member.

Even with mini copy, reducing a shadow set to a single member represents an easily avoided risk, especially considering the low cost of storage.
A crucible of informative mistakes
Lawrence Czlapinski
Trusted Contributor

Re: How to remove shadowset member to be used as backup

John, thanks for the bad block info. In practice, two member shadow sets are broken a lot to keep one as a fallback for various reasons (upgrades, standalone backup and restore to other member, etc.). Thanks for the tip about merging before breaking the shadow sets. I know we haven't been doing that. So many of us do care how many bad blocks are being generated per some unit of time and on which disks. If bad blocks are detected, they should be reported in the error log perhaps on a selectable time frame (daily,hourly). It would be useful to have a utility that could be started by a batch file off-peak for reading all the blocks of a physical disk and reporting the number of bad blocks detected. It could reduce the time till the bad blocks are detected significantly.
Lawrence
David B Sneddon
Honored Contributor

Re: How to remove shadowset member to be used as backup

With regard to forcing merges etc. Would ANALYZE/DISK/SHADOW
do a similar task given that it reads all blocks on all
members and compares them? Obviously it won't do any
writing, which a merge will do.

Dave
Ian Miller.
Honored Contributor

Re: How to remove shadowset member to be used as backup

Lawrence, bad blocks are reported to the error log when they are detected. The problem is that they are not detected until VMS tries to read or write the block. The commands mentioned could be run periodically to ensure all blocks on a disk are accessed and therefore bad blocks are detected sooner. The error count on the disk could be checked before and after if if it increases then check the error log.
____________________
Purely Personal Opinion