Online Expert Day - HPE Data Storage - Live Now
April 24/25 - Online Expert Day - HPE Data Storage - Live Now
Read more
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

general question on volume shadowing

SOLVED
Go to solution
Kirk Reindl
Frequent Advisor

general question on volume shadowing

HW: ES40
OS: OpenVMS 7-3.1

Hi<
Short question is:

How does my OpenVMS system know how to correctly configure and mount the volume shadowed system disk on boot up?

I'm clear on the command used to create the volume set:

MOUNT DSA23:(1) /SHADOW(2) =$4$DUA9:(3) volume-label(4) logical-name(5)

but.....
I'd just like to know the process the OS uses at boot up.

Here is my current config:

OSIJX1> show dev dsa100

OSIJX1> show dev dsa100

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA100: Mounted 0 ALPHASYS 27501915 751 2
$1$DGA351: (OSIJX1) ShadowSetMember 0 (member of DSA100:)
OSIJX1>

I'm also aware of the following two configs in my common_modparms.dat file:

shadow_sys_unit = 100 ! system volshad disk=dsa100
shadow_sys_disk = 1 ! system disk volshad enabled

Both these settings make sense to me.
Where/how does the OS do the mount/shadow creation??

Thanks for the responses.

Kirk



16 REPLIES
Neil Ashworth_1
Occasional Advisor
Solution

Re: general question on volume shadowing

Hi Kirk,

After a shadow set is formed, all the current members of the set is stored in the volume header. So in the case of the system disk, VMS at boot time will do the following:

1. boot VMS from the disk specified in BOOT_DEFDEV

2. load the sysgen parameters and checks the SHADOW_SYS_* parameters.

3. if shadow_sys_disk is set to "1" VMS reads the volume header to determine all the shadow members mounted at the time VMS was shutdown (that is important as I will describe later)

4. VMS then mounts the system disk as a shadow disk, giving it the sysgen unit number.

5. boot resumes

To me, it is very important that you NEVER put a MOUNT statement for the system disk anywhere in your boot procedures. VMS will always mount the system disk with all the members that were there at time of shutdown. So if you purposely dismount a system disk member prior to shutdown, that member will still be dismounted when VMS is rebooted. If you had a system disk mount statement in your boot procedures, then that would negate the attempt to keep a member offline.

Hope that helps.

Neil
Martin P.J. Zinser
Honored Contributor

Re: general question on volume shadowing

Hello,

the behavious Neil described e.g. comes in handy if you split a shadowset to do an upgrade (i.e. keeping one half with the old version to do a rollback in case of problems). If the system would just mount the shadow set members regardless you backup copy would be overwritten.

Greetigns, Martin
John Eerenberg
Valued Contributor

Re: general question on volume shadowing

Another thing that happens at boot time is that the VMS determines where to write the crash dump file in the event of a bugcheck.

By default, the crash dump file sysdump.dmp resides on the system disk (sys$system). This file is written to by the primitive driver not the shadowing driver. This driver only writes to one member of the shadow set. This causes a problem when reading from the dump from the shadowed system disk.

You have two choices in this situation. 1) Permanently move the dump file off the system disk to a non-shadowed disk. 2) Capture the console output of the crash via some logging mechanisism (i.e., paper, PC, Console Works, PCM, etc.) and when a dump file is written, look for the physical disk the dump was written to (it will be one and only one member of DSA100 for example). Use the set device command to lower the read cost of that physical disk to 1. Use SDA and copy the crash dump file. Finally, use the set device command to change the readcost back to the default (or your specified value).

Without knowing that the primitive driver writes to the dump, you'll have unpredictable read behavior in SDA.

john
It is better to STQ then LDQ
Uwe Zessin
Honored Contributor

Re: general question on volume shadowing

Shadow Membership is recorded in the Storage Control Block (SCB) of the volume. There is a 64-bit field (SCB$Q_GENERNUM, or something like this, I cannot check right now) that usually holds the date+time of the last change of the shadow set.

The algorithm is a bit more complex, because it has too deal with a situation when you set back the system time, but there are provisions for this.

During system boot all disks are scanned and their volume labels and generation numbers are compared. If they match with the boot member, that disk is automatically added to the shadow set.
.
Keith Parris
Trusted Contributor

Re: general question on volume shadowing

One more thing: Set the console environment variable BOOTDEF_DEV to be a list containing each of your system disk shadowset members, so that if one fails, your system can still reboot.
Uwe Zessin
Honored Contributor

Re: general question on volume shadowing

Keith,
does that really work these days? I recall in earlier versions that one had to boot all systems from the same physical memeber, but I haven't checked this on recent versions.
.
Keith Parris
Trusted Contributor

Re: general question on volume shadowing

All that's required is that the boot member be a current member of the shadowset (and not something like a full-copy target). According to the HP Volume Shadowing for OpenVMS Manual at http://h71000.www7.hp.com/doc/732FINAL/DOCUMENTATION/PDF/aa-pvxmj-te.PDF, "When multiple nodes boot from a common system disk shadow set, ensure that all nodes specify a physical disk that is a source member of the system disk shadow set." and "You cannot reboot the system unless the boot device is a current member of the shadow set." and "The shadowing software detects boot attempts from a physical disk that is inconsistent with currently active shadow set members. In this case, the boot attempt detects the existence of the other shadow set members and determines (using the information in the SCB) that the boot device is not a valid member of the shadow set. When this occurs, the boot attempt fails with a SHADBOOTFAIL bugcheck message on the system console, and a dump file is written to the boot device. The system bugchecks because it can boot only from a currently valid member of the system disk shadow set. If the boot device fails out of or is otherwise removed from the system disk shadow set, you must either mount the boot device back into the shadow set (and wait for the copy operation to complete) or modify the boot command file to boot from a current shadow set member."

With regard to a search list of devices/paths in BOOTDEF_DEV, it says "On some systems, you can stipulate that multiple devices be members of the same system disk shadow set. Please refer to the system-specific manual for further details."
Uwe Zessin
Honored Contributor

Re: general question on volume shadowing

Thank you, that's good to hear. I might get a change to create such a cluster in the future and then it will be handy.
.
Jan van den Ende
Honored Contributor

Re: general question on volume shadowing

It is also possible to avoid this confusion:

define EACH clustermember as a bootserver for ALL others.
Then specify the default boot device for ALL members to be a (searchlist of) network device(s). Whatever the status of the shadow set members during a boot, you are now NOT booting from a physical device, but from the shadow set!
And as soon as the newly booted node finds out there is a more direct path to the disk than the MSCP-served primary path, current path fails over to the more direct path.
If multi-site, especially with also multi-site SAN, be sure to study & implement the SITE parameter issues in the shadowing manual!


fwiw,


Jan
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: general question on volume shadowing

Jan,
excuse me for being a sceptical one ;-)

Have you had practical experience with this? If I recall correctly the capability to do a failover between MSCP served fibre paths and local paths isn't that old (V7.3?).
.
Jan van den Ende
Honored Contributor

Re: general question on volume shadowing

Uwe:

Yes, we started doing this when we moved form HSZ40's to HSG80's.
Our (I guess famed by now) cluster works this way since last juny.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: general question on volume shadowing

Oh, and yes Uwe, it was introduced at 7.2-2 (which we skipped, at 7.3-1 now and evaluating 7.3-2)

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Uwe Zessin
Honored Contributor

Re: general question on volume shadowing

Ah, thank you. I have been in a project with VMS V6.2-1H3 quite some years ago where the customer had two HS211(?)-based FDDI storage servers and the missing direct paths to all devices made the handling 'quite interesting'.

It was the second cluster implementation for the software vendor and the customer had even less experience. The hardware was already ordered, but it was still an interesting project.
.
Keith Parris
Trusted Contributor

Re: general question on volume shadowing

The ability to fail over between direct Fibre Channel (snd SCSI) paths and MSCP-served paths was introduced in 7.3-1, and is NOT available in 7.2-2. (A number of features from 7.3 were included in 7.2-2, but 7.3-1 came out later.)

References:

7.2-2 New Features Manual: http://h71000.www7.hp.com/doc/722final/6650/6650pro.html

7.3-1 New Features Manual: http://h71000.www7.hp.com/doc/731FINAL/6657/6657PRO.HTML

"Multipath failover to an MSCP-served path for disks is implemented in SCSI and Fibre Channel configurations."
Jan van den Ende
Honored Contributor

Re: general question on volume shadowing

keith, I must stand corrected.

I know I read SOMETHING on this kind of stuff in the new shadowing manual that came with 7.2-2, but since we never used it, I never went deep into the details.
7.2-2 only introduced fiber channel storage, and I got them confused.

Sorry for the misinformation.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Keith Parris
Trusted Contributor

Re: general question on volume shadowing

Jan, you were probably thinking of all the new DCL commands added in support of shadowing in Fibre Channel-based DT clusters so folks could do their best to limp by until failover to/from MSCP-served paths could be delivered in 7.3-1. There was a big white paper written about the topic at the time -- you can still find that via http://h71000.www7.hp.com/openvms/fibre/index.html

From the 7.2-2 release notes:
"Disaster-tolerant support for host-based volume shadowing in an OpenVMS Cluster configuration using shared Fibre Channel storage
- This support gives system managers greater control for managing failover.
- This feature is included in OpenVMS Version 7.3 and is also provided in Volume Shadowing update kits for OpenVMS Alpha Version 7.2-1H1 and for OpenVMS Alpha Version 7.2-1.
- The support provided in this release is identical to that provided in OpenVMS Version 7.3."

In retrospect, failover with these commands ended up being so complicated, messy, and error-prone that I ended up recommending that folks simply avoid connecting up their inter-site Fibre Channel link (thus forcing the use of only an MSCP-served path to remote disks, and obviating any failover issues) until they could get to 7.3-1.