Operating System - OpenVMS
1751969 Members
4987 Online
108783 Solutions
New Discussion юеВ

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

 
Edmundo T Rodriguez
Frequent Advisor

SHADOW-F-NOACCMBREX - After spliting a cluster

We had a cluster of two AlphaServers 4100 running OpenVMS V6.2-1H3 sharing the same system disk (DSA0:), a shadow-set of two members and quorum disk.

Both system have a StorageWorks direct attach array of disk, but running different applications and the second of them needed to be upgraded (both OS and application) and the OS couldn't be upgraded in the first one.

In order to upgrade the second, I decided to make each node boot from a different system disk while staying as members of the cluster,
One with VMS V6.2-1H3 and the other with VMS 7.32

So, I shutted down the first of the systems, analyze/disk/repair the system disk (went fine),dismounted the secondary member of DSA0 and performed a BACKUP/image to another disk
in the same StorageWorks attached in redundant mode to the system.

Shutdown the second system. Boot the first system from the original shadow-set as always,
then modified the hardware environment parameter BOOTDEF-DEV in the second to point
to the new disk (primary of new shadow-set) and boot conversational and modified the parameter SHADOW_SYS_UNIT from 0 to 1 and enter CONTINUE to boot.

Here I encounter a BUGCHECK and system crash.

--------------------------------------------
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%VMScluster-I-LOADSECDB, loading the cluster security database
%EWA0, Fast mode set by console
%CNXMAN, Sending VMScluster membership request to system ALPHA1
%CNXMAN, Now a VMScluster member -- system ALPHA2
%EWA0, Link state: UP
%SHADOW-F-NOACCMBREX, unable to access all mbrs of existing shadowset
**** OpenVMS (TM) Alpha Operating System V6.2-1H3 - BUGCHECK ****
----------------------------------------------

I tried more than once doing a couple of things but anything work and need to go back and boot the second system from the original DSA0:

Does anyone have any idea which could help us resolve the problem.

Thank you.

24 REPLIES 24
Hoff
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

This looks to be an unsupported cluster version span, per the Cluster SPD.

http://h18000.www1.hp.com/info/SP2978/SP2978PF.PDF

I don't know that this span is the trigger for the shadowing issues. But it could be.

The error itself is indicating errors with connectivity; with the volumes involved in the shadowset, or potentially with the quorum disk.

Here's one of the very few previous discussions of this HBVS error:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1172594

I'd also enable full-on boot-time diagnostics, and see if anything interesting gets displayed before the crash.

boot -fl x,30000 ddcu

where x is the system root, and ddcu is the boot device.

But this could well be the version span. Which would leave you with the decision to upgrade, downgrade, or split the cluster. (As a related test, see if the box boots correctly without the other lobe around; with the other lobe shut down.)

Mandatory ECOs to current, et al., too.

Stephen Hoffman
HoffmanLabs LLC
Martin Hughes
Regular Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Is it possible that this bugcheck is being triggered by trying to form DSA1 with the same label as DSA0?.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2
Karl Rohwedder
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Perhaps the 1st member has remounted its former shadow set member? VMS tries to mount all members of a shadowset, if available and valid.

regards kalle
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

What exactly was done after the image backup of the original shadowset member to a new disk?

Did you mount the new disk/over=(id,shadow) and reset the volume label?

Doing the mount/over=shadow will reinit the shadow related portion of the SCB on the disk, so it will no longer remember the prior members of the shadowset. When you reboot (with SHADOW_SYS_DISK 1), the SCB will be updated to make it be a new shadowset.

If you have done that, you are going to need to get more info from the crash dump.

Hoff's warning about too much disparity between 6.2-1H3 and 7.3-2 is valid, but I don't think it has anything to do with this crash, since unless you did an upgrade you aren't telling us about after you did the image backup, both systems will still be running 6.2-1H3.

Jon
it depends
John Travell
Valued Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

This looks very much like you need to do more to differentiate those shadow sets. VMS appears to think the new boot disk should be a member of the original shadow set.
You have to have been booting from different roots on the original disk, I presume you did not change the root selection in the boot command.
JT:
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Thank you All!

---> Reply to Hoff:
I don't perceive any pertinent information about the possibilities of of unsupported cluster version span. I didn't go with enabling the boot-time diagnostics due to the rush with short window for splitting.

---> Reply to John Pinkley:
You have a good point, possibly critical!

In the hurry of implementing and finishing the Change-Control (2 Hrs.) for the "split" I didn't went on to mount the new disk and its companion in a shadow-set DSA.

We were able to see both new disk at the >>> prompt, so I moved on. My idea was that I could force the reinit of the shadow related portion of the SCB, at the startup but seems that is not possible.

I am not sure if a crash dump analisys will work and/or provide the pertinent information to debug this problem.

===> My next step is to reinint the shadow related portion of the SCB, unless somebody prove me wrong.

Inputs welcome.

Thank you again.
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

After doing a test with LD devices, I see I gave you bad information.

A backup/image shadow_mem: new_disk: does not copy the shadow related info from the SCB, so the output disk is already in a state similar to what you will get after a mount/ov=(id,shadow).

A backup/physical shadow_mem: new_disk: does create a "shadow member", and the new disk after being dismounted, will write lock itself when it is mounted (unless you specify /over=shadow).

This makes sense, as an image backup doesn't place files in their original locations, so from a shadowing point of view, the new disk must have its generation number reset.

Using the freeware diskblock program, you can see what is in the SCB, and that's what I used to look.

So if you were indeed using the target of the image backup as the system disk for the "second system", I don't think the lack of doing a mou/ov=(shadow) can explain the crash.

What is less clear to me is

%SHADOW-F-NOACCMBREX, unable to access all mbrs of existing shadowset

Specifically, what shadowset, and if this was the system disk, what were the other members that could not be accessed?

Again, please tell us exactly what you did.

You do need to have unique volume labels on every logical device VMS mounts in a shared fashion. Specifically, DSA0: and DSA1: cannot both have the same label.

Also look at this ITRC thread:

http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1116912

See this thread on google. Specifically Andy Goldstein's response.

which discusses checks VMS makes to ensure consistency in volume processing.

http://groups.google.com/group/comp.os.vms/browse_thread/thread/ff54db0336c8d1b3/c4e6149b3c4639a8?lnk=st&q=&rnum=2#c4e6149b3c4639a8

>> We did a major rework of the device / volume correspondence logic in V7.1
>> (and backported to V6.2 in the "MOUNT / Shadowing Compatibility kit"),
>> and the latter case was split out to a separate error message,
>> "MOUNT-F-DIFVOLMNT, different volume mounted on same device".

From your original description:

>>>"So, I shutted down the first of the systems, analyze/disk/repair the system disk (went fine),"

I will assume the analyze/disk was done from the second system? (if the first system is down, it seems the only possibility)

>>>"dismounted the secondary member of DSA0 and performed a BACKUP/image to another disk
in the same StorageWorks attached in redundant mode to the system.

Can you give us the commands used to do this backup, including the mounting of the devices involved? And after the backup, what, if anything was done with the new volume? Specifically, did it still have the original volume label? Did you mount it as part of a shadowset?

>>>"Shutdown the second system. Boot the first system from the original shadow-set as always,
then modified the hardware environment parameter BOOTDEF-DEV in the second to point
to the new disk (primary of new shadow-set) and boot conversational and modified the parameter SHADOW_SYS_UNIT from 0 to 1 and enter CONTINUE to boot."

When you say "primary of new shadow-set" what do you mean? Are there unspecified actions prior to the shutdown, or was the shadowset formed as a result of the booting with SHADOW_SYS_DISK set to 1?

Note: The change you made to SHADOW_SYS_UNIT during the conversational boot was probably not written back to the current params of the second system's boot disk, since the boot never completed.

Jon
it depends
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Jon,

When I said, (My next step is to reinit the shadow related portion of the SCB), I mean to do the MOUNT with SYSTEM account (at the process level only) both disks that will conform the new DSA, including the one I used as boot disk during the implementation of the split.

First: Both 4100 systems (Alpha1 & Alpha2) are members of a cluster sharing DSA0
as the system disk.

--- Example ---
DSA0: Mounted 0 ALPHASYS
$1$DKC0: (ALPHA2) ShadowSetMember 0 (member of DSA0:)
$1$DKF100: (ALPHA2) ShadowSetMember 0 (member of DSA0:)

I did the following (step by step) Now I know I did a couple of mistakes ├в ┬ж

1. $ shutdown < ALPHA1 system wait transition
2. $ analyze/disk/record/repair DSA0:
3. $ del DSA0:[SYSLOST] .DAT; < garbage ├в ┬ж
4. $ analyze/disk/record/repair DSA0: < everything fine
5. $ dismount $1$DKF100:
6. $ mount/forei $1$DKB106:
7. $ mount/noassist $1$DKF100:
8. $ backup/image $1$DKF100: $1$DKB106:
9. $ dismount $1$DKB106:
10. $ dismount $1$DKF100:
11. $ mount/noassist $1$DKB106:
12. $ dir/size=all/gran $1$DKB106: < compare to DSA0: - OK
13. $ dismount $1$DKB106:
14. $ shutdown < ALPHA2 system
15. >>> b < ALPHA1 system
16. >>> set BOOTDEF_DEV dkb106.1.0.2.1 < new boot (primary)
17. >>> b ├в fl 0,1 < new disk path
18. SYSBOOT> set SHADOW_SYS_UNIT 1
19. SYSBOOT> con
-------- Here got the BUGCHECK

The SYSPAGSWPFILES.COM mount the secondary member of DSA0: if is not
mounted (in both systems) and DSA0: enters into shadowing.

I did not change the LABEL of the new system disk because I though it will work
Independently, as both systems has their own system-disk.

What I understand that has to be wrong is that I did not established any relationship
between primary and secondary members of the new shadow set by not mounting them
with new label (as precaution), etc. and letting the shadowing establish the relationship.


$ mount/noassist DSA01: -
/shadow=($1$DKB106:,$1$DKD206:) OPENVMS-621 OPENVMS-621

The let the shadow copy end, and ├в ┬ж

$ dismount/nounload DSA01:

This time I will, unless there is something wrong with previous.

Please, let me know if I am still wrong.
Martin Hughes
Regular Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

>> $ mount/noassist DSA01: -
/shadow=($1$DKB106:,$1$DKD206:) OPENVMS-621 OPENVMS-621
<<

That won't work. $1$DKB106: still has the volume label ALPHASYS. I'd expect you will get this error -

%MOUNT-F-INCVOLLABEL, incorrect volume label

I'd do this instead:

$ mount/over=(ident,shadow) $1$DKB106:
$ set volume/label=OPENVMS-621 $1$DKB106:
$ dismount $1$DKB106:

Then shutdown and do your conversational boot from DKB106.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2