Operating System - OpenVMS
1822005 Members
3897 Online
109639 Solutions
New Discussion юеВ

SHADOW-F-NOACCMBREX - After spliting a cluster

 
Edmundo T Rodriguez
Frequent Advisor

SHADOW-F-NOACCMBREX - After spliting a cluster

We had a cluster of two AlphaServers 4100 running OpenVMS V6.2-1H3 sharing the same system disk (DSA0:), a shadow-set of two members and quorum disk.

Both system have a StorageWorks direct attach array of disk, but running different applications and the second of them needed to be upgraded (both OS and application) and the OS couldn't be upgraded in the first one.

In order to upgrade the second, I decided to make each node boot from a different system disk while staying as members of the cluster,
One with VMS V6.2-1H3 and the other with VMS 7.32

So, I shutted down the first of the systems, analyze/disk/repair the system disk (went fine),dismounted the secondary member of DSA0 and performed a BACKUP/image to another disk
in the same StorageWorks attached in redundant mode to the system.

Shutdown the second system. Boot the first system from the original shadow-set as always,
then modified the hardware environment parameter BOOTDEF-DEV in the second to point
to the new disk (primary of new shadow-set) and boot conversational and modified the parameter SHADOW_SYS_UNIT from 0 to 1 and enter CONTINUE to boot.

Here I encounter a BUGCHECK and system crash.

--------------------------------------------
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%VMScluster-I-LOADSECDB, loading the cluster security database
%EWA0, Fast mode set by console
%CNXMAN, Sending VMScluster membership request to system ALPHA1
%CNXMAN, Now a VMScluster member -- system ALPHA2
%EWA0, Link state: UP
%SHADOW-F-NOACCMBREX, unable to access all mbrs of existing shadowset
**** OpenVMS (TM) Alpha Operating System V6.2-1H3 - BUGCHECK ****
----------------------------------------------

I tried more than once doing a couple of things but anything work and need to go back and boot the second system from the original DSA0:

Does anyone have any idea which could help us resolve the problem.

Thank you.

24 REPLIES 24
Hoff
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

This looks to be an unsupported cluster version span, per the Cluster SPD.

http://h18000.www1.hp.com/info/SP2978/SP2978PF.PDF

I don't know that this span is the trigger for the shadowing issues. But it could be.

The error itself is indicating errors with connectivity; with the volumes involved in the shadowset, or potentially with the quorum disk.

Here's one of the very few previous discussions of this HBVS error:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1172594

I'd also enable full-on boot-time diagnostics, and see if anything interesting gets displayed before the crash.

boot -fl x,30000 ddcu

where x is the system root, and ddcu is the boot device.

But this could well be the version span. Which would leave you with the decision to upgrade, downgrade, or split the cluster. (As a related test, see if the box boots correctly without the other lobe around; with the other lobe shut down.)

Mandatory ECOs to current, et al., too.

Stephen Hoffman
HoffmanLabs LLC
Martin Hughes
Regular Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Is it possible that this bugcheck is being triggered by trying to form DSA1 with the same label as DSA0?.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2
Karl Rohwedder
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Perhaps the 1st member has remounted its former shadow set member? VMS tries to mount all members of a shadowset, if available and valid.

regards kalle
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

What exactly was done after the image backup of the original shadowset member to a new disk?

Did you mount the new disk/over=(id,shadow) and reset the volume label?

Doing the mount/over=shadow will reinit the shadow related portion of the SCB on the disk, so it will no longer remember the prior members of the shadowset. When you reboot (with SHADOW_SYS_DISK 1), the SCB will be updated to make it be a new shadowset.

If you have done that, you are going to need to get more info from the crash dump.

Hoff's warning about too much disparity between 6.2-1H3 and 7.3-2 is valid, but I don't think it has anything to do with this crash, since unless you did an upgrade you aren't telling us about after you did the image backup, both systems will still be running 6.2-1H3.

Jon
it depends
John Travell
Valued Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

This looks very much like you need to do more to differentiate those shadow sets. VMS appears to think the new boot disk should be a member of the original shadow set.
You have to have been booting from different roots on the original disk, I presume you did not change the root selection in the boot command.
JT:
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Thank you All!

---> Reply to Hoff:
I don't perceive any pertinent information about the possibilities of of unsupported cluster version span. I didn't go with enabling the boot-time diagnostics due to the rush with short window for splitting.

---> Reply to John Pinkley:
You have a good point, possibly critical!

In the hurry of implementing and finishing the Change-Control (2 Hrs.) for the "split" I didn't went on to mount the new disk and its companion in a shadow-set DSA.

We were able to see both new disk at the >>> prompt, so I moved on. My idea was that I could force the reinit of the shadow related portion of the SCB, at the startup but seems that is not possible.

I am not sure if a crash dump analisys will work and/or provide the pertinent information to debug this problem.

===> My next step is to reinint the shadow related portion of the SCB, unless somebody prove me wrong.

Inputs welcome.

Thank you again.
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

After doing a test with LD devices, I see I gave you bad information.

A backup/image shadow_mem: new_disk: does not copy the shadow related info from the SCB, so the output disk is already in a state similar to what you will get after a mount/ov=(id,shadow).

A backup/physical shadow_mem: new_disk: does create a "shadow member", and the new disk after being dismounted, will write lock itself when it is mounted (unless you specify /over=shadow).

This makes sense, as an image backup doesn't place files in their original locations, so from a shadowing point of view, the new disk must have its generation number reset.

Using the freeware diskblock program, you can see what is in the SCB, and that's what I used to look.

So if you were indeed using the target of the image backup as the system disk for the "second system", I don't think the lack of doing a mou/ov=(shadow) can explain the crash.

What is less clear to me is

%SHADOW-F-NOACCMBREX, unable to access all mbrs of existing shadowset

Specifically, what shadowset, and if this was the system disk, what were the other members that could not be accessed?

Again, please tell us exactly what you did.

You do need to have unique volume labels on every logical device VMS mounts in a shared fashion. Specifically, DSA0: and DSA1: cannot both have the same label.

Also look at this ITRC thread:

http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1116912

See this thread on google. Specifically Andy Goldstein's response.

which discusses checks VMS makes to ensure consistency in volume processing.

http://groups.google.com/group/comp.os.vms/browse_thread/thread/ff54db0336c8d1b3/c4e6149b3c4639a8?lnk=st&q=&rnum=2#c4e6149b3c4639a8

>> We did a major rework of the device / volume correspondence logic in V7.1
>> (and backported to V6.2 in the "MOUNT / Shadowing Compatibility kit"),
>> and the latter case was split out to a separate error message,
>> "MOUNT-F-DIFVOLMNT, different volume mounted on same device".

From your original description:

>>>"So, I shutted down the first of the systems, analyze/disk/repair the system disk (went fine),"

I will assume the analyze/disk was done from the second system? (if the first system is down, it seems the only possibility)

>>>"dismounted the secondary member of DSA0 and performed a BACKUP/image to another disk
in the same StorageWorks attached in redundant mode to the system.

Can you give us the commands used to do this backup, including the mounting of the devices involved? And after the backup, what, if anything was done with the new volume? Specifically, did it still have the original volume label? Did you mount it as part of a shadowset?

>>>"Shutdown the second system. Boot the first system from the original shadow-set as always,
then modified the hardware environment parameter BOOTDEF-DEV in the second to point
to the new disk (primary of new shadow-set) and boot conversational and modified the parameter SHADOW_SYS_UNIT from 0 to 1 and enter CONTINUE to boot."

When you say "primary of new shadow-set" what do you mean? Are there unspecified actions prior to the shutdown, or was the shadowset formed as a result of the booting with SHADOW_SYS_DISK set to 1?

Note: The change you made to SHADOW_SYS_UNIT during the conversational boot was probably not written back to the current params of the second system's boot disk, since the boot never completed.

Jon
it depends
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Jon,

When I said, (My next step is to reinit the shadow related portion of the SCB), I mean to do the MOUNT with SYSTEM account (at the process level only) both disks that will conform the new DSA, including the one I used as boot disk during the implementation of the split.

First: Both 4100 systems (Alpha1 & Alpha2) are members of a cluster sharing DSA0
as the system disk.

--- Example ---
DSA0: Mounted 0 ALPHASYS
$1$DKC0: (ALPHA2) ShadowSetMember 0 (member of DSA0:)
$1$DKF100: (ALPHA2) ShadowSetMember 0 (member of DSA0:)

I did the following (step by step) Now I know I did a couple of mistakes ├в ┬ж

1. $ shutdown < ALPHA1 system wait transition
2. $ analyze/disk/record/repair DSA0:
3. $ del DSA0:[SYSLOST] .DAT; < garbage ├в ┬ж
4. $ analyze/disk/record/repair DSA0: < everything fine
5. $ dismount $1$DKF100:
6. $ mount/forei $1$DKB106:
7. $ mount/noassist $1$DKF100:
8. $ backup/image $1$DKF100: $1$DKB106:
9. $ dismount $1$DKB106:
10. $ dismount $1$DKF100:
11. $ mount/noassist $1$DKB106:
12. $ dir/size=all/gran $1$DKB106: < compare to DSA0: - OK
13. $ dismount $1$DKB106:
14. $ shutdown < ALPHA2 system
15. >>> b < ALPHA1 system
16. >>> set BOOTDEF_DEV dkb106.1.0.2.1 < new boot (primary)
17. >>> b ├в fl 0,1 < new disk path
18. SYSBOOT> set SHADOW_SYS_UNIT 1
19. SYSBOOT> con
-------- Here got the BUGCHECK

The SYSPAGSWPFILES.COM mount the secondary member of DSA0: if is not
mounted (in both systems) and DSA0: enters into shadowing.

I did not change the LABEL of the new system disk because I though it will work
Independently, as both systems has their own system-disk.

What I understand that has to be wrong is that I did not established any relationship
between primary and secondary members of the new shadow set by not mounting them
with new label (as precaution), etc. and letting the shadowing establish the relationship.


$ mount/noassist DSA01: -
/shadow=($1$DKB106:,$1$DKD206:) OPENVMS-621 OPENVMS-621

The let the shadow copy end, and ├в ┬ж

$ dismount/nounload DSA01:

This time I will, unless there is something wrong with previous.

Please, let me know if I am still wrong.
Martin Hughes
Regular Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

>> $ mount/noassist DSA01: -
/shadow=($1$DKB106:,$1$DKD206:) OPENVMS-621 OPENVMS-621
<<

That won't work. $1$DKB106: still has the volume label ALPHASYS. I'd expect you will get this error -

%MOUNT-F-INCVOLLABEL, incorrect volume label

I'd do this instead:

$ mount/over=(ident,shadow) $1$DKB106:
$ set volume/label=OPENVMS-621 $1$DKB106:
$ dismount $1$DKB106:

Then shutdown and do your conversational boot from DKB106.
For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

First, volume labels of shared disks must be unique clusterwide. So you will have to change the volume label on one copy if you expect the nodes to be in the same cluster (and as you have shared access to your storageworks disks, I highly recommend that you ensure they are in the same cluster).

14. $ shutdown < ALPHA2 system
15. >>> b < ALPHA1 system
16. >>> set BOOTDEF_DEV dkb106.1.0.2.1 < new boot (primary)
17. >>> b ├Г┬в??fl 0,1 < new disk path
18. SYSBOOT> set SHADOW_SYS_UNIT 1
19. SYSBOOT> con
-------- Here got the BUGCHECK

The above implies ALPHA1 crashed (and that ALPHA2 was down at the time)

What you showed us in the original problem statement does not seem to be consistent with the above. Which node crashed?

%CNXMAN, Sending VMScluster membership request to system ALPHA1
%CNXMAN, Now a VMScluster member -- system ALPHA2

This appears that ALPHA2 was the one booting (and crashing) after joining an already existing cluster (ALPHA1 is still in the cluster).

Also, at least in current versions of the shadowing manual, it explicitly warns against adding members to a system disk shadow set in startup procedures (if I understand the following correctly, that is what you are doing)

"The SYSPAGSWPFILES.COM mount the secondary member of DSA0: if is not mounted (in both systems) and DSA0: enters into shadowing."

See Chapter 3 (this is from the 7.3-2 Shadowing Manual) section "Booting from a System Disk Shadow Set" pg 44 of the PDF version available here: http://h71000.www7.hp.com/doc/732FINAL/DOCUMENTATION/PDF/aa-pvxmj-te.PDF

And re-read what Hoff wrote. There are many changes in VMS between 6.2 and 7.3-2, and although there were patches to allow co-existence along the way, these patches generally don't span the many versions you are planning to attempt. If this was a hobby system, that's one thing. But don't paint yourself into a corner by stating this configuration will work. Even if you can get the system to boot and appear to work, the cluster will have to run in crippled mode to be able to co-exist. Specific examples: Shadowing won't be able to use any features related to write bitmaps, so minicopy and HBMM are not possible. XFC won't work. New features of the lock manager, etc. Will it boot? Perhaps. Will it work? Perhaps. When you run into problems will you get any support from HP? Probably the recommendation to upgrade to a supported version.

Jon
it depends
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Please, try to understand me.I may not bet writing down all the steps I took or that I may be taking next, so that may be the reason you may be confuse. Sorry! ( I been working with OpenVMS since 1986 but still make mistakes)

Reply to Martin Hughes:

$ mount/noassist DSA01: -
/shadow=($1$DKB106:,$1$DKD206:) OPENVMS-621 OPENVMS-621

This will work, the only thing is that I didn├в t mention the step to label them prior doing this.


Reply to Jon Pinkley:

I do not see how this implies that ALPHA1 had the crash ├в ┬ж

14. $ shutdown < ALPHA2 system
15. >>> b < ALPHA1 system

------------ Here Alpha1 is already up

16. >>> set BOOTDEF_DEV dkb106.1.0.2.1 < new boot Alpha2
17. >>> b -fl 0,1 < new disk path
18. SYSBOOT> set SHADOW_SYS_UNIT 1
19. SYSBOOT> con

-------- Here got the BUGCHECK in Alpha2


I have been using a SYSPAGSWPFILES.COM in each system root for years to do this and never had any type of issues. And again, that is a check to see if the secondary member is not there.In any event I can eliminate this.
Is not a cause to for the problem I had.


├в And re-read what Hoff wrote. There are many changes in VMS between 6.2 and 7.3-2,├в


Yes, I already though about that and it may be that to avoid any risk in the production environment we should just get rid of the cluster and boot as independent system.

This will cause me a lot of headaches due to the configuration of the two StorageWorks
racks, which have shadow sets shared in both system. Is like beginning years of work again.

I just aming to produce a configuration that DEC/Compaq/HP had stated that is possible.

Alpha1 is running an environment that need to stay alive by law for 15 more years.
(Medical records)

Alpha2 can not coexist because its application environment need to be upgraded to comply with new regulations, anyway it can not go higher than OpenVMS v7.3

OpenVMS v6.2-1H3 vs OPenVMS v7.3-2 and other version amy not be supported by HP but it didn't mean they may not work in cluster. (My understanding)
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Hello!

Attached you will find my new work plan
step by step.

Regards.
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

RE: Reply to Jon Pinkley:

>>>"I do not see how this implies that ALPHA1 had the crash."

Now that I look again, I see it doesn't. I wasn't reading carefully enough. The addition of alpha2 to step 16 does make it clearer.

The part about not mounting members during system startup is so you don't accidentally overwrite a more up to date member of the shadowset. Instead of mounting during startup, I would suggest that near the end of your startup, you check and send yourself a mail message that the system device doesn't have all the members that were expected. Then you can mount the member if that is really what is needed. At boot time, the system will attempt to bring all the members back that were there when the virtual unit (DSA) was dismounted, so you shouldn't need to mount the members every time you boot.

Does the medical records environment on Alpha1 only work on 6.x? Can it be made to work on 7.x ?

I understand that you don't want change your config, but running a mixed version cluster is going to make things complex too, assuming you can get it to work. The bad thing is that it may appear to work, but we know for sure that some things won't work as well as they could if they were not encumbered by the need to co-exist with a version that supports only a subset of what the new version does. In my opinion, you will be better off if you run into a problem early, then you can spend your energy on creating a supportable configuration.

As far as the StorageWorks, I don't know what your are running on, I do know we just disposed of an HSZ70 with 4 controllers. They aren't worth much on the used market. So you should be able to duplicate what you have for relatively little, but if you are going to do that, I would at least consider something newer. When you are being required to change something to comply with regulations is the time to ask for money.

I looked at your steps, and here is the one thing I don't see any mention of. Cluster common files. Almost certainly you will want to share files like SYSUAF, RIGHTSLIST, queue file, etc. In newer versions of VMS (not sure about 6.2) there is a file, sys$common:[sysmgr]sylogicals.template, that has a list of the files that should normally be shared by all the cluster member. Where there is a single shared system disk, the default values implicitly make these shared. When there are multiple system disks, you need to explicitly make them shared. You can probably see these on the VMS installation CD. Also Hoff's web site has an article on clusters which has a list of the shared files:

Hoff's introductory information on adding nodes to a cluster, and a cluster divorce:

http://64.223.189.234/node/169


If you aren't defining logical names to redirect these files, you will need to do so. The "easy" thing to do is just have everything reference the files on DSA0: (if they are not currently already defined to be somewhere else).

You may want to do the edit of the startup files to prevent the application startup before splitting the member off, if the change is going to be the same on both.

Adding a second member to the system shadowset isn't a requirement, and you can add it any time you want after you have booted from the single member shadow set. Also when a full copy is expected, I prefer to start with a single member shadowset, and then after I verify that it is indeed the one I want to be the source, then mount the second member of the shadow set using /confirm. I would also not initialize the second member with the same label, I would use something like SCRATCH_DISK so it is obvious that it isn't to be used as the source of the copy operation. Using that label also allows you to use mount DSA /shadow=(newmember) /policy=verify_label

Delaying the addition of the second member with a full copy will minimize the down time before you can do your testing.
it depends
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Thank you Jon!

We barely reboot these system and other we have because this is a 24X7 site.

The Alpha1 application is old CERNER which will be use on archive mode for years, can not be touched.

As I said, the process check if it was NOT mounted, 99% of the time goes automatically.

We have 2 StorageWorks rack with 4 HSZ50 ea.

I have been using SYLOGICALS.COM for years with very good results.

Many of Hoffman recommendations on files I already had in consideration and will need to work on this to filter out what is not need in each side.

I will try boot with one shadow member to accelerate, if become convinced no other way.

If we need to go by divorcing the members of the cluster, then we need to work more in the shared files separation. No doubt is cumbersome but I believe it will be better than starting from scratch unless an investment is done to replace part of the environment.

PS: Attached a sample of the Alpha2 disk.

Thank you for taking from your time for sharing ideas and bring enlightenment!

Volker Halle
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

did you let the system dump write complete ? Can you read the dump file ? If so, you could try to find out, which device was not visible from the existing DSAx: system disk shadowset or which read may have failed from any of the members. May there be a need to increase SHADOW_SYS_WAIT ?

Volker.
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

The system wrote the dump but I did not analyzed because I was running against time and after the second reboot the system actually cleared the dump area.

We are working with the SHADOW_SYS_WAIT default value of 256 (around 4 min) and we never had any issues with the time to form the shadow-set in any of the two systems.

Now. This is a interesting point and HP may not want to support the plan I have to perform the the split if they find the following:

There is the possibility of a problem due to we are getting errors with one of the controllers in one of the two StorageWorks which is almost for sure that need to be replaced.

I am of the belive that this problem is not a casue for not booting from those disk but I may be wrong.
Volker Halle
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

if you have SHADOW_SYS_WAIT set to 4 minutes, did the system seem to hang for 4 minutes until the NOACCMBREX error showed up ??? If not, there may have been a read error, which is NOT being retried in V6.2-1H3 (this was fixed later).

OpenVMS does not overwrite the dump, except if another dump is taken. Did you check on the correct member of the shadowset (i.e. the boot member) ?

Volker.
Jim_McKinney
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

> Did you check on the correct member of the shadowset (i.e. the boot member) ?

I believe that what Volker is inferring here is that you may have been caught by the fact that the dump is written without the assist of the shadow driver. Think about this - when a system crashes, VMS is no longer present, nor is its shadow driver. Dumping is done without the aid or knowledge of VMS. If the node was a member of a still running cluster the other nodes are also unaware that a dump is being written. It is written to a single disk - most likely the disk you booted from. When the system is rebooted into a cluster, since the shadow set is still in a steady state, the booting system has no reason to merge or copy to propogate the dump changes to the other shadow members. So, you want to read this dump. The disk it's on is shadowed - some of your reads may go to one member, some to another. If your reads don't all go to the member that was dumped to you won't get what you expect. So, you have to direct all your reads to the member that contains the updated dump file. The following should show you the device you booted from - this is likely where your updated dump file is located.

$ write sys$output f$getd("boot_device")

If you temporarily dismount the other member(s) of the shadow set you'll then find all your reads going to this one and should be able to view that last dump. When you remount the disk the resulting shadow copy will restore the consistancy of the shadow members.
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Hello again!

I was out of the country and my peer followed up with plan and we had a CRASH again.

I want to confirm the actual effect of having a system with the SHADOW_SYS_UNIT value = 0 and another with the SHADOW_SYS_UNIT = 2

Somebody at HP Global Support stated that changing the value to 2 in the second node is the cause of getting a crash while booting the node.

My understanding is that if each machine is booting from a different system disk, having each with a different SHADOW_SYS_UNIT value is the way to go.

We may finish severing the cluster if we can not find a solution here.

Volker Halle
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

I didn't read all the entries again. There should be no problem at all to boot one member with SHADOW_SYS_UNIT=0 from one disk in the cluster and another one with SHADOW_SYS_UNIT=2 from another disk in the same cluster.

You should try to MOUNT the 2nd disk once with /OVER=SHADOW (cleaning the SCB info), before booting the 2nd node from the upgraded system disk. And make sure you give a unique volume label to that disk.

Volker.
Jan van den Ende
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

maybe _I_ missed something, but

For one of the nodes you changed SHADOW_SYS_UNIT.
DID you also change the VOLUME LABEL??

VMS _DOES NOT_ like two volumes (system disks at that!) with the same lable..

If I missed your reporting it, then forget about this ...

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Edmundo T Rodriguez
Frequent Advisor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Hello!

The attachement show the process and you can confirm the mount and the disk-label renamed

What botter me is that there seems to be a conflict which is holding the system to find/read the new boot disk.

I been working with OpenVMS since 1984 in almost all kind of configurations except the new itanium and I did split previous system disk and work (years ago)

Now, HP is claiming that SHADOW_SYS_UNIT may be a cause for failure. What's this?

I know this is OpenVMS 6.2-1H3 but how can we recreate the environment in a case like this?

We do not have a test box.

HELP!
Jon Pinkley
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

RE:"The attachment show the process and you can confirm the mount and the disk-label renamed"

Which attachment are you referring to?

Some questions:

1. Do you have a valid crash dump (as asked by Volker Halle)? That can clear up a lot of guessing.

1.a. Is your system dump file on the system disk?

2, Be extra careful about any disk that is used by the primitive file system, as there is very little co-ordination (synchronization) done with other cluster members as it runs without all VMS facilities available. System dump files fall into this category.

2. Have you made it possible for both system disks to see a common set of cluster personality files? Easier to do that before hand by making your startup procedures generic so the same copy will work for either system disk. You can conditionalize based on nodename.

3. After you split a member off your system device, you will need to mount/ov=(id,shad) and change the label. Then that member will become the master for the new system disk shadow set. All disks mounted /system (system disk implicitly falls into this category) or /group must have unique labels cluster wide. Shadow set members of the same shadow set are considered to be the same disk, so they will have the same label, but the DSAx devices must
have labels unique from other DSA devices and other disks that are not shadow set members that are mounted either /system or /group.

Your original post had the following:

%SHADOW-F-NOACCMBREX, unable to access all mbrs of existing shadowset
**** OpenVMS (TM) Alpha Operating System V6.2-1H3 - BUGCHECK ****
----------------------------------------------

Without more information, we can't be 100% sure the bugcheck happened immediately as result of the SHADOW-F-NOACCMBREX, only that the BUGCHECK occurred after the SHADOW-F-NOACCMBREX error. This is where a crash dump would save a lot of guessing. Was there a valid crash dump the last time the system crashed while you were away?

I think that in 6.2 it was still a requirement that a pagefile had to be mapped before exiting SYPAGSWPFILES.COM. You said you are mounting the second member of the shadow set in your sypagswpfiles.com. Are those device names hard coded there? It is referring to a member that is part of another shadow set? That shouldn't cause a system crash, but you are dealing with an ancient version of VMS, and perhaps that was a bug that has been fixed since then.

Can you please capture the following from each node (while they are booted from the common system disk) and put the output in a text file, and attach the output.

From node ALPHA1

$ mcr sysgen
SYSGEN> USE CURRENT ! this is what will be used at the next boot
SYSGEN> SHOW /SCS
SYSGEN> SHOW /CLUSTER
SYSGEN> SHOW DEVICE_NAMING ! This probably doesn't exist in 6.2 (I think this appeared in 7.1)
SYSGEN> SHOW SHADOW
SYSGEN> SHOW DUMP ! enter only 4 so it matches both DUMPSTYLE and DUMPBUG
SYSGEN> SHOW SAVEDUMP
SYSGEN> EXIT
$ directory/nohead/notrail/file sys$manager:sypagswpfiles.com;
$ type sys$manager:sypagswpfiles.com;

From node ALPHA2

$ mcr sysgen
SYSGEN> USE CURRENT ! this is what will be used at the next boot
SYSGEN> SHOW /SCS
SYSGEN> SHOW /CLUSTER
SYSGEN> SHOW DEVICE_NAMING ! This probably doesn't exist in 6.2 (I think this appeared in 7.1)
SYSGEN> SHOW SHADOW
SYSGEN> SHOW DUMP ! enter only 4 so it matches both DUMPSTYLE and DUMPBUG
SYSGEN> SHOW SAVEDUMP
SYSGEN> EXIT
$ directory/nohead/notrail/file sys$manager:sypagswpfiles.com;
$! if file is same as on ALPHA1, then no need for following line
$ type sys$manager:sypagswpfiles.com;

Before posting, make scan it to verify there isn't any secret stuff there. (There normally wouldn't be, but it is easier to check than to try and get the attachment deleted).

Using that and the sysgen parameters that were changed on during the conversational boot of the new system disk, we should be able to determine the sysgen parameters in effect at that time.

RE:"I want to confirm the actual effect of having a system with the SHADOW_SYS_UNIT value = 0 and another with the SHADOW_SYS_UNIT = 2

Somebody at HP Global Support stated that changing the value to 2 in the second node is the cause of getting a crash while booting the node.

My understanding is that if each machine is booting from a different system disk, having each with a different SHADOW_SYS_UNIT value is the way to go.
"
-------

As others have said, as long as the members of your DSA0 and DSA2 devices are non-overlapping, that should not cause a crash.

On the other hand, if this is the case:

DSA0: ($1$DKC0:,$1$DKF100:)

And ALPHA1 has bootdef_dev set to dkc0.x.x.x
and ALPHA2 has bootdef_dev set to dkf100.x.x.x

and you shut down ALPHA2, removed the dkf100 member from DSA0, did nothing else, and then did a conversational boot of ALPHA2 and changed SHADOW_SYS_UNIT to 2, I believe that would cause a crash since there would already be another disk mounted with the same label. Also, the SCB would still know about the other member of the shadow set ($1$DKC0:) and booting from a shadowed system disk does about the same thing that mount/system/include does, i.e. it attempts to mount all members of the shadow set that were there when the disk was dismounted. (at least this is true in 7.3-2, whether it was true in 6.2 I am not sure)

So while the answer from HP Global Support probably wasn't true for your situation, I can understand why if was given as the reason, especially if you talked to someone in first level support. They are essentially a help desk with access to a database of known problems, but very little actual experience (at least that is the impression I get).

Can you please find out if a crash dump is available?

Jon
it depends
Volker Halle
Honored Contributor

Re: SHADOW-F-NOACCMBREX - After spliting a cluster

Edmundo,

this is the comment from the source code preceeding the NOACCMBREX SHADDETINCON crash:

;
; A path to one of the valid and currently mounted shadow set members does not
; yet exist on this system. Since we are booting, some controller might not have
; been seen by STACONFIG yet ... so we will wait for SHADOW_SYS_WAIT seconds.
; All the locks shall be released at this point. Retry from the beginning
; of this routine after a fork and wait. If some shadow set members still
; cannot be seen, when the retry count exhausts, send a message and then
; bugcheck the system.
;

Please read this and think about this. Then let the system write the dump and HALT it on the way up. Then mount the disk from the other node and save the dump:

$ ANAL/CRASH dkb106:
SDA> COPY some_dev:NOACCMBREX.DMP
SDA> EXIT

Now that the dump has been saved, there is lots of time to analyze it...

Please compare the answer from HP Global Services with the knowledge available from participants in this forum. When presented with correct and detailled information (and this includes the system dumpfile in a crash scenario !), you can expect a proper answer. And never forget, this service is for free ;-)

Volker.