Operating System - OpenVMS
1753867 Members
7180 Online
108809 Solutions
New Discussion юеВ

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

 
SOLVED
Go to solution
Jon Pinkley
Honored Contributor
Solution

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Mark,

Back to your original question about the "%MOUNT-F-NOSUCHDEV" message.

My guess is that this was complaining about the MSCP served member that had not yet been detected.

The SHADOW_SERVER process has to make sure that all members mounted by any node are accessible by the node that is attempting to mount the DSA virtual unit.

I can reproduce the error by doing the following, using the LDDRIVER. LD devices are not MSCP served, but they are loaded on each system, so it is easy to simulate the device not yet being detected.

NodeA:

$ ld connect DISK$ARCHIVE:[000000]DISK1.DSK;1 lda1 /share
$ ld connect DISK$ARCHIVE:[000000]DISK2.DSK;1 lda2 /share
$ show device ld

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$LDA0: (OMEGA) Online 0
$4$LDA1: (OMEGA) Online 0
$4$LDA2: (OMEGA) Online 0
$ mou/system/noassist/rebuild dsa999 /shadow=($4$lda1:,$4$lda2:) itrcshad
%MOUNT-I-MOUNTED, ITRCSHAD mounted on _DSA999:
%MOUNT-I-SHDWMEMSUCC, _$4$LDA1: (OMEGA) is now a valid member of the shadow set
%MOUNT-I-SHDWMEMSUCC, _$4$LDA2: (OMEGA) is now a valid member of the shadow set
$ dism/pol=minicopy lda1


NodeB:

$ ld connect disk1.dsk lda1: /share
$! note we will not connect lda2, this simulates the MSCP served drive not being "seen" yet.

$ show dev ld

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$LDA0: (SIGMA) Online 0
$4$LDA1: (SIGMA) Online 0
$ mou/system/noassist/rebuild dsa999 /shadow=($4$lda1:) itrcshad ! mount "local" device into shadow.
%MOUNT-F-NOSUCHDEV, no such device available
$ show dev dsa999

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA999: Mounted 0 (remote mount) 1
$ ld connect DISK$ARCHIVE:[000000]DISK2.DSK;1 lda2 /share
$ mou/system/noassist/rebuild dsa999 /shadow=($4$lda1:) itrcshad
%MOUNT-I-MOUNTED, ITRCSHAD mounted on _DSA999:
%MOUNT-I-SHDWMEMCOPY, _$4$LDA1: (SIGMA) added to the shadow set with a copy operation
%MOUNT-I-ISAMBR, _$4$LDA2: (SIGMA) is a member of the shadow set
$

It would be nice if the error message indicated which device was not available.

My guess is that you should be checking for the other member of the shadowset, not the locally attached one.

Also, in your MOUNT_COMMON.COM command procedure, in the section where you mount the disk on EMU2 and EMU1 is in the cluster, you may as well specify all shadowset members and use $ mount/policy=require_members if you want to be certain that EMU2 doesn't create a single member shadowset with a stale member. Of course in the case where you have overridden the check for EMU1, you would not want this to occur.

Good luck,

Jon
it depends
Jon Pinkley
Honored Contributor

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

An even more interesting scenario happens when all the devices are available to the system doing the mount, but the new member is not available to a node that already has the shadowset mounted.

i.e. to reproduce:

nodeA:

$ ld connect disk1.dsk lda1 /share /allo=4
$ mount/system dsa999 /shadow=$4$lda1:)

nodeB:

$ ld connect disk1.dsk lda1 /share /allo=4
$ ld connect disk2.dsk lda2 /share
$ mount/system dsa999 /shadow=($4$lda1:,$4$lda2:)

At this point the process mounting the shadowset just hangs. No error message is generated.

However the mount count on NodeA now shows up as 2 in show device dsa999:. The mount will complete when the lda2 device is connected on nodeA:

i.e. on nodeA:
$ ld connect disk2.dsk lda2 /share

At this point mount of shadowset completes on nodeB

This is not a situation that would normally occur with served devices, since MSCP service devices are seen by all cluster members. Howerver if the configure process was not running on a node that had a locally attached device, it is at least feasible that this situation could occur.

I was expecting to get the "%MOUNT-F-NOSUCHDEV" message, although it would have been confusing to someone that saw all the member devices available on the node doing the mount.

Jon
it depends
MarkOfAus
Valued Contributor

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Hi Jon,

Sorry for the late reply, I hope you can still remember the discussion.

" My guess is that you should be checking for the other member of the shadowset, not the locally attached one. "

Yes, I have done that. I modified the mount_common.com routine to output more information to the startup.log (both attached).

-----startup.log-bad----------------------------------
%STDRV-I-STARTUP, OpenVMS startup begun at 13-NOV-2007 14:51:30.29
%DCL-I-SUPERSEDE, previous value of SYS$AUDIT_SERVER_INHIBIT has been superseded
Finish syconfig
SYLOGICALS.COM> Begin
MOUNT_COMMON.COM> Begin
node=EMU1
MOUNT_COMMON> Emu1 is in the cluster
MOUNT_COMMON> Checking device exists: dkc300
MOUNT_COMMON> Device exists, ready to check availability (dkc300)
MOUNT_COMMON> Checking device is available: dkc300

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$DKC0: (EMU2) Mounted 0 ALPHASYS1 41129727 198 1
MOUNT_COMMON> Device available, ready to mount (dkc300)

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$DKC0: (EMU2) Mounted 0 ALPHASYS1 41129727 198 1
$4$DKC100: (EMU2) Online 0
$4$DKC200: (EMU2) Online 0
$4$DKC300: (EMU2) Online 0
$4$DKC400: (EMU2) Online 0
$4$DQA0: (EMU2) Online 0
$4$DQA1: (EMU2) Offline 1
$4$DQB0: (EMU2) Offline 1
$4$DQB1: (EMU2) Offline 1
$4$DUA0: (EMU1) HostUnavailable 0
$4$DVA0: (EMU2) Online 0
%MOUNT-F-NOSUCHDEV, no such device available
MOUNT_COMMON> Checking device exists: dsa3
MOUNT_COMMON> Device exists, ready to check availability (dsa3)
MOUNT_COMMON> Checking device is available: dsa3

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA3: Mounted 0 (remote mount) 1
$4$DKC0: (EMU2) Mounted 0 ALPHASYS1 41129727 198 1
MOUNT_COMMON> Device available, ready to mount (dsa3)
All's good
MOUNT_COMMON.COM> End
SYLOGICALS> Redefining Qman$Master
"QMAN$MASTER" = "DISK$COMMON:[COMMON.SYSTEM]" (LNM$SYSTEM_TABLE)
SYLOGICALS.COM> End
%%%%%%%%%%% OPCOM 13-NOV-2007 14:51:43.94 %%%%%%%%%%%

The operator console and logfile will not be enabled.
Change OPC$OPA0_ENABLE & OPC$LOGFILE_ENABLE in SYLOGICALS.COM
to enable them.
%RUN-S-PROC_ID, identification of created process is 2140010A
%RUN-S-PROC_ID, identification of created process is 2140010B
%STDRV-E-NOSUCHFILE, File SYS$STARTUP:VMS$CONFIG-050_AUDIT_SERVER.COM does not exist.
%LICENSE-F-BADLDBWRIT, error writing to license database DSA3:[COMMON.SYSTEM]LMF$LICENSE.LDB;
-RMS-E-DNR, device not ready, not mounted, or unavailable
Copyright 2003 Hewlett-Packard Development Company, L.P.
%RMS-F-DNR, device not ready, not mounted, or unavailable
%DECdtm-F-NODECnet, the TP_SERVER process was not started because either:

o DECnet-Plus is not started or is not configured, or

o The SYS$NODE_FULLNAME logical name is not defined

This could be because when you installed DECnet-Plus and were prompted
for the system's full name, you specified a local name instead of a
DECdns or Domain name.

If you want to use DECdtm services, make sure that DECnet-Plus is started and
configured and that SYS$NODE_FULLNAME is defined, then use the following
command to start the TP_SERVER process:

$ @SYS$STARTUP:DECDTM$STARTUP.COM

%LICENSE-E-NOAUTH, DEC OPENVMS-ALPHA use is not authorized on this node
-LICENSE-F-NOT_STARTED, License Management Facility is not started
-LICENSE-I-SYSMGR, please see your system manager
%LICENSE-F-NOT_STARTED, License Management Facility is not started
Running SYSTARTUP_VMS.COM

The OpenVMS system is now executing the site-specific startup commands.

%JBC-E-JOBQUEDIS, system job queue manager is not running
%TCPIP-F-NOFILE, cannot find file DSA3:[COMMON.SYSTEM]SYSUAF.DAT;


----END STARTUP.LOG-BAD-----------------------------------------------

I apologise for the formatting.

You can note that the device DSA3: is reporing via the various f$getdvi calls, that it exists & is available yet it does not appear in the show device output NOR will it mount.

I am fairly certain, now, that the main server, EMU1, did NOT have DSA3: mounted as /cluster.

If you look in the log for the successful startup (attached), you will see that the remote disks are showing as (remote mount) in the show device. When DSA3 is not able to be mounted the remote systems disks are not visible (see the above included startup log-bad).

None of the remote system (in this case EMU1)'s disks are mounted as /cluster, except prior to the successful startup, when I added /cluster to the dsa3: mount in mount_common and then rebooted the production server.
Jon Pinkley
Honored Contributor

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Mark,

First I am assuming that these logs are from booting EMU2. The "node=EMU1" is being printed in the WAIT_FOR_MASTER/NODE_CHECK section, and it confused me for a while.

Note that in the log of the failed attempt, there are no $3$DKA devices listed in the show device d output, but when it worked, the $3$DKA devices were listed. You just need to wait for the configure process to detect the MSCP served devices before moving on.

In the EMU2 node specific section, right after gosub WAIT_FOR_MASTERS, you wait for device dkc300 to become available. But $4$DKC300 is on the EMU2 node. You need to check for the EMU1 served device, $3$DKA300 instead. See my example using LD devices.

I made changes (no testing done, so read carefully and use differences to see what was changed before attempting to use) and am attaching the changed version as a txt file.

In addition to changing which device to wait for, I also replaced the device names and label in the mount commands with symbols that are initialized at the top of the procedure. I also changed the text that is printed so it is less confusing to someone that looks at the console output.

You have some code that is cluster related that I would review. Specifically, how are you protecting yourself from a partitioned cluster? Is Wombat a quorum node? I put some of my questions in comments in the attached command procedure; they will pop out in the output from differences.

Good Luck,

Jon
it depends
MarkOfAus
Valued Contributor

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Hi Jon,

" First I am assuming that these logs are from booting EMU2. The "node=EMU1" is being printed in the WAIT_FOR_MASTER/NODE_CHECK section, and it confused me for a while."

Yes, you are right. I apologise for the confusion.

" Note that in the log of the failed attempt, there are no $3$DKA devices listed in the show device d output, but when it worked, the $3$DKA devices were listed. You just need to wait for the configure process to detect the MSCP served devices before moving on.
"

Indeed this is a fault on my behalf. I should be looking for $3$dka devices, not local ones. Thank you for finding that.

" You have some code that is cluster related that I would review. Specifically, how are you protecting yourself from a partitioned cluster? Is Wombat a quorum node? I put some of my questions in comments in the attached command procedure; they will pop out in the output from differences.
"

I would very much appreciate your feedback on it. No, Wombat is not a quorum node, not yet. At present the votes are 1 with Emu1 having 1 vote & emu2 have 0 votes. Emu2 is purely used to receive data in a remote location and to be "switch on to production" should a disaster occur.


I will be running reboots tomorrow & the weekend, so I can try out your changes and see how it works.

I really appreciate the time you have put into answering my queries, you and Hoff are a God-send.

Regards,
Mark.
MarkOfAus
Valued Contributor

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Hi Jon (and others),

After reboot testing ad nauseum, and your considerable input, I have finally got a predictable outcome 100% of the time.

The shadowed volme DSA3: now always starts up, after a short delay.

Jon, as you suggested, I was trying to perform the duty of the cluster voting scheme by checking if Emu1 was up before Emu2 could continue. I have since removed all that rubbish code, so the command file is much more streamlined and compact.

I have attached the mount_common.com command file for others to use in the future.

Many thanks to everyone and especially Jon.

Regards,
Mark
MarkOfAus
Valued Contributor

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

See the previous message