Re: Mounting of HBVS disks in sylogicals.com fails on a node.

MarkOfAus · ‎10-28-2007

When a node was shutdown, and subsequently rebooted, it failed in the sylogicals.com to mount a shadowed disk set.
The disk set is mounted on the other machine as DSA3 consisting of the internal disk DKA300.
It fails with this message "%MOUNT-F-NOSUCHDEV".

This wouldn't be such a big issue if not for the fact it contains the SYSUAF, RIGHTSLIST, LICENSE etc.

The command in sylogicals:
mount/system dsa3:/shad=($4$dkc300) /noassist data3

Any assistance would be greatly appreciated.

Also, how do you stop the cluster messages about a node shutting down appearing on other nodes (is it central?)?

Cheers
Mark

Hoff · ‎10-28-2007

The message was probably correct, when it was issued. The device was not known.

Stick a time delay in front of the mount or use a retry loop with a delay in the processing; your bootstrap probably got to the MOUNT faster than the device configure process detected the particular device.

What I usually have is an f$getdvi("whatsit","EXISTS") lexical combined in a loop with a WAIT command, and a IF counter .le. limit THEN GOTO label and related counter processing to avoid an infinite loop.

This logic is then usually wrapped into a subroutine, and the code mounting the volume calls the subroutine for each of the volumes.

I'd probably scrounge up another member for that shadowset, too. A single-volume shadowset does certainly have some uses, but the configurations here are somewhat specialized. The biggest real benefit of RAID-1 HBVS comes only from having multiple spindles...

MarkOfAus · ‎10-28-2007

Hoff,

Thanks for your speedy reply (always appreciated!)

If I understand you correctly, then this is a multiple member shadowset. Node1 has dka300, Node 2 has dkc300.

Would running an io autoconfigure help?

There is already a delay in the routine so it waits for the main server to be up before continuing so I will put it in there.

Regards,
Mark

Hoff · ‎10-28-2007

[[[If I understand you correctly, then this is a multiple member shadowset. Node1 has dka300, Node 2 has dkc300.]]]

That this is a multi-member shadowset wasn't obvious to me from what was posted -- on re-reading it, I can infer what was intended. (I don't like inferring these sorts of things, though. Tends to get me in (more) trouble. But I digress.)

Regardless, if this is a multi-member shadowset, I'd specify both devices on the shadowset virtual unit (VU) mount command. But that's me. Something like this:

mount/system -
dsa3:/shad=($4$dkc300:,$whatever$dka300) -
/noassist data3

I'd probably also look to string together the SCSI buses, assuming the (OpenVMS Alpha?) hosts, versions, and SCSI controllers permit it. And to enable port allocation classes.

[[[Would running an io autoconfigure help?]]]

With the timing of the discovery of the device? Probably not. It's already running. Well, explicitly running it might well perturb and/or delay things such that the devices are discovered and configured. But so would a wait-loop.

And as a side-note, do take a look at the SYS$EXAMPLES:MSCPMOUNT.COM example command procedure; that sort of processing can be useful in configurations that have nodes and served disks coming and going. (I don't like tossing MOUNT /CLUSTER around, due to bad experiences with same over the years. I tend to prefer issuing a MOUNT /SYSTEM on each node.)

John Gillings · ‎10-28-2007

Mark,

Because of all the ways disks can be connected to an OpenVMS system, you can't necessarily just mount a disk.

Instead spreading the code to mount a volume in many places, I prefer to remove all my MOUNT commands into a module which can be called when necessary. Abstract the idea of a "disk" into a logical entity and hide the detail. So, your SYLOGICALS might do something like:

$ @SYS$STARTUP:GET_DISK CLUSTER_DATA
$ IF .NOT$STATUS
$ THEN
$ ! handle error
$ ENDIF

When GET_DISK has returned succesfully you know you can access the storage area via its logical name.

Let GET_DISK know the details of where CLUSTER_DATA is stored and how it's mounted.

Use F$GETDVI item "EXISTS" to see if the physical disks exist yet, with a time delay and retry if they're not visible. Then use F$GETDVI "MOUNTED" to check if you need to mount it. Finally you can mount the disk.

Using this type of mechanism you can make it very easy to move logical entities around, and change details like physical disk, shadowed or non-shadowed, how many members, and if they're required to be mounted. In a split site, you can also implement blanket rules for mounting 3, 2 or 1 member shadow sets via user defined SYSGEN parameters. My recommendation for mounting shadow sets is to wait for all members to be present and use /POLICY=REQUIRE_MEMBERS. This reduces the changes of mounting shadow sets backwards.

Regarding the cluster messages, are you talking about OPCOM or connection manager messages? Maybe post a sample, and explain how and/or where you want the message to be written.

A crucible of informative mistakes

MarkOfAus · ‎10-28-2007

Hoff,

I apologies for making you infer, I was tardy in not fully explaining the situation.

"Regardless, if this is a multi-member shadowset, I'd specify both devices on the shadowset virtual unit (VU) mount command. But that's me. Something like this:

mount/system -
dsa3:/shad=($4$dkc300:,$whatever$dka300) -
/noassist data3
"

Why?

I have a common routine, see attached. As per your previous reply, I added a routine to check if the device exists, see the WAIT_FOR_DEVICE "subroutine". The key part applies to EMU2 ie, if node.eqs."EMU2"...

Emu2 owns the disk $4$dkc300, emu1 owns the disk $3$dka300. Together they happily form dsa3: (oh the irony!)

This is what happened in the startup.log after the changes were made:

-BEGIN LOG---------------------------------
%STDRV-I-STARTUP, OpenVMS startup begun at 29-OCT-2007 13:07:19.30
SYLOGICALS.COM> Begin
MOUNT_COMMON.COM> Begin
node=EMU1
MOUNT_COMMON> Device exists, ready to mount (dkc300)
%MOUNT-F-NOSUCHDEV, no such device available
MOUNT_COMMON.COM> End
-END LOG---------------------------------

Then I halted the console, and tried again, and this is the output from the successful startup:

-BEGIN LOG---------------------------------
%STDRV-I-STARTUP, OpenVMS startup begun at 29-OCT-2007 13:21:08.33
SYLOGICALS.COM> Begin
MOUNT_COMMON.COM> Begin
node=EMU1
MOUNT_COMMON> Device exists, ready to mount (dkc300)
%MOUNT-I-MOUNTED, DATA3 mounted on _DSA3:
%MOUNT-I-SHDWMEMCOPY, _$4$DKC300: (EMU2) added to the shadow set with a copy operation
%MOUNT-I-ISAMBR, _$3$DKA300: (EMU1) is a member of the shadow set
MOUNT_COMMON.COM> End
-END LOG---------------------------------

Is it not curious that it failed the first time but succeeded in the latter without any modification to the routine

Regards,
Mark

MarkOfAus · ‎10-28-2007

John,

I have two major routines for disk mounting. One is the one attached in the previous reply to Hoff. The other routine is called by systartup_vms.com to mount the data disks. This works ok (so far...)

The routine under discussion here has the sole purpose, in this circumstance, to mount the shadowed disk(s) which contain the sysuaf, rightslists, license, proxy et al. The cluster is running, the other node is running and is the master for the dsa3 shadow set.

"Because of all the ways disks can be connected to an OpenVMS system, you can't necessarily just mount a disk.
"

I tried to do this with the routine, and as Hoff also suggested, it took a look at mscp_mount and the concepts I used in my own command file. So I am trying to get to your suggested mode of operation, but I seem to have some form of timing issue.

"Use F$GETDVI item "EXISTS" to see if the physical disks exist yet, with a time delay and retry if they're not visible. Then use F$GETDVI "MOUNTED" to check if you need to mount it. Finally you can mount the disk."

I would then be interested in your view of the routine I wrote. Are you saying that I should also check to see if logical device DSA3 is mounted? That I can do. I have perhaps wrongly assumed that if the primary server is up (in normal day-to-day operation), that DSA3 is already active & mounted.

As an aside, how can I prevent dsa3: from going into mount verification if the system shuts down - increase the timeout? Can I test for this in f$getdvi?

"My recommendation for mounting shadow sets is to wait for all members to be present and use /POLICY=REQUIRE_MEMBERS. This reduces the changes of mounting shadow sets backwards."

Oh, I would love to do this, but operational circumstances prevent this. Therefore, I have tried to ensure that the primary node "Emu1" is up and only via the use of userd1 parameters will "Emu2" (the secondary node), come up by itself.

"Regarding the cluster messages, are you talking about OPCOM or connection manager messages? Maybe post a sample, and explain how and/or where you want the message to be written."

Sure can post it:

------------------------------------------
SHUTDOWN message on EMU1 from user MARK at _EMU2$OPA0: 08:59:00
EMU2 will shut down in 0 minutes; back up shortly via automatic reboot. Please
log off node EMU2.
Standalone
------------------------------------------

This confuses the users on EMU2, who start logging out (well at least they are well trained to follow operator messages :-) )

Regards,
Mark

MarkOfAus · ‎10-28-2007

John,

Oops, I should have written:

This confuses the users on EMU1, who start logging out (well at least they are well trained to follow operator messages :-) )

I wrote EMU2 instead of EMU1.
The message appears on EMU1 users' terminals, and they don't know to check the specific node name, so they start logging out (and complaining).

Regards,
Mark

Bart Zorn_1 · ‎10-28-2007

Mark,

regarding the suppression of the shutdown messages on other cluster members, do you use the logical name SHUTDOWN$INFORM_NODES ?

HTH,

Bart Zorn

Hoff · ‎10-29-2007

Ok, so EMU1 is the primary and EMU2 is the secondary. I'd (still) mount the disks as previously stated, specifying all nodes and the full path. And I'd use the wait loop as previously specified. (I tend to combine the whole MOUNT sequence into the subroutine; the test for existence and the wait, a test for having been mounted and the MOUNT, etc.) And I'd look to configure shared SCSI buses (assuming the two systems are co-located within the range of appropriate SCSI cables), as this substantially improves uptime and reduces network load.

As for the disaster-level processing and the usual sorts of situations, I'd simply look to avoid starting the applications on the secondaries, or (better) at coding the applications to use locks or such at startup to manage the election of a primary. Or (best) to code the environment to use all of the available cluster member nodes in parallel. I've found that manual switch-over processes tend to fail during disasters; best to have these set up as automatic as us reasonably feasible. Humans can tend to be the error trigger, particularly for seldom-used sequences.

If you are using humans as key components in the fail-over, you'll want to test the fail-over sequencing periodically.

If you'd like to chat on this topic using larger text windows, feel free to contact me off-line. Then one of us can publish up a summary for folks here, or similar such.

Stephen Hoffman
HoffmanLabs LLC

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Mounting of HBVS disks in sylogicals.com fails on a node.