Mounting of HBVS disks in sylogicals.com fails on a node.

MarkOfAus · ‎10-28-2007

When a node was shutdown, and subsequently rebooted, it failed in the sylogicals.com to mount a shadowed disk set.
The disk set is mounted on the other machine as DSA3 consisting of the internal disk DKA300.
It fails with this message "%MOUNT-F-NOSUCHDEV".

This wouldn't be such a big issue if not for the fact it contains the SYSUAF, RIGHTSLIST, LICENSE etc.

The command in sylogicals:
mount/system dsa3:/shad=($4$dkc300) /noassist data3

Any assistance would be greatly appreciated.

Also, how do you stop the cluster messages about a node shutting down appearing on other nodes (is it central?)?

Cheers
Mark

Hoff · ‎10-28-2007

The message was probably correct, when it was issued. The device was not known.

Stick a time delay in front of the mount or use a retry loop with a delay in the processing; your bootstrap probably got to the MOUNT faster than the device configure process detected the particular device.

What I usually have is an f$getdvi("whatsit","EXISTS") lexical combined in a loop with a WAIT command, and a IF counter .le. limit THEN GOTO label and related counter processing to avoid an infinite loop.

This logic is then usually wrapped into a subroutine, and the code mounting the volume calls the subroutine for each of the volumes.

I'd probably scrounge up another member for that shadowset, too. A single-volume shadowset does certainly have some uses, but the configurations here are somewhat specialized. The biggest real benefit of RAID-1 HBVS comes only from having multiple spindles...

MarkOfAus · ‎10-28-2007

Hoff,

Thanks for your speedy reply (always appreciated!)

If I understand you correctly, then this is a multiple member shadowset. Node1 has dka300, Node 2 has dkc300.

Would running an io autoconfigure help?

There is already a delay in the routine so it waits for the main server to be up before continuing so I will put it in there.

Regards,
Mark

Hoff · ‎10-28-2007

[[[If I understand you correctly, then this is a multiple member shadowset. Node1 has dka300, Node 2 has dkc300.]]]

That this is a multi-member shadowset wasn't obvious to me from what was posted -- on re-reading it, I can infer what was intended. (I don't like inferring these sorts of things, though. Tends to get me in (more) trouble. But I digress.)

Regardless, if this is a multi-member shadowset, I'd specify both devices on the shadowset virtual unit (VU) mount command. But that's me. Something like this:

mount/system -
dsa3:/shad=($4$dkc300:,$whatever$dka300) -
/noassist data3

I'd probably also look to string together the SCSI buses, assuming the (OpenVMS Alpha?) hosts, versions, and SCSI controllers permit it. And to enable port allocation classes.

[[[Would running an io autoconfigure help?]]]

With the timing of the discovery of the device? Probably not. It's already running. Well, explicitly running it might well perturb and/or delay things such that the devices are discovered and configured. But so would a wait-loop.

And as a side-note, do take a look at the SYS$EXAMPLES:MSCPMOUNT.COM example command procedure; that sort of processing can be useful in configurations that have nodes and served disks coming and going. (I don't like tossing MOUNT /CLUSTER around, due to bad experiences with same over the years. I tend to prefer issuing a MOUNT /SYSTEM on each node.)

John Gillings · ‎10-28-2007

Mark,

Because of all the ways disks can be connected to an OpenVMS system, you can't necessarily just mount a disk.

Instead spreading the code to mount a volume in many places, I prefer to remove all my MOUNT commands into a module which can be called when necessary. Abstract the idea of a "disk" into a logical entity and hide the detail. So, your SYLOGICALS might do something like:

$ @SYS$STARTUP:GET_DISK CLUSTER_DATA
$ IF .NOT$STATUS
$ THEN
$ ! handle error
$ ENDIF

When GET_DISK has returned succesfully you know you can access the storage area via its logical name.

Let GET_DISK know the details of where CLUSTER_DATA is stored and how it's mounted.

Use F$GETDVI item "EXISTS" to see if the physical disks exist yet, with a time delay and retry if they're not visible. Then use F$GETDVI "MOUNTED" to check if you need to mount it. Finally you can mount the disk.

Using this type of mechanism you can make it very easy to move logical entities around, and change details like physical disk, shadowed or non-shadowed, how many members, and if they're required to be mounted. In a split site, you can also implement blanket rules for mounting 3, 2 or 1 member shadow sets via user defined SYSGEN parameters. My recommendation for mounting shadow sets is to wait for all members to be present and use /POLICY=REQUIRE_MEMBERS. This reduces the changes of mounting shadow sets backwards.

Regarding the cluster messages, are you talking about OPCOM or connection manager messages? Maybe post a sample, and explain how and/or where you want the message to be written.

A crucible of informative mistakes

MarkOfAus · ‎10-28-2007

Hoff,

I apologies for making you infer, I was tardy in not fully explaining the situation.

"Regardless, if this is a multi-member shadowset, I'd specify both devices on the shadowset virtual unit (VU) mount command. But that's me. Something like this:

mount/system -
dsa3:/shad=($4$dkc300:,$whatever$dka300) -
/noassist data3
"

Why?

I have a common routine, see attached. As per your previous reply, I added a routine to check if the device exists, see the WAIT_FOR_DEVICE "subroutine". The key part applies to EMU2 ie, if node.eqs."EMU2"...

Emu2 owns the disk $4$dkc300, emu1 owns the disk $3$dka300. Together they happily form dsa3: (oh the irony!)

This is what happened in the startup.log after the changes were made:

-BEGIN LOG---------------------------------
%STDRV-I-STARTUP, OpenVMS startup begun at 29-OCT-2007 13:07:19.30
SYLOGICALS.COM> Begin
MOUNT_COMMON.COM> Begin
node=EMU1
MOUNT_COMMON> Device exists, ready to mount (dkc300)
%MOUNT-F-NOSUCHDEV, no such device available
MOUNT_COMMON.COM> End
-END LOG---------------------------------

Then I halted the console, and tried again, and this is the output from the successful startup:

-BEGIN LOG---------------------------------
%STDRV-I-STARTUP, OpenVMS startup begun at 29-OCT-2007 13:21:08.33
SYLOGICALS.COM> Begin
MOUNT_COMMON.COM> Begin
node=EMU1
MOUNT_COMMON> Device exists, ready to mount (dkc300)
%MOUNT-I-MOUNTED, DATA3 mounted on _DSA3:
%MOUNT-I-SHDWMEMCOPY, _$4$DKC300: (EMU2) added to the shadow set with a copy operation
%MOUNT-I-ISAMBR, _$3$DKA300: (EMU1) is a member of the shadow set
MOUNT_COMMON.COM> End
-END LOG---------------------------------

Is it not curious that it failed the first time but succeeded in the latter without any modification to the routine

Regards,
Mark

MarkOfAus · ‎10-28-2007

John,

I have two major routines for disk mounting. One is the one attached in the previous reply to Hoff. The other routine is called by systartup_vms.com to mount the data disks. This works ok (so far...)

The routine under discussion here has the sole purpose, in this circumstance, to mount the shadowed disk(s) which contain the sysuaf, rightslists, license, proxy et al. The cluster is running, the other node is running and is the master for the dsa3 shadow set.

"Because of all the ways disks can be connected to an OpenVMS system, you can't necessarily just mount a disk.
"

I tried to do this with the routine, and as Hoff also suggested, it took a look at mscp_mount and the concepts I used in my own command file. So I am trying to get to your suggested mode of operation, but I seem to have some form of timing issue.

"Use F$GETDVI item "EXISTS" to see if the physical disks exist yet, with a time delay and retry if they're not visible. Then use F$GETDVI "MOUNTED" to check if you need to mount it. Finally you can mount the disk."

I would then be interested in your view of the routine I wrote. Are you saying that I should also check to see if logical device DSA3 is mounted? That I can do. I have perhaps wrongly assumed that if the primary server is up (in normal day-to-day operation), that DSA3 is already active & mounted.

As an aside, how can I prevent dsa3: from going into mount verification if the system shuts down - increase the timeout? Can I test for this in f$getdvi?

"My recommendation for mounting shadow sets is to wait for all members to be present and use /POLICY=REQUIRE_MEMBERS. This reduces the changes of mounting shadow sets backwards."

Oh, I would love to do this, but operational circumstances prevent this. Therefore, I have tried to ensure that the primary node "Emu1" is up and only via the use of userd1 parameters will "Emu2" (the secondary node), come up by itself.

"Regarding the cluster messages, are you talking about OPCOM or connection manager messages? Maybe post a sample, and explain how and/or where you want the message to be written."

Sure can post it:

------------------------------------------
SHUTDOWN message on EMU1 from user MARK at _EMU2$OPA0: 08:59:00
EMU2 will shut down in 0 minutes; back up shortly via automatic reboot. Please
log off node EMU2.
Standalone
------------------------------------------

This confuses the users on EMU2, who start logging out (well at least they are well trained to follow operator messages :-) )

Regards,
Mark

MarkOfAus · ‎10-28-2007

John,

Oops, I should have written:

This confuses the users on EMU1, who start logging out (well at least they are well trained to follow operator messages :-) )

I wrote EMU2 instead of EMU1.
The message appears on EMU1 users' terminals, and they don't know to check the specific node name, so they start logging out (and complaining).

Regards,
Mark

Bart Zorn_1 · ‎10-28-2007

Mark,

regarding the suppression of the shutdown messages on other cluster members, do you use the logical name SHUTDOWN$INFORM_NODES ?

HTH,

Bart Zorn

Hoff · ‎10-29-2007

Ok, so EMU1 is the primary and EMU2 is the secondary. I'd (still) mount the disks as previously stated, specifying all nodes and the full path. And I'd use the wait loop as previously specified. (I tend to combine the whole MOUNT sequence into the subroutine; the test for existence and the wait, a test for having been mounted and the MOUNT, etc.) And I'd look to configure shared SCSI buses (assuming the two systems are co-located within the range of appropriate SCSI cables), as this substantially improves uptime and reduces network load.

As for the disaster-level processing and the usual sorts of situations, I'd simply look to avoid starting the applications on the secondaries, or (better) at coding the applications to use locks or such at startup to manage the election of a primary. Or (best) to code the environment to use all of the available cluster member nodes in parallel. I've found that manual switch-over processes tend to fail during disasters; best to have these set up as automatic as us reasonably feasible. Humans can tend to be the error trigger, particularly for seldom-used sequences.

If you are using humans as key components in the fail-over, you'll want to test the fail-over sequencing periodically.

If you'd like to chat on this topic using larger text windows, feel free to contact me off-line. Then one of us can publish up a summary for folks here, or similar such.

Stephen Hoffman
HoffmanLabs LLC

Jon Pinkley · ‎10-29-2007

Mark,

If each of the shadow members has only a single system with a direct connection, i.e. if DKA300 is directly attached only to nodeA and DKC300 is directly attached only to nodeB, and you can't share the SCSI bus between the systems, you may be interested in trying to avoid a full copy when the member is reintroduced when a system boots.

If you are running non-VAX VMS 7.3+, you should be able to take advantage of write bitmaps to minimize the time it takes to return a member to steady state.

If a member's only path is via the system that is being shutdown, that system can request that the member be dismounted by the other system (using sysman). The command to dismount is

$ dismount /policy=minicopy

The bitmap is created on the node that does the dismount, therefore the dismount must be done on a node that will remain up during the reboot.

I haven't used the method John Gillings recommended, [/policy=require_members], but I just tried it and it works.

So in syshutdwn

$! with LOG_IO priv
$! if member only accessible via this node
$! request other node to dismount member
$! with use of sysmanini this can be done with single dcl command line.

Contents of exe_other_node.sysmanini
set environment /node=
set profile/priv=log_io

$ define/user sysmanini exe_other_node.sysmanini
$ mcr sysman do dismount/policy=minicopy

When disks are mounted, you do not have to specify /policy=minicopy unless you want the mount to fail if the member can't be mounted without a full copy. If a minicopy bitmap exists, it will be used. You can specify /policy=require_members, although this is most important on the initial mount of the virtual unit, to ensure that the most recent member is used as the master.

I've attached an example showing commands and their effect on bitmaps and remounting of a member (that was static during the time it was dismounted).

Good Luck,

Jon

it depends

Jon Pinkley · ‎10-29-2007

Here's the attachment I left off.

Jon

it depends

MarkOfAus · ‎10-29-2007

Bart,

"do you use the logical name SHUTDOWN$INFORM_NODES"

No, but I must say that was the first thing I checked for.

Regards,
Mark.

MarkOfAus · ‎10-29-2007

Hoff,

" Ok, so EMU1 is the primary and EMU2 is the secondary. I'd (still) mount the disks as previously stated, specifying all nodes and the full path. And I'd use the wait loop as previously specified. (I tend to combine the whole MOUNT sequence into the subroutine; the test for existence and the wait, a test for having been mounted and the MOUNT, etc.) "

Ok, I will rationalise the approach; point taken.

"And I'd look to configure shared SCSI buses (assuming the two systems are co-located within the range of appropriate SCSI cables), as this substantially improves uptime and reduces network load."

The systems are geographically seperated, and have their own closed fibre connections.

"As for the disaster-level processing and the usual sorts of situations, I'd simply look to avoid starting the applications on the secondaries, or (better) at coding the "

No problem there as the licensing we have precludes running the application on both servers. So the secondary server is really just idling behind the scenes as a real-time backup, receiving data.

"...Or (best) to code the environment to use all of the available cluster member nodes in parallel. I've found that manual switch-over processes tend to fail during disasters; best to have these set up as automatic as us reasonably feasible. Humans can tend to be the error trigger, particularly for seldom-used sequences."

I guess I don't have an option, given the constraints, so a manual switch-over is the only way I can go, for now.

Regards,
Mark.

MarkOfAus · ‎10-29-2007

Jon,

"If each of the shadow members has only a single system with a direct connection, i.e. if DKA300 is directly attached only to nodeA and DKC300 is directly attached only to nodeB, and you can't share the SCSI bus between the systems, you may be interested in trying to avoid a full copy when the member is reintroduced when a system boots."

You are correct. Each system has its own disks, no shared storage (Vms 7.3-2).

You are also astute. I had looked at the minicopy, for future usage, because the disk at issue today is only a 36GB disk, so when it comes back into the shadow set the full copy is fairly quick. When the 300G disks are added in the next few weeks, that "fairly quick" copy will be a "bloody long one".

"So in syshutdwn

$! with LOG_IO priv
$! if member only accessible via this node
$! request other node to dismount member
$! with use of sysmanini this can be done with single dcl command line.

Contents of exe_other_node.sysmanini
set environment /node=
set profile/priv=log_io

$ define/user sysmanini exe_other_node.sysmanini
$ mcr sysman do dismount/policy=minicopy
"

Brilliant! Thank you Jon, I will do this.

MarkOfAus · ‎10-29-2007

Bart,

"regarding the suppression of the shutdown messages on other cluster members, do you use the logical name SHUTDOWN$INFORM_NODES ?"

I think you may be onto something, though. I was under the impression that if it was blank it notifies none. Perhaps I should revise that assumption to "if it is blank, it will notify all nodes."?

Regards,
Mark.

Jon Pinkley · ‎10-30-2007

If anyone is aware of a better way to handle the "member with a single connection" case, than using sysman, I would be interested.

I have read the "HP Volume Shadowing for OpenVMS Alpha 7.3-2" manual, and it is silent on the subject as far as I know. The main focus of minicopy is for backups. However, from experience I can say that a master minicopy bitmap on a system with only an MSCP served connection is sufficient to avoid a full copy, and that the bitmap survives on the node it is created on, across the removal and reintroduction of the other node with the direct connection.

A nice "enhancement" to dismount/policy=minicopy would be the ability to specify a node for which the dismount should be initiated, and therefore where the master bitmap should be created.

For example:

$ dismount/policy=minicopy=node:omega $4$DKC300: ! not implemented !!!
would tell omega to dismount the member and create the master minicopy bitmap. Perhaps there could be a list of nodes specified, in which case the first node in the list that was currently a cluster member would master the bitmap. The check for log_io privilege would be on the requesting node, so this assumes the security domain is the cluster, i.e. homogenous privileges on all nodes of the cluster (shared SYSAUF).

Also, if anyone knows of any problems with my suggestion, I would also like to hear about them, as I have never seen this recommended or documented.

Jon

it depends

MarkOfAus · ‎10-30-2007

Jon,

"I have read the "HP Volume Shadowing for OpenVMS Alpha 7.3-2" manual, and it is silent on the subject as far as I know. The main focus of minicopy is for backups. However, from experience I can say that a master minicopy bitmap on a system with only an MSCP served connection is sufficient to avoid a full copy, and that the bitmap survives on the node it is created on, across the removal and reintroduction of the other node with the direct connection.
"

The manual is helpful, but as you suggest, it is often one-tracked in its explanations. No alternative scenarios are given, because, to me, examples mean much more than paragraph after paragraph of explanatory notes. Often the manuals assume a level of OpenVMS knowledge by the reader that is not there.

"A nice "enhancement" to dismount/policy=minicopy would be the ability to specify a node for which the dismount should be initiated, and therefore where the master bitmap should be created.
"

This is a brilliant idea, and I can't understand why it isn't available BUT the LOG_IO privilege issue seems a sticking point and is probably why using sysman is the only way to do it.

I am going to use your suggestion today, first manually then in a command file at shutdown.

Martin Hughes · ‎10-30-2007

I think we had a similar discussion about 6 months ago. Using SMISERVER to perform the dismount on another node is an option I hadn't considered. That would certainly allow you to create the master write bitmap where it belongs.

I'm still inclined to handle the dismount/mount processes manually though. Mounting and dismounting locally attached shadowset members can be a dangerous business, and I'd argue that you have less control if you automate the process. I tend to just write the mount/dismount scripts and then execute them when and where I choose.

For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2

Jon Pinkley · ‎10-30-2007

I believe the thread Martin is referring to is this one:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1118643

After thinking a bit more about this, there should really be an option to "do the right thing" when dismounting a virtual unit at shutdown. I.e. if the virtual unit has at least one member that has a direct path to another cluster member, but there are some members of the virtual unit that are directly attached only to the system being shutdown, then the dismount should first initiate a dismount of the members that have no direct paths to other cluster members, and this dismount should create a minicopy bitmap on a cluster member that currently has the virtual unit mounted, and has a direct connection to one of the other members.

The purpose of doing this is to avoid full copies when the system being shutdown reboots. Also, by dismounting the member, the remaining cluster nodes won't have to timeout the connection to the (MSCP served) member that stops responding when the MSCP serving node shuts down. With HBMM, multiple cluster nodes can have master copies, with minicopy, this doesn't seem to be possible, as the master copy is created on the node that creates the bitmap with the dismount of mount command.

Since this discussion is not related to "failure to mount disks", perhaps we should start a new topic discussing the use of minicopy during shutdown.

Jon

it depends

Jon Pinkley · ‎11-01-2007

Mark,

Back to your original question about the "%MOUNT-F-NOSUCHDEV" message.

My guess is that this was complaining about the MSCP served member that had not yet been detected.

The SHADOW_SERVER process has to make sure that all members mounted by any node are accessible by the node that is attempting to mount the DSA virtual unit.

I can reproduce the error by doing the following, using the LDDRIVER. LD devices are not MSCP served, but they are loaded on each system, so it is easy to simulate the device not yet being detected.

NodeA:

$ ld connect DISK$ARCHIVE:[000000]DISK1.DSK;1 lda1 /share
$ ld connect DISK$ARCHIVE:[000000]DISK2.DSK;1 lda2 /share
$ show device ld

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$LDA0: (OMEGA) Online 0
$4$LDA1: (OMEGA) Online 0
$4$LDA2: (OMEGA) Online 0
$ mou/system/noassist/rebuild dsa999 /shadow=($4$lda1:,$4$lda2:) itrcshad
%MOUNT-I-MOUNTED, ITRCSHAD mounted on _DSA999:
%MOUNT-I-SHDWMEMSUCC, _$4$LDA1: (OMEGA) is now a valid member of the shadow set
%MOUNT-I-SHDWMEMSUCC, _$4$LDA2: (OMEGA) is now a valid member of the shadow set
$ dism/pol=minicopy lda1

NodeB:

$ ld connect disk1.dsk lda1: /share
$! note we will not connect lda2, this simulates the MSCP served drive not being "seen" yet.

$ show dev ld

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$LDA0: (SIGMA) Online 0
$4$LDA1: (SIGMA) Online 0
$ mou/system/noassist/rebuild dsa999 /shadow=($4$lda1:) itrcshad ! mount "local" device into shadow.
%MOUNT-F-NOSUCHDEV, no such device available
$ show dev dsa999

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA999: Mounted 0 (remote mount) 1
$ ld connect DISK$ARCHIVE:[000000]DISK2.DSK;1 lda2 /share
$ mou/system/noassist/rebuild dsa999 /shadow=($4$lda1:) itrcshad
%MOUNT-I-MOUNTED, ITRCSHAD mounted on _DSA999:
%MOUNT-I-SHDWMEMCOPY, _$4$LDA1: (SIGMA) added to the shadow set with a copy operation
%MOUNT-I-ISAMBR, _$4$LDA2: (SIGMA) is a member of the shadow set
$

It would be nice if the error message indicated which device was not available.

My guess is that you should be checking for the other member of the shadowset, not the locally attached one.

Also, in your MOUNT_COMMON.COM command procedure, in the section where you mount the disk on EMU2 and EMU1 is in the cluster, you may as well specify all shadowset members and use $ mount/policy=require_members if you want to be certain that EMU2 doesn't create a single member shadowset with a stale member. Of course in the case where you have overridden the check for EMU1, you would not want this to occur.

Good luck,

Jon

it depends

Jon Pinkley · ‎11-02-2007

An even more interesting scenario happens when all the devices are available to the system doing the mount, but the new member is not available to a node that already has the shadowset mounted.

i.e. to reproduce:

nodeA:

$ ld connect disk1.dsk lda1 /share /allo=4
$ mount/system dsa999 /shadow=$4$lda1:)

nodeB:

$ ld connect disk1.dsk lda1 /share /allo=4
$ ld connect disk2.dsk lda2 /share
$ mount/system dsa999 /shadow=($4$lda1:,$4$lda2:)

At this point the process mounting the shadowset just hangs. No error message is generated.

However the mount count on NodeA now shows up as 2 in show device dsa999:. The mount will complete when the lda2 device is connected on nodeA:

i.e. on nodeA:
$ ld connect disk2.dsk lda2 /share

At this point mount of shadowset completes on nodeB

This is not a situation that would normally occur with served devices, since MSCP service devices are seen by all cluster members. Howerver if the configure process was not running on a node that had a locally attached device, it is at least feasible that this situation could occur.

I was expecting to get the "%MOUNT-F-NOSUCHDEV" message, although it would have been confusing to someone that saw all the member devices available on the node doing the mount.

Jon

it depends

MarkOfAus · ‎11-13-2007

Hi Jon,

Sorry for the late reply, I hope you can still remember the discussion.

" My guess is that you should be checking for the other member of the shadowset, not the locally attached one. "

Yes, I have done that. I modified the mount_common.com routine to output more information to the startup.log (both attached).

-----startup.log-bad----------------------------------
%STDRV-I-STARTUP, OpenVMS startup begun at 13-NOV-2007 14:51:30.29
%DCL-I-SUPERSEDE, previous value of SYS$AUDIT_SERVER_INHIBIT has been superseded
Finish syconfig
SYLOGICALS.COM> Begin
MOUNT_COMMON.COM> Begin
node=EMU1
MOUNT_COMMON> Emu1 is in the cluster
MOUNT_COMMON> Checking device exists: dkc300
MOUNT_COMMON> Device exists, ready to check availability (dkc300)
MOUNT_COMMON> Checking device is available: dkc300

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$DKC0: (EMU2) Mounted 0 ALPHASYS1 41129727 198 1
MOUNT_COMMON> Device available, ready to mount (dkc300)

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$4$DKC0: (EMU2) Mounted 0 ALPHASYS1 41129727 198 1
$4$DKC100: (EMU2) Online 0
$4$DKC200: (EMU2) Online 0
$4$DKC300: (EMU2) Online 0
$4$DKC400: (EMU2) Online 0
$4$DQA0: (EMU2) Online 0
$4$DQA1: (EMU2) Offline 1
$4$DQB0: (EMU2) Offline 1
$4$DQB1: (EMU2) Offline 1
$4$DUA0: (EMU1) HostUnavailable 0
$4$DVA0: (EMU2) Online 0
%MOUNT-F-NOSUCHDEV, no such device available
MOUNT_COMMON> Checking device exists: dsa3
MOUNT_COMMON> Device exists, ready to check availability (dsa3)
MOUNT_COMMON> Checking device is available: dsa3

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA3: Mounted 0 (remote mount) 1
$4$DKC0: (EMU2) Mounted 0 ALPHASYS1 41129727 198 1
MOUNT_COMMON> Device available, ready to mount (dsa3)
All's good
MOUNT_COMMON.COM> End
SYLOGICALS> Redefining Qman$Master
"QMAN$MASTER" = "DISK$COMMON:[COMMON.SYSTEM]" (LNM$SYSTEM_TABLE)
SYLOGICALS.COM> End
%%%%%%%%%%% OPCOM 13-NOV-2007 14:51:43.94 %%%%%%%%%%%

The operator console and logfile will not be enabled.
Change OPC$OPA0_ENABLE & OPC$LOGFILE_ENABLE in SYLOGICALS.COM
to enable them.
%RUN-S-PROC_ID, identification of created process is 2140010A
%RUN-S-PROC_ID, identification of created process is 2140010B
%STDRV-E-NOSUCHFILE, File SYS$STARTUP:VMS$CONFIG-050_AUDIT_SERVER.COM does not exist.
%LICENSE-F-BADLDBWRIT, error writing to license database DSA3:[COMMON.SYSTEM]LMF$LICENSE.LDB;
-RMS-E-DNR, device not ready, not mounted, or unavailable
Copyright 2003 Hewlett-Packard Development Company, L.P.
%RMS-F-DNR, device not ready, not mounted, or unavailable
%DECdtm-F-NODECnet, the TP_SERVER process was not started because either:

o DECnet-Plus is not started or is not configured, or

o The SYS$NODE_FULLNAME logical name is not defined

This could be because when you installed DECnet-Plus and were prompted
for the system's full name, you specified a local name instead of a
DECdns or Domain name.

If you want to use DECdtm services, make sure that DECnet-Plus is started and
configured and that SYS$NODE_FULLNAME is defined, then use the following
command to start the TP_SERVER process:

$ @SYS$STARTUP:DECDTM$STARTUP.COM

%LICENSE-E-NOAUTH, DEC OPENVMS-ALPHA use is not authorized on this node
-LICENSE-F-NOT_STARTED, License Management Facility is not started
-LICENSE-I-SYSMGR, please see your system manager
%LICENSE-F-NOT_STARTED, License Management Facility is not started
Running SYSTARTUP_VMS.COM

The OpenVMS system is now executing the site-specific startup commands.

%JBC-E-JOBQUEDIS, system job queue manager is not running
%TCPIP-F-NOFILE, cannot find file DSA3:[COMMON.SYSTEM]SYSUAF.DAT;

----END STARTUP.LOG-BAD-----------------------------------------------

I apologise for the formatting.

You can note that the device DSA3: is reporing via the various f$getdvi calls, that it exists & is available yet it does not appear in the show device output NOR will it mount.

I am fairly certain, now, that the main server, EMU1, did NOT have DSA3: mounted as /cluster.

If you look in the log for the successful startup (attached), you will see that the remote disks are showing as (remote mount) in the show device. When DSA3 is not able to be mounted the remote systems disks are not visible (see the above included startup log-bad).

None of the remote system (in this case EMU1)'s disks are mounted as /cluster, except prior to the successful startup, when I added /cluster to the dsa3: mount in mount_common and then rebooted the production server.

Jon Pinkley · ‎11-14-2007

Mark,

First I am assuming that these logs are from booting EMU2. The "node=EMU1" is being printed in the WAIT_FOR_MASTER/NODE_CHECK section, and it confused me for a while.

Note that in the log of the failed attempt, there are no $3$DKA devices listed in the show device d output, but when it worked, the $3$DKA devices were listed. You just need to wait for the configure process to detect the MSCP served devices before moving on.

In the EMU2 node specific section, right after gosub WAIT_FOR_MASTERS, you wait for device dkc300 to become available. But $4$DKC300 is on the EMU2 node. You need to check for the EMU1 served device, $3$DKA300 instead. See my example using LD devices.

I made changes (no testing done, so read carefully and use differences to see what was changed before attempting to use) and am attaching the changed version as a txt file.

In addition to changing which device to wait for, I also replaced the device names and label in the mount commands with symbols that are initialized at the top of the procedure. I also changed the text that is printed so it is less confusing to someone that looks at the console output.

You have some code that is cluster related that I would review. Specifically, how are you protecting yourself from a partitioned cluster? Is Wombat a quorum node? I put some of my questions in comments in the attached command procedure; they will pop out in the output from differences.

Good Luck,

Jon

it depends

MarkOfAus · ‎11-14-2007

Hi Jon,

" First I am assuming that these logs are from booting EMU2. The "node=EMU1" is being printed in the WAIT_FOR_MASTER/NODE_CHECK section, and it confused me for a while."

Yes, you are right. I apologise for the confusion.

" Note that in the log of the failed attempt, there are no $3$DKA devices listed in the show device d output, but when it worked, the $3$DKA devices were listed. You just need to wait for the configure process to detect the MSCP served devices before moving on.
"

Indeed this is a fault on my behalf. I should be looking for $3$dka devices, not local ones. Thank you for finding that.

" You have some code that is cluster related that I would review. Specifically, how are you protecting yourself from a partitioned cluster? Is Wombat a quorum node? I put some of my questions in comments in the attached command procedure; they will pop out in the output from differences.
"

I would very much appreciate your feedback on it. No, Wombat is not a quorum node, not yet. At present the votes are 1 with Emu1 having 1 vote & emu2 have 0 votes. Emu2 is purely used to receive data in a remote location and to be "switch on to production" should a disaster occur.

I will be running reboots tomorrow & the weekend, so I can try out your changes and see how it works.

I really appreciate the time you have put into answering my queries, you and Hoff are a God-send.

Regards,
Mark.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Mounting of HBVS disks in sylogicals.com fails on a node.

Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.

Re: Mounting of HBVS disks in sylogicals.com fails on a node.