Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Continuation of earlier QMAN thread - questions

 
SOLVED
Go to solution
The Brit
Honored Contributor

Continuation of earlier QMAN thread - questions

We are now configured with a shared QMAN$MASTER, and everything seemed to be OK.
However...

1. We have not placed any restrictions on which nodes can host the queue manager, i.e. did not specify any "/ON=,..."

Saturday night I shutdown the Itanium cluster members but not the Alpha. When the blades came back up, all of the batch/print queue entries were gone on the Itaniums. In addition, on the Alpha, teh couple of batch queues which run on that system were there, however all of the jobs were in a starting state.

My initial thought was that somehow the queue manager had failed over to the Alpha when the Itanium systems shutdown, and had not failed back.
Since the Alpha needed booting anyway, I shut it down, and rebooted. There were many messages indicating that the jobmanager had failed over to ECOM (one of the blades), however the jobs which were in a starting state were still in a starting state, and even worse. All of the queues hosted on the itanium blades were gone.

To cleanup, I issued a "stop /queue/manager/cluster", followed by a "start /queue /manager/on=::". I then ran the necessary initialization queues to regenerate all of the batch/print queues, and then manually resubmitted all of the batch jobs.

Question 1:

If I issue a "start /queue /manager /On=(Node1::,Node2::)", will this implicitly exclude Node3:: and Node4:: from taking part in any failover (this is the behaviour I want)

Question 2:

Should this command be executed every time the systems boot? Should it be executed on all nodes?

Dave.
6 REPLIES 6
Volker Halle
Honored Contributor

Re: Continuation of earlier QMAN thread - questions

Dave,

does SHOW QUE/MANA/FULL show the same Database Location on all nodes in the cluster ? And is this location specified as a UNIQUE device and directory ? Using SYS$COMMON or any reference indirectly pointing to SYS$SYSDEVICE is incorrect !

$ START/QUE/MANA/ON=(Node1::,Node2::) excludes any nodes other than Node1 and Node2 from ever running the QUEUE_MANAGER process.

This command only needs to be executed ONCE. The information will be stored in QMAN$MASTER.DAT - this file also must be UNIQUE within the cluster, if you have a shared QMAN database.

The QUEUE_MANAGER does not automatically fail back to Node1, if Node1 gets rebooted. If it's running on Node2 and Node1 is up, you can use the command START/QUE/MANAGER to force failover to the Node1.

See the help text for $ HELP START/QUE/MANA/ON

Volker.
The Brit
Honored Contributor

Re: Continuation of earlier QMAN thread - questions

Volker,
Here is the output of the "show/queue/manager/full", executed within SYSMAN.

BUD$SYSTEM>> mc sysman set env /clust
%SYSMAN-I-ENV, current command environment:
Clusterwide on local cluster
Username SYSTEM will be used on nonlocal nodes

SYSMAN> do show queu/mana/full
%SYSMAN-I-OUTPUT, command execution on node BUD
Master file: DSA101:[VMS$COMMON.SYSEXE]QMAN$MASTER.DAT;
Queue manager SYS$QUEUE_MANAGER, running, on BUD::
/ON=(BUD)
Database location: SYS$COMMON:[SYSEXE]
%SYSMAN-I-OUTPUT, command execution on node CITIUS
Master file: DSA101:[VMS$COMMON.SYSEXE]QMAN$MASTER.DAT;
Queue manager SYS$QUEUE_MANAGER, running, on BUD::
/ON=(BUD)
Database location: SYS$COMMON:[SYSEXE]
%SYSMAN-I-OUTPUT, command execution on node ECOM
Master file: DSA101:[VMS$COMMON.SYSEXE]QMAN$MASTER.DAT;
Queue manager SYS$QUEUE_MANAGER, running, on BUD::
/ON=(BUD)
Database location: SYS$COMMON:[SYSEXE]
%SYSMAN-I-OUTPUT, command execution on node SPEEDY
Master file: DSA101:[VMS$COMMON.SYSEXE]QMAN$MASTER.DAT;
Queue manager SYS$QUEUE_MANAGER, running, on BUD::
/ON=(BUD)
Database location: SYS$COMMON:[SYSEXE]
SYSMAN>

the logical QMAN$MASTER is defined as DSA101:[VMS$COMMON.SYSEXE]. DSA101 is a common cluster disk (and also the Itanium System Disk). It is mounted on the Alpha, in Sylogicals.com.

I am a little concerned that all system see the DB location as "SYS$COMMON:[SYSEXE]"

Dave
Hoff
Honored Contributor

Re: Continuation of earlier QMAN thread - questions

The cluster here is unusual, and I'd bring all the nodes up using a staged reboot. Things will tend to get rather better when you get the core disks out onto the SAN, too.

And yes, use /ON.

For now, look to the restart to resolve this. And once you're on the SAN or otherwise with common disks (system disk or otherwise) for the cluster core files, to MSCPMOUNT or such, to keep the disks online on all nodes.
Volker Halle
Honored Contributor
Solution

Re: Continuation of earlier QMAN thread - questions

Dave,


I am a little concerned that all system see the DB location as "SYS$COMMON:[SYSEXE]"


That EXACTLY is your problem ! SYS$COMMON on Alpha cannot be the same as SYS$COMMON on Itanium !

You have to once stop the queue-manager cluster-wide and restart it with specifying a unique QMAN db location:

$ STOP/QUE/MANA/CLUSTER
$ START/QUE/MANA DISK$itanium:[VMS$COMMON.SYSEXE]

(assuming your *.QMAN* files are to be kept on the Itanium system disk.

Note that ALL queues will be stopped, if you issue the first command !

Volker.
John Gillings
Honored Contributor

Re: Continuation of earlier QMAN thread - questions

Dave,

This is exactly the problem I tried to warn you about in your last thread. Please reread my responses.

Move the queue manager files off all system disks, and make sure the logical names are IDENTICAL on all nodes. If you can see any differences in SHOW LOGICAL QMAN$MASTER executed on any node in the cluster, it's wrong.
A crucible of informative mistakes
The Brit
Honored Contributor

Re: Continuation of earlier QMAN thread - questions

Thanks Volker, for your help with this.

John, your responses were gratefully received and fully read. Unfortunately I was not in a position to do the complete move to a Common, NON-SYSTEM disk at this time so we had to make the more important system disk "authoratative", at least with respect to the Queue files.

The Logicals were all correctly defined in SYLOGICALS, however what I missed was to explicitly specify the location of the Queue Database when I started up the queue manager.

This was what Volker pointed out. Thanks.

As always, every day is a learning experience on this forum.

Dave.