Operating System - OpenVMS
1748015 Members
4697 Online
108757 Solutions
New Discussion юеВ

Re: Two Queuemanagers without shared db on cluster.

 
SOLVED
Go to solution
SDIH1
Frequent Advisor

Re: Two Queuemanagers without shared db on cluster.

We don't want that, I was just curious.
Volker Halle
Honored Contributor
Solution

Re: Two Queuemanagers without shared db on cluster.


Would there be a problem if you would make another qman$master.dat in another directory on the same disk?


No problem, the parent resource name includes the file-id of the QMAN$MASTER.DAT file. Not that this would make sense...

Volker.
SDIH1
Frequent Advisor

Re: Two Queuemanagers without shared db on cluster.

Ok, I did some testing. I started an independent queuemanager on the quorum node,
like this (nodename obfuscated) :

start/que/manager/new/on=(qnode)

show queue/manager/full shows this :
Master file: SYS$SYSROOT:[SYSEXE]QMAN$MASTER.DAT;

Queue manager SYS$QUEUE_MANAGER, running, on
QNODE::
/ON=(QNODE)
Database location: SYS$COMMON:[SYSEXE]

On the productioon node ( there are 4, A,B,C and D) the queuemanager was running on node C. A reboot of node C made the 'production' queuemanager shift to node 'A'.

A stop/queue/manager/cluster command on QNODE
made it stop the queuemanager on QNODE, but
NOT on the production nodes, as hoped.

A reboot of the Quorum node did not affect the queuemanager on the production nodes,
and in the places I looked ( operator.logs)
there was no evidence of any queuemanager
panicking on what to do.

I could have repeated this test 100 times, and rebooted all cluster nodes 100 times, but there really was no indication this would change the results, so I didn't.

Conclusions :
1 An independent queuemanager, provided the start/queue/manager has a carefully crafted nodelist in the /on qualifier works without problems.

2 Although this is expected behaviour given the qualifiers available for starting the queuemanager, it is rather challenging to extract this from the documentation.
Many people weren't up to this challenge.

Anyone suggestions for more tests or things to be tested?
Jan van den Ende
Honored Contributor

Re: Two Queuemanagers without shared db on cluster.

Jose,

STOP/QUEUE/MANAGER is a rather controlled way of terminatinh a queue manager.
As far as I understoog John Gillings' description, the real potential for trouble is when another node notices a remote queue manager gone, (because the queue manager crashed, the node crashed, of connectivity disappeared)
You did not report on any such "catastropy" scenario.

I am still very much in doubt on the wisdom of this confiruration.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: Two Queuemanagers without shared db on cluster.

Jose,

try a STOP/ID of the QUEUE_MANAGER process, it will just be restarted.

The major issue is the correct specification of the QMAN$MASTER file location and it's contents, i.e. the node(s) to run on and the physical location of the QMAN database files.

I see one lock (QMAN$ORB_LOCK), which is not a child of the Master File Access Lock and therefore is NOT unique for each QMAN$MASTER.DAT file in the cluster. This could be a potential problem, but it's only being used, if you set ACLs on the queues.

Volker.
SDIH1
Frequent Advisor

Re: Two Queuemanagers without shared db on cluster.

Ok. I submitted 3 jobs ( doing a wait for 20 minutes) to the batchqueue on the quorum node. Then I killed the queuemanager process on the quorum node, the process came back immediately,all jobs still running.

I killed it one time more, the queue manager process came back, and then immediately again, but then the queue manager process did not come back, stating in operator.log INTERNALERROR, internal error caused loss of process status. A start/que/manager got it up again, the jobs were still running.

Killing the queue manager on the production nodes showed comparable behaviour ( I also stop/id'd the process 3 times) , also here the queuemanager did not come back after the third stop/id. Furthermore it favored the node it had been running on before, I didn't see a switch to another node.

I could reproduce the behaviour of the queuemanager process, i.e. usually coming back by itself, sometimes refusing to, on a standalone machine, so there seems to be no link between this behaviour and the two queuemanager scenario.

So, even after a heavy beating, the queuemanagers do not seem to be affected by each other. I don't feel a crash on a node would introduce a lot more stress than a stop/id of the queuemanager process would.

Ian Miller.
Honored Contributor

Re: Two Queuemanagers without shared db on cluster.

you should try crashing a node or two as well and unplugging cluster connections. The queue manager is being restarted when you stop it (by JOB_CONTROL?) which would not happen if the node crashed.
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: Two Queuemanagers without shared db on cluster.

If the node crashes, there is no JOB_CONTROL to restart the QUEUE_MANAGER. There will be no failover attempt, as - in this configuration - the QUEUE_MANAGER is only allowed to run on the local (quorum) node. Once OpenVMS boots, JOB_CONTROL will start the QUEUE_MANAGER on the local node...

I really can't see a problem with this configuration - although it may be 'unsupported' by HP.

Volker.
SDIH1
Frequent Advisor

Re: Two Queuemanagers without shared db on cluster.

Ok. I crashed the quorum node, and after the boot the queuemanager was started ( as expected, as it did so after a normal reboot).
No signs of the queuemanager wanting to start on other nodes.

I also crashed node A that was running the queuemanager for the production machines. The queuemanager failed over to node B. No signs of the queuemanager trying to start on PDCC0E.

No jobs missing, apart from the ones executing at the time of the crash without restart and/or retain options.

It all looks pretty robust. The technical merits are clear, if it is wisdom to implement such a configuration depends not only on technical merits but also on other more arbitrary considerations, like personal preference, operational skills in an organisation and, if you're really unlucky, existing policies and prejudice.

I'm waiting for HP to give an answer on the question if this configuration is officially considered to be supported, which in my case is one of those arbitrary considerations.

Thomas Ritter
Respected Contributor

Re: Two Queuemanagers without shared db on cluster.

We run a 4 node disaster tolerant cluster. 2 nodes at each site working over two fibre optic links about 4 and 8 km in length. We run a single queue manager. Each node offers identical services. A major component of the system administrators duties is to ensure that no one node is over burdened with work. The idea being that the cluster will only run as fast as the slowest node ! We run Oracle/RDB with global buffering enabled meaning millions of locks generated. Lock tree bouncing and CPU saturation are very real risks in our environment. We use the queue manager to ensure that the work load is equitably balanced and that like type work, with respect to database access, is perform on the same node(s). We achieve our workload spread by having carefully crafted /AUTOSTART_ON=() lists. Almost every queue is autostart enabled. When a host is shutdown, autostart is disabled and the queues fail over gracefully. At vms startup all the queues automatically balanced back to the first entry in the autostart list.

If we ran separate queue managers, we would lose a lot of the flexibility we currently rely on.