Re: Manual Que Manager & Print Que Cluster Failover

Jack Trachtman · ‎05-30-2008

I've been reading through the forum but don't seem to have found the exact answer to my question:

As part of a scheduled cluster node shutdown, we'd like to force the Que Manager and the telnetsymbiont processes to migrate to the remaining node (before the actual shutdown, where auto-failover would happen). (We presently have the Que Manger configured to run on any node, & the print queues are all defined with /AUTOSTART=(an explicit node list).)

How can I congifure for this setup?

What command(s) would I use to initiate the failover?

TIA

Hoff · ‎05-30-2008

The forums? Not where I'd go. There is a chapter or so in the system manager's manual on queues and queue processing.

You could and probably do use node-specific queues, stop the queue(s) of interest, and then merge the contents (eg: ASSIGN /MERGE ) to a queue on another host.

There are a gazillion tools around which skim the queues, and rebuild or reassign or otherwise manage the queues. (The FIXQUE.COM tool on the Freeware is a gonzo DCL tool that extracts all sorts of goodies out of the queue database, useful for rebuilding the queue database or as a launching board for other magicks.)

The direct command is STOP /QUEUE /ON_NODE ..., if you want to go that route. Read the DCL command reference documentation and the aforementioned System Manager's manual for details, and do expect active jobs will end up getting restarted.

Jon Pinkley · ‎05-30-2008

Make sure autostart is enabled on the node that will not shutdown, then disable autostart on the node that will shutdown.

Assuming /autostart=(tom::,jerry::) and you want to force all autostart queues to jerry.

JERRY$ enable autostart

TOM$ disable autostart

Or from any node:

$ enable autostart/queues/on_node=jerry::
$ disable autostart/queues/on_node=tom::

Examine sys$system:shutdown.com, and search for "disable". You will see that you can control when autostart queues are disabled.

That will take care of the queues with /autostart. For generic batch queues with execution queues on specific nodes, it is best (my opinion) to use stop/queue/next to allow currently executing jobs to continue, while preventing new jobs from starting on the specific execution queue.

Example:

$ init queue/batch batch$tom /on=tom:: ...
$ init queue/batch batch$jerry /on=jerry:: ...
$ init queue/batch/generic=(batch$tom,batch$jerry)

When ready to shutdown tom:

$ stop/queue/next batch$tom

I am not sure exactly what stop/queues/on_node does, but is it is more like stop/queue/reset, i.e. currently executing jobs are aborted. If your goal is to allow as many batch jobs to complete without them being aborted, then using stop/queues/on_node isn't the appropriate tool.

I don't know of any single command to do a stop/queue/next for all execution queues on a specific node. A command procedure using f$getqui could scan through all the non-autostart execution queues, find the execution node, and issue stop/queue/next commands.

Jon

it depends

Wim Van den Wyngaert · ‎06-02-2008

We have autostart active on all nodes. It's done as the last action in the boot with enable autostart/que.

When we shut a node, we first do disable autostart/queue to get the failover. We don't wait for completion of the running jobs (we check it manually before the shutdown). All startup/shutdown is done in non-autostart queues.

When a node crashes, failover is automatic. But we signal all jobs in error (were busy at the moment of the crash). Of course we have /retain=error on all queues.

Wim

Wim

Jess Goodman · ‎06-02-2008

To move the queue manager process to a specific node before shutting down the node that it is currently on, just do:

$ start/queue/manager/on=(STAY_UP_NODE,*)

To move the telnet symbiont processes you must move the queues. Since they are all /AUTOSTART then use:

$ disable /autostart/on_node=GOING_DOWN_NODE

I have one, but it's personal.

Wim Van den Wyngaert · ‎06-02-2008

We let the queue manager failover himself without intervention. Of course we don't have that failover case very often but until now without any problems.

Something often forgotten in interbuilding clusters is that spoolfiles must be on common disk (set dev/spool=common_dev, lpd printcap file).

Fwiw

Wim

Wim

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Manual Que Manager & Print Que Cluster Failover

Manual Que Manager & Print Que Cluster Failover