Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

SOLVED
Go to solution
Jeremy Begg
Trusted Contributor

Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Hi,

A customer site has two clustered AlphaServer DS25s running VMS 8.2. On the weekend they had a power outage and when power was restored two batch queues didn't start.

The queues in question are configured with /AUTOSTART_ON=(NODE1::,NODE2::) so that (hopefully) they would fail over from one machine to the other in the event the "current" machine went down for some reason.

The system startup procedure does this:

$ INIT/QUEUE/BATCH/AUTOSTART_ON=(NODE1::,NODE2::) -
PROD$DETACH /BASE=3 /JOB=10
$ INIT/QUEUE/BATCH/AUTOSTART_ON=(NODE1::,NODE2::) -
TEST$DETACH /BASE=2 /JOB=20
$ ENABLE AUTOSTART/QUEUES

Is that incorrect (i.e. is that the wrong way to set up an autostart queue at system boot)? The OpenVMS "Cluster Systems" manual doesn't make it entirely clear (in my opinion).

Or was it a result of the sudden failure of both systems, e.g. because the queue manager got confused when they rebooted?

Thanks,
Jeremy Begg
19 REPLIES
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

did the customer capture the output of the startup procedure or the INIT/QUEUE commands ? Any error messages ?

Did the queues get re-created ? If so, you would be missing the START/QUE command to actually start the queue.

ENABLE AUTOSTART/QUEUES would only start AUTOSTART_ON queues, which had actually been started. If such a queue had manually been stopped, the above command would not have started that queue.

Volker.
Jeremy Begg
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Hi Volker,

Logs from both machines show the message
%JBC-I-QUENOTMOD, modifications not made to running queue
but don't indicate which queues the messages refer to. This is before the ENABLE AUTOSTART/QUEUE command. (There are multiple batch queues being started besides the two autostart queues. I had better see why the startup procedure tries to start the "other"node's batch queues.)

What do you mean by "did the queues get re-created"? The startup procedure does an INIT/QUEUE/BATCH command (see my original query, above) but the queues would have existed prior to the INIT/QUEUE commands. The queues would have been running before power was removed from the systems.

(Note that the systems were *not* shut down; someone just turned off the power, and the systems rebooted by themselves when power was restored.)

Thanks,
Jeremy Begg
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

%JBC-I-QUENOTMOD can only refer to queues already running, so this message couldn't have been related to those 2 autostart batch queues.

With 're-created' I meant to say: if the queues would not have existed during startup, the commands shown would not have started them. If there were existing jobs in those queues, that would prove they had existed before.

Did you check OPERATOR.LOG for any QMAN related error messages ?

Does SHOW QUE/MANA/FULL show a cluster-wide UNIQUE location of the queue-manager database ?

Volker.
Jeremy Begg
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Hi Volker,

The queues existed during startup and had jobs retained in them. (I know this because the jobs currently in the PROD$DETACH queue were submitted before the reboot, and the queue is not the target of a generic queue.)

No suspicious messages in OPERATOR.LOG on either node.

Queue manager database is same on both nodes:

%SYSMAN-I-OUTPUT, command execution on node NODE1
Master file: DISK2:[SYSTEM.FILES]QMAN$MASTER.DAT;
Queue manager SYS$QUEUE_MANAGER, running, on NODE1::
/ON=(*)
Database location: DISK2:[SYSTEM.FILES]
%SYSMAN-I-OUTPUT, command execution on node NODE2
Master file: DISK2:[SYSTEM.FILES]QMAN$MASTER.DAT;
Queue manager SYS$QUEUE_MANAGER, running, on NODE1::
/ON=(*)
Database location: DISK2:[SYSTEM.FILES]

Thanks,
Jeremy Begg
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

did the ENABLE AUTOSTART/QUEUES command actually get executed during startup ? Are these the only autostart queues on the system ?

What did you do the start the queue again ?

Sorry for those questions, but in general I assume that OpenVMS is working and I'm trying to find the error elsewhere.

Volker.
Jeremy Begg
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Hi Volker,

The ENABLE AUTOSTART/QUEUES command was run on both nodes during startup. There are a number of autostart printer queues which appear to have started OK, it's only the two autostart batch queues which didn't start.

To get them going I used $ START/QUEUE.

Regards,
Jeremy Begg
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

what was the state of the queues before your started them manually ?

You could add the /START qualifier to the INIT command, but then that would start the autostart-queues, even if they had been manually stopped before (by a system manager).

Do you have a test system, where you can test the various permutations ?

Volker.
Jeremy Begg
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

They were "stopped, autostart inactive" before I started them.

I don't currently have a test cluster to reboot at will. (Perhaps I should consider building an emulated one.)

Thanks
Jeremy
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

'stopped, autostart inactive' means that the queue had DEFINITELY been stopped somehow. 'autostart inactive' means, that that queue would NOT be started by ENA AUTO/QUEUES.

This is the expected state, if you have issued your INIT/QUEUE... command (and the queue had been just created).

Or if you had previously stopped the queue manually.

As a workaround, consider to add a START/QUEUE command to your startup.

You should try to reproduce this with a single test system first. If you can't reproduce it, you might need a cluster. Using an emulator would make much sense here, you now have the choice between PersonalAlpha and FreeAXP.

Volker.
Jeremy Begg
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

The "somehow" is the key. I can confirm that jobs were running in the queue PROD$DETACH at 5:15am and there is nothing to indicate they stopped before the system failed at 4pm (approx).

Therefore I suspect the queue was stopped as a side-effect of the unplanned shutdown (removal of power).

Regards,
Jeremy
Graham Burley
Frequent Advisor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

There's no need to INIT queues in the system startup, they already exist, they only need to be started.
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

do you have any 'UPS powerfail' software being run when your power is about to get lost ?

If you just remove power from your AlphaServers, there is nothing that 'can happen' from OpenVMS. The state of the queues is stored in SYS$QUEUE_MANAGER.QMAN$QUEUES.

I tried to simulare this problem with an autostart queue on my PersonalAlpha (running E8.4) and - as expected - there were no problems with the queue after HALTing and booting the system.

You may have a hard time trying to reproduce this.

Volker.
Steve Reece_3
Trusted Contributor
Solution

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Hi Jeremy,

There's a lot of confusion here, for sure!

As commented, there's no necessity to reinitialize the queues provided that they are still there. The queue manager is, technically, only paused when the system shuts down. So, the old (pre v5) requirement to reinitialize has gone away.

That said, when you first initialize an /AUTOSTART queue you have to put a /START in there too. Otherwise, the queue doesn't start. Once the queue has been started first time, the autostart will work and the queue will restart when the ENABLE AUTOSTART/QUEUES is processed - unless of course you reinitialize the queue!

The correct fix here is probably to check whether the queues exist and, if they do, just do a SET QUEUE to modify the characteristics as required. Alternatively, put the /START on both the INIT's.

Remember that the ENABLE AUTOSTART/QUEUE command is there so that the queues can be started later than the queue manager (if, for example, your batch jobs are on a different disk to the queue manager that needs mounting before the queues start)

Steve
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

I think I can reproduce this now. I used the following steps:

- INIT/QUE/BATCH/AUTOSTART_ON=node:: PROD$DETACH/BASE=3/JOB=10
- START/QUE
- verified that queue was IDLE
- ^P and BOOT

- there is no ENA AUTO/QUE in my E8.4 SYSTARTUP_VMS.COM

- the queue comes up as: stopped, autostart
- issued INIT/QUE commands from your startup
- state changes to: stopped, autostart inactive

and that's the problem ! This state requires a START/QUE after ENABLE AUTO/QUEUES ! The previous 'stopped, autostart' would have worked after a simple ENABLE AUTO/QUEUES.

So the solution for your setup would be:

- test that the queue exists
- if not, re-create it with including /START
- if yes, use SET QUE to eventually reset the attributes
- ENABLE AUTO/QUEUE

In addition, you could then also discuss with HP, whether an INIT/QUE of an existing queue should change the state to 'autostart incative'.

Volker.
Volker Halle
Honored Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Jeremy,

I now also tested your commands with a normal reboot. And they don't work in that case as well. Is this a recent change to your startup procedure ? Did you ever check, if this works after a normal reboot ?

Or did the INIT/QUE commands never change anything, because the queue was ACTIVE (on the other node), resulting in a %JBC-I-QUENOTMOD message ? This could explain, that you've now seen this for the first time !

I bet that you would also see this behaviour after a cluster shutdown.

Volker.
Jess Goodman
Esteemed Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

It's easy to see what is happening here just from the following:

$ SHOW QUEUE TESTQ/FULL
Batch queue TESTQ, idle, on AX21::
/AUTOSTART_ON=(AX21::) /BASE_PRIORITY=6 /JOB_LIMIT=1 /OWNER=[VMS,SYSTEM]
/PROTECTION=(S:M,O:D,G:R,W:S)

$ DISABLE AUTOSTART/ON=AX21::
$ SHOW QUEUE TESTQ/FULL
Batch queue TESTQ, stopped, autostart, on AX21::
/AUTOSTART_ON=(AX21::) /BASE_PRIORITY=6 /JOB_LIMIT=1 /OWNER=[VMS,SYSTEM]
/PROTECTION=(S:M,O:D,G:R,W:S)

$ INIT/QUEUE/BATCH/AUTOSTART=AX21:: TESTQ
$ SHOW QUEUE TESTQ/FULL
Batch queue TESTQ, stopped, autostart inactive, on AX21::
/AUTOSTART_ON=(AX21::) /BASE_PRIORITY=6 /JOB_LIMIT=1 /OWNER=[VMS,SYSTEM]
/PROTECTION=(S:M,O:D,G:R,W:S)

$ ENABLE AUTOSTART /ON=AX21::
$ SHOW QUEUE TESTQ/FULL
Batch queue TESTQ, stopped, autostart inactive, on AX21::
/AUTOSTART_ON=(AX21::) /BASE_PRIORITY=6 /JOB_LIMIT=1 /OWNER=[VMS,SYSTEM]
/PROTECTION=(S:M,O:D,G:R,W:S)

$ START/QUEUE TESTQ

So for /AUTOSTART queues, if you re-INIT the queue and do not use /START, the queue becomes autostart inactive, just as if you had first done STOP/RESET on the queue.

This is not really inconsistent. With non-autostart queues if the queue was already stopped and you re-INIT it without /START it obviously remains stopped. And if a non-autostart queue is not stopped and you attempt to re-INIT it the command is ignored:

$ INIT/QUEUE/BATCH TESTQ
%JBC-I-QUENOTMOD, modifications not made to running queue

I always /START all stopped batch queues, autostart or not, that run on a node when it boots. This is easy to do using QUEUE_COMMANDS.COM which is available at:
http://dcl.openvms.org/stories.php?story=08/08/08/4125819

$ @QUEUE_COMMANDS START/QUEUE/BATCH/NODE/NOON *
$ ENABLE AUTOSTART/QUEUES

If I have target queues on a node that I do not want to receive jobs after that node boots, I use SET QUEUE/CLOSE on them instead of STOP/QUEUE.
I have one, but it's personal.
Jeremy Begg
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Hi all,

I have been able to confirm your findings that using INIT/QUEUE after a reboot *before* ENABLE AUTOSTART/QUEUE is the problem.

When the system is freshly booted the queue state is "stopped, autostart". Using INIT/QUEUE changes this to "stopped, autostart inactive" - thus requiring an explicit START/QUEUE to get things going.

The INIT/QUEUE... is in the startup procedure to reset things like job limit and base priority -- a practice I've followed for many years (too many, perhaps?)

I'll see about changing that to SET QUEUE or even removing it all together.

Thanks for your help!
Jeremy Begg
Steve Reece_3
Trusted Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

>>>The INIT/QUEUE... is in the startup procedure to reset things like job limit and base priority -- a practice I've followed for many years (too many, perhaps?)<<<


I don't think you're alone in having this kind of situation. I think most of the "established" VMS systems that I've taken over responsibility for do exactly the same. I guess it's thanks to VMS Engineering for having a history of not intentionally breaking things!

Steve
Richard W Hunt
Valued Contributor

Re: Correct combination of INIT/QUEUE/BATCH and ENABLE AUTOSTART/QUEUE

Just as a thought, I use F$GETQUI to test whether a queue exists at reboot time before I go to the trouble of INITing it. If it exists, I just start it. If it doesn't exist then I INIT it first and then START it in a separate step outside the IF-block that holds the INIT code.

As pointed out elsewhere in this thread, the QMAN file holds that information across reboots pretty reliably. I can tell when a re-INIT is needed. Haven't needed one since I've been on VMS 6.x and I'm now on 8.3

Further, the one time I needed the re-INIT that I can remember is when the QMAN file got corrupted across a non-graceful reboot caused by hardware failure, and I can't blame that on the queue manager or OpenVMS.
Sr. Systems Janitor