Operating System - OpenVMS
1829115 Members
13451 Online
109986 Solutions
New Discussion

System Disk Restore & Queue Entries

 
Robert Atkinson
Respected Contributor

System Disk Restore & Queue Entries

On restoring my Alpha 7.2-2 systems during a recent Disaster Recovery test, I noticed that the batch and print queues were empty.

I was a little shocked by this, although the image had been taken from a 'snap' copy of the system, so wasn't truly 'clean'.

Does anyone have any insight into why this happened, and how to correct it, so that the queue manager retains the entries on the rebuilt system?

Many thanks, Rob.
14 REPLIES 14
Lokesh_2
Esteemed Contributor

Re: System Disk Restore & Queue Entries

Hi,

How you have taken the image backup ? Have you included /ignore=(interlock,nobackup) ?

Thanks & regards,
Lokesh Jain
What would you do with your life if you knew you could not fail?
Robert Atkinson
Respected Contributor

Re: System Disk Restore & Queue Entries

I didn't need to do this as the 'snap' copy doesn't have accessed files on it.

However, the snap is taken with the queue manager files open, so perhaps this is the problem, although I would hope that it could recover itself.

Rob.
Lokesh_2
Esteemed Contributor

Re: System Disk Restore & Queue Entries

Hi,

In this case you can copy qman$master.dat and sys$queue_manager.qman$queues from your existing environment to your disaster recovery environment. You can use convert utility for this purpose.

Thanks & regards,
Lokesh Jain
What would you do with your life if you knew you could not fail?
Lokesh_2
Esteemed Contributor

Re: System Disk Restore & Queue Entries

and here are the steps:

Hope this helps:

Best regards,
Lokesh

______________________________________

Saving and Restoring the Queue Database



Each time you want to preserve changes to your queue configuration, save a copy of your queue database files. In this way, if your queue database files are not accessible, you can restore the queue database you have saved; you thus avoid having to redefine forms and characteristics and reinitialize each queue.
Saving Queue Database Files

To save a record-by-record copy of your queue database files while the queuing system is functioning, perform the following steps. This procedure saves definitions of queues, forms, and characteristics. No job information is preserved. (HP recommends not saving the journal file because timed and pending jobs might be reexecuted after the journal file is restored.)

How to Perform This Task



To save the master file, enter an OpenVMS Convert utility (CONVERT) command in the following format:CONVERT/SHARE QMAN$MASTER.DAT master-filenamewhere master-filename is the name of the file to which QMAN$MASTER.DAT is to be copied.

For more information about CONVERT, refer to the OpenVMS Record Management Utilities Reference Manual .

Enter a CONVERT command in the following format to save the queue file:CONVERT/SHARE SYS$QUEUE_MANAGER.QMAN$QUEUES queue-filenamewhere queue-filename is the name of the file to which SYS$QUEUE_MANAGER.QMAN$QUEUES is to be copied.

Use the Backup utility (BACKUP) to save the files created with CONVERT. Use a command in the following format:BACKUP/LOG masterfile-name, queue-filename device:saveset-name/LABEL=labelFor more information about the Backup utility, refer to the HP OpenVMS System Management Utilities Reference Manual.
Example



The following example is a simple procedure showing how to save the queue database.

$ SET DEFAULT SYS$COMMON:[SYSEXE]
$ CONVERT/SHARE QMAN$MASTER.DAT MASTERFILE_9SEP.KEEP;
$ CONVERT/SHARE SYS$QUEUE_MANAGER.QMAN$QUEUES QFILE_9SEP.KEEP;
$ INITIALIZE MUA0: QDB
$ MOUNT/FOREIGN MUA0:
%MOUNT-I-MOUNTED, QDB mounted on _LILITH$MUA0:
$ BACKUP/LOG MASTERFILE_9SEP.KEEP,QFILE_9SEP.KEEP MUA0:QDB_9SEP.SAV/LABEL=QDB
%BACKUP-S-COPIED, copied SYS$COMMON:[SYSEXE]MASTERFILE_9SEP.KEEP;
%BACKUP-S-COPIED, copied SYS$COMMON:[SYSEXE]QFILE_9SEP.KEEP;
$ DISMOUNT MUA0:
Restoring Queue Database Files

When you restore queue database files, all queue, form, characteristic, and queue manager information is restored. However, information about jobs in the queues is not restored.

How to Perform This Task



If the queue manager is running, stop it by entering the STOP/QUEUE/MANAGER/CLUSTER command.

Delete all three queue database files. (You must delete all three files, even if only one or two of them are lost.)
--------------------------------------------------------------------------------
When starting a queue manager on OpenVMS, the queue manager process always opens version number one of the queue journal file (SYS$QUEUE_MANAGER.QMAN$JOURNAL;1). For this reason, when you restore the queue system files with the Backup utility, you must ensure that the latest version of the queue journal file is version number one.

--------------------------------------------------------------------------------


Use the MOUNT command to mount the disk or tape containing the queue database backup.

Use the Backup utility (BACKUP) to restore the queue file and master file from the save set you created in step 3 of Saving and Restoring the Queue Database. If the master file or queue file is stored in a location other than the default, make sure you restore it to the correct location or that you specify the new location when you start the queue manager.
--------------------------------------------------------------------------------
When starting a queue manager on OpenVMS, the queue manager process always opens version number one of the queue journal file (SYS$QUEUE_MANAGER.QMAN$JOURNAL;1). For this reason, when you restore the queue system files with the Backup utility, you must ensure that the latest version of the queue journal file is version number one.

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
When you restore your queue database, you must always restore both the master and queue files, even if you lost only one of those files.

--------------------------------------------------------------------------------


Start the queue manager with the START/QUEUE/MANAGER command. Do not enter the /NEW_VERSION qualifier: a new, empty journal file will be created automatically.
Example



The following example is a simple procedure showing how to restore the queue database from tape.

$ STOP/QUEUE/MANAGER/CLUSTER
$ SET DEFAULT SYS$COMMON:[SYSEXE]
$ DELETE SYS$QUEUE_MANAGER.QMAN$JOURNAL;,SYS$QUEUE_MANAGER.QMAN$QUEUES;, -
_$ QMAN$MASTER.DAT;
$ MOUNT/FOREIGN MUA0:
%MOUNT-I-MOUNTED, QDB mounted on _LILITH$MUA0:
$ BACKUP/LOG MUA0:QDB_9SEP.SAV/SELECT=[SYSEXE]MASTERFILE_9SEP.KEEP; -
_$ QMAN$MASTER.DAT;
%BACKUP-S-CREATED, created SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT;1
$ SET MAGTAPE/REWIND MUA0:
$ BACKUP/LOG MUA0:QDB_9SEP.SAV/SELECT=[SYSEXE]QFILE_9SEP.KEEP; -
_$ SYS$QUEUE_MANAGER.QMAN$QUEUES
%BACKUP-S-CREATED, created SYS$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$QUEUES;1
$ DISMOUNT MUA0:
$ START/QUEUE/MANAGER
What would you do with your life if you knew you could not fail?
Wim Van den Wyngaert
Honored Contributor

Re: System Disk Restore & Queue Entries

Check also John's http://h71000.www7.hp.com/openvms/journal/v1/backup.html.

But it doesn't explain why your entries were lost. In my opinion they should be on disk and my experience is that I never lost queue entries after a crash.
Wim
Willem Grooters
Honored Contributor

Re: System Disk Restore & Queue Entries

Robert,
What happens if you startup the systems: does it re-create the queues from scratch (some sites do)- that would explain this behaviour.
Then a guess: Are the definitions exactly equal? I can think of a scenarion where you define a queue, /ON=(,), sunbit from node B to this queue. Now if you restore NodeA, it's possible the info is lost, if the queuefiles are not shared for some reason.
Willem Grooters
OpenVMS Developer & System Manager
Robert Atkinson
Respected Contributor

Re: System Disk Restore & Queue Entries

Willem, although we do hold the queue definitions in a command file, we don't run this on startup - it's purely there as a last-resort.

I'm not sure why the entries didn't return, but your node comments did make me think.

I wonder if it's something to do with where the Cluster Queue Manager was running? If it was on Beta, but we reboot Alpha first, perhaps the jobs didn't hop over properly?

Anyway, VdeW's (can't say it sorry) previous post basically says you can't save the queue database without shutting down the Manager, so our method is a no-no from the start!

I'm going to resort to a SHOW QUEUE/ALL/FULL and picking up the pieces from there if the entries don't automatically come back.

Rob.
Martin P.J. Zinser
Honored Contributor

Re: System Disk Restore & Queue Entries

Hello Rob,

if the 'snap' of your disk does not contain a valid copy of the Qman database, there is no way how the Qmanager would be able to restore itself. Also ask yourself what you really want to restore, the queues or the entries in the
queues. Your safest bet for the queue definitions is to run the DCL script you already have to recreate them ;-). If you really need the queue entries the only way I can think of is to shadow the disks across the network to your recovery site.

Greetings, Martin
Wim Van den Wyngaert
Honored Contributor

Re: System Disk Restore & Queue Entries

If the cluster crashed, the q database should be complete. Only if the q manager was doing something on the moment that you took your backup (or the cluster crashed), the database can be corrupt.

So my guess it :
1) bad luck. The q manager was working. Did you check the system log file for messages during startup ?
2) the entries are deleted somewhere during startup. Find out where.
3) your definition of the q db as seen in sh que/man/fu is inconsistent (=different) on each cluster node. Thus you are using different databases.
4) the entries did execute before you looked (released due to invalid time ?). init your queue with /retain=always to find out.
Wim
Willem Grooters
Honored Contributor

Re: System Disk Restore & Queue Entries

Another source you may want to check for reasons is accounting - if enabled. It may yield a clue why the jobs were lost.

Robert, like you said:
...
although the image had been taken from a 'snap' copy of the system, so wasn't truly 'clean'
...

and the jobs were located on the other node (and finished there) could mean a discrepancy causing the jobs to disappear (I think, given the other remarks) - it that case they might show up in accounting.
Willem Grooters
OpenVMS Developer & System Manager
Jan van den Ende
Honored Contributor

Re: System Disk Restore & Queue Entries

Robert,

what I did not see above is whether you moved your que files away from the system disk?
DEFINE (/SYSTEM or Clusterwide) /EXEC QMAN$MASTER .
Of course use a (shadowed) disk that is available ALWAYS from ALL running nodes.
The moving of your que files itself is a quite cumbersome job. The easiest way is to BACKUP/igore=interlock ( AND CHECK they are copied completely, BACKUP just tries the best it can, but definitely does NOT guarantee!! ) and reboot the entire cluster, after changing the QMAN$MASTER definition in the bootstrap procedure.
Often, in disaster-tolerant configurations, cluster-reboot is very undesirable. On our move from HSZ-disks to SAN we managed to find a timeslot (of course at ugly hours) in which we could block al printing for a few minutes, stop all batches, make the BACKUP of the directory, do the define on all nodes except the one running the que manager, and forced the que manager to fail over.
Yes, I know, this sequence is entirely unsupported, but it was the "best" we could concoct.
We DID have the organisation prepared for a cluster reboot in case of serious failure, but just getting those preparations was not making us more popular! On the other hand, we DID get it right, and that was quite some contast to the Unix and Citrix environments!

To get to your original question:
what exactly do you mean by "on restoring"?
what by "recovery test"?
Did you take your backups to other machines, or did you deliberately crash your active environment and get it rebooted? Do I understand correctly that your queues WERE still present, just the ENTRIES gone?

"the image taken from a snap copy". Stupid question just to be sure: WERE there entries at the moment of snapping? Before the image was taken, did you do a rebuild of the snap
(MOUNT/rebuild or SET VOLUME/REBUILD) ?

Well, a lot of questions & info, maybe if you fill in some details we can carry this a bit further.

Jan

Don't rust yours pelled jacker to fine doll missed aches.
Dale A. Marcy
Trusted Contributor

Re: System Disk Restore & Queue Entries

Willem's comments reminded me of something I encountered once. When I restored a system to a different processor, the disk names were different and none of the batch entries could run, because they could not locate the command procedures since the disk names changed. It was a while ago and my recollection is not very clear.
Robert Atkinson
Respected Contributor

Re: System Disk Restore & Queue Entries

To answer -A LOT- of your questions.

- All of the Queue Manager files sit on the system disk.

- The backup was taken using BACKUP/IGNORE=INTERLOCK on a live, running system (not from Snap disks as I originally said).

- The backup was then restored to another pair of ES40's, with exactly the same configuration as our live system, as part of a Disaster Recovery test.

I personally feel that the queue manager didn't like the journal file, so dumped it, hence no entries. This isn't ideal, but I can live with it as long as I know it's going to happen.

Many thanks for all of your replies, but please don't feel it's necessary to get to the bottom of this for my benefit. Of course, being VMS admins and consumate professionals, I'm sure you'll all still want to!

Rob.
Antoniov.
Honored Contributor

Re: System Disk Restore & Queue Entries

Hi Rob,
I think your trouble is depening by BACKUP/IGNORE=INTERLOCK.
When you make backup and queue manager is running, files of que manager are continuously update; the /IGN=INTER qualifier tell to backup ignore error (means error is treated as warning) but files of uqe manager are inconsistently.
Perhaps you could issue STOP/QUE/MANAG/CLUST before backup, but STOP/MAN stop all job ruuning (that's no a good idea).
Better may be issue STOP/QUE/NEXT for every queue in system and then RESTART/QUE on new system.

Bye
Antoniov
Antonio Maria Vigliotti