Operating System - OpenVMS
1839261 Members
2565 Online
110137 Solutions
New Discussion

Booting from backup - old batch jobs

 
SOLVED
Go to solution
Duane Sadowski
Frequent Advisor

Booting from backup - old batch jobs

Hi. I'm writing with a question regarding booting from a restored copy of one's system disk. When our system boots, disks get mounted automatically, and the batch and print queues start automatically. (The system is a standalone system, not part of a cluster.) This makes it tricky to boot from a restored backup of the system disk, which I like to do occasionally to check the integrity of the backups. Batch jobs submitted with the /AFTER qualifier that were in the queue when the backup was created will start executing if their scheduled start time has passed.

Could someone please comment on ways people avoid this problem? (For example, do people have the startup command procedure issue a prompt? Do people manually start the queues after the system is back up?)

(I have a step in the site-specific startup command procedure to skip the mounting of the disks if a particular dummy file is present. If I know before I do a particular backup that I will later want to test its integrity, I can create the dummy file before shutdown, but this doesn't seem good as a general solution.)

- Duane
17 REPLIES 17
Robert Gezelter
Honored Contributor

Re: Booting from backup - old batch jobs

Duane,

I would consider using one of the user specifiable parameters (USERD1, USERD2, USER3, USER4) to the STARTUP process together with a Conversational Boot to resolve this problem.

I would then add the DCL to NOT start the queues if the parameter is set to the value that you are using to indicate a backup verification.

I hope that the above is helpful.

- Bob Gezelter, http://www.rlgsc.com
John Gillings
Honored Contributor
Solution

Re: Booting from backup - old batch jobs

Duane,

I've had exactly the opposite question from people who do BACKUP/IGNORE=INTERLOCK. Their queue manager files are never saved properly because they're open. Their complaint is that they WANT old batch jobs to start, so how can they make it happen?

Seems crazy to me but customer's are always right... I guess you can't please everyone all the time.

Robert's idea to use the USERDn SYSGEN parameters is a good one. It requires your startup procedures to check parameters and have tested code paths for each of the startup scenarios you want to cover. Human recovery procedures are also written down with all the details of EXACTLY how to restart in each circumstance, and your operations staff need the discipline to follow them precisely even when in the panic state of recovering from some kind of problem or disaster. They also need to be tested!

One school of thought, especially in disaster tolerant systems is to never let a system boot by itself from power on (default boot flags set to conversational boot), and never let disks (especially shadow sets) mount automatically. You define a number of states your systems can be in. Manually let them progress through the states only when you've verified they're ready for the next state.

It may be more prudent to invert the USERDn model and have the system boot by default in FAILSAFE mode. Have a setting to override and specify a "fast boot" if you're sure everything is ready.
A crucible of informative mistakes
Phil.Howell
Honored Contributor

Re: Booting from backup - old batch jobs

this is in our startup, if there is someone there to answer the question as N, then it does a minimal startup, otherwise it does a "world" startup
Phil
$set_startup:
$ say " "
$ open/read oper_console opa0:
$ read -
/time_out = 10 -
/prompt = "Is this startup a world startup [Y] ?" -
oper_console startup_answer
$ close oper_console
$ if startup_answer .eqs. "" then startup_answer = "Y"
$ if startup_answer then goto not_stand_alone
Ian Miller.
Honored Contributor

Re: Booting from backup - old batch jobs

Consider not using BACKUP/IGNORE=INTERLOCK and using DCL to save the state of the queues and jobs and restore it if necessary. There is DCL around from Keith Parris which does this.

Having jobs start when you restore is often a bad thing.

The USER system parameters are a good way to modify startup behaviour. Having systems do a conversational boot every time can work if proper procedures are in place and the operations staff follow them.
____________________
Purely Personal Opinion
Nic Clews
Occasional Advisor

Re: Booting from backup - old batch jobs

A technique I have employed in the past is to delay the start of batch jobs after the end of start up.

The reason for this is two fold, firstly, if the system was in a crash-reboot situation, it stopped the batch jobs from slowing down the system when first logging in, and the second reason is that the batch job in question could potentially be the cause of the problems, i.e. system slowdowns or the crash! (Some batches were run at higher than normal base priority).

The technique is quite simple, autostart has to be off of course, and we're talking a stand alone system, not clustered with a remote queue manager, but one of the last lines in the system start up is a RUN/DET LOGINOUT/INPUT= and the command file has a WAIT in it. We used 5 minutes. Plenty of time to log in, stop that process.

But in your case of restoring backups, another possibility is to set the system date and time earlier than that of the backup, but you may also need to perform a MIN boot to reset passwords for access prior to the full boot to access the data you need, so this may be the point to set the clock.

Duane Sadowski
Frequent Advisor

Re: Booting from backup - old batch jobs

Thanks to all of those who provided suggestions. You provided me with a variety of alternatives and presented different philosophies.

I liked the way that two of you emphasized the human element (the potential for panic, the need for staff to follow established procedures).

The approach that seems most appealing is the one with the prompt. It's simple, although it lacks the potential power and flexibility of the scheme the "USERDn" approach.

Reflecting on what one of you wrote about the potential for panic, I'm wondering about the potential for distraction or forgetfulness at the point one would need to kill a detached process that automatically starts the batch jobs.

- Duane
Ian Miller.
Honored Contributor

Re: Booting from backup - old batch jobs

Personally I prefer an explict start. That is someone (operator, system manager) has to do something to start the application. It should be a simple thing so it is easier to do and does not unduely delay the start.
____________________
Purely Personal Opinion
Duane Sadowski
Frequent Advisor

Re: Booting from backup - old batch jobs

Ian, you advocate an explicit start. Can you please tell me whether you are referring only to a conversational boot with the use of USER parameters, which you discussed earlier in the thread, or could that also include a command procedure that someone needs to run manually after the system has rebooted?

- Duane
Robert Gezelter
Honored Contributor

Re: Booting from backup - old batch jobs

Duane,

I for one prefer a process that is optionally based upon something along the lines of the USERx parameters.

My reasoning is that the normal path can be completely automated, allowing for automatic, unattended restarts. The unusual case can be managed by performing an OPTIONAL conversational boot.

This prevents the problem of being unable to restart the system (or delay the startup of the system) in the normal cases, while providing an escape hatch.

- Bob Gezelter, http://www.rlgsc.com
John Gillings
Honored Contributor

Re: Booting from backup - old batch jobs

Both Ian and Bob may be correct, depending on circumstances...

A completely automated startup may be appropriate. Consider, if your site loses power. When it comes back, do you want/need all your OpenVMS systems to come back on line unattended? If it's a self contained environment, that may be appropriate

On the other hand, if you're one lobe of a multi site data centre, most of the time you DO NOT want systems to boot unattended. If you have shadow set members on multiple sites, booting in the wrong order, or booting without communications to a site which is still live can potentially corrupt data bases. In this case you want to have a human make the decision as to when and how a system is booted.

Regardless, if you value your data, you should have a procedures manual (on paper) which documents the startup, shutdown and recovery procedures under a variety of scenarios. For "simple" systems it doesn't have to be highly detailed, but the effort taken should at least reflect the value of your the system to your corporation.
A crucible of informative mistakes
Duane Sadowski
Frequent Advisor

Re: Booting from backup - old batch jobs

Thanks for the additional comments, Bob and John.

I'm mulling over what John Gillings wrote in his first message, and I'm thinking about the specific scenario that John just presented. Suppose a standalone VMS system is running and a batch job is executing, when the system goes down during a weekend because of an electrical blackout. I'm wondering whether in this scenario a VMS system (in my case, a standalone AlphaServer 2100A) will boot by itself when power comes back up and start running batch jobs - perhaps batch jobs that depend on the successful completion of the job that bombed when the system went down.

I see that there's a console variable called auto_action. If I'm interpreting the documentation correctly, I could set its value to HALT if I'm concerned about this particular scenario. (I just checked the value of this on the running system, by using F$GETENV, and I see that it currently has the value BOOT.)

However, I'm wondering if setting auto_action to HALT would mean that someone would have to travel to the computer site to boot the system. (I recall that some people here told me previously about the ability to connect the AlphaServer console to a DECserver, which if I understand things correctly, would permit remote access if I can telnet into the DECserver.)

- Duane
Ian Miller.
Honored Contributor

Re: Booting from backup - old batch jobs

If you don't start the batch queues then the batch job will not restart. If your batch queues are not autostart enabled then they wont start until explictly started. This can be a useful control. Autostart queues start when the AUTOSTART ENABLE command is executed (usually in SYSTARTUP_VMS).

Having a DECserver connected to the console port is invaluable. A DECserver 700, or DECserver 90 both allow telnet. A DECserver that can boot from internal flash memory is recommended as it will not need a load host after a powerdown.
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: Booting from backup - old batch jobs

I wonder if there are other things to verify except queues (e.g. restored on another system with other hardware names, is db recovery to be applied, ...).

Anyway ...

Our system has a list of everything that will be started and you can indicate where to stop the startup.

E.g. stop after startup of network (but before applications and middleware).

So, we should have to do a minimal boot, change the file and continue.

But we still didn't do the restore test.

Wim
Wim
Robert Gezelter
Honored Contributor

Re: Booting from backup - old batch jobs

Duane,

Your reasoning is why I mentioned the USERx parameters.

You can leave the automatic boot enabled, and bring the system up in whatever safe state is desired (e.g., any combination of batch queues stopped, shadow sets unmounted, applications down). But, you DO have a running system at that point.

Unless you have set up a secure access to the console LAN (where the server providing telnet access to the console is), then you have a whole new area of security vulnerability. My preference is to bring up the system in such a case. Then you are able to use all the features of your security system.

In the case of OpenVMS clusters, this does require some care. Partitioned clusters can occur if parameters relating to cluster quorum are not set correctly. This is particularly a danger if during an emergency, parameters are altered. Personally, I prefer to have pre-planned alternative boot roots on the system disk, to deal with different situations. The most errors occur when changes are made in haste under pressure.

Just my US$ 0.02.

- Bob Gezelter, http://www.rlgsc.com
Jan van den Ende
Honored Contributor

Re: Booting from backup - old batch jobs

Duane,

did I ever agree with Bob!

When things are going well, it is all so easy.

But, _WHEN_ thing are derailing, it is better to STOP first thing, and go forward from there ever so carefull.

Remember: time lost is a (maybe big) nuisance, but data corrupted takes orders of magnitude more time to detect, sort out, and repair. In that case the words that replace "nuisance" in the above sentence are not fit for this forum!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Duane Sadowski
Frequent Advisor

Re: Booting from backup - old batch jobs

Thanks to those who elaborated on things they and others wrote previously. I will be discussing with the people with whom I work the things that you shared with me, voicing questions about our current philosophy and practices.

- Duane
Duane Sadowski
Frequent Advisor

Re: Booting from backup - old batch jobs

Thanks again to those who answered my questions.