Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Stopping User processes before Batch Cycle

 
John T. Farmer
Regular Advisor

Stopping User processes before Batch Cycle

Hey,

Looking for suggestions on the best approach to stopping user processes prior to starting our batch cycle. We occassionally hit file and record locks where some users leave their terminal logged in and the application running overnight.

We have done some DCL script to stop processes for a specific job. Wondering if there's a better approach than custom DCL script. We have basic 7a-5p office hrs.

I've been working with the WATCHER utility from MadGoat for stopping idle processes, but haven't been successful with that yet.

Thanks,

John

Systems: Alpha AS800, DS20e, OpenVMS 8.3, PowerTerm VT Emulator.
5 REPLIES 5
Hein van den Heuvel
Honored Contributor

Re: Stopping User processes before Batch Cycle

In an ideal world application would be written with doorbell locks, or mailboxes with attention AST's to be able to tell them to 'go away', or close and re-open or similar. That is rarely available though.

Next best is a lock lister / lock killer/ There is a good few OpenVMS Lock analyze programs out there that can help you determine which process perhaps should be killed (try FORCE-EXIT first!)

I know I have a few, and attached one I recently worked on for a specific problem, reporting (RMS) locks for a specific PID. Should be easy enough to modify to your needs. For mere money I'll be hapyy to do so :-).

You may want to consider alternatives.
For example, a customer I work with uses the (V)select tool from EGH (our OpenVMS friend John Santos) which can read indexed file records faster than RMS can, and can blow through locks. That works great for reports and extracts, but might not be the ticket for bulk updates.

Hope this helps a little,

Hein van den Heuvel
Hvdh Performance COnsuling.
Hoff
Honored Contributor

Re: Stopping User processes before Batch Cycle

Well, if the systems really are 07:00 to 17:00 with no off-hours, you could brute-force the problem with the permitted login hours mechanism in UAF. When 18:00 rolls around for instance, everybody marked for, say. 06:00 to 18:00 access gets tossed off the box. Automatically.

Matching environments are quite rare (everybody tends to have a unique system load), and process logout and idle process killer tools (IPK) tend to need customizations to match local expectations. What processes you can delete via $delprc and what you have to toss $forcex and what sessions or applications you might have to go over to the terminal and gracefully exit the session will vary. Widely.

The general approaches available here (beyond the access times in UAF) are the IPKs, or coding the applications to be better behaved with locks and holding same (eg: doorbells, or code that uses a timer to move off of a record, or code that doesn't take a lock when reading records and/or that handles read-write-modify), or moving the application over to a database with the concurrency features you're likely looking for here. None of these options are simple nor easy nor (without a local application review) entirely "safe", though having a database underneath has various advantages.
John Gillings
Honored Contributor

Re: Stopping User processes before Batch Cycle

John,

For uncooperative code you have no other option than to kill the processes. This has some risk of data corruption, especially if locks are held, or transactions are in flight, so it's best to be as judicious as possible, killing only those processes that are in your way. That pretty much means a procedure customised to your site.

Hein's code to hunt down locks is one possibility, but realise that there are timing windows involved! Another option is to search for open files. Again there are timing windows.

However you select your processes, remember to $FORCEX to give the process a chance to cleanup before $DELPRC.

Since you're V8.3, you can use the DCL command STOP/IMAGE to perform a $FORCEX from the command line. You can also use the /EXIT=mode qualifier to control execution of exit handlers.

Typically you should scan processes using F$CONTEXT and F$PID, and/or parsing output of SHOW DEVICE/FILES to find your processes. Use STOP/IMAGE/IDENT on the first pass, and record the PIDs as you go. Allow at least 30 to 60 seconds for the processes to settle, then, if any processes still exist, do a second pass using just STOP/IDENT.

The simplest way I've found to (silently) determine if a process exists is:

$ PIPE pid1=F$GETJPI(pid,"PID") >nl: 2>nl:
$ IF $STATUS
$ THEN
$ ! process exists
$ ELSE
$ ! no such process
$ ENDIF

you can also compare:

$ IF "''pid1'".EQS.pid
$ THEN
$ ! process exists
$ ENDIF

Symbol substitution is necessary because pid1 is not assigned if the process doesn't exist. Also note that since there is only a single command, the PIPE is executed entirely in the context of the current process. It's only used as a simple way of redirecting the error message to null.

When writing application code in the future, make sure you implement some mechanism to remotely request a clean shutdown.
A crucible of informative mistakes
Thomas Ritter
Respected Contributor

Re: Stopping User processes before Batch Cycle

Our general approach was process rights identifier based. About 15 minutes before the nightly processing starts, the usual warnings are sent out. Then we have job which will $forcex in one iteration, wait 2 minutes and $delprc every interactive user which does NOT have the OPS_SUPPORT identifier. Holders of this identifier are the Operations staff, whose job is to manage the batch streams.
This approach has worked well for us.
Craig A
Valued Contributor

Re: Stopping User processes before Batch Cycle

It looks like you have a number of issues:

1. Terminating inactive processes
2. Terminating active processes but who are working outside the acceptable window.

Personally, I'd go the UAF route.

If there is user generated batch work then a STOP/QUE/NEXT on those queues, say, 30 minutes before your end-of-workday time; followed by a STOP/QUE/RESET 30 minutes later.

Craig