Operating System - OpenVMS

Processes Mysteriously Being Deleted

 
Robert Atkinson
Respected Contributor

Processes Mysteriously Being Deleted

We have a problem that's now happened for a 3rd year in a row. Over the Christmas period a couple of our batch jobs have terminated, like this :-

Entry  Jobname         Username     Blocks  Status
  -----  -------         --------     ------  ------
    324  GBSINVRUN       BATCHUSER             Retained on error
       %SYSTEM-F-EXITFORCED, forced exit of image or process by SYS$DELPRC
         On available batch queue BETA$OPERATIONS
         Completed 31-DEC-2007 07:45:52.97 on queue BETA$OPERATIONS

We cannot find a reason for this. Is only effects a couple of processes each year, and only happens at Christmas.

We Autogen/reboot every month, so it's not a 'build-up' type issue.

What I'd like to know is if anyone else has seen anything that matches this profile - could there be a VMS virus that only effects a few systems?

Rob.
57 REPLIES 57
Hein van den Heuvel
Honored Contributor

Re: Processes Mysteriously Being Deleted

>> Over the Christmas period a couple of our batch jobs have terminated

So this happened over multiple days?
So we can not blame say an idle process killer which got nervous (miscalculated) year-end? Anyway, the example shown is too far away from new-year even fudging in UTC consideration (looks like local time = utc for you).


>> this. Is only effects a couple of processes each year, and only happens at Christmas.

I think someone wants to go home early?!
Sorry, not obvious explanation.


>> We Autogen/reboot every month, so it's not a 'build-up' type issue.

Why would you want to do that? Oh well.
Does that reboot come with process deletes?
Are the processes targetted just (much) longer running than anticapted aroudn Xmass?


>> could there be a VMS virus that only effects a few systems?

Ah, finally a simple, one word, answer: NO.
Two words? NO WAY!

grins,
Hein.


Hoff
Honored Contributor

Re: Processes Mysteriously Being Deleted

You'll want to enable auditing of privileges or auditing of process control system services, so that you can capture the $delprc or $forcex call for next year. Or set yourself a calendar event for mid December 2008, and enable the requisite auditing then. (You might want to have at least some of this auditing enabled as a matter of course.)

If the processes are the approximate same set each year (check last year's accounting, if you still have it on-line or in your BACKUP archives), it is easily possible that the code itself is detecting the end of the year as part of its normal processing (as I'm guessing the code is date-related, based on that name), and it's failing and/or forcing an exit.

Look for patterns.

Look for application code messing with scheduling or with system time, or with time-change or TDF-related events, or other such code. (If you don't have application code, check with the vendor or whomever has the code.)

As for malware, an underlying operating system bug, or most any other out-of-band and external trigger for an error such as this, a site-local tool or procedure or a latent application error is a far more likely trigger for this sort of behavior.

The so-called "built-up" issues are generally also signs of application errors, or process errors. OpenVMS is quite capable of running for a decade, barring the standard maintenance and requisite patches. I'd not bother with a monthly reboot and an AUTOGEN pass, save for cases where there are large-scale changes in load.

I would look to cases such as file versions approaching 32K (tools such as DFU, or updates to the DIRECTORY command in recent OpenVMS versions can help here), and toward cruft "building up" in indexed files, and related. There are cases where cruft can build up, but these are generally due to the way the application operates, or operates with OpenVMS. Reboots won't generally cure these cases, either.

Malware is certainly quite possible on OpenVMS, but you do need to ask yourself "am I really a target?" and "am I the first site to see malware in the wild on OpenVMS since, well, CCC and WANK?". And you need to ask yourself "how did I get infested?", as transmission is rather difficult using the typical PC vectors of mail and web. If you're handling boatloads of assets, you might well be a specific malware target. But otherwise, hearing hoof beats usually mean horses, and not zebras. This means application and process bugs, then maybe OpenVMS or LP bugs, and not malware.
Jon Pinkley
Honored Contributor

Re: Processes Mysteriously Being Deleted

Is it possible you have some type of process watcher that is looking for "runaway" processes and it is deleting the processes? Perhaps due to seasonality, the processes are taking more resources than they normally do, and thus appear to be in a loop.

Have you looked at accounting records for the resources used by the processes that are being deleted? If you have image accounting turned on, this would also be helpful. Even more helpful would be audit records if

$ set audit/audit/enable=(process=all)

is in effect, or for only monitoring $DELPRC events:

$ set audit/audit/enable=(process=delprc)

Before next year's Christmas period, you may want to turn on process auditing and possibly image accounting.

Is there special year end processing that must complete before 1-jan? (Looking at the 31-DEC-2007 date).

A bit more about image accounting:

Image accounting can consume disk space quickly if you have many image rundowns. But disk space is the primary resource it consumes. We run with image accounting on all the time. A side benefit of having image accounting turned on is that the CTL$xx_I* cells in the process control region are populated when images startup, so for example it is possible to determine when an image started while it is currently running by using something like this:

SDA> set proc/in=20207802
SDA> exam/time ctl$gq_istart
15-JAN-2008 13:12:07.46
SDA>

To enable image accounting:

$ SET ACCOUNTING /ENABLE=IMAGE

See help set accounting for more info.

Image accounting records can be surprisingly useful for troubleshooting. So can audit records.

Jon
it depends
John Gillings
Honored Contributor

Re: Processes Mysteriously Being Deleted

Rob,

Since you're only getting once a year shots at this, you should turn on more auditing than you think you need to make sure you can follow leads. In addition to PROCESS audits as already suggested, I'd also recommend both LOGIN and LOGOUT audits:

$ SET AUDIT/AUDIT=(LOGIN=ALL,LOGOUT=ALL)

So, when your enable=(process=delprc) audit tells you some random process did it, you can determine when and from where that process was created. Since it's OpenVMS, you should probably enable your audits well in advance (perhaps just BEFORE a reboot prior to the expected event). That way you can be certain you'll have a record of all the processes that might be responsible.

If you have sufficient storage and processing power, you could take the thermonuclear option and enable ALL auditing for the suspect period. (believe it or not, I've seen systems with full auditing enabled, but they were specifically sized with extra CPUs and storage to cope with the auditing load, and if I told you where they were I'd have to shoot you ;-)
A crucible of informative mistakes
Thomas Ritter
Respected Contributor

Re: Processes Mysteriously Being Deleted

Robert, was there a log file associated with this batch job ?
DECxchange
Regular Advisor

Re: Processes Mysteriously Being Deleted

Did you do a:
$ show que/all/full

on this queue? Or if you leave out the queue name, it will list all of the information for every queue on your system. You can scroll back and see what file was submitted, what time it was submitted, and what qualifiers were used. Then you can work your way backwards and forwards to figure out what the job is doing and why it might be getting stopped. You could add commands to the command file that is being run to identify when and where it is getting stopped.
Wim Van den Wyngaert
Honored Contributor

Re: Processes Mysteriously Being Deleted

If you analyze the moment of process deletion with VPA or whatever, you could find which programs were also active around that moment and may be find the guilty one (just hope it's called christmas_process_del.exe).

Also accounting could reveal something.

Bush would say "It's Al-Qaeda. Lets bomb Iran.".

Wim
Wim
Robert Atkinson
Respected Contributor

Re: Processes Mysteriously Being Deleted

Many thanks for all of the responses. If I can deal with some of the simple answers first.....

We don't use a process monitor/killer anywhere on the system, so I think this can be ruled out.

We used to Autogen every week, but have dropped to monthly - 2nd week in the month. There are pros and cons to this frequency, which I don't want to discuss here and detract from the initial problem.

The application we run does the same thing, day in, day out. The same sets of programs are used, running the same data against the same code.

It's very unlikely that something exists in the application software, especially as some of the DELPRC events have occurred whilst the bagtch job has been running DCL WAIT commands.

Completely agree with enabling auditing towards the end of 2008, and we've switched some extra events on already, however, we only see this problem near the end of the year.

I'm doing some extra checking on exactly when this problem has ocurred, as it may be we're running a program during our yearend that effects VMS in some way, and shows up a day or two later.

Sorry for not including this earlier, but this is what DIAG thinks of it :-

**** V3.4 ********************* ENTRY 533 ********************************





Logging OS 1. OpenVMS

System Architecture 2. Alpha

OS version V7.3-2

Event sequence number 18983.

Timestamp of occurrence 31-DEC-2007 07:45:52

Time since reboot 15 Day(s) 16:50:36

Host name BETA



System Model AlphaServer ES45 Model 2B



Entry Type 40. System Bugcheck



Bugcheck Minor class 2. System Bugcheck



Bugcheck Msg RMSBUG, RMS has detected an invalid

condition

Process ID x007E0097

Process Name GBSINVRUN

KSP x000000007FF88000

ESP x000000007FF8BC74

SSP x000000007FF9CD70

USP x000000007AD51A70

R0 x0000000000000001

R1 x0000000000000001

R2 xFFFFFFFFFFFFFFFA

R3 xFFFFFFFF8A9022A0

R4 xFFFFFFFF8A9025B0

R5 x0000000000000000

R6 x0000000000018001

R7 x0000000000000000

R8 x000000007FFCEE24

R9 x000000007FFD0218

R10 x0000000000000001

R11 x000000007FFD00B8

R12 x000000007FFCDA60

R13 xFFFFFFFF8A90DEA8

R14 xFFFFFFFF819E6080

R15 x000000007AE4DE20

R16 x00000000000003A0

R17 xFFFFFFFFFFFFFFFA

R18 x0000000000000004

R19 x000000007FFD0010

R20 x000000007FFD0010

R21 xFFFFFFFF8A914360

R22 x00000000FFFFFFFF

R23 x0000000000000064

R24 x0000000000000001

R25 x0000000000000001

R26 xFFFFFFFF8049A200

R27 xFFFFFFFF8A90DEA8

R28 xFFFFFFFF8049B354

FP x000000007FF8BC80

SP x0041474D24415445

PC xFFFFFFFF804A3550

PS x0000000000000009


Last year, our support company suggested this may have been caused by PIOPAGES being set too low, so we upped it to 2000. As you can see, it hasn't helped so far.

Rob.
Hein van den Heuvel
Honored Contributor

Re: Processes Mysteriously Being Deleted


Ah, but that's an RMS Bugcheck.

Specifically

FFFFFFFA == ASBALLFAIL == couldn't allocate an asb (rm0stall)

RMS ran out of stack more or less.

Are you up to date on OpenVMS and patches?

RMS crashed zap the process.
For full analysis you may have to run with SYSGEN BUGCHECK_FATAL = 1.
This will crash the syste, on an RMS bugcheck.


more later... gotta run

Hein.