Operating System - OpenVMS
1752738 Members
5635 Online
108789 Solutions
New Discussion юеВ

Any DCL script to monitor System OR Application EXE fails

 
SOLVED
Go to solution
Navipa
Frequent Advisor

Any DCL script to monitor System OR Application EXE fails

Hi All,
After long time, I am here.
I have very old CharonVAX/OpenVMS v6.1, & TCPware v5.5-3 running our PASCAL/Rdb based application.
Rarely I face application EXE crashes on unhandled exception which makes the OpenVMS to crash OR System reboots automatically.

I don't have source original source, but I have few source from which I understand that this happens because of invalid string length  (or sometime I see system hangs on TCPWare NFS mount point access, NFS client is AIX).
 
I like to know any DCL script to check if there are any (sys & application) EXE has crashed.

Thanks
Navipa. 

10 REPLIES 10
Navipa
Frequent Advisor

Re: Any DCL script to monitor System OR Application EXE fails

Hi, sorry, the NFS server is AIX and the client is TcpWare OpenVMS.

Steven Schweda
Honored Contributor

Re: Any DCL script to monitor System OR Application EXE fails

> I like to know any DCL script to check if there are any (sys &
> application) EXE has crashed.

   What would you like to happen when something bad happens?

   If the system crashes, and then restarts, it runs
sys$manager:systartup_vms.com.

   How do you start "our PASCAL/Rdb based application"?  If you can
record its process ID when you run it, then you can use SHOW PROCESS or
F$GETJPI() to get information about that process.  (Or, if you can
assign its process a distictive name (SET PROCESS /NAME = proc_name),
then you can use SHOW PROCESS or F$PID()+F$GETJPI() to get the PID
from the process name.)  Similarly, if it's a batch job, SHOW QUEUE
/BATCH or F$GETQUI().

   If you can find a way to determine if the process is running
properly, then you can create a batch job which performs the test
periodically, and does whatever you want if the test fails (or
succeeds).

Navipa
Frequent Advisor

Re: Any DCL script to monitor System OR Application EXE fails

Hi Stewen,
Thanks for your response. this is what almost I want as a script which I wan to run at the scheduler interval to find if any EXE has crashed on my absence. Anyway we can't do anything if the system crashes.
But there are situation, where application EXE crashes, and nothing happen to system, and system runs normal, but I would like to track any EXE crashes and the reason from the exception codes if any.

Thanks
Navipa 

Volker Halle
Honored Contributor

Re: Any DCL script to monitor System OR Application EXE fails

Hi Navipa,

first of all, you could send yourself a mail, if the system is booting (from somewhere in SYSTARTUP_VMS.COM). OpenVMS VAX V6.1 contains the CLUE utility, so you could check the contents of the file CLUE$OUTPUT:CLUE$LAST_<nodename>.LIS, if the system is rebooting after a system crash ! If this files does NOT contain the string 'Operator Shutdown', then it would be a boot after a system crash.

Depending on how 'your application' is running, you could do the following.

- if your application consists of a detached process, you may be able to start this process inside a DCL command procedure (run LOGINOUT as a detached process and use /INPUT=dcl-procedure. Use SET NOON in that procedure and RUN your application-image. Check for $STATUS after the application image exits and send mail accordingly.

- use ACCOUNTING to find out, if there a typical $STATUS values after an application error exit and check for those values from time to time /ACCOUNTING/SINCE="-0-1:0:0"/STATUS=xxx in a batch jobs every hour or so.

Volker. 

David R. Lennon
Valued Contributor

Re: Any DCL script to monitor System OR Application EXE fails

Hi,

  With my weak psychic powers... oh wait, that's someone else's line. You really did not define exactly what defines an "application crash" in your environment. Is it a certain detached process that is no longer there, is it a certain batch job that ends with a certain status, etc? Or, from what I've imagined based on what I read, is it simply a bad error status being returned from a user process running a certain .EXE program? If so, you could install that image with /accounting and then use ACCOUNTING/FULL/TYPE=IMAGE possibly with /STATUS, periodically to catch those bad image states...

Regards,

Dave

Navipa
Frequent Advisor

Re: Any DCL script to monitor System OR Application EXE fails

Thanks Volker/David,
Our Application has around 300 to 400 EXEs (PASCAL), which we call from around 15 jobs (sys$batch). Each jobs call 100 of EXEs and I am monitoring all those jobs manually using the usual VMS utilities/commands. There were instances where few of our EXEs struck and causes system hangs with RWAST (we will reboot) and rarely few of the EXE fails with unhandled exceptions and causes the system reboot (automtically). I have come here few times in the past and IanMiller/Volker/Stewen/JGillings/Jan/Bob Gez/Wim/Hoff/etc (Thanks to VMS friends) helped to analyze the DUMP.

Here I just like to investigate (postmortem) two times/day if any EXEs has failed when it was called. Mostly it might fails with any one of the input file (Invalid field value) out of thousands of input files we process on dails basis. As our scheduler is busy all the time with many batch jobs, I don't like to interrupt any of the batch job with the time consuming ACCOUNTING command. Is there any DCL procedure to find any failure of images (EXEs) easily (using f$getjpi, $getqui, $getsyi, $getdvi, etc).
All our batch jobs create nice job logs; I usually searches those logs for any string like "-E-" OR "-F-", but going all of them is very time consuming, any assistance will be great.
Thanks

     

Volker Halle
Honored Contributor
Solution

Re: Any DCL script to monitor System OR Application EXE fails

Navipa,

if an .EXE file fails during execution in batch, it will most likely display an error message in the batch .LOG file. And it will also probably record an error in it's exit status in the accounting file, if image accounting has been enabled.

No way to find those failures using a lexical functions (f$xxx).

So if you want to check for failures of your images, there are these alternatives:

- change all your .COM files to somehow check the exit status after the invocation of each image and report failures by sending mail or writing to some central log file.

- use ACCOUNTING/STATUS=(list-of-unexpected-exit-status-values)/since="-0-12:0:0") and maybe use SET ACC/NEW every 12 hours to reduce the size of the accounting file.

- scan all your batch .LOG files (use /SIN="-0-12:0:0) for strings containing "-E-" or "-F-". This method would provide the most detailled information about each of the errors.

Volker.

Navipa
Frequent Advisor

Re: Any DCL script to monitor System OR Application EXE fails

Hi Volker,
Thanks for the information. I see there thousands of postings related with OpenVMS system management in the net, but surprise!, nothing to track image failure related.
Now I confirmed that we can't do anything for this using lexicals. OK, as per your suggestions, I will create new accounting.dat everyweek to minimize the search and search for STATUS values combinations with string search (-e- & -f-) in the log, I will try to do short script.

Thanks Stewen/Volker/Dave for your time.
Navipa.
   

john Dite
Frequent Advisor

Re: Any DCL script to monitor System OR Application EXE fails

Navipa,

if you are serious about application monitoring then you should consider looking at HP OpenVMS Service Control

see http://www.openservicecontrol.org/

It's all for free, bar some effort on your part installing and configuring it.

John