Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Invocation order of Default Condition Handlers?

 
Mark_Corcoran
Frequent Advisor

Invocation order of Default Condition Handlers?

I'm making some code changes for an executable image to deal with a problem of process dumps being generated following an earlier change to include the PRC$_IMGDMP flag when calling $CREPRC to create detached processes.

The original programmers opted to use LIB$SIGNAL in many places to signal undesireable condtions (typically a non-success return status from a call to $QIO, $QIOW or LIB$ routines, or in the case of $QIO or $QIOW the I/O completing but with an IOSB status that indicated failure), ensuring that the severe bit (STS$K_SEVERE) is clear prior to calling it.

The only condition handler that our code establishes is prior to most PASCAL RTL I/O procedure calls (because they are in wrapper procedures, though some higher level procedures that call the wrappers also make PASCAL RTL I/O procedure calls without the condtion handler being established).

In our condition handler it returns SS$_RESIGNAL (after writing a message to a mailbox device so that an application-wide logger process can record the message).

What ends up appearing in the file to which SYS$ERROR is directed for the executable I'm changing is:

  • The "%FACILITY-SEVERITY-IDENT, text" messsage
  • Optionally a secondary RMS message (e.g. "-RMS-W-TMO, timeout period expired")
  • Traceback information (the executable is linked /TRACEBACK),


I've never previously had to look in depth at how error messages and traceback was generated, and had naively assumed that the Pascal RTL was generating the messages that preceded the traceback (as a result of the ERROR := MESSAGE parameter on calls to the I/O procedures).

Having looked into it, it appears that the ERROR := MESSAGE is simply a trigger to signal a condition that eventually gets dealt with by the last chance condition handler (which is generating the primary and secondary messages and also triggering the process dump).


In VMS Internals and Data Structures v5.2 (EY-C171E-DP, 1990), it discusses default condition handlers in section 5.7 (starting on page 93).

In section 5.7.1 it says that "If the severity level is other than the three listed [warning, error or severe error], the traceback condition handler resignals the condition, which usually means that the condition is being passed on to the catch-all condition handler."

In section 25.3.2 (page 735) - paraphrasing - it says that EXE$CATCH_ALL calls SYS$PUTMSG to output the error message, potentially calls EXE$EXCMSG to write an exception summary (signal array, stack & register dump), then dispatches to EXE$IMGDUMP_MERGE to generate the process dump, and finally either calls $EXIT or continues image execution.


The process dumping only ended up happening because the EXE$CATCH_ALL condition handler was getting invoked as a result of either no condition handler being established, or our established condition handler returning SS$_RESIGNAL.

I've changed all the LIB$SIGNAL calls to record information in a log file that the process records certain events in.

By changing our condition handler so that it returns SS$_CONTINUE rather than SS$_RESIGNAL, process dumps are no longer generated when Pascal RTL I/O procedures encounter an error, nor are error messages or traceback output - all because EXE$CATCH_ALL is no longer invoked.

I'm not concerned about loss of the traceback information - I only want it to appear if it is a truly unhandled condition (e.g. an access violation).

As our condition handler already sends a message to a mailbox for a logger process to record in an application-wide log file, information is still logged, but having the error messages still written to SYS$ERROR would be useful.


The error messages were being output by EXE$CATCH_ALL - so if it is no longer being invoked, then I have to call $PUTMSG myself (no problems, though it's taken me a while to work out that EXE$CATCH_ALL (on VAX) must be looking at the 4th word in the mechanism array to get R0's value at the time of the exception, to determine that an RMS message also needs to be output).

[Page 9-59 of the HP OpenVMS Programming Concepts Manual Volume I (AA-RNSHD-TE) indicates that OpenVMS RMS system services (called by Pascal RTL I/O procedures) return two related completion values – the completion code and the associated status value, with the completion code being returned in R0 using the function value mechanism]


The one thing that I don't understand is that in the file to which SYS$ERROR is redirected, the error messages appear before the traceback information, whereas the order in which the traceback and catch-all condition handlers are detailed in the VMS Internals and Data Structures book implies it should be traceback first then the error messages.

Further, under the traceback condition handler, it says that "the traceback condition handler resignals the condition, which usually means that the condition is passed on to the catch-all condition handler."

Based on what I'm observing (error message then traceback), I'd have to infer that the book is wrong. Unless anyone knows better?

Mark

 

Update 23-JUL-2019

After further review of manuals and a bit of reading-between-the-lines, it looks like the error messages are being out by the traceback handler (analysing a detached process using SDA with SHOW CALL and SHOW CALL /NEXT reveals that there are a 3 call frame condition handlers:

3rd Condition Handler 7FF4FD5C 001217AC
2nd Condition Handler 7FF4FDB8 86E72577 IMAGE_MANAGEMENT+02B77
1st Condition Handler 7FF4FDEC 86E8B821 PROCESS_MANAGEMENT+07621

Further, it appears that the traceback handler is probably resignalling with SS$_RESIGNAL, resulting in the catch-all condition handler triggering the process dump.

The R0 I'd referred to as being in the 4th element of the mechanism array doesn't appear to contain the RMS status (attempting to use it as a message descriptor for an RMS-type message in a call to $PUTMSG does nothing, and using it as a parameter to LIB$STOP within the condition handler (simply so I could determine its value) results in an exception being caught & reported by the catch-all handler, and the same value being recorded as the final exit status in the accounting record - but is far removed from both the PAS$_ERRDURFIN I am deliberately triggering during testing and the underlying RMS$_TMO that my test causes FINDK to barf with PAS$_ERRDURFIN).

For the life of me, I can't see how the secondary RMS status message is being output - unless various bits of documentation are wrong, and it's actually the Pascal RTL that's generating the message (AFAICT, the only way of getting the RMS status is to access the FAB or RAB;  that would probably require the file to be declared VOLATILE and for the condition handler to somehow know which file the condition was being signalled for).

I think I'm just going to have to lose the reporting of the RMS status.

[Formerly appearing as woeisme]
2 REPLIES 2
Ian Miller.
Honored Contributor

Re: Invocation order of Default Condition Handlers?

isn't the RMS secondary status in the signal array not the mechanism array?

____________________
Purely Personal Opinion
Mark_Corcoran
Frequent Advisor

Re: Invocation order of Default Condition Handlers?

>isn't the RMS secondary status in the signal array not the mechanism array?
You are - as usual - right, Ian.

I copied some existing code that called $GETMSG for the condition value, then walked the signal array, looking for the optional longwords that are "Optional Additional Arguments Making Up One or More Message Sequences" (as per figure 9-6 on page 9-36 of the HP OpenVMS Programming Concepts Manual (AA-RNSHD-TE, JAN-2005)), to build an argument list for passing in a call to F$FAO.

This meant that the message that was output by $PUTMSG was this:
%PAS-F-ERRDURFIN, error during FIND or FINDK
File "PascalFileVariableName" Filename "FullyQualifiedFilename"

rather than:
%PAS-F-ERRDURFIN, error during FIND or FINDK
File "!AC" Filename "!AS"

 

I misread the original routine that I copied the code from...

The code that did the $GETMSG/signal array walking/$FAO call was in a subroutine which was called twice by a higher level routine.

The first call extracted the first condition value & optional additional arguments in the signal array in its entirety, requesting that the subroutine called $GETMSG with a flags parameter value of 15 (equivalentto SET MESSAGE /FACILITY /SEVERITY /ID /TEXT).

The second call requested that the subroutine called $GETMSG with a flags parameter value of 1 (equivalent to SET MESSAGE /NOFACILITY /NOSEVERITY /NOID /TEXT), but I hadn't noticed that the position it was looking at in the signal array was left modified by the first call (and thus was picking up the second condition value (if present)) - I had (wrongly) presumed that for some reason the original programmers were wanting to get the full message into one buffer, then just the /TEXT into a second buffer.

After my original post (and follow-up), I built a version of the executable that was LINKed /DEBUG but which by necessity had some code changes (it only starts processing once it gets an indication from a watchdog process that the system is in a master state;  the watchdog process only communicates with it if it (the watchdog) started the process as a detached process, so running it interactively in debug mode without code changes wouldn't get it to the point where I could induce the RMS failure).

I had built it /DEBUG because I was trying to get a handle (pun intended) on the condition handlers, to see what/where they were.

In my original post, I'd indicated that there were three active condition handlers when examining the process in SDA:
7FF4FD5C 001217AC
7FF4FDB8 86E72577 IMAGE_MANAGEMENT+02B77
7FF4FDEC 86E8B821 PROCESS_MANAGEMENT+07621

This is potentially strictly not true - at the time I happened to do SHOW CALL and repeated SHOW CALL /NEXT, these were the condition handlers that were established, but that was when the process was in a HIB state, waiting for input from a VT Terminal or a PLC, and not at the point where it was calling a Pascal RTL I/O routine (where a condition handler would be briefly established), nor any ad-hoc locations where the Pascal RTL itself temporarily established a condition handler.

I had hoped to look at my session log from the time when I ran the executable interactively /DEBUG (to avoid having to rebuild it LINKed /DEBUG and reintroducing the code changes to "defeat interlocks"), but there were issues accessing the system initially, then trying to view the session log (it is a poor-man's session log derived from SET HOST /LOG, so VT escape sequences are displayed rather than interpreted when you EDIT /TPU them).

I ended up rebuilding a debug version, and running it, and it was only when I decided to evaluate the elements of the signal array that I noticed that the first element indicated there was 9 additional longwords, and subsequent examination revealed that indeed, the RMS condition value was also present in the array.

Running the executable in the debugger does somewhat skew the analysis of established condition handlers - a SHOW STACK revealed that there were a number of condition handlers established at SHARE$DECCSHR+value or at SHARE$PASRTL+value

However, I did determine that the condition handler at 001217AC was established by the DEC C RTL (itself - not a user-established one with VAXC$ESTABLISH or sigvec), and by LINKing the executable /NOTRACEBACK I found that the condition handler at 86E72577 (IMAGE_MANAGEMENT+02B77) was the Traceback Condition Handler (the condition handler no longer appeared when examining the process in SDA with SHOW CALL and SHOW CALL /NEXT).

The executable is (as you might surmise from condition handlers in SHARE$DECCSHR and SHARE$PASRTL) a mixed-language executable - C and Pascal, with the entry point being main() in C.

This seems to somewhat obfuscate the first (last-chance/catch-all) condition handler defined for the process...

Using SDA with SHOW CALL & SHOW CALL /NEXT, a plain-vanilla C-language program shows the first condition handler as being EXE$CATCH_ALL, but for the mixed-language executable, it shows as 86E8B821 (PROCESS_MANAGEMENT+07621) - it appears to eventually call (or at least, perform the same actions as) the last-chance/catch-all conditon handler.

I had originally thought that since the user condition handler established with the Pascal ESTABLISH procedure was resignalling with SS$_RESIGNAL, it must have been the Pascal RTL itself that was handling the condition (as it had been called with FINDK, it would therefore know which file was being accessed, and therefore have access to the RAB to check the RAB$L_STS field, and then use RAB$L_STS and RAB$L_STV in a call to $PUTMSG).

However, based on what the VMS Internals and Data Structures 5.2 book (EY-C171E-DP, 1991) says in section 5.7.1 on page 93, that it was the traceback condition handler that output the messages pertaining to the condition, then output the traceback interpretation of the call frames before resignalling with SS$_RESIGNAL, causing the last-chance/catch-all condition handler to eventually dispatch to EXE$IMGDMP_MERGE

Although the code mostly calls the Pascal ESTABLISH procedure before making calls to Pascal RTL I/O routines (I've not checked all of them in the single source module that contains library routines/wrappers for accessing the database (RMS indexed files), but there are at least a few places where it doesn't), it does seem to always call another wrapper routine to check the status returned by the Pascal RTL I/O routine.

As the checker wrapper routine knows which file it is dealing with and has access to the RAB for the file, I modified that routine to check the RAB$L_STS field, and then use RAB$L_STS and RAB$L_STV in a call to $PUTMSG if necessary.

Now that I've correctly interpreted the original code that I copied, I'll add a second call to the subroutine to get the secondary RMS status output, after modifying the flags parameter value (in my test case, the existing value would have resulted in a message of "timeout period expired" rather than "%RMS-W-TMO, timeout period expired", though it's curious to note that the Traceback handler was outputting "-RMS-W-TMO, timeout period expired" rather than "%RMS-W-TMO..."), then remove the changes I'd made to the checker wrapper routine.

Mark

[Formerly appearing as woeisme]