Folks,
Just a quick question regarding the exception information available from **bleep**/SYS. I have a system consisting of a number of processes, the majority written in Pascal with some modules in C and C++.
We're using OpenVMS 8-4, patched to the hilt running on RX2800s.
One of the processes was taking an inordinate amount of time to perform its processing, most of the CPU time was spent outside of user mode. **bleep**/SYS revealed that the process seemed to be doing nothing but exception processing. Finding a user mode PC address was quite a challenge (used PCS, EXC and FLT under **bleep**/SYS) but seemed to point to an old function using LIB$MOVC3 to perform string conversions. The code was reorganised to reduce the reliance on the problematic function, performance was improved by 95%. The process spent 1 second in COM rather than 20 seconds.
The process is still showing a large number of exceptions. The questions I have are:
1) What is an acceptable exception rate?
2) How do I find out the root cause of the exception?
I've attached some samples output from **bleep**/SYS . If the output from the PRF tool is to believed the process spends most of its time processing exceptions.
Anyway, hope it makes sense.
Cheers
Brian Reiter
Solved! Go to Solution.
Hmmm,
The output from EXC was less than helpful to be honest. How do I map an exception to a given process ID? Where is the root cause?
As far as I know our developers didn't rely on the exception handlers, my best guess is that these exceptions are in the PASCAL RTL or elsewhere.
cheers
Brian
>>> The output from EXC was less than helpful to be honest. How do I map an exception to a given process ID? Where is the root cause?
OK, that helps a lot. In the space of 1 second (or less) for the process I'm interested I got 2294 exceptions all with the same basic arguments.
All I need to do now is map the sig (address I assume) into the process address space and then find it in the source. Running the offending exe in debug and examining the value that came back gives me an address in a comman library, presumably data as the debugger didn't show anything resembling source, just an offsett into one of our libraries. The offset being SHARE$NMCS2_APP_RWDATA0+0E477C which looks to be a linker construct.
So the question now becomes, how do I find out which module caused the aggravation.
OK, so a bit of digging around seems to point the blame at the PASCAL builtin READV. The function calling READV is being called incorrectly anyway so its kind of blind luck that this has worked at all, at this point in the application the READV function is being passed a binary array (its expecting a deliniated numeric string), and hence causing the PASCAL exception handler to fire. We're using ERROR:=CONTINUE which forces contination after the event.
Something to try tomorrow.
>>> All I need to do now is map the sig (address I assume) into the process address space and then find it in the source.
Hi,
Thanks for the clarifications and your help it has been much appreciated.
Anyway I've corrected he calls around the READV.Uultimately they weren't required any more, binary data was being passed in which caused an exception in the Pascal READV. This binary data had already been correctly converted earlier.
The problem had probably been in existance for over a decade, possibily two. We had noticed on the Alphas that this process spent too much time in COM but assumed it was normal behaviour. When we ported to Itanium the process spent even more time in COM which set the alarms bell ringing. We have plans for various extensions to the system which this problem would have badly affected. However, now we the processing down to well under a second and we're in a position now to consider the extensions.
cheers
Brian