Operating System - OpenVMS
1752671 Members
5523 Online
108789 Solutions
New Discussion юеВ

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

 
Tivis Mobberley
New Member

ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

When developing software I have used the debugger for 30 years with Macro-32, FORTRAN and C. I am now in a support role and need to debug process dumps and I find virtually no guidance in the Debugger Manual or elsewhere. Does anyone know of a тАЬguide to debugging process dumps?тАЭ Thank you in advance.


%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000000, PC=FFFFFFFF84298508, PS=0000001B
break on unhandled exception at SHARE$DECC$SHR_EV56_CODE0+1288 in THREAD 1
%DEBUG-I-SOURCESCOPE, source lines not available for %PC in scope number 0
Displaying source for 4\%PC
DBG>
9 REPLIES 9
Hein van den Heuvel
Honored Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...


It's going to be tricky.

With the failure PC in DECC$SHR, you typically need the OpenVMS Listing and the EXACT version of DECC$SHR.

The address being so low, you may be in luck.
I suspect the problem is a NULL pointer as second argument for a call to STRCMP.

For DECC$SHR_EV56 "V8.3-01" the function strcmp starts at 0x500, and this may well be the case for some other versions.
At 0x508 the code picks up the second argument, from R17

Check this with:

DBG> SET RAD HEX
DBG> EXA/INST 0FFFFFFFF84298500:0FFFFFFFF84298510

Does it read ...

SHARE$DECC$SHR_EV56_CODE0+500: LDQ_U R27,(R16)
SHARE$DECC$SHR_EV56_CODE0+504: AND R16,#X07,R21
SHARE$DECC$SHR_EV56_CODE0+508: LDQ_U R18,(R17)
SHARE$DECC$SHR_EV56_CODE0+50C: AND R17,#X07,R20
SHARE$DECC$SHR_EV56_CODE0+510: SUBQ R20,R21,R0

Here is a dummy program to create this failure:

$ type tmp.c
#include stdio
#include string

main (int argc, char **argv)
{
printf ("Hello world %d\n", argc);
printf ("Goodbye %d", strcmp("test",argv[2]));
}
$ cc tmp
$ link tmp
$ mcr sys$login:tmp
Hello world 1
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000000000000, PC=FFFFFFFF80A98508, PS=0000001B

And....

DBG> ex r16,r17
0\%R16: 65608
0\%R17: 0
DBG> ex /asciz @r16
65608: "test"

Good luck... You'll need some.
Hein





Volker Halle
Honored Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

Tivis,

there is nothing 'special' when analyzing process dumps. You use the same debugger commands and techniques and only need to keep in mind, that you are looking at a STATIC situation.

If an ACCVIO occurs inside OpenVMS-supplied images and you do not have their source code, consider to look at the call stack and find out, from which source code line in your program that routine has been called. This may help to figure out what went wrong.

Volker.
GuentherF
Trusted Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

DBG> SHOW CALLS

...should tell which of your code called a CRTL function.

/Guenther
Tivis Mobberley
New Member

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

Thank you for your replies. The debugger manual does not distinguish which commands do not work or work differently when analyzing a process dump verses a live interactive debugging session. Of course I do not expect the Go command to work but what else does not work?

I am not asking anyone to actually debug for me, I was just asking for guidance as to the differences so I did not include a tremendous amount of detail. However, for those who may be interested in more detail I have attached a MS Word doc with the SHOW CALL and a small excerpt from the listing file. The ACCVIO is deep in the getenv() function in the HP C RTL. From a programmers point of view I should be able to pass getenv() anything from a NULL or any other garbage to a legitimate string and it should return either a NULL if the logical name is not defined or a pointer to the equivalence string. Am I mistaken in my belief?

Finally, the dumps are purely random. The code will execute tens of thousnads of times in a month with only 8 to 15 failures!
Dennis Handly
Acclaimed Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

>I do not expect the Go command to work but what else does not work?

Anything that "moves" the PC or changes any registers or memory.

>The ACCVIO is deep in the getenv() function in the HP C RTL. I should be able to pass getenv() anything from a NULL or any other garbage to a legitimate string and it should return either a NULL if the logical name is not defined or a pointer to the equivalence string. Am I mistaken in my belief?

You should NOT expect extensive error checking for C runtime functions other than what's required by the Standard or for functions dealing with security. These are expensive and slow things down for good guys.

So check what you are passing to getenv(3).
What results do you get for the getenv calls before line 22124?
Volker Halle
Honored Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

Tivis,

first of all, note the AIMGMISMATCH message and make sure, that you're using the SAME version of the image for the debugging session of the dump, which was in use when the dump file has been created.

Your source code listing extract (line 22124) and the line number shown in the DBG output (ECARS_UTILS validate_address line 22107) do not seem to match.

It also looks like the routine the ACCVIO occured in is NOT the C RTL routine your code has called, but in some other CRL routine called by that routine.

Look at the code stream of the ACCVIO:

DBG> SET LAN MAC
DBG> EXA/INS 84298508-30:84298508

Then look at the instruction stream inside your calling routine:

DBG> EXA/INS 0496C8C-40:0496C8C

and match that instruction stream with your source code machine code listing (compiled with /LIS/MACHINE).

Once you've correctly identified the instruction stream inside the machine code, look at the source code again and try to determine, where you may be passing an invalid address.

Volker.
Hein van den Heuvel
Honored Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

Hadn't noticed the attachment before.
What Volker said...

And you want to look at what R17 and R18 point to. If the instructions match they would be addressed of strings to compare, one of them being zero. But the other should match up with the getenv call, or an entry in a Crtl maintained cache?

Is the entire input form described with environment variables. Why? TO be 'flexible'? at what price? If the program 'simply' running out of VM space and failing to report a failing malloc?
Can you see the high-vm (GETJPI FREP0VA) and/or pagefile quota on working process? Approaching limits? ( 0x3fff0000 )

Or it is just a nasty memory corruption with some long since gone routine stomping over CRTL memory structures?

Good luck!
Hein
Tivis Mobberley
New Member

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

Thank you to everyone who responded. All responses have been helpful in planning on how to proceed from here.

Dennis,

Again, it fails 1 to 3 times in 1x10**5 calls. The prototype is ├в char *getenv (const char *name);├в and the call is ├в acp = (char *) getenv("FORM_FLD_ACP");├в where acp is declared as ├в char *acp;├в and ├в FORM_FLD_ACP├в is obviously a character string. Even if my assumptions are incorrect this is not a malformed call. The results to getenv() calls before and after line 22124 are the successful return of either NULL or a pointer to a NULL terminated string.


Volker,

The exact version of DECC$SHR is:

Image Identification Information

image name: "DECC$SHR_EV56"
image file identification: "V8.3-01"
image file build identification: "XBCA-0080070047"
link date/time: 6-JAN-2011 12:22:41.66
linker identification: "A13-03"

We copy dumps to a test system along with the .EXE and .MAP files. When the image and map were copied the version numbers changed thus the AIMGMISMATCH message. Once the image and map were renamed to the correct version the message went away.

When I said ├в deep in the HP C RTL├в I was again being brief. As can be clearly seen from the SHOW CALL the ACCVIO happens after the C RTL calls another C RTL routine that calls a LIB$ routine that calls the C RTL routine where the ACCVIO actually happens. While I have programmed in MACRO-32 on VAXen (and MACRO-16 on PDP-11) I am learning the Alpha RISC instruction set as I go so it is a little slow. Your suggestions have been most helpful and I am making progress.


Hein,

I am dealing with a threaded web application that was originally designed and written over 10 years ago. Redesigning it at this point is not in the budget so I have to work with what I have (don├в t we always!). Your point concerning memory corruption is a very real possibility. Either the logical name ├в FORM_FLD_ACP├в will be undefined and a NULL should be returned, or it will equate to a phone number ├в 202-555-1212├в (the input is validated to insure the field contains only digit characters (├в 0├в ├в ├в 9├в ) or optional hyphen character(s) (├в -├в ). It is possible for the ├в user├в to enter the same digit 10 times (77777777777) or enter less than 10 digits (4444) but none of these should result in an ACCVIO!

Again, thank you to everyone.
Dennis Handly
Acclaimed Contributor

Re: ANALYZE/PROCESS_DUMP/IMAGE_PATH=...

>it fails 1 to 3 times in 1x10**5 calls.

Then it isn't apt to be bad input.

>the call is acp=(char*)getenv("FORM_FLD_ACP");

I would suggest you remove the redundant cast because it could hide problems with a missing prototype.

>this is not a malformed call. The results to getenv() calls before and after line 22124 are the successful return of either NULL or a pointer to a NULL terminated string.

It sure looks proper. (And you can't tell after because it aborts. :-)
As Hein mentioned, it could be some corruption of the runtime environment that causes the abort. Though any return of NULL before the abort means it walks through the whole list at least once.

>I am dealing with a threaded web application

Is one thread trying to change the environment when you are calling getenv? Is getenv documented as thread safe?

>Your point concerning memory corruption is a very real possibility.

What are other threads doing at this time?

>but none of these should result in an ACCVIO!

At least not in getenv.