Operating System - OpenVMS
1752754 Members
4956 Online
108789 Solutions
New Discussion юеВ

Re: Access violation where VA = PC

 
Sebastian Bazley
Regular Advisor

Access violation where VA = PC

I've been given details of an access violation.

The reason code is 00 (data read).

What is a bit odd is that the virtual address is the same as the PC.

I've tried to reproduce this condition (by hacking function pointers and stack stomping) but so far failed - the PC is always different from the virtual address.

Any idea how this situation might come about?

i.e. the sort of bugs to look for.

Details below.

%SYSTEM-F-ACCVIO, access violation,
reason mask=00
virtual address=005D99A8005D98C0,
PC=005D99A8005D98C0
PS=0000001B

Improperly handled condition, image exit forced.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000010000
005D99A8005D98C0
005D99A8005D98C0
000000000000001B

I've asked for the MAP file which may give some more clues.
19 REPLIES 19
Hein van den Heuvel
Honored Contributor

Re: Access violation where VA = PC

Looks like a 32-bit / 64-bit confusion
Like a RETURN address on a stack being corrupted.
Or a badly passed AST address?

Is this new code? Language? recent modificarrtions on working code? 'Nothing changed' ?

>> What is a bit odd is that the virtual address is the same as the PC.

Well, that PV would be a bad virtual address in all likelyhood no?

The real target address is probably 0x5D98C0

Check that address (debugger? MAP?) what is expected there for clues.

hth,
Hein.
Sebastian Bazley
Regular Advisor

Re: Access violation where VA = PC

It's C-code - unfortunately the crash does not always happen...

We are asking the customer to enable crash dumps in case it happens again.
labadie_1
Honored Contributor

Re: Access violation where VA = PC

You have a very good document on that subject at

http://www.eight-cubed.com/articles/traceback.html
Hein van den Heuvel
Honored Contributor

Re: Access violation where VA = PC



Consider this:

$ mcr sys$login:tmp -11 -12
%SYSTEM-F-ACCVIO, access violation,
reason mask=00,
virtual address=000201B8000201B8, PC=000201B8000201B8, PS=0000001B

Code?


#include

b (int i, int j, int *x)
{
x[i] = x[j];
}

a (int i, int j, int *x)
{
b(i, j, x);
}


main(int argc,char *argv[])
{
int i,j,x[100];
i = atoi(argv[1]);
j = atoi(argv[2]);
a(i,j,x);
}


Cheers,
Hein.
Hoff
Honored Contributor

Re: Access violation where VA = PC

Something probably romped on the stack or on the heap or on a queue header, by the look of it. (Yep, that really narrows it down. There's a corruption. :-) A hammerlock grasp of the obvious, eh?) This footprint can could be a stack corruption or stack-stomp causing a return heading off into hyperspace; the error here is usually secondary to the actual failure.

Do post the full stack trace.

If that's the full stack trace, then you'll probably end up adding at least debugging information and potentially instrumenting the code. Unfortunately, adding debug or instrumentation can perturb the bug, and might well cause it to enter remission.

Stephen Hoffman
HoffmanLabs


John Gillings
Honored Contributor

Re: Access violation where VA = PC

Sebastian,

I'd be enabling PROCESS dumps - SET PROCESS/DUMP

You don't mention a version - make sure you're at V7.3-2 at least. Use DEBUG or SDA to analyse the resulting dump.

VA=PC is a fairly specific footprint. Since there are few modern language constructs which allow jumping to an arbitrary address, the most likely cause is loading a PC from the stack. So, you're looking for something that overwrites the return address in a call frame. The "improperly handled condition" suggests that other pieces of the call frame have also been corrupted.

The biggest problem with tracking these type of errors down is the "distance" between cause and effect. The corruption could be caused by the first executable line in the routine, but not actually seen until you attempt to exit. In DEBUG you can step a few lines, then check the integrity of the stack with SHOW CALL. From within a program, you can do something similar with LIB$GET_INVO* routines (potentially non-destructive). Another option would be to declare a condition handler at top level which can SS$_CONTINUE some specific condition. You can then check the integrity of the stack by LIB$SIGNALling that condition. If it's good, you continue, if not you'll get an improperly handled condition.

If your magic number "005D99A8005D98C0" is always the same, have a look at the 32 bit addresses in a live process from SDA. I'd guess they're pointers within a data structure which has a size mismatch between a the caller (too small) and called routine.
A crucible of informative mistakes
Robert Gezelter
Honored Contributor

Re: Access violation where VA = PC

Sebastian,

First, a question. Does enabling DEBUG and using the debugger preserve the problem?

If so, then you can use the more advanced facilities in the DEBUGGER to isolate the region in the program where the stack is being corrupted. Most importantly, is there a test case that consistently produces the failure, and does it still fail when the DEBUGGER is enabled.

In nearly 30 years of debugging programs under OpenVMS, for myself and clients, I have had a few cases where stack corruption bugs manifested themselves at great distances from the actual corruption. Tracking them down can be difficult. Some of them have even appeared/disappeared depending on the presence of the DEBUGGER, which can be particularly aggravating.

-Bob Gezelter, http://www.rlgsc.com

John Travell
Valued Contributor

Re: Access violation where VA = PC

In my experience, VA = PC means that code tried to use the bogus VA as the destination of a JSR. If you have the register contents, have a look at R26.
This *may* contain the address the supposed JSR should have returned to.
If you have what looks like a valid address there, check to see if it is code, and if so, is that address preceded by a JSR R26(R26) ?
If all this works out, you should be able to work out where your bogus VA came from.

Sebastian, should you wish to do so, you are welcome to call me, I think I gave you my card at one of the London seminars.
JT:
Sebastian Bazley
Regular Advisor

Re: Access violation where VA = PC

Thanks for all the replies.

I've suggested that the customer enable process dumps for the application. It is a batch process so that's easy to do. Using DEBUG on the live process is not an option.

Hein: I tried your crash program on VMS 7.3-1 (Compaq C V6.4-008) and it does not crash using -11 -12 as parameters - nor any others I tried. (I do get an access violation if the parameters are omitted). Also fails to crash using Compaq C V6.5-001 on OpenVMS Alpha V7.3-2.

Not yet had a chance to look at the other suggestions.