Operating System - OpenVMS
1753445 Members
5217 Online
108794 Solutions
New Discussion юеВ

ASTs corrupting stack frames in DECC 6.5 /optimize

 
SOLVED
Go to solution
David Jones_21
Trusted Contributor

ASTs corrupting stack frames in DECC 6.5 /optimize

I made a tweak to some code I've been running on my alpha for 6 years and it doesn't work correctly with the new compiler unless you compile it /noopt or /opt=level=1. What seems to be happening is that an AST delivered during the initial execution of afi_exchange_pending() is depositing 0x040001 in the return address of that routine's stack frame. Using /reentrancy=AST makes no difference. Is this a bug in the compiler?

The attachment is a test program I assembled to better isolate the problem. When compiled/optim=level=1, the AST frame is 48 bytes higher on the stack than with level=2 optimization. Note that the program uses SYS$SNDJBC() to muck with the accounting file, so I would caution against actually trying to run the program.
I'm looking for marbles all day long.
4 REPLIES 4
John Gillings
Honored Contributor

Re: ASTs corrupting stack frames in DECC 6.5 /optimize

David,

Unfortunately "it's been working for years" doesn't really mean much. Although there's an outside chance of a compiler bug, this is almost certainly an application error. They're far too easy to make in C, and can lurk benignly for decades.

The value 040001 looks suspiciously like an IOSB for the successful completion of a 4 byte $QIO to me. Running under DEBUG, I find that the corruption goes away with tracing on, which suggests a timing issue.

So, take a very careful look at the iosb's of ALL asynch system services. You're looking for one with context "above" your problem routine.
A crucible of informative mistakes
John Gillings
Honored Contributor
Solution

Re: ASTs corrupting stack frames in DECC 6.5 /optimize

David,
Aha! I think I've found it...

The SYS$SNDJBC in "afi_initialize" is async, the IOSB is allocated on the stack within the routine, and it's the last call in the routine. So, if we return and call another routine, which establishes a stack frame before the $SNDJBC completes, the IOSB could overwrite part of that frame.

Change the code to SYS$SNDJBCW or move to IOSB to static storage.

This is a VERY general principle. You must make sure that IOSBs are allocated in storage with a life time that extends to at least the completion of the service.
A crucible of informative mistakes
David Jones_21
Trusted Contributor

Re: ASTs corrupting stack frames in DECC 6.5 /optimize

Thanks, John, for spotting that. It should have been coded SYS$SNDJBCW(). I should have noted that the OS version also jumped from 7.2-2 to 8.2. Didn't $SNDJBC start life not 'really' asynchronous, so R0 always contained the value in the first word of the iosb?

I'm aware that compiler bugs are extremely rare, but they are not unknown.
I'm looking for marbles all day long.
John Gillings
Honored Contributor

Re: ASTs corrupting stack frames in DECC 6.5 /optimize

David,

> Didn't $SNDJBC start life not 'really' asynchronous

No, indeed it's always been one of the "least synchronous" of the asynch services. Some requests require a message to the local JOB_CONTROL process, which in turn has to talk to the QUEUE_MANAGER, possibly on another node, potentially numerous I/Os, then the return path for the result.

This kind of bug can lie dormant for a long time, because the trigger requires both the timing to be right (err... make that "wrong" ;-), and the corrupted stack location has to matter enough to cause trouble.

>R0 always contained the value in the first word of the iosb?

For all the asynch services, R0 really only tells you that the request was syntactically correct. The IOSB tells you the result. In a debugged program R0 should always be "success", but the IOSB can vary as a result of external influences.

>I'm aware that compiler bugs are extremely rare, but they are not unknown.

I'd say "rare" rather than "extremely rare", (but then I get a very selected sample). Long experience has taught me to always start by looking at the application, rather than assuming a compiler bug.

On the other hand, in just the last month or so, an ITRC report has uncovered a day 1 bug in the Basic RTL.
A crucible of informative mistakes