- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: ILLEGAL_SHADOW error in C, casting NaN to unsi...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-15-2007 10:36 AM
тАО12-15-2007 10:36 AM
According to the traceback, the line where the error occurs is this one:
return (unsigned int) d;
but if I remove the preceding if block (which does not modify d in any way), the crash doesn't happen, so the traceback info is suspect and running under debug prevents the error from occurring, so I'm a bit stumped.
Whether it makes sense to cast a double holding an IEEE NaN into an unsigned int is an interesting question, but this code is found, not made (i.e., I didn't write it), and I'm stuck with it whether what it's doing makes sense or not. I'm open to suggestion about whether the compiler is doing something wrong or the code is doing something wrong or some combination thereof.
$ cc/vers
HP C V7.3-009 on OpenVMS Alpha V8.3
$ cc/float=ieee/ieee=denorm/list/show=expansion/machine nan
$ link/trace nan
$ run nan
%SYSTEM-F-ILLEGAL_SHADOW, illegal formed trap shadow, Imask=00000000, Fmask=00008000, summary=03, PC=0000000000020098, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
NAN NAN cast_uv 1824 0000000000000098 0000000000020098
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=0000000100000000, PC=0000000100000000, PS=0000001B
Improperly handled condition, image exit forced.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000010000
0000000100000000
0000000100000000
000000000000001B
Register dump:
R0 = 0000000000000001 R1 = 0000000000000001 R2 = 000000007BF7C590
R3 = 000000007ADDF2F0 R4 = 000000007ADDF2E0 R5 = 000000007ADDF2C8
R6 = 000000007ADDF360 R7 = FFFFFFFF81D4CD20 R8 = 000000007FF9CDE8
R9 = 000000007FF9DDF0 R10 = 000000007FFA4F28 R11 = 000000007FFCDC18
R12 = 000000007FFCDA98 R13 = FFFFFFFF81D4D1F0 R14 = 0000000000000000
R15 = 000000007AEE2670 R16 = 0000000000000EE0 R17 = FFFFFFFF77773700
R18 = 0000000100044D18 R19 = 000000007ADDF030 R20 = 0000000000000729
R21 = 000000007B67C848 R22 = 0000000100044CD8 R23 = 000000007ADDF020
R24 = 0000000000000000 R25 = 0000000000000001 R26 = 0000000100000002
R27 = 000000007B63F590 R28 = 000000007BF90438 R29 = 000000007ADDEFF0
SP = 000000007ADDEFF0 PC = 0000000100000000 PS = 300000000000001B
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-15-2007 11:22 AM
тАО12-15-2007 11:22 AM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
I'm wondering if the attempt to cast the NaN is what's nailing the sequence. (The error that's signaled certainly points this way; trying to use a NaN...)
That, and the other part that's a little odd here is at the end of the main function; falling off the end can tend to spew whatever value was in R0 last as the final status.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-15-2007 02:44 PM
тАО12-15-2007 02:44 PM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
$ gdiff -pu0 nan.c;-2 nan.c
--- nan.c;-2 Sat Dec 15 10:40:33 2007
+++ nan.c Sat Dec 15 15:15:47 2007
@@ -9,0 +10,2 @@ cast_uv(double d)
+ unsigned int u;
+
@@ -11 +13 @@ cast_uv(double d)
- return d < IV_MIN
+ u = d < IV_MIN
@@ -17 +19 @@ cast_uv(double d)
- return (unsigned int) d;
+ u = (unsigned int) d;
@@ -18,0 +21 @@ cast_uv(double d)
+ return u;
[end of diff]
The shadow error is not triggered for what should be code with identical behavior. We still cast a NaN to an unsigned int, the only difference being the result of the cast is now stored in a local variable and that variable is returned rather than the result of an expression being returned directly. Whether it can be said to "work" is an open question, since what it means to give the following value as the unsigned int representation of a NaN is difficult to say:
$ run nan
2079679152
As far as the main() function not having an explicit exit(), that shouldn't be necessary, and looking at the machine listing you can see that it moves 1 into R0. The case that blows up does so before it gets anywhere near that far anyway.
BTW, I forgot to mention before that the ACCVIO occurs during the traceback. If I link /NOTRACE, the ACCVIO does not occur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-16-2007 01:24 PM
тАО12-16-2007 01:24 PM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
Debugging this type of error is the art of not looking where you think you should be looking.
The "trap shadow" has to do with pipelining instructions. Floating point instructions, in particular can take many cycles to complete, so between issuing the instruction, and finding some error, other instructions may have been issued, or completed. The "TRAP_SHADOW" is the range of instructions between the failing instruction and the current one. To try to assist debugging there are structures which (hopefully) point to the real culprit.
My guess is the cast is generating instructions that are dealing with the same object as both floating point and as integer. The processor thinks they can be executed in parallel, but they can't. Somehow that's messing up the trap shadow structures.
Look at the instruction stream around the reported error, or at least where you think it's happening. Work backwards, looking for floating point operations that might fail.
This won't happen on Itanium because it's the compiler doing any pipelining, not the processor. There is no trap shadow, so it can't be illegally formed (that's the fundamental architectural difference between Alpha - RISC and Itanium - EPIC)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-16-2007 07:33 PM
тАО12-16-2007 07:33 PM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
I do have one other observation. In the if block that looks like this:
if (d < UV_MAX_P1) {
return (unsigned int) d;
}
we should never hit the cast-and-return line when d is a NaN because I think any comparison with a NaN is supposed to be false, and stepping through the non-optimized version with the debugger confirms that we don't get there except when compiled with default optimizations turned on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-17-2007 01:36 AM
тАО12-17-2007 01:36 AM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
this is a nice problem ;-)
I will try to share, what I know about ILLEGAL_SHADOW traps, as I've worked an IPMT involving such thing once and also found a problem causing an ILLEGAL_SHADOW trap in an earlier version of the PersonalAlpha emulator.
If your code incurs an ILLEGAL_SHADOW exception, you need to work backwards in the Alpha instruction stream from the TRAPB (Trap Barrier) instruction (pointed to by the exception PC) to the first instruction found using the /S qualifier (requesting software completion).
CMPTLT/SU F16, F14, F15
FCMOVNE F18, F16, F18
TRAPB <- exception PC points here
The Imask and Fmask provide a bit for each register, which was a target of any instruction issued inside the 'trap shadow'.
The exception summary bits: summary=03 indicate:
bit 0 = SWC (Software Completion)
bit 1 = INV (Invalid Operation)
In this case, Fmask=00008000 points to F15 being a target register and therefore identifies the CMPTLT/SU F16, F14, F15 instruction as the one causing the trap.
summary=03 indicates an INV, this bit is set, when one of the operands has an illegal value.
The CMPTLT (IEEE Floating Compare) instruction will trap, if one of the input operands (F16 or F14) is a NaN. In this case it's F16.
The software completion is to be handled by the Operating System, in this case [SYS]IEEE_INST. If this handler believes, there is an inconsistency (there are lots of rules for a trap shadow to be valid) in the instruction stream preceeding the TRAPB instruction, which declared the exception, or it incurs any other error while checking this, it will signal the ILLEGAL_SHADOW trap. So this is all done by software !
I also get 'interesting' results, if I run the SAME NAN.EXE on different versions of OpenVMS and real (or emulated) Alphas. As the ILLEGAL_SHADOW is being reported from the exec, you also need to include the version of EXCEPTON.EXE as an additional 'parameter' to this problem.
Running NAN.EXE on an AlphaServer 1000A with OpenVMS V8.2, I get:
AXPVMS $ run nan
%SYSTEM-F-ILLEGAL_SHADOW, illegal formed trap shadow, Imask=00000000, Fmask=0000
8000, summary=03, PC=0000000000020098, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
NAN NAN cast_uv 1824 0000000000000098 0000000000020098
NAN NAN main 1833 00000000000001DC 00000000000201DC
NAN NAN __main 1829 0000000000000174 0000000000020174
0 FFFFFFFF8031DF94 FFFFFFFF8031DF94
Note: no ACCVIO during traceback handling !
On a PersonalAlpha (V1.2.2) OpenVMS V8.3 with VMS83A_UPDATE-V0400, I get:
CHAALP $ run nan
2079916720
This seems to be an interesting corner case and noone except maybe HP OpenVMS engineering has a good chance of solving this mystery. You may need to also have a good reading in the Alpha Architecture Reference Manual to at least get an idea of what may be happening.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-17-2007 09:18 AM
тАО12-17-2007 09:18 AM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
there is a path through your cast_uv routine, which does NOT return a status value:
...
if (d < UV_MAX_P1) {
return (unsigned int) d;
}
return (unsigned int) d; /* also return a value here */
}
...
When adding this 'fix', I get reliable results and no ILLEGAL_SHADOW or ACCVIO anymore.
When analyzing the machine code flow through cast_uv, I found a path, which does not load R0 and so returned a bogus value for R0. This explains, why I got different printed values from the printf when run on different machines.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-17-2007 01:48 PM
тАО12-17-2007 01:48 PM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
Thanks for your replies and for taking the time to do your own testing. You are quite right about the cast_uv function in my example not returning a valid value when either of the two if blocks in it evaluates to false. I now think this is what Hoff meant by stepping off the end of the main function (I had thought he meant the function main(), but now think he just meant the primary function in the example, which is cast_uv).
The reason there is no else clause or fallback return statement in the example is that I deleted it in trying to reduce the example to the smallest possible reproducer; code you've deleted can't be causing the problem. It's a red herring in this case, though I did allow it to confuse me.
It is true that if I put your fallback return statement:
return (unsigned int) d;
as the last statement in the function, the illegal shadow problem goes away. However, if, instead of your fallback statement I restore the original one I deleted (for which you'd need to include llmits.h):
return d > 0 ? UINT_MAX : 0;
the exact same problem is still there as in my original example.
There appear to be any number of ways to rewrite the function such that it dodges the illegal shadow problem, but then how to know that the next innocent edit won't trigger it again? I'm more convinced than ever that the function as written is legal (if a bit strange) yet triggers pathological behavior when optimized.
For the curious the original comes from the Perl sources and can be seen by hunting for "Perl_cast_uv" here:
http://public.activestate.com/cgi-bin/perlbrowse/f/numeric.c
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-17-2007 04:24 PM
тАО12-17-2007 04:24 PM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
>The reason there is no else clause ...
>
>code you've deleted can't be causing the
>problem. It's a red herring in this case,
>though I did allow it to confuse me.
Don't be so sure! In the world of heavily optimised and pipelined processors there's a concept of "speculative execution". That is, the pipeline may be busily processing BOTH sides of a conditional before (or while) the test is evaluated. When the test result is known the result for the other branch is discarded. This can avoid a true branch operation, (which tend to slow down the pipe). There are obvious things like function side effects that cannot be done like this - the optimiser will know.
One of the potential consequences is handling exceptions for non-taken branches! As well as creating some interesting cases when debugging.
I'm not sure how much this is used by which Alpha processor versions, but it might explain the differences between different systems Volker observed. Note that you may see even more of this type of thing on Itanium.
As processors grow more threads, cores, pipelines and execution units, your code no longer can be seen as a strict linear sequence of operations. Compilers and processors get more dependent on the "complete" correctness of the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-17-2007 11:40 PM
тАО12-17-2007 11:40 PM
Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int
WG
OpenVMS Developer & System Manager