Operating System - OpenVMS
1748089 Members
4833 Online
108758 Solutions
New Discussion юеВ

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

 
SOLVED
Go to solution
Craig A Berry
Honored Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

John, I perhaps wasn't clear in expressing that I still see the error after deleting the final fallback return, which is why I omitted it from my original reproducer. But your point is well taken since if I omit either of the remaining if blocks, things don't blow up, but with both of them there together, the illegal trap shadow happens.

Willem, casting a NaN to an int is a bit weird, but it works ok by itself, which I think has as much to do with IEEE floating-point semantics as it does with C. There is a nice article on IEEE floating point here for the mathematically inclined:

http://docs.sun.com/source/806-3568/ncg_goldberg.html

The context of the code that generates the error is that Perl is a dynamic language and whether a variable is a string or a number and whether a number is an integer or floating point are things that get determined on the fly. This involves asking a lot of questions of the form, "Can this chunk of memory be treated as ...?" Even if the answer is no, the question still has to be asked, which leads to some interesting conversion attempts such as the one that triggered the error.
Dennis Handly
Acclaimed Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

The results on PA and HP-UX IPF indicate that the function cast_uv is missing a return.

In fact it seems the OVMS compiler may be broken because no compares with NaN should be true and it should fall off cast_uv as Volker points out.

>Volker: When analyzing the machine code flow through cast_uv,

No analysis is needed. A real compiler should have told you that. :-)
warning #2940-D: missing return statement at end of non-void function "cast_uv"

>when either of the two if-blocks in it evaluates to false.

These should always evaluate to false if d is a NaN.

>WG: the cause is casting a "NaN" - in other words: do something with an uninitialized variable.

For IPF, the hardware says the result is a long long 0x800000000000000LL. PA-RISC handles it in a kernel trap handler but with a completely different value.

>casting a NaN to an int is a bit weird, but it works ok by itself

On HP-UX, it gets truncated to 0 (IPF) or UINT_MAX (PA).
Craig A Berry
Honored Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

Dennis,

Thanks for the reply.

>The results on PA and HP-UX IPF indicate that the >function cast_uv is missing a return.

And as we've already discussed at some length, the return is only missing from the pared-down reproducer, and its presence or absence makes no difference as far as the trap shadow error; the function never returns at all when the error is triggered, so the return value, bogus though it may be, is not of particular interest to the problem at hand, and is not visible in an environment that exercises the bug. I've attached a revised reproducer with the return statement restored just so we can stop confusing ourselves about it.

>In fact it seems the OVMS compiler may be broken
>because no compares with NaN should be true and it >should fall off cast_uv as Volker points out.

They aren't true on OVMS either. There does appear to be a gotcha in the Alpha compiler as far as one path of parallel execution not defending itself quite enough from what another path might be doing at the same time in this rather odd corner case.

>>when either of the two if-blocks in it evaluates to false.

>These should always evaluate to false if d is a NaN.

They do -- except when they blow up and neither is evaluated.


>>casting a NaN to an int is a bit weird, but it works ok by itself

>On HP-UX, it gets truncated to 0 (IPF) or UINT_MAX (PA).

On VMS, it also gets truncated to 0, but that's irrelevant. It would have been better if I never said anything about "casting NaN" in my subject line; "parallel comparison operations with NaN" is more to the point except I did not yet know that was the problem at the time I posted.

Volker Halle
Honored Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

Craig,

I've built your new example and run it on a V8.2 AlphaServer 1000A 5/400 (EV56) and it fails with ILEGAL_SHADOW. I've copied the same image to eisner (.decuserve.org) (DS20 V7.2-1) and it runs there without a failure.

As I said before, the ILLEGAL_SHADOW is a condition detected and reported by the IEEE handler in EXCEPTION.EXE. There are about a dozen checks, which may report this condition.

If John talks about 'speculative execution', this only seems to apply to Alpha 21264 (EV6 or higher) CPUs, so apparently can be ruled out on my EV56.

Volker.
Volker Halle
Honored Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

Craig,

I've reduced the instruction stream causing the ILLEGAL_SHADOW to a simple MACRO-64 program (see attached). Swapping the instruction following the CMPTLT/SU F16,F14,F15 instruction causes various types of failures or causes the ILLEGAL_SHADOW to disappear, but on the other hand works on some Alphas without a problem.

There is something wrong here, so please - if you can - log a call with HP.

Volker.
Craig A Berry
Honored Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

Volker,

Thanks and double thanks. One thing I never mentioned is that I was seeing this on a DPW 500au, thus EV56, which confirms your experiments. I think there's plenty of info in this thread for someone with access to the compiler sources to dig in and fix the problem. For me this is hobbyist work done on my own time, and the only way to report it is to post here and at the C compiler feedback link off the OpenVMS home page, which I have now done.

Folks following this thread may be interested to know Hoff has written a nice background article on the alpha trap shadow here:

http://64.223.189.234/node/690



Happy New Year.
Craig


John Reagan
Respected Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

il CMPTLT/SU F16,F14,F15
FCMOVNE F18,F16,F18 ; inserted, causing ILLEGAL_SHADOW !
;; FCMOVNE F18,F17,F18 ; inserted, causing ILLEGAL_SHADOW !
;; FBNE F18, out ; inserted, causes HPARITH
;; LDA R0,nan ; inserted - no problem
;; CPYSE F18,F19,F20 : inserted - no problem

Of course you can get illegal shadows if you code in Macro-64 since you are responsible for following (or ignoring) the rules.

The Trap Shadow Rules are in the Alpha Architecture Manual, section 4.7.7.3.1.

The FCMOV instructions violate rule #4. You used F18 as both an input and output register inside the trap shadow.

The FBNE violates rule #2. No branches or jumps allowed in a shadow. I would have expected an illegal shadow message here as well.

The rules allow the OS to re-execute the faulting instruction and all the other instructions upto the TRAPB.

I haven't been reading all the C examples. If somebody has a short C example where the compiler violates the rules, email it to me please.
Volker Halle
Honored Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

John,

I've mailed you the C code and instructions for reproducing this problem.

I had written the MACRO-64 example strictly based on the I-stream generated by the C compiler.

This 'little example' seems to show a couple of different problems in various components of OpenVMS.

Volker.
John Reagan
Respected Contributor
Solution

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

Well, I don't see any OpenVMS issues other than the HPARITH vs ILLEGAL_SHADOW for the branch inside the shadow. The other behaviours look correct.

Now, the C compiler shouldn't be generating the

FCMOVNE F18, d, F18

inside the shadow. I'll see if I can reproduce it with the latest compiler.
John Reagan
Respected Contributor

Re: ILLEGAL_SHADOW error in C, casting NaN to unsigned int

With /NOOPT, the suspect FCMOV is below the TRAPB instruction. The peepholer is trying to move instructions into the trap shadow as an optimization. It moves the FCMOV into the shadow intentionally.

When the same register appears in more than one operand, I found a comment:

"If Rb==Rc, there is no real move occuring at all. If Ra==Rc, and the move didn't occur the first time, then Ra/Rc will be unchanged. If it does happen the first time, Ra/Rc will get the new value from Rb, but then it doesn't matter if the move happens again or not."

Only when all three register operands are disjoint will the instruction not be moved into any trap shadow.

I also found some other comments about some ECO to the Alpha Architecture which adds some more wording to the trap shadow rules. Perhaps the OS' trap shadow checking code didn't catch up. I'll check that case.