Operating System - OpenVMS
1753435 Members
4518 Online
108794 Solutions
New Discussion юеВ

Re: IA64 dec$ForRTL exp(-z) 2000x too slow for large z?

 
Jon Pinkley
Honored Contributor

Re: IA64 dec$ForRTL exp(-z) 2000x too slow for large z?

More precisely:

There exist a value L for which exp(L) does not underflow, but for which all x
There exist a value U for which exp(U) does not overflow, but for which all x>U, exp(x) will overflow.

Therefore if L<=x<=U then exp(x) will not cause an underflow or overflow exception.
it depends
John Reagan
Respected Contributor

Re: IA64 dec$ForRTL exp(-z) 2000x too slow for large z?

Yes the code seems to have some input checking. However, the algorithm (which I don't completely understand) seems to think there is some numbers which might or might not underflow since you can get smaller results if denorms are enabled. We're still looking at the underlying cause.
John Reagan
Respected Contributor

Re: IA64 dec$ForRTL exp(-z) 2000x too slow for large z?

For those of you keeping score at home, more info...

The Math RTL routines need to check with their callers to see if they should raise an error or do something else.

On Alpha, that information is inside the procedure descriptor (PDSC$V_EXCEPTION_MODE) with the default being PDSC$K_EXC_MODE_SIGNAL : Raise exceptions for all error conditions except for underflows producing a 0 result.

So even when the RTL checks the input arguments, it has to walk back up the stack to find the calling routine to see if that routine wanted underflow checking or not (F90 has a /CHECK=UNDERFLOW).

On I64, which doesn't have a single data structure like a procedure descriptor, has that information buried down in the unwind descriptors.

Part of the slowdown is the stack walk via the LIB$I64 calling standard routines to see if the caller wants the underflow raised as an error or just mapped to 0.
Dennis Handly
Acclaimed Contributor

Re: IA64 dec$ForRTL exp(-z) 2000x too slow for large z?

Ok, my previous numbers were for doubles, which don't underflow. When I call expf, it is about 40 times slower, unless I use flush denorms.

$ time a.out 80
exp(-80.000000) = 1.80485e-35
real 0m02.64s user 0m02.63s sys 0m00.00s

$ time a.out 120
exp(-120.000000) = 0
real 1m44.51s user 0m28.71s sys 1m15.54s

The timing info is real interesting in that it clearly is blaming the kernel and the sloppy hardware that hates denorms.