Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

silly $GETQUI bug and work-around

SOLVED
Go to solution
Jess Goodman
Esteemed Contributor

silly $GETQUI bug and work-around

It's Friday quiting time, so on the lighter side I'm going to report a bug in the $GETQUI sevice that has been there forever, but won't cause many worries.

The bug shows up if you SUBMIT a batch job with a certain range of CPU limit that's not infinity but is ridiculously large:

$ SUBMIT JUNK /QUEUE=JUNKQ -
/CPUTIME=289-10:34:56.78

$ SHOW ENTRY /FULL '$ENTRY
Entry Jobname Username Blocks Status
----- ------- -------- ------ ------
2005533 JUNK SYSTEM Pending (queue stopped)
On stopped batch queue JUNKQ
Submitted 3-NOV-2006 22:19:12 /CPU=12-J-N-1859 15:5 /PRIORITY=100
File: _$1$DGA111:[DSKA.COM.SYSTEM]JUNK.COM;3

$ CPU_LIMIT = F$GETQUI -("DISPLAY_ENTRY","CPU_LIMIT",$ENTRY)
$ SHOW SYMBOL CPU_LIMIT
12-JUN-1859 15:52:56.17

Now to make this more interesting I will award 8 points to the first responder who can tell me why the following date-time can be used as a work-around for this bug.

$ MAGIC_TIME := 28-MAR-1860 02:27:52.95
$ SAY F$DELTA(CPU_LIMIT,MAGIC_TIME)
289 10:34:56.78

Have fun.
I have one, but it's personal.
15 REPLIES
Robert Brooks_1
Honored Contributor

Re: silly $GETQUI bug and work-around

It's Friday quiting time, so on the lighter side I'm going to report a bug in the $GETQUI sevice that has been there forever, but won't cause many worries.

----
Have you formally reported this to us through your support centre?

Posting it here is not likely to lead to it getting fixed.


-- Rob (VMS Engineering)
Volker Halle
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,

lot's of magic numbers ;-)

I can answer the first part of your question:

CPULIM is supposed to be stored (and handled) as an unsigned longword equivalent to the number of 10 ms CPU soft ticks.

SUBMIT/CPU=... and SET ENT/CPU=... require a DELTA TIME value to be entered. The time value is stored internally in a quadword as the no. of 100 ns intervals since 17-NOV-1858. A delta time value is stored as a negative quadword value.

The delta time value entered needs to be converted to 10 ms CPU soft ticks to be stored in the CPULIM longword in the various data structures.

The conversion routine takes into account the unsigned definition of CPULIM.

Everything goes well, if the CPU (delta) time specified is below 248-13:13:56.47 or if it is above 497-02:27:52.95 (returns %QUEMAN-F-INVQUAVAL, value '497-02:27:52.96' invalid for /CPUTIME qualifier).

Any values inbetween cause CPULIM to become 'negative' and this seems to confuse the display routines, so it's a display problem.

During display, the CPULIM value is supposed to be converted to a delta time again. This conversion seems to fail to take into account, that CPULIM is supposed to be an UNSIGNED longword. So it ends up multiplying the 'negative' CPULIM by '-100000' (=no. of 100ns intervals in 10 ms) to convert the 10ms ticks to the 100ns intervals as a delta time, but ends up getting a positive results and therefore now displays the /CPU time as an absolute date & time.

SHOW QUE/FULL or SHO ENT/FULL then try to be 'clever' and reformat the expected delta time string, making it look even worse. F$GETQUI just takes the value as it is and converts it to an ASCII time string.

So the bug is not in $GETQUI, but in the various display routines, which try to display the CPULIM time as a delta time.

Now whether specifying a CPU time limit in access of 248 days does make sense, is a question, which you would NOT expect to be asked in other operating systems, but for OpenVMS, it may be legitimate ;-)

Volker.
Volker Halle
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,

after a good night's sleep over this problem, I now tend to agree, that this is probably a problem of some 'consumers' of the unsigned CPULIM longword, when the code tries to convert the 10 ms units back into the quadword OpenVMS time representation.

AUTHORIZE seems to have got the coding right. You can MOD user/CPU=497-0 and it's being displayed correctly.

So it may just be a matter of finding all CPULIM conversions and examine/correct the code. But I hear the 'bean counters' asking: how many additional OpenVMS licenses are we going to sell after fixing this bug ?

I remember a similar problem when the GS160 came around. After you logged in and issued a SHOW PROC/ACC, you would see:

...
Elapsed CPU time: 17-NOV-1858 00:00:00.00
...

The CPU was too fast and finished the login processing using less than 10 ms of CPU time. Trying to convert 0 CPU seconds into a delta time produced zero, which is an absolute and not a delta time. This has been fixed in V7.3-1 after I had raised a PTR (internal problem report).

Volker.
Volker Halle
Honored Contributor
Solution

Re: silly $GETQUI bug and work-around

Jess,

MAGIC_NUMBER is the equivalent of 0xFFFFFFFF 10ms CPU intervals when converted to the quadword time representation with a wrong sign.

When converting the maximum possible unsigned CPULIM value to the VMS quadword time format, you should get 497-02:27:52.95, but if you get the sign wrong, this represents 28-MAR-1860 02:27:52.95

Volker.
Ian Miller.
Honored Contributor

Re: silly $GETQUI bug and work-around

I would expect this is the sort of basically cosmetic bug that can be recorded in PTR and fixed the next time someone is doing something in that code. However, as Rob Brooks said,you have to report it to HP to get it recorded.
____________________
Purely Personal Opinion
Jess Goodman
Esteemed Contributor

Re: silly $GETQUI bug and work-around

>When converting the maximum possible >unsigned CPULIM value to the VMS quadword >time format, you should get 497->02:27:52.95, but if you get the sign wrong, >this represents 28-MAR-1860 02:27:52.95

We have a winner! I calculated the magic date this way:

$ WRITE SYS$OUTPUT -
F$CVTIME("17-NOV-1858+497-02:27:52.95",-
"ABSOLUTE")
28-MAR-1860 02:27:52.95

where 17-NOV-1858 is the VMS zero date and the delta time is the equivalent of 0xFFFFFFFF in 10msecs units.

Volker is also correct in that the bug is not in the SYS$GETQUI service, which is just returning the cpu tick limit in an unsigned longword. It is a display bug in both the F$GETQUI lexical function and the SHOW QUEUE/SHOW ENTRY commands.

I don't have software support anymore so I don't know how to officially report this. Anyway I don't really care if it gets fixed, as the bug is only apparent with ridiculous values that I can't imagine would ever be used in the real world. But if someone has the code open anyway, why not?

But then again I have written a robust command file that takes an entry number, job name, or batch queue (last two can be wildcard strings) and uses F$GETQUI to output matching SUBMIT command(s). I couldn't resist checking for these bad CPU limits and putting in the above work around for it. So this bug has cost me some time and code. :)
I have one, but it's personal.
Volker Halle
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,

I will try to remember this thread and at least report it as a PTR during V8.4 fieldtest time, if it's not fixed by then...

Volker.
Ian Miller.
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,
if you as its a display bug in DCL and utilities then if you email the DCL maintainer
dcl at hp dot com

then they could record this bug.
____________________
Purely Personal Opinion
Jess Goodman
Esteemed Contributor

Re: silly $GETQUI bug and work-around

Ok, thanks. I think I will send a email to that address. I will also mention another display bug in SHOW ENTRY/QUEUE that I just discovered. I can not figure out for the life of me why this occurs but...

$ SUBMIT JUNK /RETAIN=UNTIL="20-NOV-2006 22:02:14.40"
Job JUNK (queue SYS$BATCH, entry 3004552) pending
pending status caused by queue stopped state
$ SHOW ENTRY/FULL '$ENTRY
Entry Jobname Username Blocks Status
----- ------- -------- ------ ------
3004552 JUNK SYSTEM Pending (queue stopped)
On stopped batch queue SYS$BATCH
Submitted 7-NOV-2006 18:12:33.75 /PRIORITY=100
File: _$1$DGA111:[DSKA.COM.SYSTEM]JUNK.COM;3

Notice that the job retention time is not displayed. F$GETQUI is not affected by this bug:

$ SAY F$GETQUI("DISPLAY_ENTRY","JOB_RETENTION_TIME",$ENTRY)
20-NOV-2006 22:02:14.40

Change the retention time by a little as .01 second either way and SHOW ENTRY works. But add to it a delta time of 15-12:49:37.28 or any multiple of that and it reoccurs. It also occurs if you use this delta time or any multiple of it as your /RETAIN=UNTIL= value.

The only thing special about that delta time value is that if you convert the binary value to .01 sec. units by dividing it by -100000 you get 0x08000000 (2**27). But why 2**27 is special or why .01 sec units applies here is beyond me.
I have one, but it's personal.
Volker Halle
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,

what kind of contest is this ? Are we going to do quality testing for HP OpenVMS engineering here ? Believe, me, I'm up to that ;-)

When converting 20-NOV-2006 22:02:14.40 to a quadword time value, you might note, that the low order longword becomes all zero.

The code in [CLIUTL]QUEMANSHO tests whether a RETENTION_TIME is specified, but only tests the low order longword of the quadword time value.

You've found the symptom of this bug, I found the actual bug in the source code. So 90% of the work is done. Now it would just take someone with a support contract or access to PTR to report this problem officially and wait for OpenVMS engineering to fix it.

Volker.
John Abbott_2
Esteemed Contributor

Re: silly $GETQUI bug and work-around

> Now it would just take someone with a support contract or access to PTR to report this problem officially and wait for OpenVMS engineering to fix it.

If you *REALLY* think it's worth it (maybe because most of the work is already done by Volker) then I'm sure someone will log a support call (maybe me) or engineering might take note (ye olde Mr DCL did!)

Just post exactly what needs to be said...

J.
Don't do what Donny Dont does
Jess Goodman
Esteemed Contributor

Re: silly $GETQUI bug and work-around

I did send an email mentioning these two bugs and got a polite reply from David Sweeney. I hope fixing them doesn't take time away from anything important.

Volker, no this second bug is not that simple. It does not occur for all binary time values with a zero low-order long word:

DBG> exam/hex r0
%R0: 00A5F20000000000
DBG> exam/date r0
%R0: 22-NOV-2006 20:48:17.11

$ SUBMIT JUNK /RETAIN=UNTIL="22-NOV-2006 20:48:17.11"
Job JUNK (queue SYS$BATCH, entry 5009500) pending

$ SHOW ENTRY/FULL '$ENTRY
Entry Jobname Username Blocks Status
----- ------- -------- ------ ------
5009500 JUNK SYSTEM Pending (queue stopped)
On stopped batch queue SYS$BATCH
Submitted 8-NOV-2006 19:29:14.17 /PRIORITY=100
/RETAIN=UNTIL="22-NOV-2006 20:48"
File: _$1$DGA111:[DSKA.COM.SYSTEM]JUNK.COM;3

If it was based on a zero low-order long word the bug would occur for any value that is a multple of delta time interval of 0-07:09.49 (binary time value of negative 2**32). As stated above the delta interval for this problem is actually 15-12:49:37.28, whose binary value is negative (2**32)x3125.
I have one, but it's personal.
Volker Halle
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,

I give up for now. To troubleshoot this effectively, one would need to run the code with the debugger. Let's leave this to OpenVMS engineering...

Volker.
Volker Halle
Honored Contributor

Re: silly $GETQUI bug and work-around

Jess,

both problems have now been logged via PTR (official OpenVMS problem tracking tool) at HP:

75-13-1804 for the /retain=until problem
75-13-1805 for the cpu_limit problems

You may consider to close this thread.

Volker.
Jess Goodman
Esteemed Contributor

Re: silly $GETQUI bug and work-around

thread closed
I have one, but it's personal.