Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Dan R Farrell · ‎01-13-2009

We are using a SYS$SCHDWK call to run a process 50 times a second. It works except every six hours from boot time it misses a few cycles. This is very repeatable. It does not matter if the process is running for hours or for a few minutes. Six hours after boot time and every six hours after that (almost to the millisecond) it misses cycles. I think the system time may be getting reset but we have shut down almost every on the system except for VMS and network and it still happens. NTP is not running. Has anyone seen this issue before?

Robert Gezelter · ‎01-13-2009

Dan,

Have you verified that the computation of wake times does not have some slip in it?

- Bob Gezelter, http://www.rlgsc.com

Richard Whalen · ‎01-13-2009

50 times a second is very often!

I know that VMS on ia64 (not necessarily 8.3-1H1) stores the value of the TOY clock on disk approximately every 6 hours. I suspect that your problem is related to this.

Jon Pinkley · ‎01-13-2009

Dan,

Anything that can block scheduling has the potential to cause a wake up to be delayed.

Or as Bob suggested, depending on how you are computing the next wakeup, you may be getting rounding errors. For example, the LIB$CVTF_TO_INTERNAL_TIME may be subject to rounding errors with small intervals. If you are using integer math, then I doubt that is the cause of the problem.

There are several possibilities, either higher priority processes could be preventing the process from being scheduled or some high IPL code could be blocking process scheduling or even the hardware clock interrupt.

I am not sure how often the BBW battery backed up watch gets updated, but hopefully that couldn't cause cycles to be lost, and I wouldn't expect any disk I/O to be blocking scheduling.

If it is that repeatable, I would fire up the PRF SDA extension to collect samples starting 10 seconds or so prior to a 6-hour epoch and see what is happening. I'm reasonably sure it isn't driven of the HDWCLK interrupt so I believe it has a chance at seeing code executing at HWCLK IPL. If PRF is using EXE$GQ_SYSTIME in its time stamp calculations, it may lead to false conclusions about "when" something happened.

Do you have other timer based code running, or some performance data collector that could be running something at high IPL periodically?

Jon

it depends

John Gillings · ‎01-13-2009

Dan,
Just to clarify...

I'm assuming you're calling $SCHDWK with a "reptim" value of 20msec, as opposed to calling it every cycle with a "daytim" of 20msec?

Could you show us the actual code?

A few things I'd worry about...

1) first 20msec is only 2 quanta. I'd want my value to be as close to the real value as possible. If I cared about it a lot, I'm not sure I'd trust a $BINTIM conversion to do that for me. I'd be checking the bits in the time value.

2) The timing of the $WAKE has no influence on when the target process responds (ie: actually wakes up). How can you distinguish between the $WAKE being late, and the process "sleeping in"?

3) $HIBER/$WAKE seems like a rather blunt instrument to use if you require high precision ticks. Maybe you should consider other possibilities? For really accurate, high frequency timing, you pretty much have to dedicate a CPU and busy wait.

Things to try...

What happens if you double the frequency to 10msec?

If you haven't done so already, build an absolutely minimal test program. On waking, don't do anything other than sample the time and put the results in a ring buffer.

Are you running with multiple CPUs? Have you tried using affinity?

A crucible of informative mistakes

Hoff · ‎01-13-2009

I'm actually somewhat surprised this works as well as this does and you're only seeing a few cycles every six hours; this looks to be a polling-based design, though somewhat cloaked in the garb of a multiprocessing application. And I'd expect to see a few cycles going to other tasks here and there.

I might well look to abscond with a core here and go to full-on polling, rather than a 50 Hz (60 Hz in the US?) solution. That, or (depending on what is going on) I'd look to start dealing with the cruft in an out-board processor here, as those are cheap. There are also ways to release the processor through the scheduler interface, too.

Do call HP, as they're the arbiters of this sort of thing and (if you're doing 50 process activations a second) you probably have a support contract.

Jon Pinkley · ‎01-13-2009

I hope by "run a process" Dan didn't mean an image activation. I assumed his process was scheduling a wakeup and hibernating. For that, 50 times a second shouldn't be taxing things (on average), as long as his process (kernel thread) software priority is in realtime range. I don't think VMS claims to be REALTIME, at least in the general case, and if this node is part of a cluster, then all bets are off.

Dan, if you really need something hard scheduled 50 times a second, I would be looking at a dedicated collection box that can weather the peak demands, cluster transitions etc.

John, I wasn't aware that the VMS schedular waited until quatum end to reschedule a sufficiently higher priority process. If it does, then either things have changed, or my memory is incorrect.

Jon

it depends

John Gillings · ‎01-13-2009

re: Jon, "or my memory is incorrect."

Sorry, maybe I wasn't clear enough. My remark "20msec is only 2 quanta" wasn't referring to the SYSTEM parameter QUANTUM. I was referring to the limit of the "reptim" parameter:

(from docs) "The time interval specified cannot be less than 10 milliseconds; if it is, $SCHDWK automatically increases it to 10 milliseconds."

The issue is potentially one of granularity. When you're down at that level, even small absolute errors in calculating time intervals can be large percentage errors.

It's also unclear from the documentation if 10 msec is just a lower limit, or a granularity. Would a request for (say) 14msec be rounded up to 20msc or down to 10msec?

When you're this close to the documented limits, and you care enough about the exact behaviour to ask a question like this one, I'd be strongly recommending having a look at the sources to see exactly how $SCHDWK uses its parameters and calculates the time intervals to generate the $WAKEs.

Always remember, a computer is NOT a chronometer. You cannot rely on one for high precision or fine grained time, other than spending big bucks on purpose built, real time systems.

A crucible of informative mistakes

Dan R Farrell · ‎01-14-2009

Thanks for the responses. We did create a test program and are now running it at 10 ms in order to push things a bit and it is running at priority 55. We are using $SCHDWK with a repeat time value and are not calling it every cycle. It is now the only thing running on the Itanium box except for VMS, Decnet and TCP/IP. It is not part of a cluster. I guess my question is mainly that it does seem to work fine 99.99% of the time except for those 6 hour intervals. The synchronous nature of the event seems to indicate something else happening. I would expect more randomness from the event if it was related to any OS scheduling issue or something else also running at an elevated priority. We also created another test program using SETIMR and it does the same thing. I agree that if we really want guaranteed fixed 20 ms response we should probably use a hardware solution but we thought this would be good enough (and seemed to be in preliminary tests).

Robert Gezelter · ‎01-14-2009

Dan,

If I may put on my architecture hat and make a few observations.

I would not necessarily rush out for a hardware solution, but I would consider something in the nature of an IO driver for this type of task. OpenVMS time handling is subject to some imprecision, as John and others have noted. If something must be monitored precisely at a resolution that close to the precision of the system services, they are not appropriate.

I have seen this general genre of problem throughout my career, starting with second-generation PDP-11 systems. The answer is almost invariably the same: For high precision timing, get an external oscillator running at a significantly higher frequency, and have it interrupt every the counter gets to zero. At that point, use a device driver to perform the immediate actions and forward the summarized information to a process/task for more complete processing.

Since the time-critical portions of this code are in the driver's interrupt handling, little is likely to interfere with it.

For completeness, I note that just because one has not noticed an overhead operation lasting .02 second or so does not mean that they are not there. While cluster transitions and similar activities are well known, I would assume that there are other activities that can create similar situations. Jeff Schreisheim (formerly of the DECnet-11/RSX team) did a very nice article in Computer Design many years ago on why DECnet-RSX ended up implementing COMMEXEC, a special executive supplement to provide services needed by DECnet protocol modules. It makes very good reading even today.

- Bob Gezelter, http://www.rlgsc.com

Hoff · ‎01-14-2009

I was once more solidly in the camp with Bob G. here around writing a driver for these cases, but (with the cost of Arduino and other such solutions) I've variously found tossing hardware at the problem cheaper than tossing a driver at it.

In years past, PLC-like approaches were both hairy and expensive, but that's changed.

There are also PCI-based PLCs around, though these do tend to require a driver.

Whether Arduino or another PLC-like solution is appropriate here does depend on what your responsiveness and timing and bandwidth and connection requirements might be, of course.

If you have tight requirements (and can't loosen those requirements through added hardware), then moving to what amounts to an application-dedicated core (such as the dedicated lock manager) could also be an option.

Jon Pinkley · ‎01-14-2009

Dan,

What does your test program do, and how do you detect a "missed cycle"?

I don't have an Itanium to test on, but on Alpha's the granularity of reptim is the HWCLK interrupt, not 10ms as stated in the SSREF documentation.

Attached is a C program and example logs from an ES40 and ES47 both running VMS 7.3-2

It sets the reptim to the smallest delta time possible: -1 (1 clunk or 100 nanoseconds). The right hand column is the number of "clunks" (100ns VMS time clock units) since previous contents of EXE$GQ_SYSTIME (via $gettim). Note these are not 100000 (10ms), instead they are a minimum of EXE$TICK_WIDTH.

Several anomalies (these were all running at normal, interactive priority 4).

ES40

884 15-JAN-2009 00:33:57.94 0x00A85A316AD27D2F 9765
885 15-JAN-2009 00:33:57.94 0x00A85A316AD2A354 9765
886 15-JAN-2009 00:33:57.94 0x00A85A316AD2C979 9765
887 15-JAN-2009 00:33:57.94 0x00A85A316AD2EF9E 9765
888 15-JAN-2009 00:33:57.95 0x00A85A316AD33BE8 19530
889 15-JAN-2009 00:33:57.95 0x00A85A316AD33BE8 0
890 15-JAN-2009 00:33:57.95 0x00A85A316AD3620D 9765
891 15-JAN-2009 00:33:57.95 0x00A85A316AD38832 9765
892 15-JAN-2009 00:33:57.95 0x00A85A316AD3AE57 9765

ES47

203 15-JAN-2009 00:44:43.70 0x00A85A32EBB98637 10257
204 15-JAN-2009 00:44:43.70 0x00A85A32EBB9AE48 10257
205 15-JAN-2009 00:44:43.70 0x00A85A32EBB9D659 10257
206 15-JAN-2009 00:44:43.72 0x00A85A32EBBCCF9C 194883 ! JLP this is 19 * 10257
207 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 10257
208 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
209 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
210 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
211 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
212 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
213 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
214 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
215 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
216 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
217 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
218 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
219 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
220 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
221 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
222 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
223 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
224 15-JAN-2009 00:44:43.72 0x00A85A32EBBD1FBE 10257
225 15-JAN-2009 00:44:43.73 0x00A85A32EBBD47CF 10257

1331 15-JAN-2009 00:44:44.86 0x00A85A32EC6A6141 10257
1332 15-JAN-2009 00:44:44.86 0x00A85A32EC6A8952 10257
1333 15-JAN-2009 00:44:44.86 0x00A85A32EC6AF0C0 26478 JLP 26478 = (2 * 10257) + 5964 (accuracy bonus??)
1334 15-JAN-2009 00:44:44.86 0x00A85A32EC6AF0C0 0
1335 15-JAN-2009 00:44:44.86 0x00A85A32EC6B18D1 10257
1336 15-JAN-2009 00:44:44.87 0x00A85A32EC6B40E2 10257

Jon

it depends

Jon Pinkley · ‎01-14-2009

Sorry, I accidently clicked submit when I meant to click browse for the attachment.

Here's the C program snd log files.

Jon

it depends

John Gillings · ‎01-15-2009

Running Jon's program on

OpenVMS V8.3-1H1 HP BL860c (1.67GHz/9.0MB), with 2 cores active

I increased the samples to 20000 at normal priority and all came back as 10000. There were no anomolies.

I then started 4 batch jobs at interactive priority running a dcl loop to saturate the CPU:

$ loop: goto loop

This resulted in only a few skips:

0 16-JAN-2009 08:12:12.17 0x00A85B3A991227BA
828 16-JAN-2009 08:12:13.00 0x00A85B3A9990A68A 20000
3558 16-JAN-2009 08:12:15.73 0x00A85B3A9B31FA7A 60000
5035 16-JAN-2009 08:12:17.21 0x00A85B3A9C141D1A 60000
6657 16-JAN-2009 08:12:18.84 0x00A85B3A9D0C5FCA 60000
8804 16-JAN-2009 08:12:20.99 0x00A85B3A9E54BE4A 60000
10293 16-JAN-2009 08:12:22.49 0x00A85B3A9F388E9A 50000
11899 16-JAN-2009 08:12:24.10 0x00A85B3AA02E393A 50000
14048 16-JAN-2009 08:12:26.25 0x00A85B3AA176499A 20000
15554 16-JAN-2009 08:12:27.76 0x00A85B3AA25CB1FA 50000
17160 16-JAN-2009 08:12:29.37 0x00A85B3AA3525C9A 50000
19305 16-JAN-2009 08:12:31.52 0x00A85B3AA49A6CFA 60000

Increasing the count to 60000, and running as a batch job on a quiet system resulted in only one skip:

28782 16-JAN-2009 08:20:03.56 0x00A85B3BB20ADDBA 20000

the run time was just over one minute. Boot time was 15-DEC-2008 09:28:36.00, so I'll schedule the 60000 sample run for 09:28 and see if anything interesting happens at 09:28:36
(or maybe 09:28:37, given the leap second over new year? ;-)

report back in a couple of hours...

A crucible of informative mistakes

John Gillings · ‎01-15-2009

I'm not sure what this means!

The job started on time at 09:28:00 and completed at 09:29:01.44. The samples that weren't exactly 10000 apart were:

0 16-JAN-2009 09:28:00.03 0x00A85B452FCE39FA
991 16-JAN-2009 09:28:01.02 0x00A85B453065E61A 40000
51609 16-JAN-2009 09:28:51.64 0x00A85B454E91BECA 20000
51610 16-JAN-2009 09:28:51.64 0x00A85B454E91BECA 0

So nothing suspicious at the 6 hour multiple from boot time. Maybe clock drift for whatever event is causing Dan's anomoly?

I've scheduled the job to run at 6 hour intervals for the next day or so to see if any pattern emerges.

A crucible of informative mistakes

David Jones_21 · ‎01-16-2009

When I did a real time code, long ago, the unexpected thing that tripped me up was image rundowns would take relatively long times to cleanup large address spaces and shceduling was blocked while doing so.

I'm looking for marbles all day long.

Hoff · ‎01-16-2009

Big global section flushes were a trigger for pauses at various sites.

Jon Pinkley · ‎01-16-2009

Dan,

If you are still reading this thread...

I had suggested the PRF tool, but looking at some notes I had from bootcamp, it does not store any timestamps, so it would probably not let you see cause and effect.

Probably the best SDA extension is SPL (Spinlock tracing)

I would suggest submitting sys$examples:spl.com about 10 seconds prior to when you expect the missed cycle.

Then look at the section of in the analysis file that has the following heading:

Long Spinlock Hold Times (> 1000 microseconds)

My guess is it will give you a good clue. For example, when I ran my program on an ES40 with 21000 samples, and looked for long ticks, here is what I found:

(18:53:18) $ run fast_schdwk
0 16-JAN-2009 18:53:18.60 0x00A85B9428D9D0A8
5099 16-JAN-2009 18:53:23.58 0x00A85B942BD25258 58590
6073 16-JAN-2009 18:53:24.54 0x00A85B942C63A3F2 22265
13297 16-JAN-2009 18:53:31.59 0x00A85B943098C6C3 58590

The 22265 is due to the accuracy bonus which shouldn't affect you as long as the HWCLK int is 1000Hz.

Here's what was in the SPL analysis file.

Long Spinlock Wait Times (> 1000 microseconds)

Timestamp CPU Spinlock | Forklock Calling PC | Forking PC EPID Wait (us)
---------------------- --- --------------------- -------------------------------------- -------- ---------
16-JAN 18:53:31.593694 03 8C5BA800 LCKMGR 801D7400 EXE$DEQ_C+000F0 202004E9 5322
16-JAN 18:53:39.605956 03 8C5BA800 LCKMGR 801D29D0 EXE$ENQ_C+00900 20257DBC 5311
16-JAN 18:53:31.593498 01 8C5BA800 LCKMGR 801D29D0 EXE$ENQ_C+00900 20257DBC 5278
16-JAN 18:53:23.583059 02 8C5BA800 LCKMGR 801D29D0 EXE$ENQ_C+00900 20257DBC 5208

Other than being interesting, I am not sure that knowing what the cause is will help you. Unless your process remains CUR, it will be subject to the whims of the scheduler and that can be blocked (for short periods) by many things.

What is the process doing? I.e. does it need full process context? If it does not, then you may be able to hook the HWCLK Interrupt and store stuff in a ring buffer in Non-Paged pool. Perhaps an SDA extension like PCS. The HWCLK interrupt runs at sufficiently high IPL that it won't normally get blocked, but you don't want to do any substantial processing at that IPL either.

A dedicated Itanium core is quite expensive if what it is doing can be done by a dedicated micro controller like Hoff mentioned.

Hoff, the Arduino looks interesting. Thanks for the reference. I assume you meant this http://www.arduino.cc/ and http://en.wikipedia.org/wiki/Arduino

Jon

it depends

Hoff · ‎01-17-2009

Jon; yes, that's the widget. One of various. There are all manner of similar options that can be used to off-load host boxes for various of these tasks; to move the timing-critical activities from timing-adverse platforms. This whether the option is out-board or bus-based or USB-based or LAN-based. Proper choice here depends on how high the bandwidth and how low the latency; on the application requirements.

John Gillings · ‎01-18-2009

Some more samples at 6 hour intervals:

0 16-JAN-2009 15:28:00.03 0x00A85B777A68CFDE
988 16-JAN-2009 15:28:01.02 0x00A85B777B0054EE 60000
52147 16-JAN-2009 15:28:52.18 0x00A85B77997EBA6E 20000

0 16-JAN-2009 21:28:00.05 0x00A85BA9C506548A
11949 16-JAN-2009 21:28:12.00 0x00A85BA9CC25C16A 20000

0 17-JAN-2009 03:28:00.07 0x00A85BDC0FA51346
14210 17-JAN-2009 03:28:14.29 0x00A85BDC181D8076 20000

0 17-JAN-2009 09:28:00.05 0x00A85C0E5A3BF136
55670 17-JAN-2009 09:28:55.72 0x00A85C0E7B6AA9A6 20000

0 17-JAN-2009 15:28:00.05 0x00A85C40A4D60312
4098 17-JAN-2009 15:28:04.15 0x00A85C40A7477842 20000
49629 17-JAN-2009 15:28:49.68 0x00A85C40C26B1A02 20000

Looks fairly random to me. Indeed, what surprises me most about this experiment is just how FEW wakeups are missed. Just one or two out of 60000 for each run

A crucible of informative mistakes

Dan R Farrell · ‎01-07-2010

OpenVMS Engineering investigated and were able to reproduce the problem. They determined that the delay is in the firmware used to update the system clock. Unfortunately there is nothing they can do to fix the problem so we have to live with it.

Dan R Farrell · ‎01-07-2010

I contatced HP support and OpenVMS Engineering investigated and were able to reproduce the problem. They determined that the delay is in the firmware used to update the system clock. Unfortunately there is nothing they can do to fix the problem so we have to live with it.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call