Operating System - OpenVMS
1748179 Members
4071 Online
108758 Solutions
New Discussion юеВ

OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

 
Hoff
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

I was once more solidly in the camp with Bob G. here around writing a driver for these cases, but (with the cost of Arduino and other such solutions) I've variously found tossing hardware at the problem cheaper than tossing a driver at it.

In years past, PLC-like approaches were both hairy and expensive, but that's changed.

There are also PCI-based PLCs around, though these do tend to require a driver.

Whether Arduino or another PLC-like solution is appropriate here does depend on what your responsiveness and timing and bandwidth and connection requirements might be, of course.

If you have tight requirements (and can't loosen those requirements through added hardware), then moving to what amounts to an application-dedicated core (such as the dedicated lock manager) could also be an option.

Jon Pinkley
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Dan,

What does your test program do, and how do you detect a "missed cycle"?

I don't have an Itanium to test on, but on Alpha's the granularity of reptim is the HWCLK interrupt, not 10ms as stated in the SSREF documentation.

Attached is a C program and example logs from an ES40 and ES47 both running VMS 7.3-2

It sets the reptim to the smallest delta time possible: -1 (1 clunk or 100 nanoseconds). The right hand column is the number of "clunks" (100ns VMS time clock units) since previous contents of EXE$GQ_SYSTIME (via $gettim). Note these are not 100000 (10ms), instead they are a minimum of EXE$TICK_WIDTH.

Several anomalies (these were all running at normal, interactive priority 4).

ES40

884 15-JAN-2009 00:33:57.94 0x00A85A316AD27D2F 9765
885 15-JAN-2009 00:33:57.94 0x00A85A316AD2A354 9765
886 15-JAN-2009 00:33:57.94 0x00A85A316AD2C979 9765
887 15-JAN-2009 00:33:57.94 0x00A85A316AD2EF9E 9765
888 15-JAN-2009 00:33:57.95 0x00A85A316AD33BE8 19530
889 15-JAN-2009 00:33:57.95 0x00A85A316AD33BE8 0
890 15-JAN-2009 00:33:57.95 0x00A85A316AD3620D 9765
891 15-JAN-2009 00:33:57.95 0x00A85A316AD38832 9765
892 15-JAN-2009 00:33:57.95 0x00A85A316AD3AE57 9765

ES47

203 15-JAN-2009 00:44:43.70 0x00A85A32EBB98637 10257
204 15-JAN-2009 00:44:43.70 0x00A85A32EBB9AE48 10257
205 15-JAN-2009 00:44:43.70 0x00A85A32EBB9D659 10257
206 15-JAN-2009 00:44:43.72 0x00A85A32EBBCCF9C 194883 ! JLP this is 19 * 10257
207 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 10257
208 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
209 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
210 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
211 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
212 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
213 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
214 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
215 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
216 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
217 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
218 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
219 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
220 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
221 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
222 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
223 15-JAN-2009 00:44:43.72 0x00A85A32EBBCF7AD 0
224 15-JAN-2009 00:44:43.72 0x00A85A32EBBD1FBE 10257
225 15-JAN-2009 00:44:43.73 0x00A85A32EBBD47CF 10257


1331 15-JAN-2009 00:44:44.86 0x00A85A32EC6A6141 10257
1332 15-JAN-2009 00:44:44.86 0x00A85A32EC6A8952 10257
1333 15-JAN-2009 00:44:44.86 0x00A85A32EC6AF0C0 26478 JLP 26478 = (2 * 10257) + 5964 (accuracy bonus??)
1334 15-JAN-2009 00:44:44.86 0x00A85A32EC6AF0C0 0
1335 15-JAN-2009 00:44:44.86 0x00A85A32EC6B18D1 10257
1336 15-JAN-2009 00:44:44.87 0x00A85A32EC6B40E2 10257

Jon
it depends
Jon Pinkley
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Sorry, I accidently clicked submit when I meant to click browse for the attachment.

Here's the C program snd log files.

Jon
it depends
John Gillings
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Running Jon's program on

OpenVMS V8.3-1H1 HP BL860c (1.67GHz/9.0MB), with 2 cores active

I increased the samples to 20000 at normal priority and all came back as 10000. There were no anomolies.

I then started 4 batch jobs at interactive priority running a dcl loop to saturate the CPU:

$ loop: goto loop

This resulted in only a few skips:

0 16-JAN-2009 08:12:12.17 0x00A85B3A991227BA
828 16-JAN-2009 08:12:13.00 0x00A85B3A9990A68A 20000
3558 16-JAN-2009 08:12:15.73 0x00A85B3A9B31FA7A 60000
5035 16-JAN-2009 08:12:17.21 0x00A85B3A9C141D1A 60000
6657 16-JAN-2009 08:12:18.84 0x00A85B3A9D0C5FCA 60000
8804 16-JAN-2009 08:12:20.99 0x00A85B3A9E54BE4A 60000
10293 16-JAN-2009 08:12:22.49 0x00A85B3A9F388E9A 50000
11899 16-JAN-2009 08:12:24.10 0x00A85B3AA02E393A 50000
14048 16-JAN-2009 08:12:26.25 0x00A85B3AA176499A 20000
15554 16-JAN-2009 08:12:27.76 0x00A85B3AA25CB1FA 50000
17160 16-JAN-2009 08:12:29.37 0x00A85B3AA3525C9A 50000
19305 16-JAN-2009 08:12:31.52 0x00A85B3AA49A6CFA 60000

Increasing the count to 60000, and running as a batch job on a quiet system resulted in only one skip:

28782 16-JAN-2009 08:20:03.56 0x00A85B3BB20ADDBA 20000

the run time was just over one minute. Boot time was 15-DEC-2008 09:28:36.00, so I'll schedule the 60000 sample run for 09:28 and see if anything interesting happens at 09:28:36
(or maybe 09:28:37, given the leap second over new year? ;-)

report back in a couple of hours...
A crucible of informative mistakes
John Gillings
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

I'm not sure what this means!

The job started on time at 09:28:00 and completed at 09:29:01.44. The samples that weren't exactly 10000 apart were:

0 16-JAN-2009 09:28:00.03 0x00A85B452FCE39FA
991 16-JAN-2009 09:28:01.02 0x00A85B453065E61A 40000
51609 16-JAN-2009 09:28:51.64 0x00A85B454E91BECA 20000
51610 16-JAN-2009 09:28:51.64 0x00A85B454E91BECA 0

So nothing suspicious at the 6 hour multiple from boot time. Maybe clock drift for whatever event is causing Dan's anomoly?

I've scheduled the job to run at 6 hour intervals for the next day or so to see if any pattern emerges.
A crucible of informative mistakes
David Jones_21
Trusted Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

When I did a real time code, long ago, the unexpected thing that tripped me up was image rundowns would take relatively long times to cleanup large address spaces and shceduling was blocked while doing so.

I'm looking for marbles all day long.
Hoff
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Big global section flushes were a trigger for pauses at various sites.
Jon Pinkley
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Dan,

If you are still reading this thread...

I had suggested the PRF tool, but looking at some notes I had from bootcamp, it does not store any timestamps, so it would probably not let you see cause and effect.

Probably the best SDA extension is SPL (Spinlock tracing)

I would suggest submitting sys$examples:spl.com about 10 seconds prior to when you expect the missed cycle.

Then look at the section of in the analysis file that has the following heading:

Long Spinlock Hold Times (> 1000 microseconds)

My guess is it will give you a good clue. For example, when I ran my program on an ES40 with 21000 samples, and looked for long ticks, here is what I found:

(18:53:18) $ run fast_schdwk
0 16-JAN-2009 18:53:18.60 0x00A85B9428D9D0A8
5099 16-JAN-2009 18:53:23.58 0x00A85B942BD25258 58590
6073 16-JAN-2009 18:53:24.54 0x00A85B942C63A3F2 22265
13297 16-JAN-2009 18:53:31.59 0x00A85B943098C6C3 58590


The 22265 is due to the accuracy bonus which shouldn't affect you as long as the HWCLK int is 1000Hz.

Here's what was in the SPL analysis file.

Long Spinlock Wait Times (> 1000 microseconds)

Timestamp CPU Spinlock | Forklock Calling PC | Forking PC EPID Wait (us)
---------------------- --- --------------------- -------------------------------------- -------- ---------
16-JAN 18:53:31.593694 03 8C5BA800 LCKMGR 801D7400 EXE$DEQ_C+000F0 202004E9 5322
16-JAN 18:53:39.605956 03 8C5BA800 LCKMGR 801D29D0 EXE$ENQ_C+00900 20257DBC 5311
16-JAN 18:53:31.593498 01 8C5BA800 LCKMGR 801D29D0 EXE$ENQ_C+00900 20257DBC 5278
16-JAN 18:53:23.583059 02 8C5BA800 LCKMGR 801D29D0 EXE$ENQ_C+00900 20257DBC 5208

Other than being interesting, I am not sure that knowing what the cause is will help you. Unless your process remains CUR, it will be subject to the whims of the scheduler and that can be blocked (for short periods) by many things.

What is the process doing? I.e. does it need full process context? If it does not, then you may be able to hook the HWCLK Interrupt and store stuff in a ring buffer in Non-Paged pool. Perhaps an SDA extension like PCS. The HWCLK interrupt runs at sufficiently high IPL that it won't normally get blocked, but you don't want to do any substantial processing at that IPL either.

A dedicated Itanium core is quite expensive if what it is doing can be done by a dedicated micro controller like Hoff mentioned.

Hoff, the Arduino looks interesting. Thanks for the reference. I assume you meant this http://www.arduino.cc/ and http://en.wikipedia.org/wiki/Arduino

Jon
it depends
Hoff
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Jon; yes, that's the widget. One of various. There are all manner of similar options that can be used to off-load host boxes for various of these tasks; to move the timing-critical activities from timing-adverse platforms. This whether the option is out-board or bus-based or USB-based or LAN-based. Proper choice here depends on how high the bandwidth and how low the latency; on the application requirements.
John Gillings
Honored Contributor

Re: OpenVMS 8.3-1H1 Itanium SYS$SCHDWK call

Some more samples at 6 hour intervals:

0 16-JAN-2009 15:28:00.03 0x00A85B777A68CFDE
988 16-JAN-2009 15:28:01.02 0x00A85B777B0054EE 60000
52147 16-JAN-2009 15:28:52.18 0x00A85B77997EBA6E 20000

0 16-JAN-2009 21:28:00.05 0x00A85BA9C506548A
11949 16-JAN-2009 21:28:12.00 0x00A85BA9CC25C16A 20000

0 17-JAN-2009 03:28:00.07 0x00A85BDC0FA51346
14210 17-JAN-2009 03:28:14.29 0x00A85BDC181D8076 20000

0 17-JAN-2009 09:28:00.05 0x00A85C0E5A3BF136
55670 17-JAN-2009 09:28:55.72 0x00A85C0E7B6AA9A6 20000

0 17-JAN-2009 15:28:00.05 0x00A85C40A4D60312
4098 17-JAN-2009 15:28:04.15 0x00A85C40A7477842 20000
49629 17-JAN-2009 15:28:49.68 0x00A85C40C26B1A02 20000

Looks fairly random to me. Indeed, what surprises me most about this experiment is just how FEW wakeups are missed. Just one or two out of 60000 for each run

A crucible of informative mistakes