Re: $WAITFR behaviour

Barry Alford · ‎10-28-2007

I am trying to synchronize two processes using event flags. One process (the Master) runs a wait loop using one event flag in a timer ($SETIMR). Once the timer event flag is set, it is cleared and another event flag is set at the end of the loop. This second flag is then cleared at the top of the loop and the timer is reset:

repeat
$SETTMR(timerFlag, cycleTime)
$CLREF(eventFlag)
$WAITFR(timerFlag)
$CLREF(timerFlag)
$SETEF(eventFlag)

This is intended to form an "escape mechanism" - the two flags are never both true.

A slave process implements this loop:

repeat
$WAITFR(timerFlag)
$WAITFR(eventFlag)

Which is intended to keep time with the master process. This relies on $WAITFR having this (documented) behaviour:
"$WAITFR - Tests a specific event flag and returns immediately if the flag is set; otherwise, the process is placed in a wait state until the event flag is set."

I took that to mean that when the flag is set, the process will definitely be woken up and run and see the flag as set. However, I do not see this happening; it seems that because the flags are only set for a short period that these events are "lost" and the slave processes run very erratically (>1 in 10 master cycles) as if the flags are being polled by the processes and not being triggered by an event from the OS.

Have I missed something here?

Richard Whalen · ‎10-29-2007

Assuming that timerFlag and eventFlag are in a common efn cluster, then I would expect this to work. If you find that it is erratic, then I suspect there is a bug in the version of VMS that you are using.

What version of VMS are you using?
Are all the patches installed?
Is this a multiprocessor system?

Hein van den Heuvel · ‎10-29-2007

My we assume these are some sort of common flags?
Is this a new design? Testing on a multi-cpu system?

>. Have I missed something here?

Yes, you missed a glaring timing window.
Once the timer flag is set, the slave is officialy runnable.
The waitfr time is done, and the waitfr other is about to be requested.
But in the mean time the master sets the other, arms the timer, clears the other and waits for the timer.
The scheduler now starts workgin on the slave.
It finally actually executes the waitfr other, which at this point is cleared. So it really waits for the timer.
Once cycle missed!

KISS!

One event flag is genrally too hard to deal with already.
Be sure to check out $HIBER / $WAKE instead.
Unlike event flags the pending wakes are rememberd.
Much easier!

Good luck,
Hein.

Barry Alford · ‎10-29-2007

I have tried it on v7.2-1 and v7.3-2 (both on DS10 single processor) so far and about to try v8.1 (PersonalAlpha). All will be pretty much unpatched :-o. (I am reluctant to believe that something as fundamental as this would be broken!)

I am using a common event cluster:
[in Fortran]
$ASCEFC(%VAL(iEflag), %DESCR("TIMER"), %VAL(0), %VAL(0))
...for both flags (69 & 70 in fact)

Hein, I see your point but I think that would make the slave miss at most one firing of the eventFlag. Consider the flags are doors into and out of a room - once the slave enters the room (on the timerFlag) it's waiting for the exit to open (on the eventFlag). Meanwhile, the master has opened the exit many times but the slave doesn't come out!

The problem of using $HIBER/$WAKE is that slaves will have to register with the master to get woken up; I wanted to keep things more adhoc...

(The processes will, in fact, map to a shared region of memory, but I wanted to find a general algorithm. Back to my old college text books and refresh my hazy memories of p & v and co-routines?)

Hein van den Heuvel · ‎10-29-2007

Yabut... the one missed 'other wait' is but the beginning. The slave will also miss a timer event, while waiting for the other flag. So that's 2!

Be happy that you tested this on a single CPU system. You might not have found the design problem on a multi-cpu system untill way too late, but it would have been equally broken!

>> (I am reluctant to believe that something as fundamental as this would be broken!)

Ah, give yourself 8 points!
That would have been 10 points if you had written 'refuse to believe'.
[Yeah, I know you can not give points to yourself]

When reading the base topic, I half expected to read 'waitfr' is broken, and was pleased to see that was not mentioned but replaced by an 'have I missed something'. Excellent.
Now I see it was fully intentional and it pleases me.
There are too many daft individuals out there that think their first dables must have uncovered a major flaw in fundamental stuff. Not!

Cheers,
Hein.

Barry Alford · ‎10-29-2007

Ah Hein! I'm not awarding any points just yet!

Well, well! When I monitor the slave process with:
$ SHOW/PROC/CONT/ID=
.. it all works _perfectly_! Stop monitoring, and it all goes sticky again.

How d'you like them apples? :-)

Hoff · ‎10-29-2007

Have you missed something? Yes, you've missed that event flags are an OpenVMS analog of "die Lorelei", or of Homer's Sirens. A construct that serves to lure unsuspecting programmers onto the rocks of pain and suffering. By sheer coincidence, I posted up a similar statement to this one -- and a description of why you're headed for the rocks -- just last night.

http://64.223.189.234/node/613

Event flags only look simple. They can get to be very nasty, this in terms of spurious triggers, problems with scaling, limits on the numbers of parallel events, and otherwise.

On no details on the application, I might tend to use locks and potentially lock value blocks here. Mayhap shared memory. I'd work to keep time with the master, whatever that means here -- and some details and some background on the application synchronization requirements would be useful.

Stephen Hoffman
HoffmanLabs LLC

Barry Alford · ‎10-29-2007

Thank, Hoff, for the warning. I will look up that link tonight (restricted access here).

All I asked was for clarification of how $WAITFR worked; it seems from your words and me scraping my ship on the rocks that the Land of Event Flags is not the place for me!

The application does simulation of various machines; currently each machine is processed serially in a time step. This makes changes a problem in that the whole app has to be rebuilt. We have toyed with shared libraries and late binding, and my aim with this exercise is to experiment with multiple processes, each simulating one machine, but running in step with each other in time. (Did someone say "threads" ot there?)

Anyhow, I will now try Plan B: use the master timer to wake up processes, then a cycle number in shared memory to ensure only one processing step per master cycle.

John Gillings · ‎10-29-2007

Barry,

I agree with Hein and Hoff. Event flags are are very hard to get right. They tend to have nasty timing windows, and because there's so few of them they often get overloaded, so you have to deal with spurious wakeups.

Consider using a pair of locks, maybe called TICK and TOCK. You can lock step your processes by cycling the locks converting to EX then NL in sequence. Put your cycle number in the lock value block.

Now, since your locks are exclusive to the specific pair of processes you avoid any logic for spurious "wakes", and you're guaranteed handshaking. Moreover, since they're locks, the mechanism will work across a cluster (and it can be scaled up to multiple processes fairly easily - just add another TICK for each slave). With some extra logic on the lock value block, you could also build in a way to monitor the presence of the other process.

Master
repeat
$ENQ TOCK CVT->EX ; wait for slave
$ENQ TICK CVT->EX ; block slave
$SETIMR
wait for timer
$ENQ TICK CVT->NL ; release timer
prepare for slave to be released
$ENQ TOCK CVT->NL ; release slave
slave is now executing
next

Slave
repeat
$ENQ TICK CVT->EX ; wait for timer
timer complete
$ENQ TOCK CVT->EX ; wait for master
do something
$ENQ TOCK CVT->NL ; signal complete
$ENQ TICK CVT->NL ;
next

A crucible of informative mistakes

Robert Gezelter · ‎10-29-2007

Barry,

I would agree with John, except it is not clear to me that you actually need two locks to accomplish this.

Personally, I would do this with locking and ASTS to make things maximally safe.

Doing this with event flags is tricky, as has been commented on.

- Bob Gezelter, http://www.rlgsc.com

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: $WAITFR behaviour

$WAITFR behaviour