Re: sys$deq documentation

John McL · ‎11-09-2011

I question whether the sys$deq documentation is correct and as detailed as it could be when it says, about using the LCK$M_CANCEL flag on a lock being converted, that the status SS$_CANCEL is stored in the lock status block (that was specified by the conversion request) ONLY when a completion AST is specified.

This does not appear to be correct. I don't use a completion AST but I still get SS$_CANCEL in the lock status block. (FWIW, I tested this by taking an exclusive lock via one VMS process, then in another VMS process attempted to take an CW lock on the same resource using sys$enqw. I had a "time-out" AST that tripped and used sys$deq to cancel the request.)

In the documentation for $ENQ, under the subhead "Condition values returned in the lock status block", we find that condition SS$_CANCEL is returned when ...

"The lock conversion request has been canceled and the lock has been regranted at its previous lock mode. This condition value is returned when $ENQ queues a lock conversion request, the request has not been granted yet (it is in the conversion queue), and, in the interim, the $DEQ service is called (with the LCK$M_CANCEL flag specified) to cancel this lock conversion request. If the lock is granted before $DEQ can cancel the conversion request, the call to $DEQ returns the condition value SS$_CANCELGRANT, and the call to $ENQ returns SS$_NORMAL."

This seems to be rather different to the $DEQ documentation

The implication of the SS$_CANCEL is that the lock status block must be available in memory at the time of the sys$deq and we can't simply identify the lock by its LockID (first parameter in the call to $DEQ).

I ask whether anyone else can confirm my findings about sys$deq. I tried search this forum but ...

John McL · ‎11-09-2011

Sorry .. I forgot to say that we're running v8.3. I checked the 8.3 documentation first and the quotes above come from the 8.4 documentation (which I think is identical on this matter).

John Gillings · ‎11-09-2011

John,

I think I follow what you're saying, but a code example would help.

Regardless of the documentation, I'd expect that if I $ENQ(W) a lock, that doesn't get granted because of a $DEQ/LCK$M_CANCEL, I'd definitely expect an explanation in my LKSB.

My SSRM says this (is this the passage to which you refer?):

•If a completion AST was specified by the conversion request, the 
completion AST is queued for delivery with SS$_CANCEL status stored in 
the lock status block that was specified by the conversion request.

I don't see that this implies the LKSB is not written if the AST was not specified (though I'd agree it could have been more clearly explained). The IDSM doesn't help either, but your observation seems to confirm my expectation.

Another way to think about it is, not specifying an AST is pretty much equivalent to specifying an AST that does nothing. The same stuff happens with event flags and status blocks, it's just the call that gets skipped.

In terms of worrying about the LKSB being in memory... I don't see why this is an issue. If you believe the lock still exists, then surely the LKSB must still exist?

A crucible of informative mistakes

Hoff · ‎11-09-2011

I'd expect the cancellation status to land in the LKSB if the $enq is cancelled in flight, and the rest of the stuff in the documentation around the ASTs describes what happens when those are specified; how those are fired.

Please post the test code.

In general (and outside of the cancellation case), I'd also suggest only using a $deq on a null-mode lock; after a conversion from another mode to null via a call to $enq or $enqw. There's a race condition latent here, given that the $deq isn't synchronous. It is very easy to end up in odd states with this system service, particularly in a cluster.

John McL · ‎11-09-2011

Providing an example isn't easy because it's a bunch of functions within a C program (which is probably why I hit this problem in the first place).

The sequence is

1 - set up the requested lock conversion

2 - call SETIMR to run a "time-out" routine (which will set an event flag as well as a simple global flag variable)

3 - record the ID of the lock in a "rollback list"

4 - call sys$enqw, passing the event flag (and a lock status block defined here on the C stack)

5 - wait for the event flag to be set

If the lock is granted the event flag with be set. if the AST times out the even flag and the global variable flag will be set.

6 - Wait for event flag ...

7 - Having established that the enq timed out, call the routine that does the roll back

8 - When I use the debugger in the rollback routine I can see on the stack the address that I arrived from

9 - After the sys$deq instruction (with LCK$M_CANCEL and done in EXEC mode via a user-written system service) the same address on the stack has been changed to the value SS$_CANCEL (2096).

That change to the stack meant either that my local variables were corrupted or the unwinding at the end of the function was failing. With the second of those, it was only when I manually deposited the old value back onto the stack that the functions would unroll correctly.

Ultimately I modified the code to move the lock status block into a global variable, and that was successful.

GuentherF · ‎11-09-2011

Ahem, is this a problem? I mean, SS$_CANCEL says what happened, AST or not.

/Guenther

Hoff · ‎11-09-2011

>Providing an example isn't easy because it's a bunch of functions within a C program (which is probably why I hit this problem in the first place).

That does appear a reasonable supposition. There likely won't be a way to determine the cause here, short of rummaging through the (unavailable) source code.

Within this case, we have event flags, "simple" global flags, user-written system services and exec-mode cancellations (and are there inner-mode ASTs flying around?), and there's apparently no state machine nor easily-extracted logic core here, and for what looks to be a transactional environment.

Is this server possibly a multiprocessor? These servers are particularly good at exposing synchronization and timing bugs.

If it's typical for these sorts of applications, this is in an environment that's probably seen incremental fixes and updates over the eon or two it's been in use, too. I'm going to guess you're not permitted to refactor or replace this implementation with a proper state machine, either.

Stack variable corruptions can be an IOSB or a buffer that's been written into after its stack frame vaporized. In this context, the reference to "unwind" is particularly interesting; are you calling the $unwind service or or analogous here?

John Gillings · ‎11-09-2011

John,

For async system services, and their W equivalents, the status block WILL be written eventually. That's the definition of the completion of the request. It doesn't matter how the completion is achieved. If the status is 0, the event is still pending, otherwise it gives the disposition of the event. So, if your LKSB is deallocated before the lock is completed, you'll be corrupting something.

For what you're describing, I'd recommend instead recording IDs in your roll back list, you make the rollback list itself out of the LKSBs (allocated dynamically, or from a static pool). Use a structure with links, a timestamp and an LKSB. I'd avoid event flags, as they're way too limited, and have lots of potential for timing issues.

Now, the routine responsible for the roll back can look at the status in the LKSB to see if it needs to be rolled back, and it has the lock id and lock value block available for inspection.

A crucible of informative mistakes

John McL · ‎11-09-2011

The problem is solved by simply keeping in global memory the LKSB for the lock that I'm trying to convert.

Nothing else can be added to the rollback list (of lock IDs) before I either (a) change the rollback action for that entry (which would happen if I get the lock) or (b) process the rollback sequence because of a timeout.

Why do I need the rollback? To cater for when the user hits control_Y, at which point I need to reinstate them to where they started, which might be with locks on some other resource. (I'm using "get new before releasing old").

As I said, the problem is solved.

I'm just looking for confirmation that my thinking is correct in that, despite what the $DEQ documentation says, SS$_CANCEL is returned in the LKSB condition field whenever a conversion is cancelled, not only when there's a completion AST, which all that the documentation specifically mentions.

Hoff · ‎11-12-2011

If a global static allocation provides a workaround, then the original LKSB was likely allocated in storage that was volatile over the lifetime of the call; that would be a typical behavior, as would be run-time corruptions.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: sys$deq documentation

sys$deq documentation