Thanks others, for their suggestions regarding software which will help analyse this; unfortunately, Sarbanes-Oxley controls would initially prevent these being installed (and I need to go to another team to get approval on it anyway).
John, in answer to your question:
>I'm a bit confused. Is the blocking lock your application lock, or an RMS lock? If RMS is it a record lock or the whole file?
It's an application lock. Let's for argument's sake call the resource name MIRROR_ON_THE_WALL.
One process $ENQWs a lock request for exclusive mode for this resource, and (for whatever reason - maybe it is stuck looping around doing nothing, waiting for something that will never happen, bug in the code where there's no call to $DEQ, or there is a call to it but under some circumstances the logic path avoids this bit of code) never releases it.
Each process supposedly does a $ENQW for MIRROR_ON_THE_WALL, then when the lock is granted, calls LIB$GET_LUN, Fortran OPEN, Fortran WRITE, Fortran CLOSE, LIB$FREE_LUN and $DEQ.
I would presume there will be RMS locks associated with the Fortran OPEN, but the code in hanging processes doesn't (read: shouldn't) get that far because it is still waiting on the $ENQW for MIRROR_ON_THE_WALL.
[The developers have indicated that there is a generic function that does this (it is an error handler), although looking through the CMS libraries, there seems to be umpteen copies of the handler in different modules.
I'm not sure whether or not they are all still in use and behave exactly the same.
Therein lies the problem in copy useful code into different modules/projects, rather than sticking it in one place...]
>The application lock case should be fairly simple, just $ENQ yourself against the lock, then $GETLKI to find the blocking lock.
As I mentioned, I haven't previously had to look at locking at this kind of level, but I have been doing development on VMS for almost 20yrs, so this won't be a problem once I've read the details of these 2 particular system services (I have the manuals in hard and soft copy form, don't worry!).
In all honesty, any solution I implement as an automatic "workaround" will require it to go thru Sarbanes-Oxley audit controls, but developing it myself might be easier than the pain of getting another team to approve third party (to the company's point of view, rather than to that team) software first of all.
>For RMS, you may be able to do something similar with a ROP=WAT option, perhaps even from DCL with READ/WAIT?
I've not seen the /WAIT qualifier for READ before, and it's not listed in the help library on our system. Is this only available from a particular version?
>If you can go back a step to the application design and work on the locking mechanism, perhaps implement a blocking AST?
Alas, development of the application was outsourced a long time ago, and ££££ is payable for any change, which may take a long time to be delivered (I'm not sure that the developers in the outsource company are hardcore VMSers; probably quite adept at the various languages that the application is written in, but it's not a place one would tend to associate with VMS systems to have worked on for years before winning an outsourcing contract).
They have given more of an update on the bug that they thought they've found; they have seen in cause "a problem", but not the same problem as we are seeing.
Thinking about it from their description (a new version of the log file is created, rather than the existing one being appended to), it sounds to me like some processes are actually not using the application lock at all, and thus:
a) create a new version of the file if the file is already locked open by an offending process
b) hang on getting access to the existing file if it is already locked open.
Mark