Operating System - OpenVMS
1839173 Members
2906 Online
110136 Solutions
New Discussion

Re: WCB (Window Control Block) structure documented anywhere?

 
Mark Corcoran
Frequent Advisor

WCB (Window Control Block) structure documented anywhere?

Hi, I'm trying to do a post mortem on a problem that happened a few times last week, where a process was stuck in a HIB state, with a lock on a log file, preventing other processes from accessing the file.

On two occasions, I had time to get the output of SHOW PROCESS /ALL in SDA for the offending process.

In both cases, the CCB (Channel Control Block) points to a WCB that shows a WRITES field (well, this is what SDA calls it - whether or not this is actually what it is called in the structure, I don't know) with a value of 0.

I'm trying to find information on what this field actually indicates, and the circumstances under which it gets updated.

e.g. does it indicate write /attempts/, or only /successful/ write attempts?

I found (and there's probably a good reason that someone will explain to me) that files opened in DCL (e.g. OPEN /WRITE) don't appear to have CCBs or WCBs associated with them - until recently, I've never had to delve to this level, so there's probably a good reason for this, that I've just never encountered before.

Creating a test C program, I found that an fopen() and an fprintf() on their own, did not result in a the WRITES count changing from zero.

If I then added an fsync() - and hence flushed RMS buffers - this caused the WRITES count to increase from 0 to 3.

A second fprintf(), and the WRITES count was still 3. A second fsync(), and it had increased to 5.

I've only written to the file twice (although I appreciate that the fprintf() may result in two underlying SYS$WRITE or SYS$PUT calls), so does anyone have any idea as to what exactly it is the WRITES counter counting?

Is the WCB formally documented anywhere (maybe VMS File System Internals which I don't have a copy of, though I have the Alpha Internals & Data Structures, and the Alpha Internals Scheduling & Process Control books, and there's only passing references to WCBs - no indication as to what all the fields are, values they may take, or what they're used for)?

[The offending code is Fortran, and supposedly does a LIB$GET_LUN, a Fortran OPEN, a Fortran WRITE, some more WRITEs within a loop,a Fortran CLOSE and an LIB$FREE_LUN

I'm trying to establish whether or not it "hung"/entered an idle loop "by mistake", or even got as far as attempting the WRITE, let alone whether or not the WRITE failed or succeeded.

Knowing if the WRITE had been attempted, would help push the developers in the right direction (since the file is open, I would guess that LIB$GET_LUN and the Fortran OPEN had both succeeded).

Any help much appreciated!]


Mark
12 REPLIES 12
Volker Halle
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Maark,

you'll find the WCB field definitions in

SYS$LIBRARY:LIB.REQ

All symbols starting with WCB$ with a module header name of WCBDEF$

Volker.
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

Volker, thanks for the reply - this does help a bit (in giving me more details of what the other fields are).

The comment for WCB$L_WRITES indicates "count of writes performed" - unfortunately, this doesn't indicate if this is write attempts or successful write attempts - but I don't want to make an incorrect assumption!

Mark
Volker Halle
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Mark,

the WCB counts the actual IOs to the blocks on the disk - as you've shown with your fsync example.

If something like this happens again, consider to force a process dump of the 'hanging' process: $ SET PROC/DUMP=NOW/ID= before you STOP/IMAGE/ID=xxx that process.

You can analyze the process dump (imagename.DMP file) with ANAL/PROC later and you have all of process memory at your disposal for analysis.

Volker.
Ian Miller.
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

If I recall correctly
PROCIO$SDA displays those WCB fields.

The IDSM chapter on IO system services talks about the WCB and its use in mapping file extents.

Do get a process dump next time

What version of VMS?

If WRITES was 0 then no writes to disk where done but that does not mean that no FORTRAN WRITEs where done. I guess FORTRAN WRITE would result in RMS $PUT which would show up in the RMS data structures.
____________________
Purely Personal Opinion
Hein van den Heuvel
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?


Mark>> The offending code is Fortran,

If that's the case, then be sure NOT to test with DCL or C because as you saw, the RunTime Libraries RTL(s) can play games.

If you want to realy see what is happening to an RMS access file, I would recommend:
SDA> SHOW PROC/RMS=(RAB,BDBSUM)
For a specific file, for example with IFI=2 make that (NOIFB:2,RAB,BDBSUM)
For a DCL file make that (PIO,RAB,BDBSUM)

But I woudl start with PROCIO!

There is a somewhat similar discussion

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1153392

It refers to a very handy tool which may well be all you really want: PROCIO
Volker published that on EISNER, but system has been down for weeks as of this writing.
So I took the liberty to append to this reply. (If I recall correctly this version still had a minor problem with allocation class 0. Volker can reply with a better version if he feels like it.)
Original location:
http://eisner.encompasserve.org/~halle/

>> files opened in DCL (e.g. OPEN /WRITE) don't appear to have CCBs or WCBs

They have to have one, apparently you can not find them

Please elaborate. If I do SDA> SHOW PROC/CHAN then I nicely see a CCB and WCB address for DCL opened files.

>> Knowing if the WRITE had been attempted

That WRITE may just have been a SYS$PUT which for an unshared file, or a shaed with DFW, need not cause an IO.
For EXISTING files, you want to use SET FILE/STAT and ANAL/SYS.. SHO PROC/RMS=FSB

For desperate cases check out SET PROC/SSLOG

Hope this helps

Hein van den Heuvel
HvdH Performance Consulting
Ian Miller.
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

A version of PROCIO$SDA is available at

http://www.pi-net.dyndns.org/jfp/english/ProcIO.html

However the author may have a newer one.
____________________
Purely Personal Opinion
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

Ian & Volker, thanks for your replies.

Ian: Version=v7.3-2

Up until a year ago, my dev system was VAX/VMS v5.5-2 (it was always due to be retiring "soon", and there were so many issues to consider if upgrading VMS version on it).

I moved around within the company, and now have slightly more modern systems (the live systems are now v7.3-2, whereas in the old role, they were I think v7.2-1; the dev systems now are v7.2-1).

In short - I wasn't aware of SET PROCESS /DUMP - please tell me that it hasn't been around sinve v5.5-2 and I just haven't noticed! :-(

[Fortunately, most of the software in these parts normally behaves, so it would be very rare that I would have an occasion to forcibly dump a process.

I've taken this on board, but I created a .COM file that would allow me to get details of an offending process, and optionally kill it.

The kill is using an executable which does CORBA trader stuff first, and then does "something" - I believe it did attempt to force a dump on the 2 occasions I had time to do this.

However, it was only a PTHREADS dump, wand the only PCs were in Fortran RTL, PTHREADS itself, and SYSTEM_MANAGEMENT]

Mark
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

Hein, thanks also for your reply...

>But I woudl start with PROCIO!
This may be something for the future, but as is normally the case, Sarbanes-Oxley audits prohibit any .EXEs being copied on without due process (no pun intended).


>They have to have one, apparently you can not find them
>Please elaborate. If I do SDA> SHOW PROC/CHAN then I nicely see a CCB and WCB address for DCL opened files.

Mea culpa.

My recollection from yesterday, was that no CCB was shown.

I'm guessing that there must have been a number of channels open, and that I simply missed seeing the file I'd opened.

Consider me suitably embarrassed and chastened!


>For EXISTING files, you want to use SET FILE/STAT and ANAL/SYS.. SHO PROC/RMS=FSB

Will endeavour to use this for the next time (if there is one!).

The developers have however apparently spotted something in the code that they think might be related (tho I haven't been given specifics yet).

Another different RMS query coming into the general pool shortly...

Mark
John Gillings
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Mark,

SET PROCESS /DUMP=NOW is new in V7.3-2 and Alpha only (or Alpha * IA64 post V8).

The new style process dumps can be analyzed on a system other than the one it was dumped on, and they can be examined either with ANALYZE/PROCESS, which puts you into DEBUG like older style process dumps, or ANALYZE/CRASH, which puts you into SDA as if the crashed process was the only one on the system. Very useful if you want to look at things like WCBs!

If you're stuck in HIB state, I'd be guessing at a timing issue with $HIBER/$WAKE revealed by moving from VAX to Alpha. Faster systems can make open timing windows much wider, often breaking code that "has been working for years" on VAX.

Do you have any idea where the code was asleep?

Writing bullet proof $HIBER/$WAKE synchronization code is non-trivial. You need to consider both lost and spurious wakeups. Defensive $HIBER/$WAKE looks something like this:

sleeping side:

flag=0
do while (flag.EQ.0) $HIBER
$WAKE

waking side:

flag=1
$WAKE

Where flag is a global variable. It's used to confirm the wake is intended (and in the above code there are still potential timing windows). The $WAKE after waking is to prevent your code from swallowing a $WAKE intended for some other thread. Using code like this you're less likely to be broken by someone else's code, but you may break other code that isn't written properly.

Something to check... Are there any LIB$WAITs? Remember that by default LIB$WAIT is expecting F_FLOAT, but the new default floating type for FORTRAN is /FLOAT=IEEE. Passing an IEEE float as the argument for LIB$WAIT won't wait the correct period (but I can't remember off hand if it's longer or shorter). See the new optional parameter for LIB$WAIT to specify floating type.
A crucible of informative mistakes
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

John, thanks for your reply.


>If you're stuck in HIB state, I'd be guessing at a timing issue with $HIBER/$WAKE revealed by moving from VAX to Alpha.
>Do you have any idea where the code was asleep?

I looked at the PC in SDA with EXA/INST, and it was reported as being in PROCESS_MANAGEMENT.

I've got a feeling that it's not really "stuck" in HIB, it's just that it really is hibernating, because it's doing nothing.

i.e. the application code simply omits to do a $DEQ in some circumstances, so basically, it got some input, processed it, encountered an error, got the exclusive lock, wrote details of the error to the error log file, didn't release the lock, and is now waiting for more input to process.


>Something to check... Are there any LIB$WAITs? Remember that by default LIB$WAIT is expecting F_FLOAT, but the new default floating type for FORTRAN is /FLOAT=IEEE. Passing an IEEE float as the argument for LIB$WAIT won't wait the correct period.

Useful information...

As indicated elsewhere, the development has long since been outsourced, but we do have access to the sources - the problem is trying to work out which module(s) in which CMS library/ies is/are used to build the .EXE, so I can't readily tell if they are using LIB$WAITs.

Last time I touched Fortran was at least 10yrs ago, and that was looking at someone else's code, and understanding the basics to see where something needed to be changed (i.e. I'm not a Fortran programmer by any means; C is my (main) bag)

Mark
John Gillings
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Mark,

>the PC in SDA with EXA/INST, and it was
>reported as being in PROCESS_MANAGEMENT.

I would hope so. By definition, a process in HIB state must be in $HIBER which is inside module PROCESS_MANAGEMENT. QED!

It's not where it is NOW that matters, it's where it came FROM that you need to know. From SDA use SHOW CALL, then SHOW CALL/NEXT, examining the return addresses to trace back up the call stack. SHOW IMAGE will give you the base addresses of any shareable images. You should be able to determine the routines, match the address to an image by comparing the ranges, subtract the base address to get the image offset and find the routine in the link map.

(you do have a link maps, yes? Always use LINK/MAP/FULL/CROSS. Back when disk space was costed in dollars per KB there was a reasonable excuse not to generate link maps but today it's cents per GB, so there's no excuse not to).

Even better, use LINK/DSF and SET PROCESS/DUMP=NOW. When you ANALYZE/PROCESS tell it where the DSF file is and use SHOW CALL to give you the module, routine and line numbers.

>I can't readily tell if they are
>using LIB$WAITs.

That's what link maps are for! With /MAP/CROSS it will show all references to LIB$WAIT.
A crucible of informative mistakes
Robert Gezelter
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Mark,

Processes stuck in HIB are often sitting in HIB waiting for an event (pardon the anthropomorphism).

Your last posting indicated that the code may have taken an error path. The question is: Which error path?

Infinite HIBernation is often caused because the program "thinks" it scheduled an event, but because of an error, did not actually issue the request successfully. This is the equivalent of a device driver waiting for an interrupt for an operation that it never actually issued.

If this is the case, one is looking for a code path that thinks it issued an IO, timer, or other request, but did not in fact do so.

Debugging traces of processing are invaluable. A dump might be useful, but if the information was in a stack frame, which has since been overwritten, it is gone (readers can guess why I mention that particular possibility -- hint: c local variables are on the stack).

- Bob Gezelter, http://www.rlgsc.com