Operating System - OpenVMS
1753816 Members
8738 Online
108805 Solutions
New Discussion юеВ

Re: WCB (Window Control Block) structure documented anywhere?

 
Mark Corcoran
Frequent Advisor

WCB (Window Control Block) structure documented anywhere?

Hi, I'm trying to do a post mortem on a problem that happened a few times last week, where a process was stuck in a HIB state, with a lock on a log file, preventing other processes from accessing the file.

On two occasions, I had time to get the output of SHOW PROCESS /ALL in SDA for the offending process.

In both cases, the CCB (Channel Control Block) points to a WCB that shows a WRITES field (well, this is what SDA calls it - whether or not this is actually what it is called in the structure, I don't know) with a value of 0.

I'm trying to find information on what this field actually indicates, and the circumstances under which it gets updated.

e.g. does it indicate write /attempts/, or only /successful/ write attempts?

I found (and there's probably a good reason that someone will explain to me) that files opened in DCL (e.g. OPEN /WRITE) don't appear to have CCBs or WCBs associated with them - until recently, I've never had to delve to this level, so there's probably a good reason for this, that I've just never encountered before.

Creating a test C program, I found that an fopen() and an fprintf() on their own, did not result in a the WRITES count changing from zero.

If I then added an fsync() - and hence flushed RMS buffers - this caused the WRITES count to increase from 0 to 3.

A second fprintf(), and the WRITES count was still 3. A second fsync(), and it had increased to 5.

I've only written to the file twice (although I appreciate that the fprintf() may result in two underlying SYS$WRITE or SYS$PUT calls), so does anyone have any idea as to what exactly it is the WRITES counter counting?

Is the WCB formally documented anywhere (maybe VMS File System Internals which I don't have a copy of, though I have the Alpha Internals & Data Structures, and the Alpha Internals Scheduling & Process Control books, and there's only passing references to WCBs - no indication as to what all the fields are, values they may take, or what they're used for)?

[The offending code is Fortran, and supposedly does a LIB$GET_LUN, a Fortran OPEN, a Fortran WRITE, some more WRITEs within a loop,a Fortran CLOSE and an LIB$FREE_LUN

I'm trying to establish whether or not it "hung"/entered an idle loop "by mistake", or even got as far as attempting the WRITE, let alone whether or not the WRITE failed or succeeded.

Knowing if the WRITE had been attempted, would help push the developers in the right direction (since the file is open, I would guess that LIB$GET_LUN and the Fortran OPEN had both succeeded).

Any help much appreciated!]


Mark
12 REPLIES 12
Volker Halle
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Maark,

you'll find the WCB field definitions in

SYS$LIBRARY:LIB.REQ

All symbols starting with WCB$ with a module header name of WCBDEF$

Volker.
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

Volker, thanks for the reply - this does help a bit (in giving me more details of what the other fields are).

The comment for WCB$L_WRITES indicates "count of writes performed" - unfortunately, this doesn't indicate if this is write attempts or successful write attempts - but I don't want to make an incorrect assumption!

Mark
Volker Halle
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Mark,

the WCB counts the actual IOs to the blocks on the disk - as you've shown with your fsync example.

If something like this happens again, consider to force a process dump of the 'hanging' process: $ SET PROC/DUMP=NOW/ID= before you STOP/IMAGE/ID=xxx that process.

You can analyze the process dump (imagename.DMP file) with ANAL/PROC later and you have all of process memory at your disposal for analysis.

Volker.
Ian Miller.
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

If I recall correctly
PROCIO$SDA displays those WCB fields.

The IDSM chapter on IO system services talks about the WCB and its use in mapping file extents.

Do get a process dump next time

What version of VMS?

If WRITES was 0 then no writes to disk where done but that does not mean that no FORTRAN WRITEs where done. I guess FORTRAN WRITE would result in RMS $PUT which would show up in the RMS data structures.
____________________
Purely Personal Opinion
Hein van den Heuvel
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?


Mark>> The offending code is Fortran,

If that's the case, then be sure NOT to test with DCL or C because as you saw, the RunTime Libraries RTL(s) can play games.

If you want to realy see what is happening to an RMS access file, I would recommend:
SDA> SHOW PROC/RMS=(RAB,BDBSUM)
For a specific file, for example with IFI=2 make that (NOIFB:2,RAB,BDBSUM)
For a DCL file make that (PIO,RAB,BDBSUM)

But I woudl start with PROCIO!

There is a somewhat similar discussion

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1153392

It refers to a very handy tool which may well be all you really want: PROCIO
Volker published that on EISNER, but system has been down for weeks as of this writing.
So I took the liberty to append to this reply. (If I recall correctly this version still had a minor problem with allocation class 0. Volker can reply with a better version if he feels like it.)
Original location:
http://eisner.encompasserve.org/~halle/

>> files opened in DCL (e.g. OPEN /WRITE) don't appear to have CCBs or WCBs

They have to have one, apparently you can not find them

Please elaborate. If I do SDA> SHOW PROC/CHAN then I nicely see a CCB and WCB address for DCL opened files.

>> Knowing if the WRITE had been attempted

That WRITE may just have been a SYS$PUT which for an unshared file, or a shaed with DFW, need not cause an IO.
For EXISTING files, you want to use SET FILE/STAT and ANAL/SYS.. SHO PROC/RMS=FSB

For desperate cases check out SET PROC/SSLOG

Hope this helps

Hein van den Heuvel
HvdH Performance Consulting
Ian Miller.
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

A version of PROCIO$SDA is available at

http://www.pi-net.dyndns.org/jfp/english/ProcIO.html

However the author may have a newer one.
____________________
Purely Personal Opinion
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

Ian & Volker, thanks for your replies.

Ian: Version=v7.3-2

Up until a year ago, my dev system was VAX/VMS v5.5-2 (it was always due to be retiring "soon", and there were so many issues to consider if upgrading VMS version on it).

I moved around within the company, and now have slightly more modern systems (the live systems are now v7.3-2, whereas in the old role, they were I think v7.2-1; the dev systems now are v7.2-1).

In short - I wasn't aware of SET PROCESS /DUMP - please tell me that it hasn't been around sinve v5.5-2 and I just haven't noticed! :-(

[Fortunately, most of the software in these parts normally behaves, so it would be very rare that I would have an occasion to forcibly dump a process.

I've taken this on board, but I created a .COM file that would allow me to get details of an offending process, and optionally kill it.

The kill is using an executable which does CORBA trader stuff first, and then does "something" - I believe it did attempt to force a dump on the 2 occasions I had time to do this.

However, it was only a PTHREADS dump, wand the only PCs were in Fortran RTL, PTHREADS itself, and SYSTEM_MANAGEMENT]

Mark
Mark Corcoran
Frequent Advisor

Re: WCB (Window Control Block) structure documented anywhere?

Hein, thanks also for your reply...

>But I woudl start with PROCIO!
This may be something for the future, but as is normally the case, Sarbanes-Oxley audits prohibit any .EXEs being copied on without due process (no pun intended).


>They have to have one, apparently you can not find them
>Please elaborate. If I do SDA> SHOW PROC/CHAN then I nicely see a CCB and WCB address for DCL opened files.

Mea culpa.

My recollection from yesterday, was that no CCB was shown.

I'm guessing that there must have been a number of channels open, and that I simply missed seeing the file I'd opened.

Consider me suitably embarrassed and chastened!


>For EXISTING files, you want to use SET FILE/STAT and ANAL/SYS.. SHO PROC/RMS=FSB

Will endeavour to use this for the next time (if there is one!).

The developers have however apparently spotted something in the code that they think might be related (tho I haven't been given specifics yet).

Another different RMS query coming into the general pool shortly...

Mark
John Gillings
Honored Contributor

Re: WCB (Window Control Block) structure documented anywhere?

Mark,

SET PROCESS /DUMP=NOW is new in V7.3-2 and Alpha only (or Alpha * IA64 post V8).

The new style process dumps can be analyzed on a system other than the one it was dumped on, and they can be examined either with ANALYZE/PROCESS, which puts you into DEBUG like older style process dumps, or ANALYZE/CRASH, which puts you into SDA as if the crashed process was the only one on the system. Very useful if you want to look at things like WCBs!

If you're stuck in HIB state, I'd be guessing at a timing issue with $HIBER/$WAKE revealed by moving from VAX to Alpha. Faster systems can make open timing windows much wider, often breaking code that "has been working for years" on VAX.

Do you have any idea where the code was asleep?

Writing bullet proof $HIBER/$WAKE synchronization code is non-trivial. You need to consider both lost and spurious wakeups. Defensive $HIBER/$WAKE looks something like this:

sleeping side:

flag=0
do while (flag.EQ.0) $HIBER
$WAKE

waking side:

flag=1
$WAKE

Where flag is a global variable. It's used to confirm the wake is intended (and in the above code there are still potential timing windows). The $WAKE after waking is to prevent your code from swallowing a $WAKE intended for some other thread. Using code like this you're less likely to be broken by someone else's code, but you may break other code that isn't written properly.

Something to check... Are there any LIB$WAITs? Remember that by default LIB$WAIT is expecting F_FLOAT, but the new default floating type for FORTRAN is /FLOAT=IEEE. Passing an IEEE float as the argument for LIB$WAIT won't wait the correct period (but I can't remember off hand if it's longer or shorter). See the new optional parameter for LIB$WAIT to specify floating type.
A crucible of informative mistakes