Operating System - OpenVMS
1839195 Members
3379 Online
110137 Solutions
New Discussion

Disk IO without doing IO's

 
Richard White_5
Advisor

Re: Disk IO without doing IO's

Good Morning Wim...

This problem is "killing-me". For something that is relatively repeatable, I feel as though I should be able to come up with something that can help us further isolate.

As for the 1ms or 10ms cpu-time, I believe that the field in the "$sho proc/cont" will highlight days, hrs, min, sec, and "jiffies" which is a term from the old Tops_10/20 days. At any rate each "jiffie", while incrementing, by 1, is actually a 10 milli-second jiffie.

Typically, a user-process will receive a priority boost anywhere from one to six, after some sort of RSE, such as coming out of some sort of Wait-State, like HIB, LEF, etc. Since the "$sho proc/cont/id=xxxx" is executing "$getjpi" to that process on our behalf, the priority boost of 1, seems ok.

Sorry for not mentioning the the SDA commands for the process-stacks or call-frames.
SDA> show proc/in=xxxx
SDA> show stack/kern
SDA> show stack/use
SDA> show call
SDA> show call/next (repeat 10 to 15 times)
Hoping that we might be able to see the system-service calls for QIO...

Is it at all possible, that another user-process on the system is also executing the program "ipcdaemeon.exe" Perhaps you can execute "SDA> show summary/imag/thread" for us, after the stacks and call-frames.

Lastly, sorry about the "Good Morning" confusion, but that is my standard greeting regardless of the time-of-day or time-zone.

Thanx,
whynot3k


Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

I ran vtdpy on the hsg80 and mon dis in parallel.

vtdpy reports no thruput whatsoever on dsa4 while mon dis reports an average of 8 IO/s.

Richard : good afternoon (almost good weekend for most of us but I have to stay for another 4 hours). I'll lauch sda ...

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

No other IPC programs active. I already checked that.

Here the SDA output.

Wim
Wim
Willem Grooters
Honored Contributor

Re: Disk IO without doing IO's

"Once a minute", "freezing CPU", rings a few bells.
Could it involve some sort of "garbage collection"? I know of Java once used on my (too little) box quite regularly the system seemed freeze for i/o, and my system disk made weird grinding noises for a short time, after which the system did continue ias if nothing was wrong. Well, I know this IPCDAEMON.EXE is not a JAVA program, but doesn't C++ (or C) have a similar construction to control the heap?
Are any timers active (Timer Queue Elements in queue), since it seems to occur every minute - by the clock, so to speak - without intermediate CPU usage (HIB state)? (I guess 2, or 14? are in queue: given the other quota that are 500..). Worth looking at, probably.

Just another wild guess, I'm not sure whether it's possible on VMS, but I've seen elsewhere: A process allocating a number of diskblocks and use these as temporary space by direct access, directly connecting to the driver. I guess IO's are then counted but you won't see where it goes unless you know where to look in the execuatble.

Can you find out where these BG devices lead to? (TCPIP SHO DEV/FULL BGxxxx, where xxxx is any of the active devices). As is specified that TCPIP uses DMA as well, it _might_ be TCPIP-io is mistakenly presented as Disk IO. And since these devices are "busy" it might be the process is waiting for a signal. It could be helpful to know about these....

It might be a matter of commands, but I wonder about the open channels and RMS data.
On channel 40, there's IPCDAEMON.ERR;40, of which the RMS data is show. But of the two (adjacent) versions of IPCDAEMON.OUT on channels 20 and 70, I see no RMS data on these. Nor on what's openend on channels 00 and 80 (directories?)
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

Willem,

The first BG is a listener.

The second is an active line to the same host and there is traffic when the application is doing something.

The third is a an active line to the other node of the cluster and there is traffic when the application is used on that other node.

The fourth is not a tcp socket and has 0 operations completed.

There is no data for the .OUT because everything is written to .ERR (but not that much). They opened sys$output twice.

On the subject of functionality : don't know but it's not that high tech.

Wim

Wim
Galen Tackett
Valued Contributor

Re: Disk IO without doing IO's

I found this text on the user stack. Perhaps it will be helpful.


00000000.7ACDB5D0 00028B50.010E0009 ....Q...
00000000.7ACDB5D8 5453554C.001D8738 ....LUST
00000000.7ACDB5E0 50435422.22003BA1 i;"" TCP
00000000.7ACDB5E8 444F4E22.20225049 IP" "NOD
00000000.7ACDB5F0 22202254.454E4345 ECNET" "
00000000.7ACDB5F8 2022454D.414E4F4E NONAME"
00000000.7ACDB600 4C414B43.45484322 "CHECKAL
00000000.7ACDB608 20202000.22455649 IVE...."
00000000.7ACDB610 20202020.20202020
repeated "n" times until:
00000000.7ACDB9E8 00000000.000300B4 SYS$K_VERSION_01+000B4


Can anyone make sense of this? Perhaps it's just a chunk of stack used by a random routine for C-style automatic variables?
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

Galen,

These are parts of the startup parameters of the program. Complete it is
ipcdaemon "NOCLUSTER" "TCPIP" "NODECNET" "NONAME" "CHECKALIVE" "MONCHECKALIVE" "DAECHECKALIVE".

What worries me is that the IO's are not really done on the disk controller.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

Willem,

No driver is used.

Wim
Wim
Richard White_5
Advisor

Re: Disk IO without doing IO's

Good Morning Wim...

I was hoping that the "ipcdaemon.out" might shed a little light on this dilemma. But it sounds like the "ipcdaemon.err" is the only file that typically gets written to, and this sounds like only errors and/or events.

As for the Channel #00, Willem is correct about that pointing to a Directory. Typically the first Disk_Volume Channel is the "Default-Directory" for the process; such as "sys$login:" or whatever. The process will NOT have a Window-Ctl-Block address (00000000) for this default directory. You can test this on your own process by switching to a different disk-volume and directory from your "login" disk and checking your process-channels.

The channel at #80, with FID (781,57353,0) is very likely an actual File, as there is a WCB-Address (820AF6C0). This WCB will contain the address of the UCB to DSA10, as well as the address of the FCB for the file that is opened, and the retrieval-pointers for the LBN's on the disk for the file.

I have attached a small DCL-Command procedure that you can use to convert the File_ID_Number to a File_Name for the file on DSA10 on Channel #80. This short utility has been around for a few years, and I beleive starting with one of the later versions of OVMS (v8.x ?), is built into DCL. You will need to rename from *.txt to *.com, as there have been a couple of times in the past, that older versions of kermit-32 and ftp have not treated the *.com as an ascii file, even after prompting for ascii.

I have seen in the past where a Server or Control Process will create or spawn "daughter-processes or threads" for a couple of minutes, to perform housekeeping tasks, and then these detached and/or sub-processes will terminate. If this were the case here, the thread or sub-process would share the KTB/FRED/JIB data-structures, and it may be this master-process/job (ipcdaemeon) is being charged for the thread or sub-process activity.

Do you have Accounting turned on? Perhaps there is Job-Termination info in the Accounting.Dat file that may coincide with when the Disk-Activity ends...

Don't know if the file-id on channel #80 will shed any more light, or if you will be able to get an opportunity to execute the SDA> show summ/imag/thread ; or if the Accounting_Info will add more insight...

Thanx,
whynot3k
Richard White_5
Advisor

Re: Disk IO without doing IO's

Good Morning Wim...

I have attached an example of a process (Acme_Server) that is multi-threaded, with unique PID's, but obviously the same image as the Master-Job-Process. Of course, I am cheating here, since I know when to execute the SDA> command, since I have the Job running when I want it to.

You have the daunting task of not knowing if this job even creates sub-processes or was linked as a multi-threaded image.. Hope this helps.

Thanx,
whynot3k
Galen Tackett
Valued Contributor

Re: Disk IO without doing IO's

Just curious:

Does reading or writing file system metadata (directories, file headers, etc.) get charged to the responsible process as direct i/o?

Perhaps it will not have any bearing on this problem but maybe I and others might learn something.

Galen
comarow
Trusted Contributor

Re: Disk IO without doing IO's

When is satisfied from cache, it still is a direct io.

If you have xfc on,
Show mem/cache/full
and see what percentages of your reads are satisified without even hitting the disk.
Richard White_5
Advisor

Re: Disk IO without doing IO's

Good Morning Wim, Galen and "comarow,"

I believe "kudos" are in order for "comarow" and Galen, as they may have hit upon a couple of interesting points. I have supplied a short DCL-Command procedure to create, open, and delete sub-directories...

When I invoke this routine (takes only 3 or 4 minutes), on one terminal; monitor the top_dio process on another terminal; and vtdpy on the HSG80; the numbers do not appear to match up -- initially...

While the creating and deleting of Directories is charged to the process as Direct-I/O, as evidenced by the "$mon proc/topd" (100-200 DIO's per second); the vtdpy output was only in the 20 to 40 I/O's
per second. And some of these HSG I/O's may have been from other jobs on the system.

But when I look at the XFC-Cache before and after this DCL-Procedure, I can see that over 90% of the I/O's are Cache-Hits, and do not make it to the Adapter/Controller. I beleive that the FDT/DDT routines do not know or care about VIOC or XFC, the process is still charged for the DIO... It is that the QIO is satisfied very quickly in cache, and never makes it out to the controller or adapter or disk-volume.

While this VIOC/XFC may account for the discrepancy between what the performance monitoring tools in the cpu will see, compared to the vt-terminal on the HSGxx controller, we still do not know exactly how the IPCDAEMON process is executing the QIO's (wheteher in cache or on disk.)

Thanx,
whynot3k


Garry Fruth
Trusted Contributor

Re: Disk IO without doing IO's

Do you know what the documentation for VPA says about DISKIO? How do they compute it?
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

All,

I already noticed that IO's satisfied by XFC were counted as direct IO's.

PA manual : Disk IO is the number of io operations per second for the device. Tallied at the physical device driver layer.

But my IO's are not seen in direct IO and only disk IO but not on the HSG.

Btw : I have VIOC, not XFC.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

Forgot to mention : nothing in accounting.

I think there is no multi-threading. The program was made somewhere around 1992.

Wim
Wim
Ian Miller.
Honored Contributor

Re: Disk IO without doing IO's

"does 'SHOW PROCESS/CONTINUOUS' send any SKASTs to the target process to collect some data? "

It has to for some things i.e access items in the target processes P1 area. However for the items shown on that display IIRC I don't think it needs to.
____________________
Purely Personal Opinion
Hein van den Heuvel
Honored Contributor

Re: Disk IO without doing IO's


I have not followed this threat, so please excuse me if the following comment is out of place/out of context/ redundant, but something struck me as I scanned over it:

Wim: "Btw : I have VIOC, not XFC."

VIOC resolved IOs are NOT counted as real IOs, but XFC resolved ones would be. This was a change in thinking as VMS engineering rolled out XFC.

fwiw,
Hein.

Wim Van den Wyngaert
Honored Contributor

Re: Disk IO without doing IO's

Hein,

I compared a XFC and VIOC data after I did a search on the same file in a loop for 5 minutes. The VPA data is identical, ti I see direct IO and no disk IO.

So, what is your real IO ?

Wim
Wim