Operating System - OpenVMS
1839250 Members
2900 Online
110137 Solutions
New Discussion

Re: High Interrupt and buffer I/O

 
SOLVED
Go to solution
EWONG
Advisor

High Interrupt and buffer I/O

Hi,

When we performed some application test and observed that the interrupt and buffer I/O was quite high, could you tell what is the problem and how we can resolve it. The configuration is below,

Server : 4 x Alpha DS20E with dual 833Mhz CPU and form the cluster (2 in primary site, named A & B and 2 in secondary site, named C & D)

Storage : Each server has a MSA30 storage directly attached and one volume shadow disk is formed between server A & C, B & D

Attached is the statistics extracted from T4.

Many thanks,

18 REPLIES 18
Volker Halle
Honored Contributor

Re: High Interrupt and buffer I/O

Ewong,

you're showing us 3 nice and steady graphs: CPU busy 180%, 7500 Bufio/sec and 45% Int Stack time.

Performance analysis requires much more information and knowledge about what's being run at that time as can be expressed in 3 T4 graphs.

What kind of load did you apply to the system ?

Volker.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Hi Volker,

In fact, we just performed the baseline test of the new release of application to the system, which we observed that the interrupt is very high comparing to the existing version. We expected the dual CPU resource should be fully utilized but wondered why the interrup is so high and consumed 45% out of 200% system CPU. May I know what is the interrupt doing and what other information I can provide for the analysis?

Many thanks
Volker Halle
Honored Contributor
Solution

Re: High Interrupt and buffer I/O

Ewong,

so you have run the same kind of test with the 'prior version' of the application and observed much less load ? Did you capture T4 data from that test ? Did you obtain the same 'application throughput' ? Can you measure, express or compare application throughput ?

What kind of load is generating the high buffered IO load ? DECnet ? TCPIP ? Processing those bufered IOs certainly requires CPU resource and interrupt time.

Try DTSEND (DECnet performance test) between the 2 systems and observe, which kind of load you can generate using that tool (packets/sec and interrupt load).

Is MSCPserving (MONI MSCP or MONI SCS) contributing to the load ? Or distributed locking (MONI DLOCK) ?

Volker.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Volker,

Based on my memory, we didn't see such high interrupt and buffer I/O in prior version with similar system loading. Sorry that the figures is no longer available as the development system has been upgraded to VMS V8.2 (the prior version runs on V7.3-2). I don't know what kind of load is generating to trigger such high buffer I/O, but I suspected it may relate to RTR as I observed there has high disk I/O on RTR journal disk. I don't sure any issue related to MSCP and distributed locking, so that I extracted some graph and attached for your review.

Many thanks.
Volker Halle
Honored Contributor

Re: High Interrupt and buffer I/O

Ewong,

to finally rule out distributed locking, look for any significant (> 1000) incoming or outgoing rates in the [MON.DLOC] data.

To find out, which load may generate the BUFIOs, look at the [NET..] and [TCP...] T4 data and graph them in the same display together with the BUFIO rates.

You could also try to 'Correlate' against the [MON.SYST]Buffered I/O Rate

Volker.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Hi Volker,

Thanks for your advise. As I am not familiar with T4, so that i just generated few of the graphs. Could you help to figure out the problem of high interrupt and buffer I/O.

Many thanks.
Volker Halle
Honored Contributor

Re: High Interrupt and buffer I/O

Ewong,

no problems with distributed locking.

The main load seems to be in the network:

LLA 2 Mio Bytes sent/sec (16 Mbit/sec)
LLA 1.5 Mio Bytes recv/sec (12 Mbit/sec)

TCP 3400 Received Packest/sec
TCP 2000 Transmited Packets/sec

This would acount for about 5400 of the 7500 BUFIOs/sec.

BUFIOs can also be created by the file system (XQP). Try to graph:

[MON.DISK]OpRate, [MON.FCP]FCP Calls and [MON.SYST]Buffered I/O Rate

To find out, how network traffic contributes to interrupt stack time, run DTSEND between the nodes. This test will send packets as fast as possible between the 2 nodes and you can then find out, how many packets/sec it can send and what the CPU load (interrupt stack) will look like.

Volker.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Volker,

Many thanks for your analysis.
I don't undertand why the total of TCP (i.e. 5400) can be counted for 5400 of the 7500 BUFIOs/sec, what is their relationship ? I attached another graph to show the OpRate and FCP calls for your review. Could you tell what is DTSEND and how can I run this function ?

Many thanks,
Hoff
Honored Contributor

Re: High Interrupt and buffer I/O

For the DECnet bandwidth test:

RUN SYS$SYSTEM:DTSEND
HELP

You'll see a couple of commands listed in the help, and will be able to connect and measure...
Volker Halle
Honored Contributor

Re: High Interrupt and buffer I/O

Ewong,

network IOs are typically buffered IOs, so adding up the TCP packets provides an approximation of the BUFIO load generated by TCPIP.

The Disk-IO rate seem to be quite steady at 1300 IO/sec. No problem with FCP calls (XQP).

DTSEND is the DECnet test sender. You invoke it with:

$ MC DTSEND
_Test: DATA/NODE=other-node

By default, it will send 128 byte packets as fast as possible to the other node for 30 seconds and will report some statistics afterwards. You should run MONI MODE at the same time to measure the int stack, kernel mode and user mode time. This will tell you, how many packets sent/sec generate how much int stack time and CPU load.

DTSEND uses the DTR object (or session control application) on the remote node. This is not set up by default with a valid username, so you have to provide a username (and password for DECnet Phase IV). I'm typically using the same as for the MIRROR object.

DTSEND is documented in Chapter 4 of the DECnet for OpenVMS Network Management Utilites Manual:

http://h71000.www7.hp.com/doc/73final/documentation/pdf/DECNET_OVMS_NET_UTIL.PDF

Another tool to be used (while running the test), would be the PCS$SDA extension (PC sampling). This will tell you, where (in which modules/routines) the CPUs are spending their time:

$ ANAL/SYS
SDA> PCS LOAD
SDA> PCS START TRACE
... (let it run for a couple of minutes)
SDA> PCS STOP TRACE
SDA> PCS SHOW TRACE/STATIS
SDA> PCS UNLOAD

Volker.
Jur van der Burg
Respected Contributor

Re: High Interrupt and buffer I/O

Make sure that there's no shadowcopy going on at the time you measuring. The latest scsi patchkits contain one of my fixes in DKdriver that may significantly lower interrupt stack time during a copy.

Jur.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Hi Jur, thanks for your remind.

Volker,

I have performed the DTSEND from the machine to other cluster member, the result and the summary of monitor mode is attached. May I know that there is purely DECnet traffic or including TCP/IP traffic in DTSend test ? Currently the DECnet over IP is enabled in all cluster members. Based on the summary, does it conclude that the interrupt is normal and expect behavior ?

Many thanks,
Wim Van den Wyngaert
Honored Contributor

Re: High Interrupt and buffer I/O

Execute the following procedure on each node to find out what IP is doing.

To find the heavy stuff (doing over 100K traffic per minute) :
@xx 100

Wim
Wim
Hein van den Heuvel
Honored Contributor

Re: High Interrupt and buffer I/O

E wrote>> Based on the summary, does it conclude that the interrupt is normal and expect behavior ?

Yes it could be, allthough Jur's comment on improved DKdriver is worth pursueing. But then you have not even identified which VMS version.

Volker wrote much ealier>> so you have run the same kind of test with the 'prior version' of the application and observed much less load ? Did you capture T4 data from that test ?

E, this is a critical comment. If you can you really should pursue this. You may find that the old version had a much similar resource usage pattern in which case there is nothing specific to worry about allthough improvements may well be possible. Or you may find that the new application version does 50% more write IO causing 50% more shadow activity, causing a proportional increase in interupt stack time. Again, all would be well with the OS, but the changes in the application may need a review. If this is a databases-ish application, and judging be the DIO count it is, then you may want to use database stats (RMU for RDB, STATSPACK for ORACLE, or MONITOR for RMS) to see if the increase system resource use matches a particular DB use increase. You then need to judge whether that increase is justifyable or erroneous.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting




Volker Halle
Honored Contributor

Re: High Interrupt and buffer I/O

Ewong,

your DTSEND tests achieved about 11200-11600 messages/second (these are BUFIOs/sec) with an INT stack load of about 60%-80% and about 50% kernel mode on the remote node.

Your application test achieved about 7500 BUFIOs/sec with 45% INT stack load, so I believe the high BUFIO rate explains the interrupt stack load.

Whether the DTSEND test is running via DECnet or via DECnet-over-IP is hard to tell, it depends on your local DECnet configuration. But it doesn't matter much, as both protocols would generate BUFIOs.

Volker.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Volker,

In such case, can I say that the network I/O is equal to the buffer I/O and trigger the interrupt ? If yes, do we encounter any resource bottleneck ?

Many thanks,
Volker Halle
Honored Contributor

Re: High Interrupt and buffer I/O

Ewong,

the data presented so far indicates that the high INT stack load is from the BUFIOs, which represent network load (either TCPIP or DECnet).

Whether you hit any resource bottleneck, depends upon what this application test is supposed to do and whether it achieves the expected or promised throughput.

DTSEND also is kind of an application test, which sends network packets 'as fast as possible' between the 2 systems and certainly hits 'some bottleneck' - otherwise it would be capable to send more packets per second.

If the same 'application test' with the old version of your application would have achieved the same throughput with half the INT stack usage, you could certainly start to ask questions about the increase in load, but as you don't seem to have the old data available, there is not much you can do.

One of the golden rules of T4 is: start collecting data now, then you have it available, when you need it.

Volker.
EWONG
Advisor

Re: High Interrupt and buffer I/O

Volker and all,

Many thanks for your help and valuable information.

Best Regards,