Operating System - OpenVMS
1832988 Members
2716 Online
110048 Solutions
New Discussion

TCPIP performance problems

 
Willem Grooters
Honored Contributor

TCPIP performance problems

My dear collegues,

Someone please give us a hint where to look...

Environment: VMS and TRU64. I'm not certain about versions, but please read on , I don't think this is relevant.
Systems have been connected over 100Mb LAN.

From the TRU64 machine, a string (800 to 8000 bytes in size) is sent over IP to the VMS machine, where a service reads it in chunks of arbitrary length, and process each part after all has been read, and finally sends an acknowledgement back. This whole processing must be done within 30 seconds.
An idea of this message is in the attachment.

For years, this works without problem in a number of sites. The whole transaction can be done in a few seconds. This is independent of VMS version (all 7.x+).

A few weeks ago, two sites using this software had their LAN replaced by a WAN (5Mb/s). One of these ran into serious problems after that.
The process _may_ run fine, but suddenly the whole transaction may take minutes to finish, for no appearant reason. As suden as this problem may occur, as sudden it may disappear.
Because this is the only site having problems, and it did run flawlessly before, we suspect the network to be the cause.
However, this has been sniffed this afternoon, and on the IP level, there seems to be no problem: between sending a stream of 1800 bytes and receipt of the IP acknowledge lay less than a second. But processing took over 40 seconds.
So the trouble must lay in VMS's socket handling. But this hasn't changed either...
Indeed, I have found reading could take tens of second by the VMS program (20 seconds to read just over 6100 bytes). But another time it took just 7...

My must urgent questions:
HOW to observe the _live_ system to determine the cause of delay, preferably without rebuilding the software. If explicitly required, including some extra logging IS a possibility but I already found out it may make things even worse.
WHAT could cause this behaviour - and in what way can we monitor this?

BTW: It's well possible we'll contact HP for support but we'd like to do some measurements ourselves.

(Why that limit? I don't know. Changing it is said to be no option at the moment, since other sites don't have this problem)
Willem Grooters
OpenVMS Developer & System Manager
15 REPLIES 15
Mobeen_1
Esteemed Contributor

Re: TCPIP performance problems

Willem,
Can we rule out any hardware errors on your VMS machines. Looks like the processing is intermittent...why don't we look at the following and rule them out

1. Use Decevent and show errors

2. Use Analyse/System and use some of the
SHOW commands

3. Review your operator log and look
out for any errors.

I bet you would have been observing your TCPIP packet movement using the MONITOR command.

I suspect that we may have some kind of intermittent issue on your ethernet interface.

I know this is not a solution, but this is all i could think of. I am sure our other colleagues will rope in with their thoughts.

regards
Mobeen
Wim Van den Wyngaert
Honored Contributor

Re: TCPIP performance problems

After that, check the IP and TCP counters.
Do sysconfig -zp tcp to zero the counters,
run the transaction and do sysconfig -p tcp afterwards. Post the counters. Repeat this for ip too.

If possible, run tcptrace/prot=tcp/fu/pack=10000 for the connection and post that too.
Wim
Antoniov.
Honored Contributor

Re: TCPIP performance problems

It sound like NIC slows down to 10Mbs then return up to 100Mbs.
You told the site has replaced LAN by WAN and this may be not only casual; in some manner then host adeguate his speed: yes there isn't any logical in this but I think the original cause is WAN.
Check what update had made that site before change any bit of software, this can help you.

@Antoniov
Antonio Maria Vigliotti
labadie_1
Honored Contributor

Re: TCPIP performance problems

A friend of mine had a brute-force solution for this type of problem: take a crash when you see the problem, then you have plenty of time to analyze. Of course, maybe it is not possible to do it on your site :-)

As you said the Lan has been replaced by a Wan, it should be interisting to do a traceroute on both sides (Vms -> Tru64, Tru64 -> Vms) to see the path, and to check that when you have the problem, you still use the "corrrect" path.

ana/sys
tcpip sh dev bg /various_qualifiers may help

I am afraid this type of problem is better solved with some expert on-site.

Regards

Gerard
Willem Grooters
Honored Contributor

Re: TCPIP performance problems

Thanks for your help so far.
I have been granted access to that VMS system to investigate.

VMS 7.1-2
TCPIP 5.0A ECO 1
member of a 2-node cluster.

ANA/SYS: problem is that the process must be spotted first, and data to be retrieved within the minute it is active. So far, I missed it. No other data found.
MONITOR: Running (for analysis)
OPERATOR.LOG: No errors found
DIAGNOSE: there is no license for running /ANALYZE, /TRANSLATE didn't show anything on the NIC.
TCPIP: see attachment and remarks at the end
Hardware/system parameters: I'll have to check that. Wouldn't surprise me if some system parameters were changed but I don't see why this would introduce this intermittend problem.

I've seen a number of weird things:
* route for 0.0.0.0 is over two distinct gateways. Since the UNIX machine's network is specified in ROUTE that should not be a problem. But I did have trouble within a similar configuration using FTP.
* although service is not active, there still is a BG device.
* although the service has been activated several times, the is no entry in accounting...
* I tested (on a different port) with the same activity, and this finished within 10 seconds (and the activity showed up in accounting).

Next: I'll check the output of monitor. Hopefully the service will be activated....
Willem Grooters
OpenVMS Developer & System Manager
labadie_1
Honored Contributor

Re: TCPIP performance problems

You say

Problems started end of 2002....

May be the problem appears only when some load is reached ?

You should anyway apply the last eco for Tcpip 5.0A if you plan to call HP :-)

May be not related to your problem, but

tcp_recvspace = 32768
tcp_sendspace = 32768

this may be raised quietly.

Good hunt

Gerard
Wim Van den Wyngaert
Honored Contributor

Re: TCPIP performance problems

tcp_mssdflt = 536
is the default maximum segment size. Seems small. As I understand it, it is only used when Path MTU Discovery [RFC-1191] is not supported somewhere on the route to your destination.

To see what they changed in the config, consult the file sys$specific:[tcpip$etc]sysconfigtab.dat.
Wim
Eric Dittman
Frequent Advisor

Re: TCPIP performance problems

When the LAN was replaced by the WAN the problems started, so one thing to check is that the connection speeds are all hard-coded rather than using autonegotiate. Autonegotiation can be problematic, and if there's a mismatch you can experience problems like you are seeing.
Mike Naime
Honored Contributor

Re: TCPIP performance problems

I'm not sure if this is relavent, but when we had a SLOW connection in establishing a telnet session to a VMS system, we ultimately found that it was the "reverse telnet lookup" that was causing our slowdown. This was evidenced by a route on the routing table that was the same as the IP address of the system involved.
(example: AH 192.168.1.112 192.168.1.112)
When we blew away that route, the delay dissapeared. Since this is a dynamic route that is added by the system, it can re-appear on it's own if the system has network problems.

Possibly changing from LAN to WAN made the extra routes appear on the routing table. Remove the extra routes and see if your problems go away.

Mike Naime
VMS SAN mechanic
Willem Grooters
Honored Contributor

Re: TCPIP performance problems

Someone spotted me a cause - but I'm not convinced:
The message is built up and sent to the (TCP) socket in one call (that happens on TRU64). Of course, 8K is not transferred in one window - but in packets of 1460 bytes.
TCP will guarantee that all packets will be delivered in the right order.
So, it was suggested that, on the VMS-side, the application would need to wait until all data is received before reading the socket.
First, I wouldn't know how to accomplish this since TCP/IP isn't asynchronous (unless I could force INETACP to launch the service only when all data is received)
Second, I don't think this is true. I won't see the delay in the application - but in TCP it would still exist, so it would NOT speed-up the whole transfer (and it would be much harder to measure (if at least possible)).

Anything in that direction?
Willem Grooters
OpenVMS Developer & System Manager
Mobeen_1
Esteemed Contributor

Re: TCPIP performance problems

Willem,
Can u please confirm if your ethernet is set to 100 Base-T instead of auto-negotiate?

regards
Mobeen
Wim Van den Wyngaert
Honored Contributor

Re: TCPIP performance problems

Please post tcptrace/prot=tcp/fu/pack=10000 for the connection. And the counters.
This way we can see what is happening.
Wim
Willem Grooters
Honored Contributor

Re: TCPIP performance problems

Thnaks to all.
As investigations progressed, it soon turned out that the main problem for the application doesn't lay withjin TCPIP after all: it's IO that caused the problem for the application.....
Nevertheless, I'm not comfortable at all. Sniffing the network revealed that this not causing a problem: on the TCP-level, a message of 4000 bytes was acknowlegded within a second, and a full roundtrip (PC to PC (into emulation program) and back) was done within seconds. We didn't do measurements on VMS and within the application but from the logs so far it is calcutlated that it takes over 5 seconds to read about 2500 bytes into the application seems quite slow, compared with the speed of TCP.
Alas, TCPTRACE didn't work properly yet (BUFFERFULL warnings, data not saved), but as it seems that data has been received (as said: ACK within a second) I guess it's an application matter....
WIll try to get more.
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: TCPIP performance problems

Just as an exercise, could you post the (incomplete) output of
tcptrace/prot=tcp/fu/pack=10000 xxxx
where xxxx is the name of the remote machine.

Wim
Wim
Michael Stephan
New Member

Re: TCPIP performance problems

Hi Willem

if TCPTRACE warns BUFFERFULL, then add /BUFFERS=500, e.g.

Michael