Operating System - OpenVMS
1751707 Members
5413 Online
108781 Solutions
New Discussion юеВ

VMS 7.3-2 Copy slow on Alpha

 
John_TT
Advisor

VMS 7.3-2 Copy slow on Alpha

Hi, we seem to have developed a problem on our Alpha systems (38 DS10's and DS15's). A number of the systems are setup in master slave pairs with the master updating the slave by frequently copying files over the net. The application was written more than 10 years ago and has been running ok since, firstly on VAXen. But recently one of the application developers noticed that some file copies were taking much longer than usual, i.e. from 3 minutes to 12 minutes. We can reproduce the problem on any of 2 of the systems. Sometimes the copy works as expected in a few seconds but most of the time it is slow.

$ copy fileA.txt (50000 blocks) nodeB::*.*;
-Should take about 6 seconds (as seen in the past) but takes around 23-24 minutes to complete. The copy does eventually complete so there are no error messages seen. Have tried nodeB"system password":: same result.

$ Copy nodeB::fileA.txt (same file 50000 blocks) *.*;
-This takes around 6 seconds...

I have looked for duplex problems, switch problems NIC errors but can't find the problem. I can reproduce this problem on 2 standalone systems with 1 switch or with a crossed network cable. The NICs are set to auto-negotiate and the Cisco catalyst switches too, I have tried Fastfd settings on all (100Mb Full duplex on switches), no change.

The Alphas are DS10/DS15's with VMS v7.3-2 V0900 patch update, although I have tried with Update V1600 on 2 systems. There have been no significant changes to the systems in about 2 years.

Any suggestions on where to look next, or how to trace the problem?
40 REPLIES 40
Volker Halle
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

John,

welcome to the OpenVMS ITRC forum.

Check with $ MC LANCP SHOW DEV/INT for LAN error messages (bottom of page). Also check the LAN counters (LANCP> SHOW DEV/COUNT) on both systems.

Which version of DECnet are you using (DECnet-IV or DECnet-OSI) ? Or DECnet-over-IP ?

If there is ANY packet loss when using DECnet NSP or OSI transport on a LAN, the retransmit timeout is normally causing what's perceived as 'bad performance'.

You can 'see' the transmit timeouts by watching hte NODE (or link) counters between the 2 systems involved:

DECnet-IV:

$ MC NCP SHOW NODE nodeB COUNTER (from NodeA and vice-versa)

Look for 'RESPONSE TIMEOUTS' increasing while your 'slow' copy is active.

DECnet-OSI:

$ MC NCL SHOW {NSP | OSI TRANSPORT } LOCAL NSAP xxx REMOTE NSAP yyy ALL COUNTER

and look for 'Duplicate PDUs Received' and 'Retransmitted PDUs' increasing.

The timeout for a lost packet is typically quite high (many seconds) and will cause the traffic on the link between the 2 processes to momentarily stop.

Try to run MC DTSEND data/NODE=NodeB tests as well, this will rule out all possible disk-IO related delays. You need to setup the DTR (object or session control application) on the remote node prior to running this test.

Volker.
John_TT
Advisor

Re: VMS 7.3-2 Copy slow on Alpha

Hi Volker,
We are using Decnet-Plus
NodeA: Lancp shows no errors on EWA0 (DE500-BA), 4x unrecognised multicast packets. And EWB0 (Not configured for Decnet!) shows last error at 18:07 Today and 3105 carrier check failures?
NodeB: Lancp shows no errors on EWA0 (DE500-BA), And EWB0 (Not configured for Decnet!) shows last error at 18:03 Today and 300 carrier check failures?

Sorry what are xxx and yyy you refer to in the MC NCL check?

I have not used DTsend before what do I have to setup?

Our network guys say there are no lost packets on the net.

Thanks in advance
marsh_1
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

Volker Halle
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

John,

to be able to run DTSEND to a remote node, you have to set up the DTR session control application with a valid username. Use

$ MC NCL SET SESSION CONTROL APPLICATION DTR USER NAME = "MIRRO$SERVER" - assuming this username does exist in the UAF. It should, because it's the default for the MIRROR session control application.

Then invoke:

$ MC DTSEND
_Test: data/node=

DTSEND will send as many DECnet packets as possible to the remote node for 30 seconds and then report:

%NET-S-NORMAL, normal successful completion

Test Parameters:
Test duration (sec) 30
Target node "I64VMS"
Line speed (baud) 1000000
Message size (bytes) 128

Summary statistics:
Total messages XMIT 141600 RECV 0
Total bytes XMIT 18124800
Messages per second 4720.00
Bytes per second 604160
Line thruput (baud) 4833280
%Line Utilization 483.328

This is the easiest way to test DECnet performance between 2 nodes in the network. You can also increase the packet size and other parameters, see the documentation.

If you get inconsistent results for messages per second between multiple runs, like 4720 and then 50 the next time, it's time to look deeper...

In DECnet-IV you could easily see the node counters with MC NCP SHOW NODE COUNTERS. In DECnet-OSI, it is harder. DTSEND could be using NSP or OSI transport (or even DECnet-over-IP). To look at the 'node counters', you need to find out the local NSAP and remote NSAP addresses and the transport protocol being used (NSAP = Network Service Access Point).

The 'easy way out' would be to use $ MCR NET$MGMT (requires a X11-display), then click on Tasks -> Show Known Node Counters.

Finding the local and remote NSAP addresses goes like this:

$ MC NCL SHOW NSP LOCAL NSAP * ALL
$ MC NCL SHOW OSI TRANSPORT LOCAL NSAP *

Repeat these commands on the remote node, then check the node counters:

$ MC NCL SHOW NSP LOCAL NSAP REMOTE NSAP ALL COUNTERS

and the same for OSI TRANSPORT instead of NSP.

Volker.
Robert Gezelter
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

John,

As has happened in another thread, please check the DUPLEX SETTINGS on ALL equipment. Severe slowdown (carrier check errors are one common indicator) are frequently caused by a duplex mismatch somewhere in the network path.

- Bob Gezelter, http://www.rlgsc.com
marsh_1
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

john,

have you tried hard setting the nic's speed , i know that auto neg is supposed to have fewer issues these days but ...


marsh_1
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

personal choice - would start with 100 half duplex.

fwiw

John_TT
Advisor

Re: VMS 7.3-2 Copy slow on Alpha

Hi,

These 2 commands work fine on all systems:
$ MC NCL SHOW NSP LOCAL NSAP * ALL
$ MC NCL SHOW OSI TRANSPORT LOCAL NSAP *

This command works OK for NSP but fails with a "command failed no such object instance" message for OSI TRANSPORT:
$ MC NCL SHOW NSP LOCAL NSAP REMOTE NSAP ALL COUNTERS

The counters look OK no errors. DTsend also came back with consistent results on both test nodes with several tries.

As I mentioned I looked for Duplex problems at first. I have tried setting the NICs, at the console level, to FastFD and the switch to 100MB Full-Duplex with no change. The test systems are connected to an HP procurve switch, all set to auto-negotiate, the results are the same with a cross over cable and no switch.
Volker Halle
Honored Contributor

Re: VMS 7.3-2 Copy slow on Alpha

John,

so you apparently have not configured OSI TRANSPORT, that's fine.

Are you saying that the 'Duplicate PDUs Received' and 'Retransmitted PDUs' are ZERO for all remote NSAPs listed ? Can you identify the REMOTE NSAP for NodeB (use a NCL SHOW NSP LOCAL NSAP * on NodeB) and just watch those counters. The PDUs Received and PDUs Sent counters increase, while you run your DTSEND test between NodeA and NodeB, right ?

If your 25 MB file took 6 seconds to be transfered in the past, this would be a good 33 Mbit/sec utilization of a 100 Mbit network. You cannot argue, that the network is that much 'slower' now, especially, if DTSEND also shows good throughput. You could use /SIZE=512 to obtain a good approximation of the possible network bandwith for a file transfer. You could also add /TYPE=ECHO, so that massive amounts of data takes place in BOTH directions.

Use MONITOR DECNET on the local and remote node while testing with DTSEND, so you can get an estimate of the achievable packet rate. What's the rate during file copy ?

If all the DTSEND test show accetable performance and copy doesn't, you have to think about where the bottleneck may be. Can you test a $ copy local_file.txt nodeB::*.* on NodeB itself ? In that case, the data would not travel over the physical network at all ? What performance do you get then ?

Volker.