Operating System - OpenVMS
1753479 Members
4948 Online
108794 Solutions
New Discussion юеВ

Very bad Performance over native DECnet

 
SOLVED
Go to solution
Heinz W Genhart
Honored Contributor

Very bad Performance over native DECnet

Hi community

We found a very strange behaviour during DECnet Copy.
I attached a Excel sheet with the results of my measurings with DTSEND.

We are generally using DECnet over IP.
The two nodes ABC001 and ABC002 are Cluster Members. So if we copy something from one of those nodes to the another one, nativ DECnet is in use. In all the other cases DECnet over Ip is in use

If I use DTSEND form ABC001 to ABC002 the performance is very very bad. If I do same test by using the scssystemid instead of the nodename the performance increases, but is still not good enough.
If I do same test, but from ABC002 to ABC001 the performance is much better.

The Nodes ABC001, ABC002 and DEF002 are in same IP subnet (so all tests marked with N/A are not possible, because we do not have any DECnet Routers).

Does somebody have a idea, where the problem is?

We checked allready the tower informations with MC DECNET_REGISTER, we flushed the cache (mc ncl flush sess contr nam cache entr "*")

The measuring results are not really reproducible, because we have a lot of DECnet traffic during working hours. So if I do dame test multiple time the results are different for any masuring.
I will try to meassure again this night and I hope that we have more and better reproducible figures.

Thanks in advance

Heinz
22 REPLIES 22
Robert Gezelter
Honored Contributor

Re: Very bad Performance over native DECnet

Heinz,

I have had many situations where the underlying problem was a duplex mismatch somewhere in the network.

What made it seem to appear randomly was the question of other traffic on the network.

Another possibility is collisions somewhere in the network.

As a start, I would review the error counters in the path that the DECnet traffic is taking (note that this may be different than the path that it is taking when it is routed as DECnet over IP).

- Bob Gezelter, http://www.rlgsc.com
Andy Bustamante
Honored Contributor

Re: Very bad Performance over native DECnet


You can check counter and speed duplex setting in LANCP

$ MC LANCP
> show dev /counter
> show dev /char

Recent versions of VMS have reportedly improved autonegotiation, however I still hard set both the server and the switch.

Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Volker Halle
Honored Contributor

Re: Very bad Performance over native DECnet

Heinz,

my first guess is: lost packets.

The DECnet re-transmission timing is very poor when compared with other network protocols. Once a DECnet packet is lost, it takes a looong time until that packet gets re-transmitted. For COPY or DTSEND, this looks lik 'bad performance', but in reality, nothing is being transmitted during the timeout period.

With DECnet Phase IV, you could easily do MC NCP SHOW NODE destination-node COUNTERS from the source node and you would look for 'Response Timeouts'.

With DECnet-Plus, it gets a little tricky. The easiest way is to use MCR NET$MGMT (needs DECwindows display) and look at Tasks -> Show Known Node Counters and look for Retransmitted PDUs to the destination node of your DTSEND test.

You can also drill down on the (NSP or OSI) transport -> local NSAP -> remote NSAP -> NSAP address of dest - then Actions -> Zoom will show you the counters. Look for 'Retransmitted PDUs' and/or 'Duplicate PDUs received'.

Another way to verify, if you are loosing packets, would be to run MONI DECNET or MONI PROC/TOPBIO while DTSEND is running. If you do not see a constant rate of IOs, the chances are high, that you're loosing packets in the network path and have to wait for re-transmissons.

Once you confirm, that this is the reason for the perceived 'bad performance', then comes the interesting part of trying to find out, where the packets get lost.

Volker.
Heinz W Genhart
Honored Contributor

Re: Very bad Performance over native DECnet

Hi Robert and Andy

First thing what we did was to check the counters in LANCP. There is no problem like Collisions, Frame Check errors or something like that. All the counters are 0, except the counters for packets/bytes received/sent.

We are using Cisco Switches and the ports where our machines are connected to, are set to 100M Full Duplex. The Interfaces are also set to 100M full duplex, done in console mode.

Each of the two problem machines has 2 dual NIC's. We configured FailSafe IP and all 4 lines are configured for DECnet

I think this is not a hardware issue, because as you can see on the excel sheet, that some connections from remote machines using DECnet over IP are so fast, as expected. So we (me and the Swiss OpenVMS Ambassador) are think, that this is a problem with name resolution, lost packets, towers or something like this. (but what?)

I think we will follow the instructions of Volker.

I compared the NCL scripts in sys$specific of the two machines. They are identical, except the addresses.

Our good luck is, that this are two testmachines (GS1280). But in our case, testmachine means, that there is a test team (40 people) and a development crew (80 people). For us, the Systemmanagement those machines are like a production machine, because we have to announce changes many days before we do them. Even during night, we can't do there something like a reboot without preannounce.

This afternoon we started to look at the CDI caches, we tried to use sys$update_decnet_migrate (show path to local:.nodename), but we don't have yet some brainy results.
I think we will start to follow the instructions of Volker, but I can├в t do it before tomorrow.

... but still any input is very welcome.

Heinz


Volker Halle
Honored Contributor

Re: Very bad Performance over native DECnet

Heinz,

what you can do now, is:

NCL> SHOW NSP LOCAL NSAP *

note local NSAP address

NCL> SHOW NSP LOCAL NSAP local_nsap REMOTE NSAP * retransmitted pdus, duplicate pdus received

Repeat the same for OSI TRANSPORT ...

If all those counters are 0, you can forget about my theory. If not, we'll see...

Volker.
Cass Witkowski
Trusted Contributor

Re: Very bad Performance over native DECnet

To test for packet lost try using PING with a large packet size, say 10,000 bytes and a count of 100 packets

With a duplex mismatch you can see any where from 7% to 20% packet loss

If you have both NSP and OSI transports enabled on a node but only DECnet over IP working between the nodes then you will get a 30 second delay in the beginning as DECnet trys NCP first, timesout, and then tries DECnet over IP. We have a 6 node cluster with 3 nodes on one subnet and 3 on the other. Within a subnet NSP works but between the subnets only DECnet over IP works. We had to remove the nodes on the other subnet from the DECnet_Register so that it would only try DECnet over IP.

You may need to check the DECnet_Register to make sure the address data is correct.
Robert Walker_8
Valued Contributor

Re: Very bad Performance over native DECnet

Hi,

We had similar problems but with Tru64. Setting the 100M Full Duplex on the console doesnt work for Tru64, and may still be an issue with OVMS.

Try ftping from each machine a very large file to NLA0:[000000] if you have the network setup at 100M full duplex across the switches and hosts then TCPIP will transfer at over 9MB/sec. However if you find only one system is getting that and the other is getting much much less then you know that one system is probably running in half duplex.

In T64 you have to force it at the OS level as the console setting of the duplex is ignored.

Robert.
Volker Halle
Honored Contributor

Re: Very bad Performance over native DECnet

Heinz,


If I do same test by using the scssystemid instead of the nodename the performance increases, but is still not good enough.


What do you mean by this ? What is the difference between using SCSSYSTEMID or nodename for the DTSEND test ? Is it selecting NSP via OSI transport ? Does the node name and the SCSSYSTEMID differ ?

What do the numbers mean in your spreadsheet ?

Your DTSEND is sending large packets in one direction and small ones in the other. This may make a difference. You can specify /TYPE=ECHO to have DTR send back the whole packet.

Volker.
Tim Hughes_3
Advisor

Re: Very bad Performance over native DECnet

There are a few differences between DECNET over IP and straight DECNET V that can cause this:

- by default DECNET users larger packets that can lead to packet loss on a flaky network.
- DECNET uses an OSI IS-IS router. This can be an old router (DECxyz) buried in the network somewhere that may only have a 10Mb link.
- The DNS name lookup is different. This can cause slow link establishment as the DNS lookup list time outs.

I found the easiest way out is to force DECNET over IP by fiddling the address towers with DECNETREG or just removing the node from DECNETREG. I think there are better ways using NCL if you have got the time. Don├в t forget to do a NCL FLUSH CACHE├в ┬ж. Also the back translation for proxy access may change,

Tim