Operating System - OpenVMS
1748169 Members
4091 Online
108758 Solutions
New Discussion юеВ

Re: Failed/slow DECnet copy

 
Stephen Daddona
Frequent Advisor

Failed/slow DECnet copy

We copy our database backup files to our backup/development AlphaServer nightly, and it started failing Monday with the following errors:

%COPY-E-READERR, error reading ES40::DISK$DGA205:[STUDENT_DB_BU]STUDENT_DB_FULL_BACKUP.RBF;1109
-RMS-F-SYS, QIO system service request failed
-SYSTEM-F-LINKEXIT, network partner exited
%COPY-W-NOTCMPLT, ES40::DISK$DGA205:[STUDENT_DB_BU]STUDENT_DB_FULL_BACKUP.RBF;11
09 not completely copied
%COPY-E-CLOSEIN, error closing ES40::DISK$DGA205:[STUDENT_DB_BU]STUDENT_DB_FULL_BACKUP.RBF;1109 as input
-RMS-F-WBE, error on write behind
-SYSTEM-F-LINKABORT, network partner aborted logical link

The copy command looks thusly (on the test Alpha):
$ copy/log es40::student_db_backup_dir:student_db_full_backup.rbf; student_db_backup_dir:*

I tried a copy of a smaller file (4685 blocks) and it took about 3 minutes to complete, where it should take just seconds.

Neither of the Alphas crash when the copy fails.

We've had the slow copy problem before, and the solution was to make sure the switch ports were the same as the NICs on the Alphas (100 bps/full duplex), which they are now. But I don't remember seeing the copy fail before.

NETSERVER.LOG on the "remote" node doesn't show any errors.

Any thoughts on this?


Thanks in advance!
8 REPLIES 8
John Gillings
Honored Contributor

Re: Failed/slow DECnet copy

Craig,

A duplex mismatch can cause all kinds of problems, including slow transfers and failures.

Use $ MCR LANCP SHOW DEVICE/INTERNAL to see the state of your adapter and any duplex mismatch warnings (depending on what version you're running).

I'd STRONGLY recommend you set all your adapters on all systems to AUTONEGOTIATE, and make sure all your switch ports are set the same. Any rumours you hear about autonegotiate not working are ancient history and long since fixed. If you want reliable network connections, set everything to auto.

A crucible of informative mistakes
Robert Brooks_1
Honored Contributor

Re: Failed/slow DECnet copy

I'd STRONGLY recommend you set all your adapters on all systems to AUTONEGOTIATE, and make sure all your switch ports are set the same. Any rumours you hear about autonegotiate not working are ancient history and long since fixed. If you want reliable network connections, set everything to auto.

--

John's advice is spot on, as always.

The chap from VMS Engineering who writes the ethernet device drivers has advocated the above for quite some time. Assuming an adapter of DE500-BA or newer, you absolutely, without question, should be able to set the relevant console variable to autonegotiate. Even older adapters like the
DE500-AA and -XA should work, but the -BA is
known to be much more robust.

There may be old, ancient
switches that may be problematic, but I'd not assume that to be the case here.

Forget any device settings via LANCP, at least initially. While there are certainly valid reasons for setting permanent characteristics via LANCP, for the purposes
of resolving physical layer issues (which is likely the case here), start with the something simple.

For completeness, what's the version of VMS and the type of ethernet adapter(s) involved?

Are both nodes on the same switch?

-- Rob
labadie_1
Honored Contributor

Re: Failed/slow DECnet copy

Apart from the pertinent info previously posted, have a look at the NCP counters

$ mc ncp sh k node counters
(or from one node
$ mc ncp sh 'other' counters)
$ mc ncp sh exe cou
and see if you have some "response timeouts", that is the number of lost packets.

It is (marginally more) efficent to do from Alpha1
$ copy file alpha2::"user pass":

than, on alpha2
$ copy alpha1::"user pass":file *.*

You can try to monitor the "quality" of your link with a basic procedure

$ loop:
$ sh ti
$ copy file remote::
$ sh ti
$ del remote::file
$ goto loop

and have a look at the times needed to copy the same file.

Then you can give some info to your network colleagues.

By the way, Vax/Alpha/Itanium, Vms version, all relevant patches applied ?
labadie_1
Honored Contributor

Re: Failed/slow DECnet copy

Is there some Decnet routing involved ?

Are both nodes in the same Decnet area , e.g. 12.100 and 12.101 ?

A slow/saturated Decnet router may be involved.
Stephen Daddona
Frequent Advisor

Re: Failed/slow DECnet copy

The network guy and I set the Alphas and their ports to auto-negotiate which didn't help. Just for fun (!) we set everything back to try other stuff to figure it out, when I noticed something. We were able to narrow it down to one of the ES40s that's having the "issues" - it's the production machine, naturally - and I noticed that the "Communication Medium" setting for the problem child is different, "CSMA/CD" vs. "Ethernet" for the Alpha that's ok. And, the "good" Alpha shows "Link Up Link state" where the "bad" Alpha doesn't show anything.

Can that be a clue?

I also looked at "Response timeouts" in NCP after zeroing the counters and trying a large file copy, but it stayed at zero.

Also: VMS v8.2, TCPIP v5.5 eco 1, DECnet IV

Thanks again!

Robert Gezelter
Honored Contributor

Re: Failed/slow DECnet copy

Craig,

Interpreting the resulting error counts from incorrect settings can be interesting. I recently had a case where the error counts did not show anything, and the only symptom was extremely slow transfers [the problem was a duplex mismatch].

Problems can also be caused by loading on the other system, or network contention on other hops in the network.

More data is always helpful.

- Bob Gezelter, http://www.rlgsc.com
Wim Van den Wyngaert
Honored Contributor

Re: Failed/slow DECnet copy

Could it be that the bad alpha has an unsupported/different/broken-but-not-completely-broken network card ?

Wim
Wim
Stephen Daddona
Frequent Advisor

Re: Failed/slow DECnet copy

Thanks for all the replies.

It turns out that the problem was the switch. (Not the switch port.) The switch was setup to (I hope I remembered this right) be included in a "spanning tree". Removing it fixed the problem.

Also, the network guy that I talked to from HP support told me that any NIC that's less than 1 GB should not be set up for auto_negotiate.