Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

brief loss of cluster connection: SET KNOWN LINES ALL

 
Jess Goodman
Esteemed Contributor

brief loss of cluster connection: SET KNOWN LINES ALL

Environment: Alpha ES45s/ES40s VMS 7.3-2 (UPDATE-V2000) with DEGXAs at full-duplex gigabit. Also AlphaServer 4100s VMS 7.2-2 with gigabit DEGPAs.

All cluster members are on a single blade of a CISCO switch. Ports are set No Auto-negotiate.

When I reboot any of my systems which use a gigabit controller, it will join the cluster withut a problem. But when @STARTNET executes NCP SET KNOWN LINES ALL, the node loses connection to all cluster members for about 10 seconds.

I know that the NCP command sets the controller's physical MAC address, but this problem does not occur with the nodes that still use 100Mb NICs (DE500s/DE602s).

Anyone know why, or how to stop it?
I have one, but it's personal.
3 REPLIES 3
John Gillings
Honored Contributor

Re: brief loss of cluster connection: SET KNOWN LINES ALL

Jess,

I'm not sure about your SET KNOWN LINES issue, but "No Auto-negotiate" is a bad idea in general, but especially bad with gigabit. My understanding is you MUST have autonegotiate enabled for gigabit. Without it you don't get any flow control.

Unlike 10 or 100MBit, that means for gigabit you can get things failing even if you have identical speed and duplex settings between host and switch.

We had a cluster with hard set gigabit adapters. We found that when a node crashed (deliberate testing), the connections between all the other nodes bounced up and down for anything up to 20 seconds. 100% reproducible. Setting autonegotiate all round fixed the issue.

I reached the point in customer support that the first thing I checked on ANY performance related report was auto negotiate on all adapters. It never ceased to amaze me the kinds of performance issues that duplex mismatchs could cause, including several that didn't seem to be at all network related.

I would therefore strongly recommend that the first thing you do is enable autonegotiate everywhere. All adapters, all speeds, all switches. Even if it doesn't resolve your immediate problem (but I'd lay fairly good odds that it will), it will prevent you from getting any of the other myriad possible problems.

My guess is the switching on the DECnet lines is doing something similar to our node crashes, and you're getting some kind of flow control issue bouncing the link.


There was a time, many years ago, that some Digital branded network adapters had trouble with some non-Digital branded switches in autonegotiating. This has long since been fixed, but for some reason the myth that "OpenVMS doesn't autonegotiate" has stuck.

I believe the reality was a typical Digital story - Digital engineers followed the *published* standard of the time to the letter. More than one other manufacturer did not, and Digital got blamed as "non-standard". Weight of numbers eventually caused the real non-standard implementation to be adopted as a new standard, resulting in the flawed autonegotiation mechanism we have today (how else could we end up with duplex mismatches?).

I suppose it keeps some support engineers employed. Even today duplex mismatches is still a very common issue.
A crucible of informative mistakes
Hoff
Honored Contributor

Re: brief loss of cluster connection: SET KNOWN LINES ALL

Swap the Cisco out for testing purposes; replace it with a commodity gigabit switch. Cisco and some of the other managed switches can sometimes be too clever by half, and the networking gear is routinely derailed by MAC swaps.
Jess Goodman
Esteemed Contributor

Re: brief loss of cluster connection: SET KNOWN LINES ALL

John,

When we put in our very first gigabit adapter we did have it and the switch port set to auto-negotiate. When it briefly lost connection during boot, I guessed that it was due to the mac address change making the switch port re-negotiate, so we decided to try no auto-negotiate instead. Obviously that did not help.

I was unaware that it was needed for flow-control. Thanks fro the tip.

Our network guy is talking to Cisco support about this. If they don't have a clue we will certainly try another switch (thanks Hoff), but I'm puzzled no other sites have seen this issue.
I have one, but it's personal.