1821645 Members
2999 Online
109633 Solutions
New Discussion юеВ

NIC Teaming problem

 
Dave Behler
Frequent Advisor

NIC Teaming problem

We are regularly receiving the following warning message on startup of our DL380G4 servers. Several seconds after receiving the first message we receive another message stating that A previously failed Network Link's receive status has been restored. In addition, occasionally we also receive a 5719-Netlogon error.

I swapped out network cables, tried different ports, etc.. all with the same result. This is happening on all of our DL380G4's where we have teaming installed & configured. We have teaming configured for NFT only. These two warnings occur after every reboot. When opening the team utility both nics are connected, but nic #1 (the primary nic) is in standby.

Our config is DL380G4's, w2k3 (no sp1), psp7.40a (also have tried version 8.15 & 8.20 of the NIC Teaming Utility), connected to two Cisco 6509 Switches with ports configured to Auto/Auto (also tried forcing 100/Full, but with no change) and made sure all ports are in the same vlan across both switches.

The problems that I am experiencing sound very similar to the following advisory, however upgrading to HP Network Configuration Utility Version 8.15 and/or 8.20 doesn't appear to help.
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c00573469

Type: Warning
Source: CPQTeamMP
Event ID: 434
Event Time: 2/6/2006 3:32:22 PM
User: n/a
Computer: abc123
Description:
HP Network Team #1: PROBLEM: A non-Primary Network Link is not receiving. Receive-path validation has been enabled for this
Team by selecting the Enable receive-path validation Heartbeat Setting.
ACTION: Please check your cabling to the link partner. Check the switch port status, including verifying that the
switch
port is not configured as a Switch-assist Channel. Generate Broadcast traffic
on the network to test whether these are being received. Also make sure all teamed NICs are on the same broadcast
domain. Run diagnostics to test card. Drop the NIC from the team, determine whether it is receiving broadcast traffic
in that configuration.

Type: Warning
Source: CPQTeamMP
Event ID: 386
Event Time: 2/6/2006 3:32:23 PM
User: n/a
Computer: abc123
Description:
HP Network Team #1: PROBLEM: A Failover occurred: The Primary Network Link is not receiving. Receive-path validation has been
enabled for this Team by selecting the Enable receive-path validation Heartbeat Setting.
ACTION: Please check your cabling to the link partner. Check the switch port status, including verifying that the switch
port is not configured as a Switch-assist Channel. Generate Broadcast traffic
on the network to test whether these are being received. Also make sure all teamed NICs are on the same broadcast
domain. Run diagnostics to test card. Drop the NIC from the team, determine whether it is receiving broadcast traffic
in an unteamed configuration.
12 REPLIES 12
Connery
Trusted Contributor

Re: NIC Teaming problem

Hi David,
Here are a couple of things to try:

1. Try turning on PortFast on the switch ports that these NICs are connected to. The NICs exchange heartbeats and the switch ports may be being blocked by STP for up to 30-50 seconds causing the heartbeats not to work.

2. Try turning off heartbeats during a boot up to see if the messages stop. If so, then the focus should be on trying to determine whats preventing the heartbeats from succeeding during boot.

3. Can you attach the switch config for the ports connected to these NICs? I'll look over it to see if I see anything out of the norm.

Best regards,
-sean
Dave Behler
Frequent Advisor

Re: NIC Teaming problem

1. I forgot to mention in my initial post that PortFast was already enabled.

2. After upgrading to 8.15, the messages continued to appear on reboot. Turned off heartbeat setting (both transmit & receive path validation), rebooted and the messages stop. Great to see that they stopped, but where do we go next?

3. Switch configs attached as provided by my network team. If this is not what you are looking for please let me know and I'll see what I can do.

Thanks,
Dave
Connery
Trusted Contributor

Re: NIC Teaming problem

Hi David,
I looked over the configs and everything looks fine.

Here are a couple of more troubleshooting steps:
1. Have the networking group set the ports to the equivalent of the command "set port host". This command on CatOS disables trunking and channeling autonegotiation and turns on PortFast. You have already turned on PortFast, but I have seen trunking and/or channeling autonegotiation (DTP and PAgP, respectively) cause port startup delays that produce the symptom you are seeing.

2. Try increasing the heartbeat timer in the Teaming GUI to a higher value - say, double. See if this changes the behavior.

3. You can also try plugging both ports into the same Cisco switch to see if that changes the behavior. I know you don't want to run like that (because you want switch redundancy) but it would provide troubleshooting information.

Regards,
-sean
Matthijs Wijers_1
Trusted Contributor

Re: NIC Teaming problem

One simple thing to try:

Edit TCP/IP properties of your networkcard(s), goto advanced, choose WINS
untick "Enable LMHOSTS lookup"

Regards,
Matthijs
Connery
Trusted Contributor

Re: NIC Teaming problem

Use (or no use) of LMHOSTS lookup should not have an affect on NIC Teaming heartbeat error messages.

I'd be very interested in an explanation if someone disagrees.
Matthijs Wijers_1
Trusted Contributor

Re: NIC Teaming problem

I've seen netlogon (5719) errors during boot being solved by disabling the LMHOSTS lookup.

In this case the netlogon error makes sense if there's a hardware error causing failovers and lost links.
Is spanning tree disabled?

Regards,
Matthijs
Connery
Trusted Contributor

Re: NIC Teaming problem

Disabling LMHOST lookup may address his occassional 5719, but it won't have an affect on his more persistent problem of Teaming heartbeat event log entries.

Spanning tree has already been address in my first reply (PortFast). He replied that PortFast has already been implemented and his attached config confirms.

-sean
Matthijs Wijers_1
Trusted Contributor

Re: NIC Teaming problem

Team members can be split across more than one switch in order to achieve switch redundancy. However,
all switch ports that are attached to members of the same team must comprise a single broadcast domain
(in other words, same VLAN). Additionally, if problems exist after deploying a team across more than one
switch, all team members should be reattached to the same switch. If the problems disappear, then the
cause of the problem resides in the configuration of the switches and not in the configuration of the team. If
switch redundancy is required (in other words, team members are attached to two different switches), then
HP recommends that the switches be deployed with redundant links between them and Spanning Tree be
enabled (or other Layer 2 redundancy mechanisms) on the ports that connect the switches. This helps
prevent switch uplink failure scenarios that leave team members in separate broadcast domains.

More information:
"HP ProLiant Network Adapter Teaming White Paper"
ftp://ftp.compaq.com/pub/products/servers/networking/TeamingWP.pdf

Regards,
Matthijs
Connery
Trusted Contributor

Re: NIC Teaming problem

I wrote that paper. Glad to see it's being used! :-)

David has already verified that all ports are in the same VLAN. I also verified it by looking at his configs.

I also already took my own advice from that paper and recommended that David connect both Team members to the same switch to see if it solves the problem. I'm waiting on him to respond with the result.

Thanks for your ideas on the possible causes!

-sean
Matthijs Wijers_1
Trusted Contributor

Re: NIC Teaming problem

Nice read! ;-)

Regards,
Matthijs
Dave Behler
Frequent Advisor

Re: NIC Teaming problem

Hi Sean,

Here are the results of the last troubleshooting steps that you ask me to try:

1. I had my network team double check the channeling & trunking autonegotiation settings and they were set to auto. Basically, to make a long story short, it turns out that they are inconsistently set across our server switches. The network team changed these two settings to off for the ports that I am testing against and now the majority of errors are gone. I still on occassion get a 5719-netlogon error, but all of the subsequent error for services dependent on netlogon are gone. I would say that networking "startup" process seems to be occuring faster, but still not 100% perfect.

2. Increased the heartbeat timer to 9 and still had the issue, changed to 15 and the errors were gone. Note - that all of this was prior to the channeling & trunking autonegotiation settings changes on the switch.

3. I had tried plugging both ports into the same Cisco switch last week as part of my troubleshooting process, prior to starting this thread and received similar errors on boot.

Based on the results of step#1 (channeling & trunking autonegotiation settings changes) it definitely looks like we're onto something here, however there is still appears to be a conflict with teaming. I say this because when I drop teaming, all errors go away. I'm not sure if we need to just "adjust" a parameter, but I'm struggling with this as now the only error is the following:

Type: Error
Source: NETLOGON
Event ID: 5719
Event Time: 2/8/2006 9:01:18 AM
User: n/a
Computer: abc123
Description:
This computer was not able to set up a secure session with a domain
controller in domain AD3 due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this
computer is connected to the network. If the problem persists,
please contact your domain administrator.
ADDITIONAL INFO
If this computer is a domain controller for the specified domain, it
sets up the secure session to the primary domain controller emulator in the specified
domain. Otherwise, this computer sets up the secure session to any domain controller
in the specified domain.

Thanks for all of your help and I think that I'm starting to see the light at the end of the tunnel.

Dave
Matthijs Wijers_1
Trusted Contributor

Re: NIC Teaming problem

Regarding the 5719 Netlogon error, try my suggestion:

Edit TCP/IP properties of your networkcard(s), goto advanced, choose WINS
untick "Enable LMHOSTS lookup"

Regards,
Matthijs