ProLiant Servers (ML,DL,SL)
1753793 Members
7212 Online
108799 Solutions
New Discussion юеВ

Re: Transmit Load Balancing fails on Bl20p G3

 
Anders_35
Regular Advisor

Transmit Load Balancing fails on Bl20p G3

Not sure whether this is a server of windows question, but ...

We have an odd problem with our BL20's. If we configure teaming out-of-the-box, they seem to work allright, but on closer inspection, lot's of network traffic just disappears.

After some testing, we see that when using transmit load balancing, with method "TCP Connection" or "Destination MAC", we loose a lot of outbound tcp-traffic. It just doesn't seem to leave the server, even if there are no errors reported anywhere.

Anyone seen this before?

HP just says "latest firmware, latest drivers", but looking through the release notes there are no mentions of any such errors fixed. We have had way too much trouble and downtime with unecessary upgrades, for me to be willing to just turn around and upgrade just like that.
15 REPLIES 15
Connery
Trusted Contributor

Re: Transmit Load Balancing fails on Bl20p G3

Anders,
Can you provide more information about how you know that "we loose a lot of outbound tcp-traffic. It just doesn't seem to leave the server"?

That will help diagnose the issue.

Have you verified that all NIC ports in the team are connected to switch ports in the same VLAN? Are both blade switches on the same broadcast domain/subnet?

We have a Teaming Whitepaper for reference also:
ftp://ftp.compaq.com/pub/products/servers/networking/TeamingWP.pdf

Best regards,
-sean
Anders_35
Regular Advisor

Re: Transmit Load Balancing fails on Bl20p G3

The servers are set up with teaming on NIC 1 and 3, ie. one nic on each GbE2 switch.
There is only one link out of the blade enclosure, on switch A, so all traffic passes that switch.

When sniffing the network on switch A, we should see the traffic coming through on the local server port, or on the port 17/18 interconnect.

But, when it doesn't work, we do not see any traffic from the server on these ports.

If I switch to NFT, I can switch between both nics, and it works fine. Ie. we have a working connection to the network through both GbE2s.

My first test indicate that the problem varies with each TLB method. For instance with "TCP Connection" it works on/off, in uneven periods of 30 to 50 minutes.
Dest. MAC: Doesn't work most of the time.
Dest. IP: Seems to work.
Round-robin: Seems to work

But after upgrading firmwares, and installing proliant support pack 7.40B (was 7.20) Dest. IP fails occasionally, too.
I'm now going to try PSP 7.51.

The tests we perform is simple:
We use an smtp-client to send a small email.
When it doesn't work, the client doesn't get a TCP-connection. (And we never see packets on the network, not a single SYN).

Since the web-servers running on these servers are all OK, I believe it is just outbound connections (initiated on the server) that are affected.
Carsten Reinhard
Frequent Advisor

Re: Transmit Load Balancing fails on Bl20p G3

Anders,

I read this morning an HP advisory:

Description: Advisory: ProLiant Servers May Become Unresponsive in Configurations Running HP Network Teaming Software and Running Microsoft Windows Server 2003 SP1 with the Scalable Networking Pack (SNP) and TCP/IP Offload Engine (TOE) (c00747687)

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00747687&jumpid=em_EL_Alerts/US/Sep06_ALL/Alerts

Maybe you have to install the mentioned MS-Patch.


Greetings Carsten
Anders_35
Regular Advisor

Re: Transmit Load Balancing fails on Bl20p G3

Thanks, I'll have a look at that.
Anders_35
Regular Advisor

Re: Transmit Load Balancing fails on Bl20p G3

Some additional info:

I've now updated every firmware and driver I can, but still no change.

Connery
Trusted Contributor

Re: Transmit Load Balancing fails on Bl20p G3

Troubleshooting steps:
1. Disable NIC 1 in the Team via the Microsoft UI (Network Connections). Run tests again.

2. Enable NIC 1 and disable NIC 2. Run tests again.

What's the behavior change? If one NIC works fine and another doesn't, you need to look at the switch configs for that port.

Also, are you running version 8.37 of the Teaming driver?
Anders_35
Regular Advisor

Re: Transmit Load Balancing fails on Bl20p G3

Thanks Sean, but the switch configs are identical, at least within each enclosure, I checked.
The problem also occurs on at least three different blade enclosures (I didn't test the rest we have),
on two different switch firmware versions, and two different switch configs (the switches in one enclosure are different from the switches in the other two).

Also, when using NFT, both nics work like a charm, so I would think a failure on one nic over the other would be more indicative of an error in teaming than in the switch.
But I am testing it, just to be sure.

Tomorrow I'm going to do some network sniffing on all these six switches, just to confirm that I'm seeing the same everywhere.

>Also, are you running version 8.37 of the
>Teaming driver?

Yes, I am now, after upgrading to support pack 7.51. Still no luck, though...
Anders_35
Regular Advisor

Re: Transmit Load Balancing fails on Bl20p G3

I ran the suggested test, with one NIC disabled and one enabled. It works just fine in both configurations.

As soon as I switch back to using both NICs it's failing again...
Connery
Trusted Contributor

Re: Transmit Load Balancing fails on Bl20p G3

What are you using to test connectivity? PING or something else?

Some apps/devices (ex. IP interfaces on some printers) don't like receiving data frames from the non-Primary NIC in the team because the source MAC address doesn't match the MAC address in it's ARP cache for the Team's IP address. A properly implemented TCP/IP stack shouldn't care, but it doesn't always get implemented properly on all devices.

That being said, make sure you are testing connectivity with a Windows system using a Windows utility (like PING). I know they work.

Another alternative is to use the Dual Channel team type. It does require an Intelligent Networking Pack license, though.
http://h18004.www1.hp.com/products/servers/proliantessentials/inp/index.html

Dual Channel uses separate ARP replies for each channel. Therefore, the other device always sees a data frame's source address that matches the ARP cache entry it received from the team.

If TLB is still causing you a problem and you don't want to use Dual Channel, I'd recommend you opening a case with our support team. The team can be reached by calling 800-354-9000 and have the call be sent to the USS_SC NETWORK queue.

Best regards,
-sean