Operating System - HP-UX
1771261 Members
2175 Online
109004 Solutions
New Discussion юеВ

Re: Too many TCP errors or OK?

 
SOLVED
Go to solution
Abhik Sarkar_2
Advisor

Too many TCP errors or OK?

Hi Everyone,

We have two RP3440 machines running HP-UX 11.11. Both machines are connected to two Cisco switches (2950's) using HP-APA. The switches are further connected to Cisco Content Switches.

When taking some tcpdumps on these boxes to analyze some application issues, I discovered and anywhere between 1% to 3% of the packets coming to the boxes are either out-of-order or duplicate acks or retransmissions.

Some netstat stats are attached.

After a bit more investigation, I found that several of these "incorrect" or duplicate packets are due to what seem to be broadcasts from the switches, which results in both servers seeing the packets which was destined for one server.

Do I have a network problem on my hands, or is this amount of errors normal... can someone say from experience?

Thanks,
Abhik.
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: Too many TCP errors or OK?

Shalom Abhik,

You may have a networking issue.

I would concentrate on that. You may wish to use cstm or mstm or xstm to make sure the networking hardware is good.

lanscan

lanadmin -x 0

for lan0

repeat for other interfaces in use.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Abhik Sarkar_2
Advisor

Re: Too many TCP errors or OK?

Salam Steven,

Thanks for the reply.

Let me try out the commands you have given. Will keep you posted.

Best regards,
Abhik.
rick jones
Honored Contributor
Solution

Re: Too many TCP errors or OK?

As the netstat stats are since boot, and since some of them may wrap fairly quickly (notably the byte counters, which on 11.11 are still a measly 32 bit) you should take snapshots and run them through something like beforeafter - ftp://ftp.cup.hp.com/dist/networking/tools/

As for duplex issues, I've attached some of my standard boilerplate on the topic.

As for duplicate frames in tcpdump output, remember that tcpdump puts the interface into promiscuous mode, which means you will see all traffic arriving at the NIC, regardless of whether or not the NIC would have otherwise sent it into the host. And switches, while providing decent traffic isolation do NOT provide 100% traffic isolation. When they do not know, or have deliberately forgotten which MAC was on which port, it will "flood forward" (term?) even unicast traffic for that destination MAC until it sees that MAC as a source address. That can also happen if the switch's mapping tables overflow.
there is no rest for the wicked yet the virtuous have no pillows
Abhik Sarkar_2
Advisor

Re: Too many TCP errors or OK?

Steven, thanks. The lanadmin command helped determine that there was a duplex mismatch between the hosts and the switches and I fixed that. Attached is the graph showing the collisions decline (actually vanish).

Rick, just saw your suggestions when I was going to reply to Steven now, thanks.

Your "boilerplate" on duplex settings is the first bookmark on in my ITRC folder :-) I checked this and found the switch to be 100FD Fixed, while the systems were on 100 HD Auto-Neg. Results already mentioned.

I also understand your point on how switches work and understand why I see the occasional floods. However, I could not find a pattern to this, and hence could not link it to some timers on the switches. I have also been monitoring the switch logs and turned on some debug on ARP, MAC and STP events, but could not catch anything unusual.

I also found your reference to "beforeafter" in another thread and tried that out as well... I have tons of stats from the system that I collected using a script and will analyze it in the lab in our office using that tool.

I am still not able to pinpoint the problem though. Various clients connecting to these machines keep complaining of throughput issues and frequent disconnections. Maybe the problem is somewhere else and will need investigation somewhere in the network.

Thanks for your help anyways. At least I caught something which might have been effecting throughput.

Best regards,
Abhik.

Steven E. Protter
Exalted Contributor

Re: Too many TCP errors or OK?

Salam,

I cut and pasted that so I would not mis-spell it.

Cisco switches are problematic with APA. They require special configuration that a some of Cisco administrators simply don't know how to do.

This problem can be overridden by
lanadmin -X 0 100FD

It can also be overridden by hpbtlanconf file in /etc/rc.config.d/

You should not however have to do any of that. This problem should be fixed the network admin. You may do better plugging both cards into the same cisco switch. That moe has alleviated my problems doing this with the Linux equivalent of APA.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
rick jones
Honored Contributor

Re: Too many TCP errors or OK?

Why on earth do Cisco switches require the links of an aggregate be 100FD?
there is no rest for the wicked yet the virtuous have no pillows
Abhik Sarkar_2
Advisor

Re: Too many TCP errors or OK?

Steven, Thanks for the reply. I have already taken both actions. The graph I attached yesterday shows the drop in collisions after I made the change.

In this case, unfortunately, I am also responsible for the switches :-(

I am suspecting the fact APA has load balancing set to MAC Address based (LB_MAC) while is it connected to two switches. I had similar problems with a bonding driver on a Linux machine connected to two switches, and after I set the bonding driver to standby instead of load balance, the problems went away.

If you have any reference to the Cisco problems with APA, I would appreciate a link.

Rick, the aim of the aggregation (in this case) is merely redundancy of network cards and switches. It's not really meant for throughput and in fact, the aggregate is not even configured on the switches because from the servers, the cables connect to two different switches.

Thanks for your replies!

Abhik