Operating System - HP-UX
1845626 Members
3065 Online
110246 Solutions
New Discussion

Nightmare of a network problem

 
Philip Kime
Regular Advisor

Nightmare of a network problem

Three of us have now spent days trying to fix this ... HPUX 10.20 running on two K-class servers with the usual 100Mb GSC cards (btlan4 driver) and APA. Incoming traffic to these cards is around 60-80 Kb/s maximum *for some hosts*. Traffic between them is terrible - about 40Kb/s for FTP. We have checked switches, cables etc. Packet tracking with tcpdump, nettl and ethereal shows the same thing - horribly mangled packet reassembly problems - lots of things arriving in the wrong order hence a lot of retransmits.
APA is not to blame - it happens on the raw interfaces too.

It looks like a broken router but we have checked all that. It can't be the IP stack because this problem isn't seen from some hosts (nor from localhost to localhost which still goes through the IP stack). This is driving us mad and these machines have to be in production in two days. It only seems to be *incoming* connections that are incredibly slow, not outgoing (FTP PUT of 8M from one of these machines is 0.5 seconds or so. FTP GET of the same file is 34 seconds ...). Normally this is an MTU problem but that has been checked multiple times - it is the default 1500 everywhere. As far as I can tell, both machines have all the latest network related patches.

Any ideas about where to look much appreciated.
26 REPLIES 26
Patrick Wallek
Honored Contributor

Re: Nightmare of a network problem

Verify your speed and duplex settings on the cards and the switches they are plugged into. Make sure you manually set everything, do *NOT* use auto-negoitiate.
Robert-Jan Goossens
Honored Contributor

Re: Nightmare of a network problem

Hi,

Just a shot, do you use full duplex on both the switch as the servers ?

/etc/rc.config.d/hpgsc100conf

Robert-Jan
James A. Donovan
Honored Contributor

Re: Nightmare of a network problem

I seem to remember having similar problems. It was fixed by setting the PMTU strategy to a particular value, in order to always set the Don't Fragment bit. We just got rid of our last 10.20 box, so I can't look up the correct ndd parameter, but under 11/11i it's called ip_pmtu_strategy.
Remember, wherever you go, there you are...
Robert-Jan Goossens
Honored Contributor

Re: Nightmare of a network problem

Jim,

Would that be

HP-UX 10.20 nettune manpage
nettune -l udp_pmtu
nettune -h udp_pmtu
nettune -l pmtu_defaulttime
nettune -h pmtu_defaulttime

HP-UX 11.X ndd manpage:
ndd -get /dev/ip ip_pmtu_strategy
ndd -h ip_pmtu_strategy

Robert-Jan
Steven E. Protter
Exalted Contributor

Re: Nightmare of a network problem

With those cards you need to hard code speed and duplex settings on the file /etc/rc.config.d/hpbtlanconf

Attaching an example.

Switch settings must be explicit, not auto as Patrick notes.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

All cards and ports are hard-coded to 100HD (don't ask why) - we've been hit by the old auto negotiate problems too many times ... I'll check the PMTU settings, thanks.
Ron Kinner
Honored Contributor

Re: Nightmare of a network problem

Your symptoms of slow FTP in one direction seem to indicate a duplex mismatch somewhere along the way. Have you looked in

lanadmin
lan
display

to see if you are getting errors? Doesn't have to be the HPUX. Could be at the router.

Have you tried connecting the two up with crossover cables to see if they can talk OK.

Do local machines have the same problem?

Does traffic in one direction follow a different path than in the other direction?

Did you put in more than one default route?

Be advised that the TCP/IP stack is smart enough to recognize its own address and thus saves time by looping it back. It does not send traffic out on the wire.

Ron
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

PMTU settings are all default and look ok - they are the same as our other 10.20 boxes which are fine.

Network people have checked the routers and say they are all ok.

We can't link these machines by a crossover cable because they are about 60 miles apart ...

Some local machines have the same problem transferring files to them, some don't.

It's hard to tell the paths at present because they are on VLANS - I'll get networks to check this.

Even though local->local traffic doesn't go on the wire, it still goes through the IP stack and there seems to be no problem. This all sounds so much like a bad router to me ...
Jean-Louis Phelix
Honored Contributor

Re: Nightmare of a network problem

Hi,

Did you try a "ping -o -n 1" on both hosts having problems to communicate to check if they really use sames routes ?

Regards.
It works for me (© Bill McNAMARA ...)
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

The are on the same VLAN so this doesn't give you any useful information - they look like they are on the same segment to ping ...
Elmar P. Kolkman
Honored Contributor

Re: Nightmare of a network problem

First take a look at the collissions on your network. They could have something to do with your problem. Same for the input/output errors.

What you could try is to disable the APA for the time being (doesn't give any advantage now anyway) and configure the cards each with a seperate IP address. Then try to use the seperate cards one by one. Perhaps one/some of the cards are generating a lot of rubbish on your Half-Duplex (HD) network resulting in lot of collissions.

Good luck.
Elmar
Every problem has at least one solution. Only some solutions are harder to find.
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

THere are some errors but no particularly noticeably high collisions. We've tried looking at all cards seperately, with or without APA. Our conclusion was that is was the cables. They were replaced yesterday - no difference ...
U.SivaKumar_2
Honored Contributor

Re: Nightmare of a network problem

Hi,

Please paste the output of

#netstat -p ip

What is the value of netmemmax kernel paramter in your kernel ?

Have you applied patch PHNE_28923 ?

regards,

U.SivaKumar

Innovations are made when conventions are broken
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

ip:
1024356 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with header length < data size
0 with data length < header length
0 illegal ip source address
0 ip version unsupported
560 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets forwarded
3554 packets not forwardable
0 redirects sent


This is a 10.20 machine, not 11.11 so that patch doesn't apply. However, I do have PHNE_28536 which is the latest 10.20 LAN patch.

That kernel param is 11.11 too I think. Networking fragmentation memory is large from dmseg):

Networking memory for fragment reassembly is restricted to 348389376 bytes
clar_1
Advisor

Re: Nightmare of a network problem

Hi

Pls. have a look on Auto-negotiation-try keeping it OFF.Additionally try changing below settings from command line and observe,if there is a considerable improvement in link-access speed,the same can be put in ur start-up scripts later.
1.tcp_xmit_hiwater_def
2.tcp_xmit_hiwater_lfp
3.tcp_xmit_hiwater_lnp
4.tcp_xmit_lowater_def
5.tcp_xmit_lowater_lfp
6.tcp_xmit_lowater_lnp

Regards
Jai

Nothing is impossible
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

See above - Auto neg is off everywhere. This is a 10.20 machine, not 11.x so those parameters do not exist on these machines.
Ron Kinner
Honored Contributor

Re: Nightmare of a network problem

If you have another HPUX machine on the same local subnet see if a linkloop will work. If 10.20 supports this you can then rule out the Ethernet switch, cables and NICs. See man linkloop for details. You will probably need to tell it which NIC to use since it sounds like you have more than one.

Try and get the network guys to check
show interface
on each end of the WAN and look for dropped packets or errors on the serial ports and also on the Ethernet. This could be a case of the input queue being too full in one direction. You might try running your FTP late at night to see if it runs better then. That is a good sign of congestion problems.

Except you say that you have local machines which have the same problem? Could you not check these with a crossover cable?

Ask the network guys to run an extended ping to your IP address and to have it Sweep Range of Sizes using the default start, stop and step. (You are using Cisco routers I hope.) This is a great test of the connection from the router to the HPUX. It should come up all !'s but if you see some .'s then there is something wrong.

Ron
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

Unfortunately, we can't run crossover cables because all of the machines in question are critical production machines. The network people are not being very helpful in tracing this to be honest. I have installed an enhanced ping on the hosts (that do the "flooding" ping with "." and "!". We get lots of "."s ....

doug mielke
Respected Contributor

Re: Nightmare of a network problem

less than adaquate response from those practicing the Networking Religion seems common.

One of the few tools available is your ability to configure your own routes. Is there more than one way to this server? If so, you could try creating some static routes or use high metrics on suspect routes, and find/bypass an offending router somewhere in the path. Tedious, but your toolbox is limited if the network folk aren't helpful.
Ron Kinner
Honored Contributor

Re: Nightmare of a network problem

OK. At least run the linkloop test on each HPUX. I looked in the manual and 10.20 does support it.

http://docs.hp.com/hpux/onlinedocs/B2355-90129/B2355-90129.html

You need to know the ppa number of the interface you want to test and a MAC address of a device on the local LAN. Let's say the ppa is 1 (lanscan or lanscan -p will show you the available ppa's.) and MAC is the MAC of a device on the same LAN (preferably another HP product since not all manufacturers support this). (To get the MAC of a device, ping it and then run arp -a). We will let it do 10 packets (-n 10) and ask for verbose output (-v)

Then it is simply

linkloop -i 1 -n 10 -v MAC

This uses a small packet size. To use a bigger 1400 byte (+overhead) packet you can add the -s 1400 option.

If this works then you can rule out layer two problems and concentrate on IP and routing.

Ron
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

linkloop tests work fine. I have connectivity - it's just that it's very very slow ...

Ron Kinner
Honored Contributor

Re: Nightmare of a network problem

It's slow because you are dropping packets somewhere. IF
lanadmin
lan
display

looks good for each NIC and you can run full speed to a host on the same subnet then the problem has to be in the network. Probably the WAN or the router. Do you see a difference in ping response vs packet size? Do larger pings have a problem getting through? What is the largest ping you can send? Do traceroutes show any particular step which is unreliable about replying? That might give you a clue as to where the problem is.



Ron
Dwyane Everts_1
Honored Contributor

Re: Nightmare of a network problem

Philip,

Have you tried doing a traceroute at any point? I know it sounds simple, but it gives the best latecy location (I prefer Visual Traceroute).

That would be step one...
Step two would be to get a sniffer. Check the packets coming from the server, if they look ok...move to the next hop and check there [both in and out].

The best troublesooting method here is the "half-split" method. Trace out the entire path. Sniff the packets out of the server, then check halfway.

Example:

Your server -> switch1 -> router1 -> {frame-relay/ISP cloud} -> router2 -> switch2 -> other server

check "Your server" outbound; if ok, then check router2 inbound; if ok, check switch2 inbound; so-on, etc. By sniffing the packets, you will see who is fragmenting them.

D
Philip Kime
Regular Advisor

Re: Nightmare of a network problem

Strange - I thought I'd replied to this thread to say what the problem was but now I can't see the reply ...

I installed a decent ping (because the default HPUX one is very basic) from the Porting archive and ran a flooding ping (ping -f -v). Instant success ... lots of dropped packets with ICMP returns from two incorrect machines. You don't see these ICMP "wrong ident" returns with normal ping. So, it looks like a netmask problem on some router somewhere ...

Thanks to all who provided advice! The flooding ping is a superb utility for tracing this sort of thing, especially on a VLAN where a traceroute is useless ...