Operating System - HP-UX
1848431 Members
3904 Online
104027 Solutions
New Discussion

File Transfer Starts fast, slows to a crawl

 
SOLVED
Go to solution
Russ Park
Frequent Advisor

File Transfer Starts fast, slows to a crawl

First of all, I realize there are so many potential variables to this kind of problem, I'm expecting a lot of follow-up questions asking for details. For the moment, can we try to take this at face value, I'm just looking for leads into what the likely culprit might be.

OK - I have an HP rp5430 with 1 750mhz CPU and 2GB of RAM. Running HPUX 11.11i (March 2003 build). I am connected to my LAN via a 100BT ethernet, full duplex, over to a switched port. Supposedly they are correctly matched (but I'm still suspicious).

Problem. After a clean reboot, I initiate an ftp session to another system on the same LAN with a like connection. I "get" an 840MB file. 1st time, I get a transfer rate of over 10,000K/sec - pretty good, and likely what you'd hope for over 100MB full. If I disconnect and reconnect my ftp session, and restart the transfer, after anywhere from 160-310MB, the transfer comes to a halt and eventually times out, with an average xfr rate of <800K/sec. Right after the "time-out" I can still communicate with the host. Then, anywhere from 30 minutes to 3 hours later, even with no other activity, the host stops responding to any network calls including PING and/or Telnet etc... The host, except on one occasion, remains alive, and at the console, I still can interact with it, but I'm unable to use any network connection. The only solution has been to reboot.

I ran a netstat -i to capture some simple stats and I can clearly see a progressive decrease in packets transferred on my 2nd run (these were very steady on the 1st).

We replaced the NIC, no change. Swapped LAN cables, no change. Pulled the hard drives out of system A and put them in (Like) system B, no change. This last test makes me a believer that it must be something in the configuration and setup of this root disk/OS.

With this in mind, my local support guy suggested we try modifying the buffer cache setting which defaults to 50(%) and we set it to 20(%), but this affected no change.

Does ANYONE have an idea of what the most likely causes of this type of LAN I/O freeze-up?

Thanks!
26 REPLIES 26
Michael Steele_2
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

One possibiltiy involves a rouge network utility which can be tracked through increasing icmp packets. ICMP could be come from a ping (* denial of server *) or a normal error report from a network utility.

netstat -p icmp | grep -i drop

Search for:

xxxxxx ICMP messages dropped

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x3f46c4c76f92d611abdb0090277a778c,00.html

Get a copy of 'tcpdump' to see where the culprit lies on the network:

http://hpux.cict.fr/hppd/hpux/Networking/Admin/tcpdump-3.6.2/

tcpdump | grep -i icmp (* or something like this *)

Also note 'source quench':

http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xa05268c57f64d4118fee0090279cd0f9,00.html

Support Fatherhood - Stop Family Law
A. Clay Stephenson
Acclaimed Contributor

Re: File Transfer Starts fast, slows to a crawl


Rule Number 1) In any network problem involving HP-UX make absolutely certain that auto-negotiation is OFF. You should hard set both the host's NIC AND each corresponding switch port to 100FD. Without knowing your exact card I can't be certain but this is set in a file in /etc/rc.config.d - probably "hpbtlanconf". Set it here and reboot. A lanadmin -x can also be used but the changes will be ignored upon the boot. Until this is done and verified, do nothing else.

I would next install the March 03 HWE and GoldQPK and then search the patch database for "LAN" and apply any of those patches - many of which will have been covered by HWE and/or GOLDQPK.
If it ain't broke, I can fix that.
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

Thanks to both my respondents, I have checked out both sets of ideas. I found that doubling my check on the network configuration and port settings was very important as it has narrowed my focus. Here's what I know now:

1. The LAN0 port and the switch port are both set correctly to 100FD Manual. During another test run, my network folks monitored the port and saw only a decrease in utilization, but NO errors.

2. The problem is also NOT the source host. I had the network guy monitor the port for this host and it was also under-utilized. In fact, during an attempted transfer that had petered down to about 100K/sec, I started another transfer from a known good Tru64 host and it galloped away at 10500K/sec and completed the transfer, hence, the problem is not the source host or it's network port.

I tried some of Michaels command suggestions, but so far, I have not been able to determine anything useful. I will do more soon.

-Russ
Bill Hassell
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

As with all network communication, the remote end is just as important as the local end. ftp is an adaptive protocol which changes packet sizes to match the network speed and allows for multiple packets to be transferred with deferred acks in order to reduce the overhead for WANs. The source machine must eventually wait for acks to return before continuing and these come from the remote system. Since the Tru64 box seems to work OK, you may want to time transfers in both directions as well as verify the settings at the remote end. The buffer cache only helps when the request rates for filesystem records is very high (100Mbit LANs are very slow compared to disks).

Check lanadmin on both ends after zeroing the stats, then transferring the file. Check syslog to see if anything is getting reported.


Bill Hassell, sysadmin
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

MORE new info:

I have now tried this on my other LIKE box under similar conditions and am duplicating the problem. YEA! At least this means it's something in my configuration soup, 'cause I built both of these boxes up the same way.
Michael Steele_2
Honored Contributor
Solution

Re: File Transfer Starts fast, slows to a crawl

Sounds like a flaky router. Try pinging with different packet size and count and look for packet loss variations:

ping ip 500 100

ping ip 1000 100

ping ip 1500 100

ping ip 2000 100

If you see a loss traceroute and re-route. Eliminate the router with a crossover cable, another port, another router.

Using VLAN tagging? Cisco router?
Support Fatherhood - Stop Family Law
Steven E. Protter
Exalted Contributor

Re: File Transfer Starts fast, slows to a crawl

There is a known problem between HP NIC cards and Cisco switches.

Besides following Clay's advice above, have the admin of the switch set the port for 100 BaseT Full Duplex, Manual.

As Clay says, Auto-negotiate and HP-9000 servers don't mix. I am a victim of this on at least two occaisions, one of which cost me weeks of diagnosis.

netstat -I lan0 3

Will give you more statistics which at this point is frustrating.

In my situation, without /etc/rc.config.d/hpbtlanconf set up, after setting the switch to manual, and booting my box, it came up half duplex.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: File Transfer Starts fast, slows to a crawl

If A. Clay and I are correct about the cause of this problem, you should see the saem problem with large nfs and samba transfers.

I may be having the same problem right now with my primary production environment. Our switch admin keeps wanting to go auto-negotiaate and we seem to get a new core switch environment every fiscal quarter.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ron Kinner
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

Do you have PHNE_22727?

http://www1.itrc.hp.com/service/patch/patchDetail.do?patchid=PHNE_22727&context=hpux:800:11:11

The notes say that the 100BT driver randomly drops packets which might explain the slowness you are seeing.

Ron
rick jones
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

I'm going to disagree with Rule Number 1. I would not hardcode until I had determined that there was indeed an autoneg problem and that it could not be resolved by patches to the host or updates to the switch. I myself have had virtually no problems with autoneg thusfar, using a mix of ProCurve and Alteon switches (alteon in 100BT). Perhaps my life is charmed

How to guess from the host if there is a duplex mismatch? If the lanadmin -g mibstats stats show the link to be full duplex and there are FCS errors, there may be a duplex mismatch. If the lanadmin description shows the link to be half-duplex and there are late-collisions, there may be a duplex mismatch.

It would be good to look at the netstat staticstics (preferably on both sides) and see what the retransmission rates are. You may find ftp://ftp.cup.hp.com/dist/networking/briefs/annotated_netstat.txt helpful, and "beforeafter" from ftp://ftp.cup.hp.com/dist/networking/tools/

It might also be good to eliminate any possibility of filesystem issues by using something other than FTP to test the link - say netperf - http://www.netperf.org/
there is no rest for the wicked yet the virtuous have no pillows
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

First of all, THANKS to all who have taken time to give feedback to my problem, I truly appreciate this!

I am working on the suggestions from M. Steele who suggested that it might be router trouble. As to the physical LAN configuration, I am connected to a 10/100 switch in the same switch 'fabric' that the soruce host is in, eliminating the likelyhood that a router is creating the problem; it is just not involved as the switch SHOULD be forwarding all packets to the other known host directly.

My network support guy tried a series of pings with progressive packet sizes as Mike suggested and was able to essentially duplicate the problem. He is looking into this but agrees that the next step is to isolate the two systems, if possible, by putting a crossover cable between them and rerunning the test. I am working on that at this time and will let you all know what I find...

-Russ
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

ALSO:
The system is connected to an ENTERASYS Matrix switch. Beyond that, I don't know the router, but as mentioned in the previous thread, I don't think we're touching the router. I'll try to get that data nonetheless.

-Russ
James A. Donovan
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

I've had similar problems in the past with FTP, and it turned out to be related to the setting of the ip_pmtu_strategy value in the ip stack.

Use ndd to see what your value is.

# ndd -get /dev/ip ip_pmtu_strategy

If it is set to 0 then you should change it to 1.

# ndd -set /dev/ip ip_pmtu_strategy 1

You can "permanently" change this by adding an entry for it in your /etc/rc.config.d/nddconf file.
Remember, wherever you go, there you are...
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

Here's the latest:
Connecting the two systems via a crossover cable; I had good results, ran the transfer 5 times with no degridation. The moment I connected back to my LAN, the problem returned. My network guys are going to isolate me down to a single blade, and see if the problem disapears. I'm guessing it might. As I'm leaving for the day, won't know until I try tomorrow.

Thanks all!
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

 
rick jones
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

certainly the error count is large enough to suggest that _something_ is not completely well with the setup. it could be that there is indeed a duplex mismatch, it could be a bad cable, or bad port or bad card even.

it might be nice to post the netstat -p tcp output - I suspect you will have seen places with packets being received out of order - it would be nice to see if that count matches the FCS error count or is larger
there is no rest for the wicked yet the virtuous have no pillows
Shannon Petry
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

99% of the time I see these type errors, the switch is auto-negotiating the port rate.

I too run Cisco's, and am actually looking at a matrix of 256 ports of them ;)

My Solaris box has no problems auto-negotiating a Cisco switch, but every one of my HP's have had their ports manually set in the switch.

The symptom you are receiving is quite common. You start with a good connection, then slowly it degrades. The more you utilize your line the faster the degredation.

Most systems only check at start up how to set the NIC (full/half, 100/10) but for some reason the HP's do this frequently and change. For this reason in a 100FD I have(note the word HAVE HP?) to force the settings in /etc/rc.config.d/hpbase100conf.

I am not sure why the HP then starts changing the speed it tells the switch, but it does. I have seen port speed/duplex flip flop all over when the HP is utilizing the lan. Your network people should be able to monitor that specific port and see the speeds changing as well as the duplex when the HP is trying to run full bore.

Anyway, my long winded explenations lead to 2 things.

1. Force the duplex and speed in /etc/rc.config.d/hpbase100conf

2. Force the port speed and duplex on the cisco

Regards,
Shannon
Microsoft. When do you want a virus today?
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

I checked and found that the counts don't match, the FCS is at 256, the out of order count is over 23,000. Does this tell you more?

-Russ
rick jones
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

It tells me that a boatload of frames are being lost places other than the local NIC. A check of the switch stats for the port would be in order, and then work your way back to the traffic source(s).
there is no rest for the wicked yet the virtuous have no pillows
Shannon Petry
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

To second what Rick stated, and fall back on my statements the switch is the most likely culprit here. Your NIC is already forced into 100FD mode, looks like your switch port is not.


Try to ping the IP of the switch, and see if you still get the dropped frames. This limits real traffic between the HP and the switch.

Regards,
Shannon
Microsoft. When do you want a virus today?
rick jones
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

And the pinging of the IP of the switch should happen while other traffic is flowing between the switch and the host on that link. A ping by itself on an otherwise idle link will not have packet losses when there is a duplex mismatch. (Insidious isn't it ?-)

Of course, if the switch is set to Auto-neg, you could also try putting the HP box back to autoneg and see if that fixes the problem - just make sure your networking guys (man I just _love_ that division of labor in these situations...) aren't changing the config on the switch at the same time...

auto, auto => happy
auto, full => unhappy
auto, half => lucky
full, auto => unhappy
full, full => happy
full, half => unhappy
half, auto => lucky
half, full => unhappy
half, half => happy

"lucky" because the autoneg will fail and a failed autoneg means the auto side goes to half-duplex (if it adheres to the spec)
there is no rest for the wicked yet the virtuous have no pillows
Russ Park
Frequent Advisor

Re: File Transfer Starts fast, slows to a crawl

Rick/Shannon:
If frames are being lost, none of my other stats are showing this. I'm waiting on my LAN guys to tell me I can reboot, as soon as I can, I'll try pinging the switch.
-Russ
John Bolene
Honored Contributor

Re: File Transfer Starts fast, slows to a crawl

We had a problem like this when the superdome was first brought up.

The network guys had the port set at auto and we set the card at full like we always do. Something did not agree and with short transfers it was fast, but with long transfers it got slower and slower. Doing a landiag showed up a bunch of alignment and FCS errors amouting to 3-10% of the packets transferred.

We finally set the card at auto and it took off like a bat-out-of-heck, finally being fast with all transfers and no errors.
It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com
A. Clay Stephenson
Acclaimed Contributor

Re: File Transfer Starts fast, slows to a crawl

I want to re-emphasize that it is a very good idea to hard set the speed/duplex on BOTH ends unless using Gigabit - which must be set to Auto-negotiate.

I suggest that you do a search on "between 35 and 41 meters" and "100BaseT".

At least 1 HP patch references the problem, PHNE_21883, and I've also found 3com references. This is a problem that has nothing to do with hardware but rather with the negotiation protocol standard itself. Cable lengths between 35 and 41 meters literally become a crapshoot. At those cable lengths, the connections only work well by accident.

If I get a chance, I may look at the IEEE site and get a better understanding of the physics.

Once the settings are hard set to known matching values, that part of the problem is eliminated and you can then move on to other possible problem areas.
If it ain't broke, I can fix that.