1838263 Members
3619 Online
110125 Solutions
New Discussion

Re: tcp: lost connection

 
Carlo Henrico_1
Regular Advisor

tcp: lost connection

I have two L1000 HP-UX servers between which I rcp files frequently. Lately when I try to rcp files (they are half MB each) after about 40 or so I get a message "tcp: Lost connection". I have a second session open which is pinging the other server with 2ms response time all the time, no interruption at all. There is nothing in either of the two servers' system logs.

Any ideas please?

Thanks

Carlo
Live fast, die young - enjoy a good looking corpse!
9 REPLIES 9
Rammig Claus
Frequent Advisor

Re: tcp: lost connection

Hi Carlo,

perhaps you should run the command netstat -p tcp to look at dropped connections.

Best regards ...
Claus
No risc no fun
Ross Zubritski
Trusted Contributor

Re: tcp: lost connection

Carlo,

What type of interfaces are you using? If they are 100BT, ensure that the cards are set to full duplex as well as the switch port.

RZ
James A. Donovan
Honored Contributor

Re: tcp: lost connection

mmmm...how long has the rcp job been running when it dies? Is it a single rcp job or multiple? If single, then it may be as simple as the TMOUT environment variable being set, and the the terminal session then expires because of lack of "feedback".

Try setting TMOUT=0, and see what happens...
Remember, wherever you go, there you are...
Carlo Henrico_1
Regular Advisor

Re: tcp: lost connection

Jim

I tried the TMOUT=0 and that allowed about 300 files through the it stopped.

Ramming - the netstat -p tcp output is attached. Any guidance what I can determine from that.

Currently the status is as follows. After the 300 files were copied, the two machines involved could not ping one another anymore. I can however access both of them from a third machine. This normally takes a few minutes to "resolve itself" and then they can ping one another again.

Thank you so far.

Carlo
Live fast, die young - enjoy a good looking corpse!
Ron Kinner
Honored Contributor

Re: tcp: lost connection

Carlo,

The only thing I see in your netstat -p tcp files is that the second box is seeing a very high rate of out of order packets compared to the first box. I would look for a duplex mismatch causing a high rate of collisions.

Look at

lanadmin
lan
display

(if you have more than one NIC you may need to change the ppa number so you are looking at the correct NIC)

Look for high rates of collisions or for FCS errors (on the second page of the report) which would indicate that a duplex mismatch or other problem with the circuit. If you manually set the duplex make sure you set the switch duplex at the same time. Also if you are using the btlan3 or btlan4 drivers make sure you have the latest patches. The earlier versions had some really odd problems one of which was a tendency to get full of packets and stop working.

Ron
rick jones
Honored Contributor

Re: tcp: lost connection

To check the duplex issue, use lanadmin -g mibstats - if the interface shows half-duplex, duplex mismatch can be indicated by the presence of _late_ collisions. If the interface reports full-duplex, then you look for FCS errors.

Both ends of the cable have to be set the same way - both auto, both half, or both full. You cannot have auto on one side and half|full on the other.

Also, the netstat on the one system shows a bunch of connections failling with no listener. Either a client or three are misconfigured, or someone is portscanning the box, or perhaps there is a duplicate IP on the network.
there is no rest for the wicked yet the virtuous have no pillows
Steven E. Protter
Exalted Contributor

Re: tcp: lost connection

lanadmin -x 0

0 for lan0
1 for lan1

etc

It appears you may have a physical problem with the network.

ping the target box and the souce box and see if the times are consistent.

Then
traceroute back and forth and see if there is a long latency on any of the hops.

Any issues, see network administration.

You might also want to consider replacing rcp with scp from the secure shell packages.

I've included a link and a cookbook for you, in case you want to use something secure for the file transfer. scp will work without passwords of you follow the document attached.

https://payment.ecommerce.hp.com/cgi-bin/swdepot_parser.cgi/cgi/try.pl?productNumber=T1471AA&date=

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Carlo Henrico_1
Regular Advisor

Re: tcp: lost connection

Thanks for the advice so far.

The FCS and late collissions on both are 0 or < 3. However I do get the following on the one machine:

Single Collision Frames = 149015
Multiple Collision Frames = 354794
Deferred Transmissions = 15502

and I see it is running half-duplex.

Any ideas please?

Thanks

Carlo
Live fast, die young - enjoy a good looking corpse!
Ron Kinner
Honored Contributor

Re: tcp: lost connection

Collisions are normal on half duplex circuits. It's hard to make a judgement on whether yours are excessive without knowing how many packets were sent. Cisco's official line says to keep the collisions under .1% of total packets sent but I have seen many which operate at 1% without any real problems.
http://www.cisco.com/en/US/products/hw/voiceapp/ps967/products_administration_guide_chapter09186a0080080bb0.html

Wouldn't hurt to ask the switch to verify that it is also running at half duplex on that port but since you are not getting late collisions this is probably not a problem. Sometimes the switch can tell you about other errors it is seeing on that port.

There is an HPUX command called linkloop

http://www.doc.ic.ac.uk/~mac/manuals/hpux-manual-pages/hpux/usr/man/man1m/linkloop.1m.html

which will do a good test on the LAN if you set -n to a high value like 1000. Note this only works with systems which are on the same LAN. If they are on different LANs then use the MAC address of the router on that LAN. If you find errors then look for a new driver, replace the card, cable, switch.

However, the odd thing you reported where a third party can ping them both but they can't ping each other shows a higher layer problem.

I'd first verify that there is not a duplicate IP address. Disconnect the Network cable from one of the boxes and then ping its IP address from another box on the same LAN. A response indicates a duplicate IP address. Connect up the cable and repeat with the other box.

Next if the two boxes are not on the same subnet check the routing table.

netstat -rnv

verify that the default gateway is present in the table. 11.0 has this cute dead gateway detection which can bite you if the router decides not to reply to pings or just is too busy to respond. You can turn this off in NDD but you may need a patch. I know our 11.0 system does not show the option in ndd -h but I don't know that it still wouldn't work. ndd -set /dev/ip ip_ire_gw_probe 0 will turn it off. Also check that the masks are correct.

Also are you running any sort of routing protocol? Perhaps something is changing there.



Ron