Operating System - HP-UX
1820114 Members
3459 Online
109619 Solutions
New Discussion юеВ

Re: FTP timeout/performance problems

 
Jim Griffiths
Advisor

FTP timeout/performance problems

Hi,

We have a real time system running on a 6 node N-Class. A large number of files, 20,000+ per day, are ftp'd continously from a 3rd party supplier to this box down a dedicated 1Mb link. The files range in size from 20 bytes to 1Mb, and tend to arrive every few minutes in batches that also very in size. The files are sent from HP an L-class, via a shell script, but I don't have access to this system. So that the receiving system at our end (N class) doesn't try to process partially send files, each file in the batch has "tmp" at the front of the file name, this is removed via a rename when the whole batch has been sent.

The problem: is that occasionally the sending s/w "hangs", and is automatically reset leaving large number of tmpxxxx... files around that are never renamed causing us major problems.

I appreciate that not having access to the sending box does makes diagnosis somewhat difficult to say the least but has anyone got any ideas of things I might try and/or suggest to the supplier?

Pinging the remote server when a batch is being recieved results in round trip times of 500-1000ms so does appear to saturate; there are no errors in syslog.log. Note other customers don't seem to have quite the same problem as we have.

I was wondering whether it could be do with kernel parameters because a colleague of mine says that Oracle recommend upping certain parameters where there is very heavy traffic, eg web servers and the like, but I don't see any errors?

Any help much appreciated,

Thanks,

Jim
If you need a miracle, play for it (BRIDGE)
17 REPLIES 17
Ron Kinner
Honored Contributor

Re: FTP timeout/performance problems

We have had problems with ftp sessions never completing on one end but thinking they were done on the other. Turned out to be an EMI sensitive NIC.

Also check
lanadmin
lan
display

look for errors and large percentage of collisions - don't skip the second page of the report. Maybe it's something simple like a bad cable or a duplex mismatch.

netstat -p tcp

will show you errors and also some information about disconnects and their causes.

netstat -a | grep ftp
will show you the state of the ftp connections. Do you have lots of them stuck in FIN_WAIT_II or in ESTABLISHED with data in the queues?

Get your network guy involved. He should look for errors on the WAN link and drops at the input queues at each end of the WAN. Sounds like the link is filling up and is being hogged by large data transfers. He may need to add custom queueing (at the entrance to the WAN link on each end of the WAN)to give your traffic guaranteed bandwidth. Get rid of Priority queueing if used. Unless ftp is the highest priority then it can easily happen that a higher priority transmission can wipe you out.

You can tune ftp a little with /etc/inetd.conf see the man.

Also ndd can be used to tune a few tcp parameters. ndd -h |grep tcp will give you a list of them. ndd -h "parametername" will give you a little help writeup on each one. Dangerous to play with these tho unless you understand tcp.

You can always just write a script which deletes any .tmp file older than x minutes.

Ron
Telia BackOffice
Valued Contributor

Re: FTP timeout/performance problems

maybe a full duplex / half duplex problem. I have seen a defect switch, kreating the same problem.

Use lanadmin with FD/HD.

BR,
Jannik
rick jones
Honored Contributor

Re: FTP timeout/performance problems

indeed most of the interesting TCP stats will be on the sender.

the bit about checking for errors is a good one. i'm not as convinced that high collision rates would necessarily be a problem - unless they are "late collisions" which are really mis-named errors.

short of finding the reason for the dropped connections (probably a rexmit limit reached on the sender) you might consider a clean-up heuristic.

some binary tcpdump traces (-w filename) could be interesting to examine with tcptrace/xplot

assuming the receiving FTP sessions eventually timeout, you could (probably) detect a file from such a failled session with fuser. ostensibly, if the ftpd has gone away, an fuser on one of those "tmp" files would have no process id listed and so could be considered an orphan.
there is no rest for the wicked yet the virtuous have no pillows
Jim Griffiths
Advisor

Re: FTP timeout/performance problems

I don't think its NIC cos there are other apps on the box and I would expect to see problems elsewhere, think it is related to the transient periods, 5-10mins+, of very high traffic. Actually I'm almost guaranteed to get this problem when the 3rd party server has been down for any length of time and is brought back online, its then attempts to send a backlog and "catch up"; this also takes an inordinate length of time so for that reason alone I am trying to justify the cost of upgrading the link to say 2Mb. Hopefully this will also help the "timeout" problem but would nice if I had abit more than a gut feeling! But thanks for your suggestions, I've now got a few things to look at when it next occurs. And thanks Rick for the fuser suggestion, I think I can use that.

Regards,

Jim
If you need a miracle, play for it (BRIDGE)
Klaus Crusius
Trusted Contributor

Re: FTP timeout/performance problems

You could could consider a replacenment of your ftpd.
For example wu-ftpd, or the newer proftpd (www.proftpd.org/)

Best regards,
Klaus
There is a live before death!
Richard Darling
Trusted Contributor

Re: FTP timeout/performance problems

Jim,
I had similar problems on an L1000 11.0, and they were resolved when I applied patch PHNE_23949.
RD
Jim Griffiths
Advisor

Re: FTP timeout/performance problems

Richard,

I've just checked and unfortunately(!) that patch is installed, but thanks for that would have been a quick hit.

Jim
If you need a miracle, play for it (BRIDGE)
rick jones
Honored Contributor

Re: FTP timeout/performance problems

10 minutes is something of a magic number to the transport - it is the value of tcp_ip_abort_interval, which is the maximum amount of time the TCP stack will wait for an ACK of successive retransmissions of the same TCP sequence number.

now, that is only when the HP TCP is actively transmitting something - which would likely only be the control connections. and even then, the only transmission I can think of on the FTP control connection that would have an abort with a file in place would be the transfer completion message prior to the receipt of the rename command.

by any chance are these orphaned files actually complete but just not renamed? I suppose I've missed a few other places where the HP side could have initiated the abort.

still, you might try increasing the value of tcp_ip_abort_interval as an experiment - and check what its analog is on the remote.

also make sure that the remote TCP can back-off its retransmission timers to something reasonable - say 60+ seconds.

also, you mention that the ping times get into the 1000+ millisecond range. setting tcp_smothed_rtt_enabled to 1 might be interesting,

also, with the wu-ftpd there are ways to limit the max number of sessions of a given defined class at a given time - you might consider putting the brakes on the remote system by limiting its max number of simultaneous sessions accepted by the HP box.

capping the max number of simultaneous sessions may help make the transfers go faster overall - I know that goes against getting the link upgraded to 2Mbit :) but showing the boss you can fix things and still save money is sometimes good :) :) :)
there is no rest for the wicked yet the virtuous have no pillows
Jim Griffiths
Advisor

Re: FTP timeout/performance problems

Rick,

Absolutely right the orphaned files look as if they are complete, looks like the mput works but the rename fails for some reason, what do you have in mind?

Many Thanks,

Jim
If you need a miracle, play for it (BRIDGE)
rick jones
Honored Contributor

Re: FTP timeout/performance problems

i'm guessing that the overload situation is causing some packets to be dropped consistently for ten minutes and so the local TCP drops back and punts the connection

you could try increasing tcp_ip_abort_interval - that will make the local TCP wait longer before giving-up on the TCP connection.

however, I think I prefer the route of setting-up the ftpaccess file to limit the number of sessions outstanding at any one time, so you do not have quite the same size of thundering herd beating on the poor 1 Mbps link.
there is no rest for the wicked yet the virtuous have no pillows
Anil C. Sedha
Trusted Contributor

Re: FTP timeout/performance problems

Hi Jim,

You may change the tcp window size to set the network packets to accept large size packets.

Also, someone mentioned above to ask your network guy to allocate strict bandwidth instead of shared bandwidth. This will help you data flow in a continuous stream instead of how it is getting affected right now. You may run tcpdump to view this. Another good idea is to run glance or webadmin for this. Webadmin can be got for free.

Regarding the style of FTP to remove hassles of renaming, try this. Use a ftp software like CUTEFTP on the supplier side. Ask him to configure it for resend. This will give you a great help. I believe if he automates the sending of data it would be great thing for you. If he is running unix, still he can use samba to enable a windows based file transfer.

Regards,
Anil
If you need to learn, now is the best opportunity
Jim Griffiths
Advisor

Re: FTP timeout/performance problems

Thanks Chaps,

Rick, I don't think the ftpaccess file will help in this case cos virtually all ftp activity is coming from this one source, ie the one username, cos as I understand it ftpaccess works on username or group basis?

So think I'll try adjusting the tcp_ip_abort_interval value, probably doubling it and see what happens.

Thanks,

Jim
If you need a miracle, play for it (BRIDGE)
Ron Kinner
Honored Contributor

Re: FTP timeout/performance problems

If your files are complete but not being renamed perhaps the problem is just in the renaming script and not in the data transfer? What exactly does the script look like? Can you capture any error messages the script generates?

Ron
David_246
Trusted Contributor

Re: FTP timeout/performance problems

Hi Jim,

I've seen this problem more often when a firewall was used. For some reason if the config is not setup correctly it timeouts after a specific time without leaving messages in the regular files. It only leaves it in de firewall messages-file :)
Of course this firewall does not have to be installed on your side of the ftp-session, this can be on the other side as well.
I have seen the exact problem with Sunscreen, if the remote server is using this exact product let me know I can give you the details about how to solve this issue.

Regs David
@yourservice
Sandip Ghosh
Honored Contributor

Re: FTP timeout/performance problems

Hi Jim,

You can look at the router level also. In our case, Connection get dropped to the server. And it was not keeping any logs any where.

Actually what happened, Router was set to wait and fair queue. So when ever it was over loaded it use to drop the packet. Then we have changed it to first-in-first-out and now it is not dropping a single connection. If you are having a CISCO Router, I think you can set the priority for the ftp also, since you are facing the problem with ftp only.

Sandip
Good Luck!!!
rick jones
Honored Contributor

Re: FTP timeout/performance problems

iirc, the ftpaccess stuff allows you to define "groups" of sessions, and limit the number of sessions of any one group type. so, you would define a group based on that one user name and say that only 5 or however many sessions could be going at one time.
there is no rest for the wicked yet the virtuous have no pillows
Jim Griffiths
Advisor

Re: FTP timeout/performance problems

Thanks guys,

Rick, thanks for the clarification I'll look at that.
It is going through a firewall somewhere so I'll check that as well David.
Ron, problem in all this is I don't have access to the sending box; the only thing I know is its not some proprietary product but some simple scripts the've written themselves. Anil, also data is sent to their other customers as well so don't think theres much scope in suggesting another product; having said that I'll see if they'll agree to at least sending me a copy of them.

Any further thoughts gratefullly recieved but thanks for all your efforts, I've got quite a few things to look at now, and if I do get a definitive answer I'll let you know.

Regrads,

Jim
If you need a miracle, play for it (BRIDGE)