1833861 Members
1881 Online
110063 Solutions
New Discussion

Re: ftp failures

 
SOLVED
Go to solution
John Waller
Esteemed Contributor

ftp failures

Hi,
I have an interesting problem in that an automatic FTP script occasionally (12 times out of 50,000 over last 2 days) appears to loose the data packet of an ftp transfer.
We have an application which creates small files between 400 and 500 bytes and ftp's them to a remote machine across a WAN. Occasionally these files are received with a size of 0 though we can verify that they were created with a non 0 size on the local machine. Has anybody seen this before or can explain how it can occur. I have tried everything I know to manually re-create the situation without success. We ftp and use individual "put" commands to place the files onto the remote system, also we are not getting any indication of failure in the syslog.log.
15 REPLIES 15
Michael Schulte zur Sur
Honored Contributor

Re: ftp failures

Hi,

can you rule out, that files were transmitted, while there werent finished set, just created with size zero?

greetings,

Michael
John Waller
Esteemed Contributor

Re: ftp failures

The files are created in a temporary directory then we use mv to move them to the transfer directory before initiating the transfer. As we do not issue the mv in the script until the file creation has completed we can be certain the file was non-zero before transfer.
Jean-Louis Phelix
Honored Contributor

Re: ftp failures

Hi,

You are sure that file has been created before the 'mv', but are you sure that transfer is not initialized *before* the 'mv' ends ? Files are small, but if your IOs are freezed for a very short time ... Of course this can occur only if source and target directories are not in the same filesystem (in this case, mv = cp + rm). You could transfer the file by mv to the target directory using a leading dot in the name, then mv the file *in the same filesystem* to remove the leading dot which would be immediate in this case.

Regards.
It works for me (© Bill McNAMARA ...)
John Waller
Esteemed Contributor

Re: ftp failures

The source and target for the mv are in the same filesystem. We move from /u/transfer/outbound/tmp to ../ (i.e /u/transfer/outbound)
John Palmer
Honored Contributor

Re: ftp failures

Does the ftp process itself display any sort of error message?
John Waller
Esteemed Contributor

Re: ftp failures

Looking at the script it appears due to the large number of transfers the person who wrote the script directed 2> (i.e standard error) to /dev/null. I have put a request to have this directed to a file so we can see any errors. One problem when you have non system admin aware programmers writing ftp scripts.
Bill Hassell
Honored Contributor

Re: ftp failures

You'll also want to turn on detailed ftp logging in /etc/inetd.conf but beware that syslog will grow rapidly with all the options turned on:

-l logs all sessions
-L logs all commands

and in the /var/adm/syslog/xferlog file:


-i logs all files received
-o logs all files sent

xferlog may be the most useful (see man xferlog) along with adding -v to the ftp command in your script and saving both stdout as well as stderr from ftp.


Bill Hassell, sysadmin
rick jones
Honored Contributor

Re: ftp failures

If there are transport or network-level problems with the file transfers, you might be able to establish a corellation bewteen things like connection aborts reported in netstat -p tcp with your "failed" FTP trasnfers.

As for getting rid of the 2> stuff in the script, you could direct the stderr stuff to a tempfile that is deleted if the transfer is determined to be completed successfully.
there is no rest for the wicked yet the virtuous have no pillows
Brian Watkins
Frequent Advisor

Re: ftp failures

We have seen this happen randomly on several systems under very similar circumstances to your own.

After several weeks of continuous packet sniffing and combing through VERY LARGE sniffer logs, we could find no root cause or technical explanation for the 0 byte files occuring.

Even though FTP is TCP-based, and TCP is (in theory, anyway) connection-oriented, packet losses should never occur. In the real world, it does happen from time to time. Since FTP is essentially a "dumb" application and has no built-in error detection/correction routines, packets do occaisionally get dropped resulting in 0 byte files on the target host.

We found that the best way to prevent 0 byte files is to build in a couple of layers of error detection and correction to the FTP scripts themselves that will verify that the target and source files are the same size prior to moving on to the next transfer. If they don't match up, then resend the files.

This probably isn't the answer you were looking for, but after pulling our hair out for 2 and half weeks, our networking and Unix teams decided that error checking and correction was the fastest and easiest solution.

Good luck and Happy New Year!

Brian
Dave Johnson_1
Super Advisor

Re: ftp failures

Brian,
It is true that TCP is connection oriented, however that does not guarantee packet delivery. There is nothing in the TCP spec. to provide this. It is up to what the application to deal with lost packets in what ever way makes sense to the application.
FTP will re-transmit a packet that does not get acknowledged but if that happens before data transfer starts, you get an empty file.

That's my 2 cents worth:
-Dave
rick jones
Honored Contributor
Solution

Re: ftp failures

A couple of TCP and FTP clarifications:

*) indeed, being "connection oriented" has nothing to do with the "reliability" of TCP

*) "reliability" in TCP is often misunderstood - many people think it means TCP guarantees data delivery. that is simply not true. what it means is that TCP guarantees notification of perceived delivery failure. a rather different thing indeed.

*) TCP increases the likelihood of data being delivered by using an ACKnowledgement and retransmission mechanism. however, TCP will not retransmit forever and if a given data packet (aka a TCP segment) is lost often enough, TCP will abort the connection and notify the user that something was amis

*) FTP does not do retransmissions of packets. FTP has no concept of packets. All is knows about is data into and out of a TCP connection, and the messages it exchanges on the control connection

Soooo, if you run a lot of FTP sessions, that means you run a lot of TCP connections. If there is some packet loss on the network, there is some measurable probability (we could probably work that out - roughly - based on netstat statistics from the sender(s) ) that you will have a given TCP segment dropped often enough in a row to convince TCP that it cannot get the data across. That will fail the TCP connection, which will "fail" the FTP transfer.

Since we are talking about files of only 400 to 500 bytes, those are files that only require one TCP data segment (typically) to transfer, which means that one would be left with a zero byte file. We would never get a "partial" file - unless the TCP MSS happened to be < 400-500 bytes, which is generally not going to be the case.

In HP-UX at least, some of the parameters that affect when TCP will give-up on a connection or how often it will retransmit include:

*) tcp_ip_abort_interval
*) tcp_ip_abort_cinterval
*) tcp_rexmit_interval_max

ftp://ftp.cup.hp.com/dist/networking/briefs/annotated_ndd.txt

parms will vary on other OSes...

if you are still reading and are particularly curious... :) once you determine the average packet loss probability for the network, you then need to work-out how many times TCP will retransmit a segment before giving-up. Once you have that figure, you raise the packet loss probability - lets call it 'p' - to that power, and that is the probability a TCP segment could be dropped that many times in a row and lead to a connection failure. basically, it is just like figuring the probability of flipping a coin and getting "tails' N times in a row. the probability of getting tails - p - is 0.5, so getting tails N times in a row will happen with a probability of p^N or 0.5^N

This means that the probability - this time lets call it 'P' that a packet will get through without the connection aborting is 1-(p^N)

Now, the probability of a given connection having that happen will depend on the number of segments it must transfer. Call that number of segments M, and the likelihood of that is P^M, substitute, and I believe it becomes (1-((p^N))^M
there is no rest for the wicked yet the virtuous have no pillows
Bruce Link
Occasional Contributor

Re: ftp failures

Don't know if it's feasable, but it's a suggestion. If you have the SSH packages installed on your servers, you could configure SFTP to conduct the transfers for you in lieu of FTP. It'll take a little more overhead, and a little more configuration, but it may help you isolate whether or not FTP is indeed causing the problem, and the OS or some other mechanism. Encryption's generally not a bad thing.

Cheers.

Bruce
Eric_287
New Member

Re: ftp failures

John,

Did you ever find a solution to your 0-byte file being written by FTP? I'm experiencing a similar problem.
John Waller
Esteemed Contributor

Re: ftp failures

Eric,
I'm arfraid we never did get to the bottom of this. I can't remember how it was done but looking at internal emails from the time, I believe we ended up putting some checking within the program which performed the FTP to check for a 0 byte file and resend. Unfortunatly this piece of code is no longer in use as both systems were consolidated onto a single server.
Eric_287
New Member

Re: ftp failures

Okay, thanks for your reply John. :-)