Monday, August 9, 2010

Transferring Files With FTP


Transferring Files With FTP

Abstract:
FTP, with it's mini macro programming language, has been around since the beginning of internet time. It has been used to transfer files around the internet with the avoidance of lower level sockets programming for decades. When transferring files, there is sometimes the question over whether a file has completed it's transfer via FTP on the receiving "ftpd" end. This problem can be mitigate with several best practices so the receiving end can be well aware of when the file being transferred is ready for batch processing through a scheduling facility such as "cron".

Option 1 - Lock Files:
One could ask the initiator create a lock file (send-me.cpio.gz.lock), start sending the data file (send-me.cpio.gz), and then remove the lock file upon completion of the transfer. The cron job can pick it up again once it sees a file where there is no corresponding lock file.

This is helpful for transferring a single file as well as multiple files when that single file was "split" (send-me.cpio.gz.1, send-me.cpio.gz.2, send-me.cpio.gz.3, etc.) Processing for multiple files will not commence until after all the files in the batch have been sent and the lock file is removed.

Option 2 - Suffixes:
A second option is when moving files via FTP, if the sender starts the transfer of the file (send-me.cpio.gz), add a separate suffix to identify that it is in transit (put send-me.cpio.gz.work), and once the file has been sent, the sender should perform a rename of the file in ftp (rename send-me.cpio.gz.work send-me.cpio.gz) The rename is an atomic operation, so cron on the receiving platform can pick up files that do not have a ".work" suffix (or only pick up files which have a ".gz" suffix!)

This option is often very helpful for the occasional transfer of a single large file, where the integrity of the file is important, but people don't want to add too much complexity.

Option 3 - Work Directories:
A third option if one does not want to rename the files, one can always have the initiator place the files in a temporary directory (/temp) and then have the initiator move the file to the production directory (/prod) via their ftp session. The cron job can pick it up only from the production directory since it is known to be completely transferred since the move is an atomic operation.

If there are large numbers of small files which are needed to be transferred, this process is very helpful since occasionally the "inode" may grow aggressively (slowing down the all processing) in the temporary or production directory, requiring an occasional rebuild (rm /temp; mkdir /temp) to resize the inode.

Option 4 - Multiple Files:
A fourth option can deal well with transferring many multiple files (mput) from an initiating system where the receiver wants to process them as they are arriving. If there is a directory holding a large number of files (file1.Z, file2.Z, file3.Z, file4.Z, ...), the initiator can create an additional file with a known suffix (file1.Z.CoMpLeTe, file2.Z.CoMpLeTe, file3.Z.CoMpLeTe, file4.Z.CoMpLeTe, ...), initiate the "mput", and the receiver can have "cron" jobs set up looking for the suffix ("CoMpLeTe"), process the original file name, and upon processing completion, purge the file containing the suffix.

This is especially helpful where transfers may be overlapping from multiple sources with multiple files and the receiving end wants to process the individual files in as close to real-time as possible.

Advanced Automation:
If the senders are newbies to the internet and have worked very little with FTP on the initiating or sending end, there are ways to help them along.

With "ftp", you can build macros on the sending end so the process of logging in, renaming, moving files, creating/removing lock files, or logging out can be reduced to single macro commands, to further remove complexity on the sending end.

The receiver can build the macros and just send them to the people who are the file senders, and the receiver can maintain the ftp macro code, as well. The "ftp" protocol can be used to update those foreign macro files, using a "rename" to swap out the old macro file and an additional "rename" to swap those new macro files into production.

Conclusion:
When there is a need to send files regularly from a source to a destination, the FTP protocol is a good choice when the sender cooperates with the receiver.

No comments:

Post a Comment