First of all I want to make clear that I always liked FTP. And I still prefer it over uploading my files through a web-interface, but I now have found quite a big reason why FTP really really sucks.
And here it is: FTP has clearly not been designed to upload 7000 files.
Right now I am uploading a Joomla-website for a client. This package consists of around 7000 small files. It’s been uploading for around 4 hours so far.
The problem is not really that FTP uses 2 separate connection, which in itself already causes some problems (although these are quite manageable), but how the second connection, the data-connection, is used.
The way it is is that for every single file the data-connection will be opened, and once the file is done it’ll be closed again.
Now that doesn’t really matter if you transfer only a few files, or big files. But it does matter when you transfer lots of tiny files, because then opening and closing the connection may take just as long as, or even longer than, the actual transfer.
Looking at my transfer I see a lot of this:
templates/beez/html/com_poll/index.html: 44.00 B 141.87 B/s
templates/beez/html/com_poll/poll/index.html: 44.00 B 149.99 B/s
templates/beez/html/mod_newsflash/index.html: 44.00 B 143.82 B/s
templates/beez/html/mod_search/index.html: 44.00 B 151.15 B/s
templates/beez/html/com_user/index.html: 44.00 B 143.46 B/s
templates/beez/html/com_user/remind/index.html: 44.00 B 156.59 B/s
templates/beez/html/com_user/login/index.html: 44.00 B 139.53 B/s
templates/beez/html/com_user/register/index.html: 44.00 B 155.67 B/s
templates/beez/html/com_user/user/index.html: 44.00 B 130.48 B/s
templates/beez/html/com_user/reset/index.html: 44.00 B 140.22 B/s
templates/beez/html/com_newsfeeds/index.html: 44.00 B 137.64 B/s
Lots of files that are 44 Byte small! Considering that the MTU is a lot higher, this means that the whole file easily fits into one packet. And that is quite an understatement, considering that PPPoE has a MTU of roughly 1500 (I think it’s actually 1492 or so).
Now think about it:
Connecting to the server: 3 packets (3-way-handshake)
Transmission: 1 packet
Closing the connection: 2 packets
Instead of just 1 packet 6 need to be sent! That’s an overhead of 500%!
Now how would it be possible to avoid this overhead? Well, easy, stream all the files in one go and let the server (or client, when downloading) handle the splitting.
This could be done in two steps, transfer file-info first (like filenames, filesize and whatever other info you may want to send), and then send all the files.
That way the connection doesn’t need to be opened and closed thousands of times and thus it will speed up the upload for transfers of many files a lot.
It won’t probably have much of an impact on the upload of big files, or just a few files, but seeing how opensource-packages like Joomla, Zen Cart, … are getting more and more popular and seeing how they consist of thousands of files, this may actually help people quite a bit.
FTP is old, and an adequate technology for the today’s file-transfers needs to be found.
It may be SCP, I would need to check if SCP uses only one connection for all the files, but I would say it does. Another obvious advantage of SCP would of course be encryption.
But maybe we just need something completely new.
Either way, the biggest problem would probably be having hosters offer it. Seeing how old SSH, and thus SCP, is and seeing how few hosters offer it it’s unlikely that any time soon we’ll be able to get rid of FTP, no matter what’s it being replaced with.