Easily transfer entire local directories to Amazon S3 using s3-parallel-put

A couple of weeks ago I faced the need to upload a large number of files to Amazon S3, we’re talking about lots of nested directories and ~100gb. So, after entering panic-mode for a couple of seconds I turned to our trusty Google and kindly filled its input with “transfer from local to amazon s3” (well, I don’t really remember what I searched). I was not feeling really hopeful until I found s3-parallel-put, which seemed to do just what I needed.

Here’s the repo: https://github.com/mishudark/s3-parallel-put

It’s a smart little phyton script that does just that, transfer possibly huge amounts of files to Amazon S3. And, yes it can parallelize the workload making it blazing fast.

It has a couple of dependencies:

# Make sure to have pip updated.
# You may need to use sudo
apt-get update && apt-get -y install python-pip
pip install boto
pip install python-magic

Then, to install it, you just have to download the thing and make it executable:

curl https://raw.githubusercontent.com/mishudark/s3-parallel-put/master/s3-parallel-put > s3-parallel-put
chmod +x ./s3-parallel-put

It needs the AWS credentials as environment variables, which you can easily set:

export AWS_ACCESS_KEY_ID=<blablablablablablabla>
export AWS_SECRET_ACCESS_KEY=<blebleblebleblebleblebleblebeble>

And, finally, you fire it up like this:

# This is considering that the script is in the current directory
./s3-parallel-put --bucket=<enter-destination-bucket-name-here> --bucket_region=us-west-2 --put=update --processes=30 --content-type=guess --log-filename=./s3pp.log /path/to/source/directory

You can do a dry run with --dry-run.

You can speed up the upload using --put=stupid. It won’t check if the object already exists, thus making fewer calls. Use with caution.

You can grant public read access to objects with --grant=public-read.

You may noticed that you can specify a log file, which is really handy because sometimes stuff happens. But, you may also end up with an enormous log file. So here is a quick grep to search for any errors grep "ERROR" s3pp.log.

And that’s all. It has a lot more options that might come handy depending on your needs,so I encourage you to go and check it out.

Thanks for reading, and I hope you find this as useful as I did.

Let me know in the comments if you have any tips.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Comments (