A couple of weeks ago I faced the need to upload a large number of files to Amazon S3, we’re talking about lots of nested directories and ~100gb. So, after entering panic-mode for a couple of seconds I turned to our trusty Google and kindly filled its input with “transfer from local to amazon s3” (well, I don’t really remember what I searched). I was not feeling really hopeful until I found s3-parallel-put, which seemed to do just what I needed.
Here’s the repo: https://github.com/mishudark/s3-parallel-put
It’s a smart little phyton script that does just that, transfer possibly huge amounts of files to Amazon S3. And, yes it can parallelize the workload making it blazing fast.
It has a couple of dependencies:
# Make sure to have pip updated.
# You may need to use sudo
apt-get update && apt-get -y install python-pip
pip install boto
pip install python-magic
Then, to install it, you just have to download the thing and make it executable:
curl https://raw.githubusercontent.com/mishudark/s3-parallel-put/master/s3-parallel-put > s3-parallel-put
chmod +x ./s3-parallel-put
It needs the AWS credentials as environment variables, which you can easily set:
export AWS_ACCESS_KEY_ID=<blablablablablablabla>
export AWS_SECRET_ACCESS_KEY=<blebleblebleblebleblebleblebeble>
And, finally, you fire it up like this:
# This is considering that the script is in the current directory
./s3-parallel-put --bucket=<enter-destination-bucket-name-here> --bucket_region=us-west-2 --put=update --processes=30 --content-type=guess --log-filename=./s3pp.log /path/to/source/directory
You can do a dry run with --dry-run
.
You can speed up the upload using --put=stupid
. It won’t check if the object already exists, thus making fewer calls. Use with caution.
You can grant public read access to objects with --grant=public-read
.
You may noticed that you can specify a log file, which is really handy because sometimes stuff happens. But, you may also end up with an enormous log file. So here is a quick grep to search for any errors grep "ERROR" s3pp.log
.
And that’s all. It has a lot more options that might come handy depending on your needs,so I encourage you to go and check it out.
Thanks for reading, and I hope you find this as useful as I did.
Let me know in the comments if you have any tips.
Leave a Reply