Easily transfer entire local directories to Amazon S3 using s3-parallel-put

A couple of weeks ago I faced the need to upload a large number of files to Amazon S3, we’re talking about lots of nested directories and ~100gb. So, after entering panic-mode for a couple of seconds I turned to our trusty Google and kindly filled its input with “transfer from local to amazon s3” (well, I don’t really remember what I searched). I was not feeling really hopeful until I found s3-parallel-put, which seemed to do just what I needed.

Here’s the repo: https://github.com/mishudark/s3-parallel-put

It’s a smart little phyton script that does just that, transfer possibly huge amounts of files to Amazon S3. And, yes it can parallelize the workload making it blazing fast.

It has a couple of dependencies:

# Make sure to have pip updated.
# You may need to use sudo
apt-get update && apt-get -y install python-pip
pip install boto
pip install python-magic

Then, to install it, you just have to download the thing and make it executable:

curl https://raw.githubusercontent.com/mishudark/s3-parallel-put/master/s3-parallel-put > s3-parallel-put
chmod +x ./s3-parallel-put

It needs the AWS credentials as environment variables, which you can easily set:

export AWS_ACCESS_KEY_ID=<blablablablablablabla>
export AWS_SECRET_ACCESS_KEY=<blebleblebleblebleblebleblebeble>

And, finally, you fire it up like this:

# This is considering that the script is in the current directory
./s3-parallel-put --bucket=<enter-destination-bucket-name-here> --bucket_region=us-west-2 --put=update --processes=30 --content-type=guess --log-filename=./s3pp.log /path/to/source/directory

You can do a dry run with --dry-run.

You can speed up the upload using --put=stupid. It won’t check if the object already exists, thus making fewer calls. Use with caution.

You can grant public read access to objects with --grant=public-read.

You may noticed that you can specify a log file, which is really handy because sometimes stuff happens. But, you may also end up with an enormous log file. So here is a quick grep to search for any errors grep "ERROR" s3pp.log.

And that’s all. It has a lot more options that might come handy depending on your needs,so I encourage you to go and check it out.

Thanks for reading, and I hope you find this as useful as I did.

Let me know in the comments if you have any tips.

Cherry-picking your way out of trouble

I find cherry-pick to be one of those great underutilized features of git. And maybe that’s good, because it’s mainly used to apply hot fixes.

The way it works is very simple, it just let’s you merge one or more commits from one branch onto another. Awesome, right?

Imagine a situation in which you have two branches, master and payments-refactor.
You’re battling your way out of a thought refactor and suddenly a bug emerges in production, but you find out that you’ve already fixed it during the refactor and have an isolated commit containing the changes. You need to replicate those changes in the master branch and re deploy the app. But copy-pasting, or manually re doing the
changes, is cumbersome and probably even error prone. Well, cherry-pick comes to the rescue. It let’s us replicate that single commit onto the master branch, all while preventing duplicate work and keeping our git history clean.
The only thing we need is the abbreviated commit hash (or all of it), we move to the branch where we want to incorporate the changes and use it like this:

git checkout master
git cherry-pick 3f75a585

That’s all!

I hope you have to use this as little as possible, but find it useful when the time comes.

What do you think of cherry-pick, what do you use it for?

How to keep an SSH session alive

* This is a Has it ever happened to you… kind of post.

Imagine you’re logged in to a server doing some magical stuff. Then you go grab a coffee and when you come back… you’re logged out from the server. Yes, it sucks. You have to SSH in again and cd into the same dir you were before, etc. Ain’t nobody go time for that.

What if I told you that you can keep an SSH session alive? 🚀
All you have to do is edit your ~/.ssh/config file and add the following:

Host *
ServerAliveInterval 60

You can define a specific host, and choose the interval. Most servers with which I have this issue have a rather low timeout, so I’ve chosen to send the keep alive signal every 60 seconds.

And baam, you’ve freed yourself from this annoyance.

rockymonkey555 over stackoverflow.com recommends to also chmod 600 ~/.ssh/config, “because the config file must not be world-readable”.

I hope this is as useful to you as it’s been for me.

1st Blog Post

This is the first post, and it has a very descriptive title.

I wanted to start a blogging for some time, but always ended up postponing it. Mainly because of a generous dose of impostor syndrome, but also because I haven’t been good at making the time to make the blog, in the first place. And, yes, as a developer I wanted my blog to be just perfect.

So, here we are. This is me forcing myself to start sharing, on a sketchy and rushed blog.

Also, I love WordPress and this gives me an excuse to tinker with it a little more ever since I stopped working with it daily.

I honestly have no clue if this is going to be a weekly thing, monthly or whatever. Though it won’t be daily for sure. Let’s just see how it feels to share some thoughts, things I learn or anything else.

Talk to you in the next one!

PD 1: I think most of the next posts will be just some meticulously crafted micro tutorials/reminders of some of the code snippets I have stored like everywhere.

PD 2: As you can see there’s no comments available here, but there will be in the upcoming entries.

PD 3: Saw that line above where I said that “I wanted to start a blogging for some time”, well I’ve this post drafted out for 7 months.