How to recover a lost EC2 instance

28 Oct 2015

If you’re working with servers in the cloud there is a good chance you’re automating everything with a nice orchestration tool. Maybe you’re using something like Ansible, Chef or Puppet, or maybe you’ve embraced containers and are using Docker. Maybe you’re on a PaaS like Heroku.

Good for you!

And yet, it might still happen that you have to deal with some machine that is managed manually. Maybe it was setup long ago and people have forgotten about it. Maybe “it just works” and no one has ever taken the time to move it into your sleek automated infrastructure. Maybe the person who set it up left, and no one knows how to access it.

Yes, we’ve all been there.

So, what do you do when something breaks in that tiny obscure box? Or maybe nothing breaks, but you finally decide to have a look and see if something can be improved? What do you do when, suddenly, you realize that no one has access to it? That’s a problem I had to deal with a few days ago.

In my case it was an Amazon EC2 instance which, luckily, was using an EBS volume as its boot device. That made things quite easy, and the process to (re)gain access is not that different from what you would do with a bare metal server.

The step by step instructions are below. They assume some familiarity with AWS and do not go into low level technical hand-holding. Some tinkering with the AWS web dashboard is required.

So, here we are:

Stop the locked EC2 instance, but do not terminate it.
Detach the EBS volume.
Start a new recovery EC2 instance in the same Availability Zone of the EBS volume. We’ll need to SSH into it, so:
- configure it to have a public IP,
- remember to get the private key.
Attach the EBS volume to the new recovery instance (docs).
SSH into the instance.
- Restart the instance if the EBS volume is not immediately available.
Mount the EBS volume. The actual data device might be available as a partition (docs, more docs and inevitable SO question). The boot device of the locked EC2 instance is now available as a directory in the file system of the recovery instance that you’re using: you can move around and modify, create and delete any file (you’ll probably need sudo).
Edit the required system files.
- If the reason you’re doing this is that you couldn’t access the locked instance at all, then append a new public key to the $HOME/.ssh/authorized_keys file of the user you want to SSH as.
- If you can SSH into the locked instance, but the problem is that you don’t have root access or sudo priviledges, edit the /etc/sudoers file.
- If you have SSH access to the locked machine via keypair authentication, but don’t have the user’s password and are thus prevented from sudo‘ing (trust me, it happens), just add a NOPASSWD option to the sudoers file. This will enable you to change your password later.
Log out of the recovery instance and terminate it.
Detach the EBS volume.
Re-attach the EBS volume to the stopped locked EC2 instance, as the boot volume.
Restart it.
The public and private IP addresses and URLs will have changed, you might need to update the DNS entries.
SSH into the (not so) locked (anymore) instance.
Profit.

tags: code Linux EC2 AWS server SSH cloud