Tuesday, January 02, 2007

The fast track to Amazon EC2

NOTE: The instructions below are now completely superceded. rBuilder Online has the ability to generate Amazon Machine Images directly, uploads them to S3 and registers them with EC2 as public AMI's. Without you having to do anything. Cook. Build. Boot.

Over the holiday break I found some time to get hands-on with the Amazon EC2 service. Being based on the Xen hypervisor, I knew that in theory any rPath-based appliance should "just work" as an Amazon Machine Image (AMI)

The good news: it's true!

Here's the skinny on how to get any rPath based appliance that has been built in the [xen,domU] flavor working on EC2. For this exercise, you'll need both an Amazon S3 account (since EC2 uses image files drawn from S3) and an EC2 account. The latter is harder to come by since EC2 is still in limited beta.

Make sure you've already got your AWS account ID, access key and secret access key, and have generated a cert and pk file locally. More tool instructions here.

Then, make sure you have the Amazon cmd line tools for working with EC2. The 'operational' EC2 tools are written in Java, shipped as a tarball, and seem to run fine. The tools for bundling new AMI images and uploading them to S3 are written in ruby and available for download as an rpm. (Painful installing these on Mac OS X btw.)

To use a given rPath-based appliance image:TE:

  1. Grab the Xen filesystem image from rBuilder Online. (e.g. MediaWiki Appliance)

  2. Unzip and rename it to mediawiki.img for consistency with Amazon's docs

  3. Bundle it up using ec2-bundle-image

    ec2-bundle-image -i mediawiki.img -k pk-yourkey.pem -c cert-yourcert.pem -u your_aws_acct_id

  4. Upload it using ec2-upload-image

    ec2-upload-bundle -b
    your_s3_bucket -m /tmp/mediawiki.img.manifest.xml -a your_public_key -s your_secret_key

    If you don't already have an S3 bucket, the easy way to create one is using the AWSZone web UI

  5. Register it using ec2-register-image

    ec2-register your_s3_bucket/mediawiki.img.manifest.xml

    Make note of the AMI ID reported.

  6. Boot it using ec2-run-instances with the AMI ID. Wait a minute or two.

    ec2-run-instances ami-61b05508

  7. Check status and get DNS name


  8. Login to the rAA console just as you would any local machine....

That's it!

No magic, nothing special. Just use.

If you happen to be the developer of the appliance in question, you can get extra credit by adding a few entries to your appliance's /etc/fstab to mount the Amazon provided sda2 and sda3 as 160GB disk space and swap partitions respectively.

What's the catch?

By design, all EC2 instances are ephemeral - meaning that when shutdown, all local disk storage evaporates. S3 is Amazon's solution for long-lived storage. Think of it as the NAS to complement the EC2 grid.

To use EC2 for "persistent" appliances is going to take either a snapshot/backup strategy, a fault-tolerant clustered approach, or integration of perhaps the S3 FUSE filesystem driver for persistent storage needs. All good topics for further exploration...


  1. When you say that you lose your storage when the instance is "shutdown", I was initially thinking "on reboot" as well... you only use your data if:

    1) the underlying hardware fails
    2) Amazon decides to move your instance for some reason known only to them
    3) you turn off your instance via the API

    I'm thinking of using EC2 in place of a traditional dedicated server and was initially worried about the ephemeral nature of it. I'm a bit less worried given the actual circumstances that would cause your instance data to go away.

    (Add in the fact that S3 provides a very inexpensive way to back up your data, and it starts to look more attractive than a typical dedicated hosting provider.)

    BTW, I haven't used the service, so please correct me if any of my assumptions are wrong.

    Thanks for the helpful post!

  2. Hardware failure, instance termination via API/cmd line and shutdown -h within the instance are all what I meant by "shutdown".

    Willfulness on the part of Amazon is also, I guess, possible. :-)

    To use EC2 for long-lived stateful servers, you're definitely going to need a backup strategy in place - one of the things I like about the latest rPath Appliance Agent is that it has a backup mechanism built in which can periodically copy your persistent state to an external sink. There's not an S3 sink plugin...yet... ;-)

  3. Anonymous4:52 PM

    Would you mind detailing what you did to get the AMI tools to run on Mac OS X?

  4. Happy to share the MacOS X workaround. See yesterday's post

  5. These instructions, while valid, are now completely superceded. rBuilder now produces AMIs directly and publishes them to EC2 for you. Gesagt, getan.