List ALL the availability zones!

Have you ever tried to google for a list of AWS Regions or Availability Zones? I do it all the time. I need to figure out which AZs to my ELB should go across, or want to try my new CloudFormation template in a different region to make sure I didn’t hardcode something I shouldn’t have. Of course, I usually blank out on how many zones each region has, and I always get my directions screwed up so I can never remember if Ireland is eu-west or eu-east (and while it is embarrassing to admit, I had no idea where São Paulo was before I started working with the AWS platform).

However, every time I google for a list of AZs, I remember that none of the AWS documentation gives a nice, easy list of all the regions and their availability zones. AWS does add new zones and regions pretty regularly, so a list might go out of date after a few months. Still, I end up looking for these every few days, so I wrote them all out, and then put it on the internet, so hopefully next time I google for “list of availability zones” this page pops up. Maybe it’ll help you too?

Virginia — US East 1
• us-east-1a
• us-east-1b
• us-east-1c
• us-east-1d
• us-east-1e

California — US West 1
• us-west-1a
• us-west-1b
• us-west-1c

Oregon — US West 2
• us-west-2a
• us-west-2b
• us-west-2c

Ireland — EU West 1
• eu-west-1a
• eu-west-1b
• eu-west-1c

Singapore — AP Southeast 1
• ap-southeast-1a
• ap-southeast-1b

Tokyo — AP Northeast 1
• ap-northeast-1a
• ap-northeast-1b
• ap-northeast-1c

Sydney — AP Southeast 2
• ap-southeast-2a
• ap-southeast-2b

São Paulo — SA East 1
• sa-east-1a
• sa-east-1b


How to manually run Chef on an OpsWorks instance

While OpsWorks gives you a lot of power and flexibility for configuring your infrastructure, it’s most powerful feature is its ability to get you through your Netflix queue: OpsWorks takes a really long time to do anything, so you’re left twiddling your thumbs a lot. And if you’re trying to troubleshoot an issue, you can sometimes have an entire episode of The Next Generation between when you kick off a stack and when it finally fails. However, there are some tricks you can do to mitigate that, so let’s talk about one today.

We’ve come up with a couple of techniques to mitigate OpsWorks slowness, and one of them is to run problematic cookbooks on the failed OpsWorks instances until you can figure out a solution. When OpsWorks fails to set up an instance, it doesn’t destroy the instance; you can still SSH into it. By doing that, you can attempt to manually re-run your failed cookbook from that point, instead of having to wait for OpsWorks to recreate everything.

(Of course, to be able to do any of this, your OpsWorks instances need to be associated with an SSH keypair, or you need to have SSH permissions enabled for the stack and have a public key associated with your account in the OpsWorks settings. So make sure you’re doing that, at least during development!)

A normal workflow for us is we’ll see a cookbook failure, bang our head on our desk for making such a simple mistake, and then we’ll go fix the cookbook. After the change is committed to our repo, we’ll log into the failed OpsWorks instance and run these commands:

sudo -i
cd /opt/aws/opsworks/current/
opsworks-agent-cli get_json > attributes.json
bin/chef-solo -c conf/solo.rb -j attributes.json -o recipe[whatever],recipe[whatever_else::specific_recipe]

This will kick off Chef, but only run the recipe(s) you define. While waiting around for OpsWorks is something you can’t get rid of entirely, using this technique will let you save a bit of time (so you might not be able to finish Season 3 today).

Once you fix the issue, don’t forget to commit your changes to source control! Also, you’ll still need to create a whole new OpsWorks stack; the instance will still show as failed in the OpsWorks console and features like deployments and auto-healing won’t be applied to the instance.


Creating a Secure Deployment Pipeline in Amazon Web Services

Many organizations require a secure infrastructure. I’ve yet to meet a customer that says that security isn’t a concern. But, the decision on “how secure?” should be closely associated with a risk analysis for your organization.

Since Amazon Web Services (AWS) is often referred to as a “public cloud”, people sometimes infer that “public” must mean it’s “out in the public” for all to see. I’ve always seen “public/private clouds” as an unfortunate use of terms. In this context, public means more like “Public Utility”. People often interpret “private clouds” to be inherently more secure. Assuming that “public cloud” = less secure and “private cloud” = more secure couldn’t be further from the truth. Like most things, it’s all about how you architect your infrastructure. While you can define your infrastructure to have open access, AWS provides many tools to create a truly secure infrastructure while eliminating access to all but only authorized users.

I’ve created an initial list of many of the practices we use. We don’t employ all these practices in all situations, as it often depends on our customers’ particular security requirements. But, if someone asked me “How do I create a secure AWS infrastructure using a Deployment Pipeline?”, I’d offer some of these practices in the solution. I’ll be expanding these over the next few weeks, but I want to start with some of our practices.

AWS Security

* After initial AWS account creation and login, configure IAM so that there’s no need to use the AWS root account
* Apply least privilege to all IAM accounts. Be very careful about who gets Administrator access.
* Enable all IAM password rules
* Enable MFA for all users
* Secure all data at rest
* Secure all data in transit
* Put all AWS resources in a Virtual Private Cloud (VPC).
* No EC2 Key Pairs should be shared with others. Same goes for Access Keys.
* Only open required ports to the Internet. For example, with the exception of, say, port 80, no security groups should have a CIDR Source of The bastion host might have access to port 22 (SSH), but you should enable CIDR to limit access to specific subnets. Using a VPC is a part of a solution to eliminate Internet access. No canonical environments should have SSH/RDP access.
* Use IAM to limit access to specific AWS resources and/or remove/limit AWS console access
* Apply a bastion host configuration to reduce your attack profile
* Use IAM Roles so that there’s no need to configure Access Keys on the instances
* Use resource-level permissions in EC2 and RDS
* Use SSE to secure objects in S3 buckets
* Share initial IAM credentials with others through a secure mechanism (e.g. AES-256 encryption)
* Use and monitor AWS CloudTrail logs

Deployment Pipeline

A deployment pipeline is a staged process in which the complete software system is built and tested with every change. Team members receive feedback as it completes each stage. With most customers, we usually construct between 4-7 deployment pipeline stages and the pipeline only goes to the next stage if the previous stages were successful. If a stage fails, the whole pipeline instance fails. The first stage (often referred to as the “Commit Stage”) will usually take no more than 10 minutes to complete. Other stages may take longer than this. Most stages require no human intervention as the software system goes through more extensive testing on its way to production. With a deployment pipeline, software systems can be released at any time the business chooses to do so. Here are some of the security-based practices we employ in constructing a deployment pipeline.

* Automate everything: Networking (VPC, Route 53) Compute (EC2), Storage, etc. All AWS automation should be defined in CloudFormation. All environment configuration should be defined using infrastructure automation scripts – such as Chef, Puppet, etc.
* Version Everything: Application Code, Configuration, Infrastructure and Data
* Manage your binary dependencies. Be specific about binary version numbers. Ensure you have control over these binaries.
* Lockdown pipeline environments. Do not allow SSH/RDP access to any environment in the deployment pipeline
* For project that require it, use permissions on the CI server or Deployment application to limit who can run deployments in certain environments – such as QA, Pre-Production and Production. When you have a policy in which all changes are applied through automation and environments are locked down, this usually becomes less of a concern. But, it can still be a requirements on some teams.
* Use the Disposable Environments pattern – instances are terminated once every few days. This approach reduces the attack profile
* Log everything outside of the EC2 instances (so that they can be access later). Ensure these log files are encrypted e.g. securely through S3)
* All canonical changes are only applied through automation that are part of the deployment pipeline. This includes application, configuration, infrastructure and data change. Infrastructure patch management would be a part of the pipeline just like any outer software system change.
* No one has access to nor can make direct changes to pipeline environments
* Create high-availability systems Multi-AZ, Auto Scaling, Elastic Load Balancing and Route 53
* For non-Admin AWS users, only provide access to AWS through a secure Continuous Integration (CI) server or a self-service application
* Use Self-Service Deployments and give developers full SSH/RDP access to their self-service deployment. Only their particular EC2 Key Pair can access the instance(s) associated with the deployment. Self-Service Deployments can be defined in the CI server or a lightweight self-service application.
* Provide capability for any authorized user to perform a self-service deployment with full SSH/RDP access to the environment they created (while eliminating outside access)
* Run two active environments – We’ve yet to do this for customers, but if you want to eliminate all access to the canonical production environment, you might choose to run two active environments at once so that engineers can access the non-production environment to troubleshoot a problem in which the environment has the exact same configuration and data so you’re troubleshooting accurately.
* Run automated infrastructure tests to test for security vulnerabilities (e.g. cross-site scripting, SQL injections, etc.) with every change committed to the version-control repository as part of the deployment pipeline.


* What is a canonical environment? It’s your system of record. You want your canonical environment to be solely defined in source code and versioned. If someone makes a change to the canonical system and it affects everyone it should only be done through automation. While you can use a self-service deployment to get a copy of the canonical system, any direct change you make to the environment is isolated and never made part of the canonical system unless code is committed to the version-control repository.
* How can I troubleshoot if I cannot directly access canonical environments? Using a self-service deployment, you can usually determine the cause of the problem. If it’s a data-specific problem, you might import a copy of the production database. If this isn’t possible for time or security reasons, you might run multiple versions of the application at once.
* Why should we dispose of environments regularly? Two primary reasons. The first is to reduce your attack profile (i.e. if environments always go up and down, it’s more difficult to hone in on specific resources. The second reason is that it ensures that all team members are used to applying all canonical changes through automation and not relying on environments to always be up and running somewhere.
* Why should we lockdown environments? To prevent people from making disruptive environment changes that don’t go through the version-control repository.


Continuous Integration in the Cloud

This article is the third part of the Cloud Delivery Blueprints series. It discusses how to go from no cloud infrastructure and no continuous integration set up to having a functioning Deployment Pipeline in Amazon Web Services. It discusses high level topics while also providing a reference implementation so it’s easy to follow along with. You can read part one here and part two here.

What we’re going to do today:

• Identify your build steps and how you unit test
• Write to automate build and unit test process
• Commit to your source repo
• Make Commit Stage Go Green

So, in the first last post, we got your source into version control and you Jenkins server up and running. Now what? It’s time to start building the blocks for our pipeline. First up: setting up the Commit Stage of the pipeline.

What is the Commit Stage?

The commit stage is the first real stage of the pipeline, and the goal is to build artifacts for later stages in the pipeline, and run a fast suite of tests on the code to make sure that the code is ready to move on to those later stages. The key word is ‘fast’; your commit stage shouldn’t take longer than it takes for your developers to go to get a cup a coffee. Less than five minutes is ideal; over ten and you’re taking too long.

The different tasks that get done during the commit stage are:

• Compile or syntax check your code
• Run commit stage tests
• Static analysis of the code
• Package code into a distributable binary (if necessary)
• Preparation of any other artifacts needed by later stages of the pipeline (like test databases, or other quickly producible artifacts)

The “commit stage tests” is a bit of nebulous term, so let’s be specific: these are the fast-running unit tests that cover a decent amount of your code base. These two goals seem to work against each other — if the tests are fast, how can they be comprehensive? At the commit stage, we’re looking for unit-test style verification; these tests should focus on testing very specific units of code, and if rely heavily on mocking out different parts of the codebase. If you can avoid interacting with outside resources, like the file system or database, that’s ideal. It’s not necessary, though, as long as your tests are running quickly.

The other set of tests are going to be static analysis. Static analysis is the examining of the source code, looking for patterns that are likely to be faulty code, poor performing code, etc. Most languages have some sort of static analysis tools, and some specialize in looking for certain things. For example, for Ruby apps, Reek will look for code smells (like bad variable names or lack of comments) where Brakeman will look for specific Ruby on Rails security holes. Which static analysis tools make sense of your project is left as an exercise to the reader, but a quick Google search of “[language name] static analysis” will most likely give you more results than you care to dig through.

Not all languages require the code to be packaged, but for the ones that do, this is the step to do it in. For example, if you have a Java project, you’ll want to package your jars/wars/ears in this project, and make those artifacts available for later stages in the pipeline.
The last thing to do in the commit stage is to set up any other resources that future stages will need. These could be test databases that contain a subset of what a production database would look like, an archive of all the images used on the site, docker containers, etc.

The Reference Implementation’s Commit Stage

The reference implementation we’ve developed as a companion to this series comes with a commit stage build job, prepopulated with the configuration to pull down the code from a github repository and run the build script. If you’ve spun up a Jenkins server (directions to do that are in our last entry) and entered a custom github repo, you’ll see it in the configuration. If you didn’t, it’ll point to Stelligent’s CanaryBoard project.

If you look at the configuration of the commit stage job, there’s a couple important bits to check out:

• The source control repo: the defines the source code repository where your source code lives. It was defined a parameter when the Jenkins OpsWorks stack was created; by default, it points to Stelligent’s CanaryBoard repo.
• The build step: the build step is pretty simply, and calls the that exists in the source control repo.

Let’s look at the build script CanaryBoard uses:

#!/bin/bash -e

This is a bash command that tells the shell to kill the script and exit with an error if any of the commands contained within fail. This is important so that Jenkins properly interprets what happens during the build.

find . -name *.rb | xargs -n1 ruby -c > /dev/null

Since this is a Ruby on Rails project, the first thing we want to do is a syntax check of the Ruby code. If this was in a compiled language, we’d do the compilation first.

gem install bundler
bundle install
bundle exec rake db:setup
bundle exec rake db:test:prepare
bundle exec rspec spec/ -f d

After we’re sure the code is valid, we want to run our tests. We make sure all the required dependencies are installed, then set up our test database, and then run the tests.

gem install brakeman --version 2.1.1
brakeman -o brakeman-output.tabs

Finally, we want to run Brakeman, which is a static analysis tool for Ruby on Rails applications. In the Jenkins server, we have the Brakeman plugin installed, so it will parse the output into something visual.

Writing your own

Writing a build script is pretty easy. In fact, build scripts usually get written pretty early in a project, so you likely already have a build script for your project. However, your build script may not be called, and it may not even be an actual shell script.

Most languages have build tools that will do all the work that a commit stage is supposed to do. A good number of Java projects will use Maven to handle most of this work. A lot of Ruby projects use Rake and Bundler to handle most of these steps. (CanaryBoard uses these tools.) .NET projects use Visual Studio to build the code, and there are many third party tools to handle other commit stage activities.

Depending on how your project is set up, you’ll need to write your differently. You’ll need to figure out the different parts of your projects build cycle and then script them out. If you’re using build tools, you’ll need to figure out how you’re calling them and what each does.

In most cases, this information will be handy, even if you aren’t practicing continuous delivery yet. You are delivering the software at some point, so someone should know how to build and package your software. They should be able to provide you with the necessary build steps. Once you have those, it’s a matter of scripting them out and committing them to your repo.

Note: your build script doesn’t need to be written as a shell script. You can write your script in any language you like, and then either call it from a file named, or edit the commit stage configuration to call your script.

Regardless of how you write your build script, you’ll want to make sure you hit three main areas:

• Compile or syntax check the code
• Run your unit-level tests
• Conduct any static analysis of your code

Once your is complete and you’ve completed your local testing, commit it to the repository. The pipeline should detect the change and automatically kick off a build, automatically calling your file.

Wrapping up

The commit stage if the first step of a deployment pipeline, and provides for quick feedback to developers that their code is working as intended. It builds, packages, and tests code, acting as a gate to later stages, and preparing items for those later stages. The key to a good commit stage is to have it provide as much valuable feedback as possible in a short amount of time; less than five minutes is ideal, and more than ten is too much.

In our Jenkins server, it looks for a file named that handles all of these items. It is part of the application’s source repo, so developers can also run it before they commit to make sure that they won’t be the ones breaking the build.

The commit stage is fast, but it’s testing is not comprehensive. In the next stage, Acceptance, we’ll talk about integration testing. We’ll also cover automated infrastructure provisioning, a key part of the Acceptance stage, as well as all other following stages.


Laying the Foundations of a Continuous Delivery Pipeline

This article is part of the Continuous Delivery Blueprints series. It discusses how to go from no cloud infrastructure and no continuous integration set up to having a functioning Continuous Delivery Pipeline in Amazon Web Services. It discusses high level topics while also providing a reference implementation so it’s easy to follow along with. You can read part one here.

What we’re going to do today:

  • • Set up Version Control System
  • • Migrate code into Version Control System
  • • Launch Jenkins Server
  • • Explore Jenkins Server

With your AWS account set up and ready to go, it’s time to start setting up your pipeline. At the end of this article, you’ll have a functioning Continuous Integration server, and can start building out your pipeline.

Set up a Version Control System

To do any kind of continuous integration, delivery, or deployment, you need your source code in source control. It’s the foundation upon which everything else is built. If your source is already under a version control system, you can just skip this step.

Otherwise, it’s time to set up a source control system there are two options when it comes to source control: either you can build and run the server which your source control service lives on, or you can use a hosted source control provider.

Choosing a hosted provider gives you a lot of things — you can instantly start using it without having to configure any sort of server; you don’t have to worry about backups and disaster recovery; you’ll always be able to take advantage of the latest features. On the downside, you are giving your source code to another entity, which may not be an acceptable trade off at your organization. You’ll need to figure out which solution works best for you. If you’re not using source control already, this is probably the easiest option.

For the purposes of these blueprints, we’ll be using Github for our source hosting. If you’d like to set up your own git server, there are directions for that online.

There are other version control options (like Mercurial) and other hosted source options (like bitbucket). All major version control solutions will work with the Continuous Delivery Pipeline. That said, we’d advise against using Subversion. Subversion is easy to pick up, but once you really get going with it you can lose a lot of time dealing with merge headaches. Distributed version control systems like Git and Mercurial and better choices in the long run. (If you already have your source code in Subversion, don’t worry — you’ll still be able to do everything we talk about.)

Throughout the blueprints, we’ll be referring to our reference implementation project, CanaryBoard. It’s an open source project Stelligent built. It is currently hosted in Github, and if you’re just getting started, we recommend you use Github as well. Setting up an account with Github is pretty easy, here’s the account sign up form if you don’t already have an account. Note that if you use the free github account, you do not get any private repositories, and your code is open up to the world. If you need a private repository, Github offers paid plans with private repositories.

With an account set up, it’s time to import your code. If your organization doesn’t use source control, it’s a snap to get it added, just follow these instructions to set up a repo, copy your code in, and then commit the code and push it to github. Boom. Done.

If you’re not ready to import your code, you can fork the CanaryBoard repository and follow along using that as a reference implementation. To do that, just go to the CanaryBoard repo page, and look for the “Fork” button at the top right.

Set up your AWS Access Credentials

To interact with AWS programmatically, you need to provide it with access credentials. These are created via the AWS console and stored in environment variables for the scripts to use. Since we’re going to be running scripts to set up the Jenkins server, we need to create those keys now. To do this, log into the AWS Console:

  1. 1. Navigate to the IAM panel
  2. 2. Select “Users” from the left side
  3. 3. Select the user you want to create credentials for
  4. 4. In the lower pane, click the “Security Credentials tab”
  5. 5. Click “Manage Access Keys”
  6. 6. Click “Create Access Key”
  7. 7. Click “Download Credentials”

Now we have the credentials created, but we need to store them in environment variables. On Linux or OSX, use these commands (replacing the values with your actual key values)

export AWS_SECRET_ACCESS_KEY=qwerty1234567890qwerty1234567890qwerty

On Windows:

set AWS_SECRET_ACCESS_KEY=qwerty1234567890qwerty1234567890qwerty

Boom. Now we’re ready to programmatically create some computing instances.

Set up a Continuous Integration Server

So, with your source in source control, we’re ready to set up your Continuous Integration server. For the purpose of these blueprints, we’ll be using Jenkins. Jenkins is a free, open source Continuous Integration server, and it’s widely used. In fact, you might already be using Jenkins somewhere in your organization.
Jenkins is pretty easy to set up, but during these lessons we’ll be taking advantage of several Jenkins plugins and using a bunch of job configuration that is pretty static. Because of that, we’re able to provide you with a script to launch a Jenkins server. It creates an Opsworks stack, and then pulls in some Chef cookbooks we collected to set up a Jenkins server that’s ready for you to work on.

The first thing you’ll want to do is clone the Stelligent CDRI repo and run the Jenkins server script:

git clone
cd cdri
bundle install
ruby bin/create_jenkins_server_stack.rb

(Don’t have Git and/or Ruby installed? Well then, you’ll probably want to follow these directions for installing Git or installing Ruby.) The script will take a couple of minutes to lay down everything it needs to set up a Jenkins server. After the script completes, though, it’ll still take a bit of time for the Jenkins server to build itself and be ready to go. The script actually has a couple options you may want to take advantage of:

$ ruby bin/create_jenkins_server_stack.rb --help
 --region, -r <s>: The AWS region to use (default: us-west-2)
 --zone, -z <s>: The AWS availability zone to use (default: us-west-2a)
 --source, -s <s>: The github repo where the source to build resides (will not work with anything but github!)
 --size, -i <s>: The instance size to use (default: m1.large)
 --help, -h: Show this message

The default region is us-west-2. If you’re in North America, this is probably fine; otherwise you may want to switch it to a region closer to you. (A list of AWS regions is available here.) The default zone us us-west-2a, so if you change the region, you will need to change this as well.

The source repo is configurable, with a couple of caveats. The repository must be a github repository, and it must be a public repo. It is possible to have Jenkins connect to really any other repository type, and it is possible to set up username / password information or SSH keys to authenticate against a private repo. Both increase complexity, so to keep things simple, we’ve restricted the script to only run against public Github repositories.

Finally, you can configure the size of the EC2 instance that the Jenkins server will run on. A list of instance sizes and prices is available here. Note that anything smaller that a c3.large instance will take considerably longer to start up, but shouldn’t impact your ability to run builds at all. (However, Jenkins is a bit too heavy to run on a t1.micro instance, so don’t try going that low; the Chef scripts won’t even successfully run.)

How the Jenkins Server is Set Up

To set up the Jenkins server, we take advantage of two AWS services: CloudFormation and OpsWorks. These services handle a lot of the tedious bits about setting up AWS resources, and handle coordinating all the Chef calls necessary. AWS has a bunch of CloudFormation and OpsWorks documentation you can refer to, and any questions you have about it should be in there. We’ll just give you the highlights in the blueprints to get around the tools we provide; if you need more information, please refer to the documentation.

The first thing the script does is create a CloudFormation stack to create everything that the subsequent OpsWorks stack will need (security groups, roles, and policies). Once those are in place, it then sets up a new OpsWorks stack with a single custom layer and instance for the Jenkins server. If you go to the OpsWorks console, you’ll see a Jenkins stack with an instance starting. Once that instance turns green, your Jenkins server will be up and running. You can access it by clicking on the IP address in the instance details.

Since you’re just getting started, it’s probably a good idea to note that the server you just launched will cost you money for each hour it’s up. If you’re not going to be doing anything with it right away, you may want to stop it when you’re not using it.

Note: when you stop the instance through OpsWorks you’ll lose any configuration changes you’ve made to the server!

Exploring the Pipeline

The Jenkins server comes prepopulated with the beginnings of a continuous delivery pipeline. If you go to the “Continuous Delivery Pipeline” view, you’ll see all the different stages laid out. As you work through this series, you’ll build out each of these stages for your application. For now, let’s take a look at the first two stages, the trigger stage and the commit stage.

The trigger stage is a simple monitoring job. Its purpose is to look for changes in the repositories of your project, and if it detects a change, it kicks off a new run of the pipeline.

The commit stage handles building, packaging, and unit testing the application. It is supposed to run quickly (around five minutes) and give fast feedback to the developers. We will cover the commit stage in depth in the next part of this series.

Wrapping Up

If you didn’t already have source control setup, you do now. That’s huge! Source control is probably the most important part of a continuous delivery pipeline. We also set up a Jenkins server, to automate the different parts of our pipeline. With your code in source control, and your continuous integration server running, you’re ready to start building out the different stages of the pipeline to work with your application. In the next article, we’ll talk about how to set up your commit stage to build and test your application.

Questions? Comments? Reach out on twitter: @jonathansywulak


Getting Started with AWS (the right way)

This article is part of the Continuous Delivery Blueprints series. It discusses how to go from no cloud infrastructure and no continuous integration set up to having a functioning Continuous Delivery Pipeline in Amazon Web Services. It discusses high level topics while also providing a reference implementation so it’s easy to follow along with.

Everyone is talking about migrating to the cloud these days, and getting started with Amazon Web Services is super simple to do. However, most people just rush in, creating headaches for themselves down the road. There are some best practices you should take at the beginning of your cloud migration that will make things easier, more secure, and allow you to scale up and out better.

What we’re going to do today:

  • • Create an AWS Account
  • • Turn On AWS CloudTrail
  • • Turn On Programmatic Billing
  • • Create IAM Users and Groups
  • • Add MFA for New Users

Create your AWS Account

It all starts here: Find the big sign up button and just follow the prompts. A couple of things to note before getting started:

  1. 1. It’ll prompt you for your information (name, email, address, etc) and credit card info, so you should get that figured out first.
  2. 2. You’ll need to verify your account via a phone call, so have your phone handy.
  3. 3. You don’t need to sign up for support just yet.

Once you’re signed up, just login into the AWS console. The console allows you to interact with most AWS services. Most people will start building their servers in the sky right away, but there’s a bit of information you should probably know up front, and some account set up we recommend before getting started. Let’s go over that first.

What You Need To Know About AWS Before Setting Stuff Up

Amazon Web Services offers a lot of different services, from virtual computing instances and storage to transcoding and streaming. Going over each service would take a whole series of blog posts, but an understanding of how AWS is laid out will be helpful when getting started.

AWS has data centers all over the world, and has two ways of grouping them. At global scale there are regions, representing parts of or entire continents. Inside each region are availability zones. Regions are completely distinct entities, and you can only work in one at a time. Availability zones are designed to talk to each other, and AWS will automatically spread your resources across availability zones. Availability zones, however, can only speak to other zones within the same region.

Choosing a region is important, though these directions are the same more-or-less in every region. However, be aware that not all services are available in all regions, and pricing does vary by region. In addition to that, US-East-1 is the “default” zone when you start with AWS, and has been around the longest. For that reason, it’s also the most popular, and sometimes you won’t be able to allocate resources in certain Availability Zones in the US-East-1 region due to those zones being at capacity.

AWS provides lots of documentation on how to choose a region, so definitely look through that to decide the best place to host your infrastructure. If you’re just doing initial investigation into AWS and aren’t sure what region to use, just pick one close to you.

Making a Name For Yourself

We’ll be talking about several AWS services in this section, and many of them make use of AWS Simple Storage Service, or S3. S3 allows you to store objects in the cloud with a high degree of durability. Where S3 objects are stored are called “buckets”. S3 bucket names have to be unique, not just across you account, but across the entire world. A bucket name is globally unique. By the time we’re done, we’ll have created a couple buckets, as well globally unique login URL. For that reason, you should come up with a unique identifier now. For example, when we tested this documentation, we used the identifier “stelligent-cdblueprints.” Just note it down now and we’ll refer to it as we go on.

Turn on CloudTrail

First thing is to turn on CloudTrail. CloudTrail is basically logging for your AWS account. It will generate JSON files and store them in an S3 bucket (Amazon’s cloud storage solution) every time an action is performed on the account. While we won’t be doing a lot with CloudTrail right away, we’re turning it on now because it’s not retroactive — you can only see logs after you’ve turned it on. So let’s turn it on first.

(Quick note: CloudTrail is a relatively new service, and at the time of this writing is only available in two regions: US-East-1 and US-West-2. If you’re using a different region, you might not be able to turn CloudTrail on. If that’s the case, just skip on to the next step.)

  1. 1. Find CloudTrail panel from the main AWS Console,
  2. 2. Click Get Started and just punch in an S3 Bucket name. (As was mentioned above, the S3 bucket name has to be globally unique. One approach is to take the unique identifier you came up with before, and just append -cloudtrail to it. We’ve named our bucket “stelligent-cdblueprints-cloudtrail”.)
  3. 3. Click OK and you’re done.

That was easy.

Turn on Programmatic Billing

Next, we’ll want to turn on Programmatic Billing. This will store your AWS billing in JSON files in another S3 bucket, so that other services can analyze your spending and plot trends over time. We’ll be visiting those kind of tools later on, but we want to enable programmatic billing now because (just like CloudTrail) it only generates data from the present — there’s no way to go back and generate historical data. By turning it on now, when we do start parsing that data for trends, you’ll have a good amount of data to go back through.

Unlike CloudTrail, you’ll need to create and permission the bucket for this yourself.

  1. 1. Go to the S3 console so we can create a new bucket. (Taking your previous unique identifier and just appending -billing to it isn’t a bad idea. We’ve named ours “stelligent-cdblueprints-billing” to keep with the theme.)
  2. 2. Click Create Bucket and punch that name in.
  3. 3. We’ll need to get a bucket permissions policy. Luckily, AWS will generate that for us at this page (we’ll need to flip back to the S3 page in a second, so open this in a new tab):
  4. 4. Go down the list and turn everything on one and a time.
  5. 5. When you get to to Programmatic billing, punch in the name of your bucket, and click “sample policy.” Just copy that policy, then flip back to your S3 bucket.
  6. 6. Click on the bucket, then properties, then Permissions, and you’ll see an option for setting an access policy.
  7. 7. Click into that, paste the policy you just copied, and save.
  8. 8. Now, flip back to the Billing Preferences page, click save there
  9. 9. Continue to enable everything else on this page.

If CloudTrail and Programmatic Billing are so important, why aren’t they turned on by default?

One thing to be aware of with these two services is that they will put data into your S3 buckets. S3 storage is very cheap, and while it is pretty close, it is not free. You’ll be paying between nine and fifteen cents a gig for storage, depending on region. For more details, check out the S3 pricing page. The services themselves don’t cost anything, though; you only pay for storing the data they generate.

Create IAM Users

Now that the bookkeeping is taken care of, let’s set up some users. A lot of new AWS users will start doing everything as the root account, which besides being a bit of a security risk, also poses some issues when you try to have multiple developers building solutions in your cloud. That’s why we strongly recommend setting up IAM users and roles from the beginning.

We’re going to use the AWS Identity and Access Management (IAM) console. IAM allows you to create users, groups, and roles so that you can manage users and access to your AWS account. For the first section, we’ll only be creating one user (for you) and one group (admins) but as your usage of the cloud increases and you need to add more users, you’ll be able to control that from here.

To create a new admins group, head to the IAM console

  1. 1. Click Create Group, and follow the prompts.
  2. 2. We’ll name the group “admins” and give it Administrator access.

Now that we have an admins group, go to the Users panel and create a new user for yourself to log in as. It’s pretty straightforward, and if you hit any bumps in the road, AWS has some pretty good documentation about it.

After you create the user, add it to the admins group. Then, for each user we want to set up two types of authentication. The first is a simple password. Under each users’ Security Credentials tab, click the “Manage Passwords” button and you’ll be able to assign a password.

After each user logs in, you’ll want to require them to add a multi-factor authentication (MFA) device to their account. To add an MFA device

  1. 1. the user will need to login and go to the IAM console
  2. 2. find their username
  3. 3. under the security credentials tab, select “Manage MFA device.”
  4. 4. Then follow the steps to add your virtual MFA device to the account.

Having MFAs set up for all accounts helps ensure that AWS accounts won’t be compromised, keeping your data safe. Also, it helps ensure that your account won’t be used for malicious purposes (DDOS attacks, spam emails, etc) which would at best would increase your AWS bill and worst case have your entire account disabled. We strongly recommend enabling MFAs for all user accounts.

Now that users are able to log in, we’ll need to give them a URL to do so. If you go to the main IAM console, you’ll find a IAM User Sign-In URL section. Remember the unique identifier you came up with your CloudTrail and Programatic Billing buckets? That’s probably a good option for your sign in URL. Changing it is optional, though highly recommended.

Wrapping Up

Using AWS is easy; using it well takes some thought. By setting up logging of your usage and billing information, you’ll be able to identify trends as time goes on. By setting up groups and users, your account is prepared to scale as you bring on more developers. And by giving those users multi-factor authentication, you’ve helped ensure the security of the account. You’re in a great place to start using the cloud. In our next post, we’ll lay the foundations for building a continuous delivery pipeline.


How we use AWS OpsWorks

Amazon Web Services (AWS) OpsWorks was released one year ago this month. In the past year, we’ve used OpsWorks on several Cloud Delivery projects at Stelligent and at some of our customers. This article describes what’s worked for us and our customers. One of our core aims with any customer is to create a fully repeatable process for delivering software. To us, this translates into several more specific objectives. For each process we automate, the process must be fully documented, tested, scripted, versioned and continuous. This article describes how we achieved each of these five objectives in delivering OpsWorks solutions to our customers. In creating any solution, we version any and every asset required to create the software system. With the exception of certain binary packages, the entire software system gets described in code. This includes the application code, configuration, infrastructure and data.

As a note, we’ve developed other AWS solutions without OpsWorks using CloudFormation, Chef, Puppet and some of the other tools mentioned here, but the purpose of this is to describe our approach when using OpsWorks.

AWS Tools

AWS has over 30 services and we use a majority of these services when creating deployment pipelines for continuous delivery and automating infrastructure. However, we typically use only a few services directly when building these infrastructure. For instance, when creating infrastructure with OpsWorks, we’ll use the AWS Ruby SDK to provision the OpsWorks resources and CloudFormation for the resources we cannot provision through OpsWorks. We use these three services to access services such as EC2, Route 53, VPC, S3, Elastic Load Balancing, Auto Scaling, etc. These three services are described below.

AWS OpsWorks
– OpsWorks is an infrastructure orchestration and event modeling service for provisioning infrastructure resources. It also enables you to call out to Chef cookbooks (more on Chef later). The OpsWorks model logically defines infrastructure in terms of stacks, layers and apps. Within stacks, you can define layers; within layers you can define applications and within applications, you can run deployments. An event model automatically triggers events against these stacks (e.g. Setup, Configure, Deploy, Undeploy, Shutdown). As mentioned, we use the AWS API (through the Ruby SDK) to script the provisioning of all OpsWorks behavior. We never manually make changes to OpsWorks through the console (we make these changes to the versioned AWS API scripts).

CloudFormation – We use CloudFormation to automatically provision resources that we cannot provision directly through OpsWorks. For example, while OpsWorks connects with Virtual Private Clouds (VPC)s and Elastic Load Balancer (ELB)s, you cannot provision VPC or ELB directly through OpsWorks. Since we choose to script all infrastructure provisioning and workflow, we wrote CloudFormation templates for defining VPCs, ELBs, Relational Database Service (RDS) and Elasticache. We orchestrate the workflow in Jenkins so that these resources are automatically provisioned prior to provisioning the OpsWorks stacks. This way, the OpsWorks stacks can consume these resources that were provisioned in the CloudFormation templates. As with any other program, these templates are version-controlled.

AWS API (using Ruby SDK) – We use the AWS Ruby SDK to script the provisioning of OpsWorks stacks. While we avoid using the SDK directly for most other AWS services (because we can use CloudFormation), we chose to use the SDK for scripting OpsWorks because CloudFormation does not currently support OpsWorks. Everything that you might do using the OpsWorks dashboard – creating stacks, JSON configuration, calling out to Chef, deployments – are all written in Ruby programs that utilize the OpsWorks portion of the AWS API.

Infrastructure Automation

There are other non-AWS specific tools that we use in automating infrastructure. One of them is the infrastructure automation tool, Chef. Chef Solo is called from OpsWorks. We use infrastructure automation tools to script and as a way to document the process of provisioning infrastructure.

Chef – OpsWorks is designed to run Chef cookbooks (i.e. scripts/programs). Ultimately, Chef is where a bulk of the behavior for provisioning environments is defined – particularly once the EC2 instance is up and running. In Chef, we write recipes (logically stored in cookbooks) to install and configure web servers such as Apache and Nginx or application servers such as Rails and Tomcat. All of these Chef recipes are version-controlled and called from OpsWorks or CloudFormation.

Ubuntu – When using OpsWorks and there’s no specific operating system flavor requirement from our customer, we choose to use Ubuntu 12.04 LTS. We do this for two reasons. The first is that at the time of this writing, OpsWorks supports two Linux flavors: Amazon Linux and Ubuntu 12.04 LTS. The reason we choose Ubuntu is because it allows us to use Vagrant (more on Vagrant later). Vagrant provides us a way to test our Chef infrastructure automation scripts locally – increasing our infrastructure development speed.

Supporting Tools

Other supporting tools such as Jenkins, Vagrant and Cucumber help with Continuous Integration, local infrastructure development and testing. Each are described below.

JenkinsJenkins is a Continuous Integration server, but we also use it to orchestrate the coarse-grained workflow for the Cloud Delivery system and infrastructure for our customers. We use Jenkins fairly regularly in creating Cloud Delivery solutions for our customers. We configure Jenkins to run Cucumber features, build scripts, automated tests, static analysis, AWS Ruby SDK programs, CloudFormation templates and many more activities. Since Jenkins is an infrastructure component as well, we’ve automated the creation in OpsWorks and Chef and it also runs Cucumber features that we’ve written. These scripts and configuration are stored in Git as well and we can simply type a single command to get the Jenkins environment up and running. Any canonical changes to the Jenkins server are made by modifying the programs or configuration stored in Git.

VagrantVagrant runs a virtualized environment on your desktop and comes with support for certain OS flavors and environments. As mentioned, we use Vagrant to run and test our infrastructure automation scripts locally to increase the speed of development. In many cases, what might take 30-40 minutes to run the same Chef cookbooks can take 4-5 minutes to run locally in Vagrant – significantly increase our infrastructure development productivity.

Cucumber – We use Cucumber to write infrastructure specifications in code called features. This provides executable documented specifications that get run with each Jenkins build. Before we write any Chef, OpsWorks or CloudFormation code, we write Cucumber features. When completed, these features are run automatically after the Chef, OpsWorks and/or CloudFormation scripts provision the infrastructure to ensure the infrastructure is meeting the specifications described in the features. At first, these features are written without step definitions (i.e. they don’t actually verify behavior against the infrastructure), but then we iterate through a process of writing programs to automate the infrastructure provisioning while adding step definitions and refining the Cucumber features. Once all of this is hooked up to the Jenkins Continuous Integration server, it provisions the infrastructure and then runs the infrastructure tests/features written in Cucumber. Just like writing XUnit tests for the application code, this approach ensures our infrastructure behaves as designed and provides a set of regression tests that are run with every change to any part of the software system. So, Cucumber helps us document the feature as well as automate infrastructure tests. We also write usage and architecture documentation in READMEs, wikis, etc.


Getting to know the Chaos Monkey

Moving your infrastructure to the cloud changes the way you think about a lot of things. With the attitude of abundance that comes with having unlimited instances at your command, you can do all sorts of cool things that would be prohibitive with actual hardware: elastic scaling of infrastructure, transient environments, blue/green deployments, etc. Some things that were just plain bad ideas with real servers have become best practices in the cloud – like just randomly turning off your production servers to see what happens.

Chaos Monkey

One of the major concepts of working in the cloud is the idea of “designing for failure.” It’s mentioned in AWS’s Cloud Best Practices, and myriad different blog entries. The main idea behind designing for failure is accepting that things are going to go wrong, and making sure your infrastructure is setup to handle that. But it’s one thing to say that your infrastructure is resilient; it’s quite another to prove it by running tools that’s sole purpose is to tear your infrastructure apart.

There are a bunch of different tools out there that do this (including Stelligent’s Havoc), probably the best known is Netflix’s Chaos Monkey. It’s available for free and is open source. On the downside, it’s not the easiest tool to get going, but hopefully this post can alleviate some of that.

Chaos Monkey is free-to-use and open source, and available on Netflix’s Simian Army GitHub page. Once targeted at an Auto Scaling Group (ASG), Chaos Monkey will randomly delete EC2 instances, challenging your application to recover. Chaos Monkey is initially configured to only operate during business hours, letting you see how resilient your architecture is in controlled conditions, when you’re in the office; as opposed to seeing it happen in the wild, when you’re asleep in bed.

The Chaos Monkey quick start guide shows you how to set up Launch Configs, Auto Scaling Groups, and Simple DB domains using the AWS CLI tools. Depending on your amount of patience and free time, you might be able to make it through those. However, Netflix has another tool, Asgard, which makes setting up all those things a cinch, and [we have a blog post that makes setting up Asgard a cinch], so for the purposes of this explanation, we’re going to assume you’re using Asgard.

As Chaos Monkey will be going in and killing EC2 instances, we highly recommend working with it in a contained environment until you figure out how you’d like to leverage it in your organization. So it’s best to at least set up a new Auto Scaling group, but ideally use an account that you’re not hosting your production instances with, at first.

The first thing you need to do once you have Asgard set up is define an Application for it to use. Select the Apps menu and choose Create New Application. Create a new Standalone Application called MonkeyApp, and enter your name and email address and click Create New Application.

With your new application set up, you’ll need to create an auto-scaling group by going to the Cluster Menu and selecting Auto Scaling Groups, and then hitting the Create New Auto Scaling Group button. Select monkeyapp from the application dropdown, then enter 3 for all the instance counts fields (desired, min, max). The defaults are fine for everything else, so click Create New Autoscaling Group at the bottom of the page.

Once the auto-scaling group is running, you’ll see it spin up EC2 instances to match your ASG sizing. If you were to terminate these instances manually, within a few minutes, another instance would spin up in its place. In this way, you can be your own Chaos Monkey, inflicting targeted strikes against your application’s infrastructure.

Feel free to go give that a shot. Of course, why do anything yourself if you can just make the computer to do that for you?

To set up Chaos Monkey, the first thing you’ll need to do is set up an Amazon Simple DB domain for Chaos Monkey to use. In Asgard, it’s a cinch: just go to SDB and hit Create New SimpleDB Domain. Call it SIMIAN_ARMY and hit the Create button.

Now comes the finicky part of setting up Chaos Monkey on an EC2 instance. Chaos Monkey has a history of not playing well with OpenJDK, and overall getting it installed is more of an exercise in server administration than applying cloud concepts, so we’ve provided a CloudFormation template which will fast forward you to the point where you can just play around.

Once you have Chaos Monkey installed, you’ll need to make a few changes to the configuration to make it work:

vi src/main/resources/

Enter your AWS account and secret keys, as well as change the AWS region if necessary.

vi src/main/resources/

Uncomment the isMonkeyTime key, and set to true. This setting restricts running Chaos Monkey during business hours and when you’re playing around with Chaos Monkey, it may not be during business hours.

vi src/main/resources/

set simianarmy.chaos.leashed=false
set simianarmy.chaos.ASG.enabled=true
simianarmy.chaos.ASG.maxTerminationsPerDay = 100
set simianarmy.chaos.ASG.<monkey-target>.enabled=true
set simianarmy.chaos.ASG.<monkey-target>.probability=6.0

(Replacing <monkey-target> with the name of your auto-scaling group, likely monkeyapp if you’ve been following the directions outlined above.) This is the fun part of the Chaos Monkey config. It unleashes the Chaos Monkey (otherwise it would just say that it thought about taking down an instance, instead of actually doing it). The probability is the daily probability that it’ll kill an instance — 1.0 means an instance will definitely be killed at some point today; 6.0 means that an instance will be killed on the first run. And let’s knock up the max number of terminations per day so we can see Chaos Monkey going nuts.

It’s also probably a good idea to turn off Janitor and VolumeTaggingMonkey, since they’ll just clutter up the logs with them saying they’re not doing anything at all.

vi src/main/resources/
vi src/main/resources/

and set simianarmy.janitor.enabled and simianarmy.volumeTagging.enabled to false in the respective files.

One you’ve configured everything, the following command will kick off the SimianArmy application:

./gradlew jettyRun

After bootstrapping itself, it should identify your auto-scaling group, pick a random instance in it, and terminate it. Your auto-scaling group will respond by spinning up a new instance.

But then what? Did your customers lose all the data on the form they just filled out, or were they sent over to another instance? Did their streaming video cut out entirely, or did quality just degrade momentarily? Did your application respond to the outage seamlessly, or was your customer impacted?

These are the issues that Chaos Monkey will show you are occurring, and you can identify where you haven’t been designing for failure.

(NOTE: When you’re all done playing around with Chaos Monkey, you’ll need to change your monkeyapp Auto Scaling Group instance counts to 0, otherwise AWS will keep those instances up, which could result in higher usage fees than you’re used to seeing. In Asgard, select Cluster > Auto Scaling Groups > monkeyapp > Edit and set all instance counts to zero, and AWS will terminate your test instances. If you’d like to come back later and play around, you can just shut down your Chaos Monkey and Asgard instances and turn them back on when you’re ready; otherwise you can just delete the CloudFormation stacks entirely and that’ll clean up everything for you.)

A quick look at Netflix’s Asgard

Netflix OSS

Asgard is an open source application from Netflix that makes it easier to work with Amazon Web Services. It offers lots of functionality that isn’t accessible via AWS’s normal web interface: for several AWS features, Asgard removes a lot of the cryptic command line tools required. It’s provided free of charge by Netflix, and the source is available on GitHub.

Asgard is a great tool to use when you’re beginning to understand certain concepts in AWS. A lot of the more powerful features available working in the cloud are handled by CLI tools, APIs or CloudFormation. These are great if you’re developing applications to take advantage of the platform, but can make understanding the concepts difficult when you’re exploring them for the first time.

For example, one of the biggest advantages of working in the cloud is you can automatically scale your hardware to match usage. AWS does this with Launch Configurations and Auto Scaling Groups, a powerful feature that is a pain to implement: getting auto-scaling groups set up correctly involves all sorts of CLI tools or calling into APIs. With Asgard (and with our tool, Stelligent Havoc), it’s a couple clicks and a few fields to fill out.

Another painful AWS feature is SimpleDB, which is needed to run another Netflix Open Source Tool, Chaos Monkey (which is the focus of another blog entry). Setting up a Simple DB via the AWS CLI tools requires some tricky scripting and a bit of luck. With Asgard, you just have to punch in the name and hit create.

Asgard has other benefits besides just making complicated AWS functions easier. If you have an organization where multiple developers need access to work in your AWS account, but you want to keep your access and private keys under wraps, you can enter them into Asgard and give your developers access there, empowering them but mitigating the security risks.

Asgard can also provides a way to have logs about what changes are being made to your AWS environment. When using the CLI tools or API, it’s on you to make sure that all activities are being appropriately documented. With Asgard, you get all that for free.

Also, if you’d like to take advantage of the hidden access keys or logging without have to operate through the web interface, Asgard also offers a REST API that you can use to use it programmatically.

Asgard is available as a self-contained war — you can download it and have it running on your machine in a few minutes. Also available is a war you can place in your already running servlet container. There are a couple of gotchas you should watch out for, so make sure you read the troubleshooting page.

Or if you are the instant-gratification type, we’ve developed a CloudFormation template you can use which will set up Asgard in AWS for you in ten minutes, and all you have to do is enter a few simple parameters. Our template uses Chef scripts to setup Tomcat and then install the Asgard web application. The scripts are open source and available on Stelligent’s github page.  All you have to do is enter a username and password for your Asgard installation, and the name key pair to set up the EC2 instance. (You just need the key pair name, you can leave out the .pem.) If you’ve never set up a keypair, check out these directions on the AWS Documentation site.

After you get Asgard running, you’ll be prompted for your access key and secret key (which you can find on your AWS Security Credentials page) and then it’ll take a few minutes to start up. If it takes a long time, check the Asgard troubleshooting page.

Once the setup is complete, you’ll be able to start taking advantage of the powerful AWS features, without all the hassle of dealing with the command line.


Continuous Delivery in the Cloud: Infrastructure Automation (Part 6 of 6)

In part 1 of this series, I introduced the Continuous Delivery (CD) pipeline for the Manatee Tracking application. In part 2, I went over how we use this CD pipeline to deliver software from checkin to production. In part 3, we focused on how CloudFormation is used to script the virtual AWS components that create the Manatee infrastructure. Then in part 4, we focused on a “property file less” environment by dynamically setting and retrieving properties. Part 5 explained how we use Capistrano for scripting our deployment. A list of topics for each of the articles is summarized below:

Part 1: Introduction – Introduction to continuous delivery in the cloud and the rest of the articles;
Part 2: CD Pipeline – In-depth look at the CD Pipeline;
Part 3: CloudFormation – Scripted virtual resource provisioning;
Part 4: Dynamic Configuration – “Property file less” infrastructure;
Part 5: Deployment Automation – Scripted deployment orchestration;
Part 6: Infrastructure Automation – What you’re reading now;

In this part of the series, I am going to show how we use Puppet in combination with CloudFormation to script our target environment infrastructure, preparing it for a Manatee application deployment.

What is Puppet?

Puppet is a Ruby based infrastructure automation tool. Puppet is primarily used for provisioning environments and managing configuration. Puppet is made to support multiple operating systems, making your infrastructure automation cross-platform.

How does Puppet work?

Puppet uses a library called Facter which collects facts about your system. Facter returns details such as the operating system, architecture, IP address, etc. Puppet uses these facts to make decisions for provisioning your environment. Below is an example of the facts returned by Facter.

# Facter
architecture => i386
ipaddress =>
is_virtual => true
kernel => Linux
kernelmajversion => 2.6
operatingsystem => CentOS
operatingsystemrelease => 5.5
physicalprocessorcount => 0
processor0 => Intel(R) Core(TM)2 Duo CPU     P8800  @ 2.66GHz
processorcount => 1
productname => VMware Virtual Platform

Puppet uses the operating system fact to decide the service name as show below:

case $operatingsystem {
  centos, redhat: {
    $service_name = 'ntpd'
    $conf_file    = 'ntp.conf.el'

With this case statement, if the operating environment is either centos or redhat the service name ntpd and the configuration file ntp.conf.el are used.

Puppet is declarative by nature. Inside a Puppet module you define the end state the environment end state after the Puppet run. Puppet enforces this state during the run. If at any point the environment does not conform to the desired state, the Puppet run fails.

Anatomy of a Puppet Module

To script the infrastructure Puppet uses modules for organizing related code to perform a specific task. A Puppet module has multiple sub directories that contain resources for performing the intended task. Below are these resources:

manifests/: Contains the manifest class files for defining how to perform the intended task
files/: Contains static files that the node can download during the installation
lib/: Contains plugins
templates/: Contains templates which can be used by the module’s manifests
tests/: Contains tests for the module

Puppet also uses manifests to manage multiple modules together site.pp. Puppet also uses another manifest to define what to install on each node, default.pp.

How to run Puppet

Puppet can be run using either a master agent configuration or a solo installation (puppet apply).

Master Agent: With a master agent installation, you configure one main master puppet node which manages and configure all of your agent nodes (target environments). The master initiates the installation of the agent and manages it throughout its lifecycle. This model enables infrastructure changes to your agents in parallel by controlling the master node.

Solo: In a solo Puppet run, it’s up to the user to place the desired Puppet module on the target environment. Once the module is on the target environment, the user needs run puppet apply --modulepath=/path/to/modules/ /path/to/site.pp. Puppet will then provision the server with the provided modules and site.pp without relying on another node.

Why do we use Puppet?

We use Puppet to script and automate our infrastructure — making our environment provisioning repeatable, fully automated, and less error prone. Furthermore, scripting our environments gives us complete control over our infrastructure and the ability to terminate and recreate environments as often as they choose.

Puppet for Manatees

In the Manatee infrastructure, we use Puppet for provisioning our target environments. I am going to go through our manifests and modules while explaining their use and purpose. In our Manatee infrastructure, we create a new target environment as part of the CD pipeline – discussed in part 2 of the series, CD Pipeline. Below I provide a high-level summary of the environment provisioning process:

1. CloudFormation dynamically creates a params.pp manifest with AWS variables
2. CloudFormation runs puppet apply as part of UserData
3. Puppet runs the modules defined in hosts/default.pp.
4. Cucumber acceptance tests are run to verify the infrastructure was provisioned correctly.

Now that we know at a high-level what’s being done during the environment provisioning, let’s take a deeper look at the scripts in more detail. The actual scripts can be found here: Puppet

First we will start off with the manifests.

The site.pp (shown below) serves two purposes. It loads the other manifests default.pp, params.pp and also sets stages pre, main and post.

import "hosts/*"
import "classes/*"

stage { [pre, post]: }
Stage[pre] -> Stage[main] -> Stage[post]

These stages are used to define the order in which Puppet modules should be run. If the Puppet module is defined as pre,it will run before Puppet modules defined as main or post. Moreover if stages aren’t defined, Puppet will determine the order of execution. The default.pp (referenced below) shows how staging defined for executing puppet modules.

node default {
  class { "params": stage => pre }
  class { "java": stage => pre }
  class { "system": stage => pre }
  class { "tomcat6": stage => main }
  class { "postgresql": stage => main }
  class { "subversion": stage => main }
  class { "httpd": stage => main }
  class { "groovy": stage => main }

The default.pp manifest also defines which Puppet modules to use for provisioning the target environment.

params.pp (shown below), loaded from site.pp, is dynamically created using CloudFormation. params.pp is used for setting AWS property values that are used later in the Puppet modules.

class params {
  $s3_bucket = ''
  $application_name = ''
  $hosted_zone = ''
  $access_key = ''
  $secret_access_key = ''
  $jenkins_internal_ip = ''

Now that we have an overview of the manifests used, lets take a look at the Puppet modules themselves.

In our java module, which is run in the pre stage, we are running a simple installation using packages. This is easily dealt with in Puppet by using the package resource. This relies on Puppet’s knowledge of the operating system and the package manager. Puppet simply installs the package that is declared.

class java {
  package { "java-1.6.0-openjdk": ensure => "installed" }

The next module we’ll discuss is system. System is also run during the pre stage and is used for the setup of all the extra operations that don’t necessarily need their own module. These actions include setting up general packages (gcc, make, etc.), installing ruby gems (AWS sdk, bundler, etc.), and downloading custom scripts used on the target environment.

class system {

  include params

  $access_key = $params::access_key
  $secret_access_key = $params::secret_access_key

  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }

  package { "gcc": ensure => "installed" }
  package { "mod_proxy_html": ensure => "installed" }
  package { "perl": ensure => "installed" }
  package { "libxslt-devel": ensure => "installed" }
  package { "libxml2-devel": ensure => "installed" }
  package { "make": ensure => "installed" }

  package {"bundler":
    ensure => "1.1.4",
    provider => gem

  package {"trollop":
    ensure => "2.0",
    provider => gem

  package {"aws-sdk":
    ensure => "1.5.6 ",
    provider => gem,
    require => [

  file { "/home/ec2-user/aws.config":
    content => template("system/aws.config.erb"),
    owner => 'ec2-user',
    group => 'ec2-user',
    mode => '500',

  define download_file($site="",$cwd="",$creates=""){
    exec { $name:
      command => "wget ${site}/${name}",
      cwd => $cwd,
      creates => "${cwd}/${name}"

  download_file {"database_update.rb":
    site => "",
    cwd => "/home/ec2-user",
    creates => "/home/ec2-user/database_update.rb",

  download_file {"":
    site => "",
    cwd => "/tmp",
    creates => "/tmp/"

  exec {"authorized_keys":
    command => "cat /tmp/ >> /home/ec2-user/.ssh/authorized_keys",
    require => Download_file[""]

First I want to point out that at the top we are specifying to include params. This enables the system module to access the params.pp file. This way we can use the properties defined in params.pp.

include params

$access_key = $params::access_key
$secret_access_key = $params::secret_access_key

This enables us to define the parameters in one central location and then reference that location with other module.

As we move through the script we are using the package resource similar to previous modules. For each rubygem we use the package resource and explicitly tell Puppet to use the gem provider. You can specify other providers like rpm and yum.

We use the file resource to create files from templates.

  :access_key_id => "<%= "#{access_key}" %>",
  :secret_access_key => "<%= "#{secret_access_key}" %>"

In the aws.config.erb template (referenced above) we are using the properties defined in params.pp for dynamically creating an aws.config credential file. This file is then used by our database_update.rb script for connecting to S3.

Speaking of the database_update.rb script, we need to get it on the target environment. To do this, we define a download_file resource.

define download_file($site="",$cwd="",$creates=""){
  exec { $name:
    command => "wget ${site}/${name}",
    cwd => $cwd,
    creates => "${cwd}/${name}"

This creates a new resource for Puppet to use. Using this we are able to download both the database_update.rb and public SSH key.

As a final step for setting up the system, we execute a bash line for copying the contents into the authorized_keys file for the ec2-user. This enables clients with the connected id_rsa key to ssh into the target environment as ec2-user.

The Manatee infrastructure uses Apache for the webserver, Tomcat for the app server, and PostgreSQL for its database. Puppet these up as part of the main stage, meaning they run in order after the pre stage modules are run.

In our httpd module, we are performing several steps discussed previously. The httpd package is installed and creating a new file from a template.

class httpd {
  include params

  $application_name = $params::application_name
  $hosted_zone = $params::hosted_zone

  package { 'httpd':
    ensure => installed,

  file { "/etc/httpd/conf/httpd.conf":
    content => template("httpd/httpd.conf.erb"),
    require => Package["httpd"],
    owner => 'ec2-user',
    group => 'ec2-user',
    mode => '664',

  service { 'httpd':
    ensure => running,
    enable => true,
    require => [
      subscribe => Package['httpd'],

The new piece of functionality used in our httpd module is service. service allows us define the state the httpd service should be in at the end of our run. In this case, we are declaring that it should be running.

The Tomcat module again uses package to define what to install and service to declare the end state of the tomcat service.

class tomcat6 {

  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }

  package { "tomcat6":
    ensure => "installed"

  $backup_directories = [

  file { $backup_directories:
    ensure => "directory",
    owner => "tomcat",
    group => "tomcat",
    mode => 777,
    require => Package["tomcat6"],

  service { "tomcat6":
    enable => true,
    require => [
    ensure => running,

Tomcat uses the file resource differently then previous modules. tomcat uses file for creating directories. This is defined using ensure => “directory”.

We are using the package resource for installing PostgreSQL, building files from templates using the file resource, performing bash executions with exec, and declaring the intended state of the PostgreSQL using the service resource.

class postgresql {

  include params

  $jenkins_internal_ip = $params::jenkins_internal_ip

  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }

  define download_file($site="",$cwd="",$creates=""){
    exec { $name:
      command => "wget ${site}/${name}",
      cwd => $cwd,
      creates => "${cwd}/${name}"

  download_file {"wildtracks.sql":
    site => "",
    cwd => "/tmp",
    creates => "/tmp/wildtracks.sql"

  download_file {"createDbAndOwner.sql":
    site => "",
    cwd => "/tmp",
    creates => "/tmp/createDbAndOwner.sql"

  package { "postgresql8-server":
    ensure => installed,

  exec { "initdb":
    command => "service postgresql initdb",
    require => Package["postgresql8-server"]

  file { "/var/lib/pgsql/data/pg_hba.conf":
    content => template("postgresql/pg_hba.conf.erb"),
    require => Exec["initdb"],
    owner => 'postgres',
    group => 'postgres',
    mode => '600',

  file { "/var/lib/pgsql/data/postgresql.conf":
    content => template("postgresql/postgresql.conf.erb"),
    require => Exec["initdb"],
    owner => 'postgres',
    group => 'postgres',
    mode => '600',

  service { "postgresql":
    enable => true,
    require => [
    ensure => running,

  exec { "create-user":
    command => "echo CREATE USER root | psql -U postgres",
    require => Service["postgresql"]

  exec { "create-db-owner":
    require => [
    command => "psql < /tmp/createDbAndOwner.sql -U postgres"

  exec { "load-database":
    require => [
    command => "psql -U manatee_user -d manatees_wildtrack -f /tmp/wildtracks.sql"

In this module we are creating a new user on the PostgreSQL database:

exec { "create-user":
  command => "echo CREATE USER root | psql -U postgres",
  require => Service["postgresql"]

In this next section we download the latest Manatee database SQL dump.

download_file {"wildtracks.sql":
  site => "",
  cwd => "/tmp",
  creates => "/tmp/wildtracks.sql"

In the section below, we load the database with the SQL file. This builds our target environments with the production database content giving developers an exact replica sandbox to work in.

exec { "load-database":
  require => [
  command => "psql -U manatee_user -d manatees_wildtrack -f /tmp/wildtracks.sql"

Lastly in our Puppet run, we install subversion and groovy on the target node. We could have just included these in our system module, but they seemed general purpose enough to create individual modules.

Subversion manifest:

class subversion {
  package { "subversion":
    ensure => "installed"

Groovy manifest:

class groovy {
  Exec { path => '/usr/bin:/bin:/usr/sbin:/sbin' }

  define download_file($site="",$cwd="",$creates=""){
    exec { $name:
    command => "wget ${site}/${name}",
    cwd => $cwd,
    creates => "${cwd}/${name}"

  download_file {"groovy-1.8.2.tar.gz":
    site => "",
    cwd => "/tmp",
    creates => "/tmp/groovy-1.8.2.tar.gz",

  file { "/usr/bin/groovy-1.8.2/":
    ensure => "directory",
    owner => "root",
    group => "root",
    mode => 755,
    require => Download_file["groovy-1.8.2.tar.gz"],

  exec { "extract-groovy":
    command => "tar -C /usr/bin/groovy-1.8.2/ -xvf /tmp/groovy-1.8.2.tar.gz",
    require => File["/usr/bin/groovy-1.8.2/"],

The Subversion manifest is relatively straightforward as we are using the package resource. The Groovy manifest is slightly different, we are downloading the Groovy tar, placing it on the filesystem, and then extracting it.

We’ve gone through how the target environment is provisioned. We do however have one more task, testing. It’s not enough to assume that if Puppet doesn’t error out, that everything got installed successfully. For this reason, we use Cucumber to do acceptance testing against our environment. Our tests check if services are running, configuration files are present and if the right packages have been installed.

Puppet allows us to completely script and version our target environments. Consequently, this enables us to treat environments as disposable entities. As a practice, we create a new target environment every time our CD pipeline is run. This way we are always deploying against a known state.

As our blog series is coming to a close, let’s recap what we’ve gone through. In the Manatee infrastructure we use a combination of CloudFormation for scripting AWS resources, Puppet for scripting target environments, Capistrano for deployment automation, Simple DB and CloudFormation for dynamic properties and
Jenkins for coordinating all the resources into one cohesive unit for moving a Manatee application change from check-in to production in just a single click.