03-02-2014

Creating a Secure Deployment Pipeline in Amazon Web Services

Many organizations require a secure infrastructure. I’ve yet to meet a customer that says that security isn’t a concern. But, the decision on “how secure?” should be closely associated with a risk analysis for your organization.

Since Amazon Web Services (AWS) is often referred to as a “public cloud”, people sometimes infer that “public” must mean it’s “out in the public” for all to see. I’ve always seen “public/private clouds” as an unfortunate use of terms. In this context, public means more like “Public Utility”. People often interpret “private clouds” to be inherently more secure. Assuming that “public cloud” = less secure and “private cloud” = more secure couldn’t be further from the truth. Like most things, it’s all about how you architect your infrastructure. While you can define your infrastructure to have open access, AWS provides many tools to create a truly secure infrastructure while eliminating access to all but only authorized users.

I’ve created an initial list of many of the practices we use. We don’t employ all these practices in all situations, as it often depends on our customers’ particular security requirements. But, if someone asked me “How do I create a secure AWS infrastructure using a Deployment Pipeline?”, I’d offer some of these practices in the solution. I’ll be expanding these over the next few weeks, but I want to start with some of our practices.

AWS Security

* After initial AWS account creation and login, configure IAM so that there’s no need to use the AWS root account
* Apply least privilege to all IAM accounts. Be very careful about who gets Administrator access.
* Enable all IAM password rules
* Enable MFA for all users
* Secure all data at rest
* Secure all data in transit
* Put all AWS resources in a Virtual Private Cloud (VPC).
* No EC2 Key Pairs should be shared with others. Same goes for Access Keys.
* Only open required ports to the Internet. For example, with the exception of, say, port 80, no security groups should have a CIDR Source of 0.0.0.0/0). The bastion host might have access to port 22 (SSH), but you should enable CIDR to limit access to specific subnets. Using a VPC is a part of a solution to eliminate Internet access. No canonical environments should have SSH/RDP access.
* Use IAM to limit access to specific AWS resources and/or remove/limit AWS console access
* Apply a bastion host configuration to reduce your attack profile
* Use IAM Roles so that there’s no need to configure Access Keys on the instances
* Use resource-level permissions in EC2 and RDS
* Use SSE to secure objects in S3 buckets
* Share initial IAM credentials with others through a secure mechanism (e.g. AES-256 encryption)
* Use and monitor AWS CloudTrail logs

Deployment Pipeline

A deployment pipeline is a staged process in which the complete software system is built and tested with every change. Team members receive feedback as it completes each stage. With most customers, we usually construct between 4-7 deployment pipeline stages and the pipeline only goes to the next stage if the previous stages were successful. If a stage fails, the whole pipeline instance fails. The first stage (often referred to as the “Commit Stage”) will usually take no more than 10 minutes to complete. Other stages may take longer than this. Most stages require no human intervention as the software system goes through more extensive testing on its way to production. With a deployment pipeline, software systems can be released at any time the business chooses to do so. Here are some of the security-based practices we employ in constructing a deployment pipeline.

* Automate everything: Networking (VPC, Route 53) Compute (EC2), Storage, etc. All AWS automation should be defined in CloudFormation. All environment configuration should be defined using infrastructure automation scripts – such as Chef, Puppet, etc.
* Version Everything: Application Code, Configuration, Infrastructure and Data
* Manage your binary dependencies. Be specific about binary version numbers. Ensure you have control over these binaries.
* Lockdown pipeline environments. Do not allow SSH/RDP access to any environment in the deployment pipeline
* For project that require it, use permissions on the CI server or Deployment application to limit who can run deployments in certain environments – such as QA, Pre-Production and Production. When you have a policy in which all changes are applied through automation and environments are locked down, this usually becomes less of a concern. But, it can still be a requirements on some teams.
* Use the Disposable Environments pattern – instances are terminated once every few days. This approach reduces the attack profile
* Log everything outside of the EC2 instances (so that they can be access later). Ensure these log files are encrypted e.g. securely through S3)
* All canonical changes are only applied through automation that are part of the deployment pipeline. This includes application, configuration, infrastructure and data change. Infrastructure patch management would be a part of the pipeline just like any outer software system change.
* No one has access to nor can make direct changes to pipeline environments
* Create high-availability systems Multi-AZ, Auto Scaling, Elastic Load Balancing and Route 53
* For non-Admin AWS users, only provide access to AWS through a secure Continuous Integration (CI) server or a self-service application
* Use Self-Service Deployments and give developers full SSH/RDP access to their self-service deployment. Only their particular EC2 Key Pair can access the instance(s) associated with the deployment. Self-Service Deployments can be defined in the CI server or a lightweight self-service application.
* Provide capability for any authorized user to perform a self-service deployment with full SSH/RDP access to the environment they created (while eliminating outside access)
* Run two active environments – We’ve yet to do this for customers, but if you want to eliminate all access to the canonical production environment, you might choose to run two active environments at once so that engineers can access the non-production environment to troubleshoot a problem in which the environment has the exact same configuration and data so you’re troubleshooting accurately.
* Run automated infrastructure tests to test for security vulnerabilities (e.g. cross-site scripting, SQL injections, etc.) with every change committed to the version-control repository as part of the deployment pipeline.

FAQ

* What is a canonical environment? It’s your system of record. You want your canonical environment to be solely defined in source code and versioned. If someone makes a change to the canonical system and it affects everyone it should only be done through automation. While you can use a self-service deployment to get a copy of the canonical system, any direct change you make to the environment is isolated and never made part of the canonical system unless code is committed to the version-control repository.
* How can I troubleshoot if I cannot directly access canonical environments? Using a self-service deployment, you can usually determine the cause of the problem. If it’s a data-specific problem, you might import a copy of the production database. If this isn’t possible for time or security reasons, you might run multiple versions of the application at once.
* Why should we dispose of environments regularly? Two primary reasons. The first is to reduce your attack profile (i.e. if environments always go up and down, it’s more difficult to hone in on specific resources. The second reason is that it ensures that all team members are used to applying all canonical changes through automation and not relying on environments to always be up and running somewhere.
* Why should we lockdown environments? To prevent people from making disruptive environment changes that don’t go through the version-control repository.

02-03-2014

How we use AWS OpsWorks

Amazon Web Services (AWS) OpsWorks was released one year ago this month. In the past year, we’ve used OpsWorks on several Cloud Delivery projects at Stelligent and at some of our customers. This article describes what’s worked for us and our customers. One of our core aims with any customer is to create a fully repeatable process for delivering software. To us, this translates into several more specific objectives. For each process we automate, the process must be fully documented, tested, scripted, versioned and continuous. This article describes how we achieved each of these five objectives in delivering OpsWorks solutions to our customers. In creating any solution, we version any and every asset required to create the software system. With the exception of certain binary packages, the entire software system gets described in code. This includes the application code, configuration, infrastructure and data.

As a note, we’ve developed other AWS solutions without OpsWorks using CloudFormation, Chef, Puppet and some of the other tools mentioned here, but the purpose of this is to describe our approach when using OpsWorks.

AWS Tools

AWS has over 30 services and we use a majority of these services when creating deployment pipelines for continuous delivery and automating infrastructure. However, we typically use only a few services directly when building these infrastructure. For instance, when creating infrastructure with OpsWorks, we’ll use the AWS Ruby SDK to provision the OpsWorks resources and CloudFormation for the resources we cannot provision through OpsWorks. We use these three services to access services such as EC2, Route 53, VPC, S3, Elastic Load Balancing, Auto Scaling, etc. These three services are described below.


AWS OpsWorks
– OpsWorks is an infrastructure orchestration and event modeling service for provisioning infrastructure resources. It also enables you to call out to Chef cookbooks (more on Chef later). The OpsWorks model logically defines infrastructure in terms of stacks, layers and apps. Within stacks, you can define layers; within layers you can define applications and within applications, you can run deployments. An event model automatically triggers events against these stacks (e.g. Setup, Configure, Deploy, Undeploy, Shutdown). As mentioned, we use the AWS API (through the Ruby SDK) to script the provisioning of all OpsWorks behavior. We never manually make changes to OpsWorks through the console (we make these changes to the versioned AWS API scripts).

CloudFormation – We use CloudFormation to automatically provision resources that we cannot provision directly through OpsWorks. For example, while OpsWorks connects with Virtual Private Clouds (VPC)s and Elastic Load Balancer (ELB)s, you cannot provision VPC or ELB directly through OpsWorks. Since we choose to script all infrastructure provisioning and workflow, we wrote CloudFormation templates for defining VPCs, ELBs, Relational Database Service (RDS) and Elasticache. We orchestrate the workflow in Jenkins so that these resources are automatically provisioned prior to provisioning the OpsWorks stacks. This way, the OpsWorks stacks can consume these resources that were provisioned in the CloudFormation templates. As with any other program, these templates are version-controlled.

AWS API (using Ruby SDK) – We use the AWS Ruby SDK to script the provisioning of OpsWorks stacks. While we avoid using the SDK directly for most other AWS services (because we can use CloudFormation), we chose to use the SDK for scripting OpsWorks because CloudFormation does not currently support OpsWorks. Everything that you might do using the OpsWorks dashboard – creating stacks, JSON configuration, calling out to Chef, deployments – are all written in Ruby programs that utilize the OpsWorks portion of the AWS API.

Infrastructure Automation

There are other non-AWS specific tools that we use in automating infrastructure. One of them is the infrastructure automation tool, Chef. Chef Solo is called from OpsWorks. We use infrastructure automation tools to script and as a way to document the process of provisioning infrastructure.

Chef – OpsWorks is designed to run Chef cookbooks (i.e. scripts/programs). Ultimately, Chef is where a bulk of the behavior for provisioning environments is defined – particularly once the EC2 instance is up and running. In Chef, we write recipes (logically stored in cookbooks) to install and configure web servers such as Apache and Nginx or application servers such as Rails and Tomcat. All of these Chef recipes are version-controlled and called from OpsWorks or CloudFormation.

Ubuntu – When using OpsWorks and there’s no specific operating system flavor requirement from our customer, we choose to use Ubuntu 12.04 LTS. We do this for two reasons. The first is that at the time of this writing, OpsWorks supports two Linux flavors: Amazon Linux and Ubuntu 12.04 LTS. The reason we choose Ubuntu is because it allows us to use Vagrant (more on Vagrant later). Vagrant provides us a way to test our Chef infrastructure automation scripts locally – increasing our infrastructure development speed.

Supporting Tools

Other supporting tools such as Jenkins, Vagrant and Cucumber help with Continuous Integration, local infrastructure development and testing. Each are described below.

JenkinsJenkins is a Continuous Integration server, but we also use it to orchestrate the coarse-grained workflow for the Cloud Delivery system and infrastructure for our customers. We use Jenkins fairly regularly in creating Cloud Delivery solutions for our customers. We configure Jenkins to run Cucumber features, build scripts, automated tests, static analysis, AWS Ruby SDK programs, CloudFormation templates and many more activities. Since Jenkins is an infrastructure component as well, we’ve automated the creation in OpsWorks and Chef and it also runs Cucumber features that we’ve written. These scripts and configuration are stored in Git as well and we can simply type a single command to get the Jenkins environment up and running. Any canonical changes to the Jenkins server are made by modifying the programs or configuration stored in Git.

VagrantVagrant runs a virtualized environment on your desktop and comes with support for certain OS flavors and environments. As mentioned, we use Vagrant to run and test our infrastructure automation scripts locally to increase the speed of development. In many cases, what might take 30-40 minutes to run the same Chef cookbooks can take 4-5 minutes to run locally in Vagrant – significantly increase our infrastructure development productivity.

Cucumber – We use Cucumber to write infrastructure specifications in code called features. This provides executable documented specifications that get run with each Jenkins build. Before we write any Chef, OpsWorks or CloudFormation code, we write Cucumber features. When completed, these features are run automatically after the Chef, OpsWorks and/or CloudFormation scripts provision the infrastructure to ensure the infrastructure is meeting the specifications described in the features. At first, these features are written without step definitions (i.e. they don’t actually verify behavior against the infrastructure), but then we iterate through a process of writing programs to automate the infrastructure provisioning while adding step definitions and refining the Cucumber features. Once all of this is hooked up to the Jenkins Continuous Integration server, it provisions the infrastructure and then runs the infrastructure tests/features written in Cucumber. Just like writing XUnit tests for the application code, this approach ensures our infrastructure behaves as designed and provides a set of regression tests that are run with every change to any part of the software system. So, Cucumber helps us document the feature as well as automate infrastructure tests. We also write usage and architecture documentation in READMEs, wikis, etc.

10-03-2012

Continuous Delivery in the Cloud: Dynamic Configuration (Part 4 of 6)

In part 1 of this series, I introduced the Continuous Delivery (CD) pipeline for the Manatee Tracking application. In part 2 I went over how we use this CD pipeline to deliver software from checkin to production. In part 3, we focused on how CloudFormation is used to script the virtual AWS components that create the Manatee infrastructure. A list of topics for each of the articles is summarized below:

Part 1: Introduction – Introduction to continuous delivery in the cloud and the rest of the articles;
Part 2: CD Pipeline – In-depth look at the CD Pipeline;
Part 3: CloudFormation – Scripted virtual resource provisioning;
Part 4: Dynamic Configuration –  What you’re reading now;
Part 5: Deployment Automation – Scripted deployment orchestration;
Part 6: Infrastructure Automation – Scripted environment provisioning (Infrastructure Automation)

In this part of the series, I am going to explain how we dynamically generate our configuration and avoid property files whenever possible. Instead of using property files, we store and retrieve configuration on the fly – as part of the CD pipeline – without predefining these values in a static file (i.e. a properties file) ahead of time. We do this using two methods: AWS SimpleDB and CloudFormation.

SimpleDB is a highly available non-relational data storage service that only stores strings in key value pairs. CloudFormation, as discussed in Part 3 of the series, is a scripting language for allocating and configuring AWS virtual resources.

Using SimpleDB

Throughout the CD pipeline, we often need to manage state across multiple Jenkins jobs. To do this, we use SimpleDB. As the pipeline executes, values that will be needed by subsequent jobs get stored in SimpleDB as properties. When the properties are needed we use a simple Ruby script script to return the key/value pair from SimpleDB and then use it as part of the job. The values being stored and retrieved range from IP addresses and domain names to AMI (Machine Images) IDs.

So what makes this dynamic? As Jenkins jobs or CloudFormation templates are run, we often end up with properties that need to be used elsewhere. Instead of hard coding all of the values to be used in a property file, we create, store and retrieve them as the pipeline executes.

Below is the CreateTargetEnvironment Jenkins job script that creates a new target environment from a CloudFormation script production.template


if [ $deployToProduction ] == true
then
SSH_KEY=production
else
SSH_KEY=development
fi

# Create Cloudformaton Stack
ruby /usr/share/tomcat6/scripts/aws/create_stack.rb ${STACK_NAME} ${WORKSPACE}/production.template ${HOST} ${JENKINSIP} ${SSH_KEY} ${SGID} ${SNS_TOPIC}

# Load SimpleDB Domain with Key/Value Pairs
ruby /usr/share/tomcat6/scripts/aws/load_domain.rb ${STACK_NAME}

# Pull and store variables from SimpleDB
host=`ruby /usr/share/tomcat6/scripts/aws/showback_domain.rb ${STACK_NAME} InstanceIPAddress`

# Run Acceptance Tests
cucumber features/production.feature host=${host} user=ec2-user key=/usr/share/tomcat6/.ssh/id_rsa

Referenced above in the CreateTargetEnvironment code snippet. This is the load_domain.rb script that iterates over a file and sends key/value pairs to SimpleDB.

require 'rubygems'
require 'aws-sdk'
load File.expand_path('../../config/aws.config', __FILE__)

stackname=ARGV[0]

file = File.open("/tmp/properties", "r")

sdb = AWS::SimpleDB.new

AWS::SimpleDB.consistent_reads do
  domain = sdb.domains["stacks"]
  item = domain.items["#{stackname}"]

  file.each_line do|line|
    key,value = line.split '='
    item.attributes.set(
      "#{key}" => "#{value}")
  end
end

Referenced above in the CreateTargetEnvironment code snippet. This is the showback_domain.rb script which connects to SimpleDB and returns a key/value pair.

load File.expand_path('../../config/aws.config', __FILE__)

item_name=ARGV[0]
key=ARGV[1]

sdb = AWS::SimpleDB.new

AWS::SimpleDB.consistent_reads do
  domain = sdb.domains["stacks"]
  item = domain.items["#{item_name}"]

  item.attributes.each_value do |name, value|
    if name == "#{key}"
      puts "#{value}".chomp
    end
  end
end

In the above in the CreateTargetEnvironment code snippet, we store the outputs of the CloudFormation stack in a temporary file. We then iterate over the file with the load_domain.rb script and store the key/value pairs in SimpleDB.

Following this, we make a call to SimpleDB with the showback_domain.rb script and return the instance IP address (created in the CloudFormation template) and store it in the host variable. host is then used by cucumber to ssh into the target instance and run the acceptance tests.

Using CloudFormation

In our CloudFormation templates we allocate multiple AWS resources. Every time we run the template, a different resource is being used. For example, in our jenkins.template we create a new IAM user. Every time we run the template a different IAM user with different credentials is created. We need a way to reference these resources. This is where CloudFormation comes in. You can reference resources within other resources throughout the script. You can define a reference to another resource using the Ref function in CloudFormation. Using Ref, you can dynamically refer to values of other resources such as an IP Address, domain name, etc.

In the script we are creating an IAM user, referencing the IAM user to create AWS Access keys and then storing them in an environment variable.


"CfnUser" : {
  "Type" : "AWS::IAM::User",
  "Properties" : {
    "Path": "/",
    "Policies": [{
      "PolicyName": "root",
      "PolicyDocument": {
        "Statement":[{
          "Effect":"Allow",
          "Action":"*",
          "Resource":"*"
        }
      ]}
    }]
  }
},

"HostKeys" : {
  "Type" : "AWS::IAM::AccessKey",
  "Properties" : {
    "UserName" : { "Ref": "CfnUser" }
  }
},

"# Add AWS Credentials to Tomcat\n",
"echo \"AWS_ACCESS_KEY=", { "Ref" : "HostKeys" }, "\" >> /etc/sysconfig/tomcat6\n",
"echo \"AWS_SECRET_ACCESS_KEY=", {"Fn::GetAtt": ["HostKeys", "SecretAccessKey"]}, "\" >> /etc/sysconfig/tomcat6\n",

We can then use these access keys in other scripts by referencing the $AWS_ACCESS_KEY and $AWS_SECRET_ACCESS_KEY environment variables.

How is this different from typical configuration management?

Typically in many organizations, there’s a big property with hard coded key/value pairs that gets passed into the pipeline. The pipeline executes using the given parameters and cannot scale or change without a user modifying the property file. They are unable to scale or adapt because all of the properties are hard coded, if the property file hard codes the IP to an EC2 instance and it goes down for whatever reason, their pipeline doesn’t work until someone fixes the property file. There are more effective ways of doing this when using the cloud. The cloud is provides on-demand resources that will constantly be changing. These resources will have different IP addresses, domain names, etc associated with them every time.

With dynamic configuration, there are no property files, every property is generated as part of the pipeline.

With this dynamic approach, the pipeline values change with every run. As new cloud resources are allocated, the pipeline is able to adjust itself and automatically without the need for users to constantly modify property files. This leads to less time spent debugging those cumbersome property file management issues that plague most companies.

In the next part of our series – which is all about Deployment Automation – we’ll go through scripting and testing your deployment using industry-standard tools. In this next article, you’ll see how to orchestrate deployment sequences and configuration using Capistrano.

09-18-2012

Continuous Delivery in the Cloud: CD Pipeline (Part 2 of 6)

In part 1 of this series, I introduced the Continuous Delivery (CD) pipeline for the Manatee Tracking application and how we use this pipeline to deliver software from checkin to production. In this article I will take an in-depth look at the CD pipeline. A list of topics for each of the articles is summarized below.

Part 1: Introduction – Introduction to continuous delivery in the cloud and the rest of the articles;
Part 2: CD Pipeline – What you’re reading now;
Part 3: CloudFormation – Scripted virtual resource provisioning;
Part 4: Dynamic Configuration – “Property file less” infrastructure;
Part 5: Deployment Automation – Scripted deployment orchestration;
Part 6: Infrastructure Automation – Scripted environment provisioning (Infrastructure Automation)

The CD pipeline consists of five Jenkins jobs. These jobs are configured to run one after the other. If any one of the jobs fail, the pipeline fails and that release candidate cannot be released to production. The five Jenkins jobs are listed below (further details of these jobs are provided later in the article).

  1. 1) A job that set the variables used throughout the pipeline (SetupVariables)
  2. 2) Build job (Build)
  3. 3) Production database update job (StoreLatestProductionData)
  4. 4) Target environment creation job (CreateTargetEnvironment)
  5. 5) A deployment job (DeployManateeApplication) which enables a one-click deployment into production.

We used Jenkins plugins to add additional features to the core Jenkins configuration. You can extend the standard Jenkins setup by using Jenkins plugins. A list of the plugins we use for the Sea to Shore Alliance Continuous Delivery configuration are listed below.

Grails: http://updates.jenkins-ci.org/download/plugins/grails/1.5/grails.hpi
Groovy: http://updates.jenkins-ci.org/download/plugins/groovy/1.12/groovy.hpi
Subversion: http://updates.jenkins-ci.org/download/plugins/subversion/1.40/subversion.hpi
Paramterized Trigger: http://updates.jenkins-ci.org/download/plugins/parameterized-trigger/2.15/parameterized-trigger.hpi
Copy Artifact: http://updates.jenkins-ci.org/download/plugins/copyartifact/1.21/copyartifact.hpi
Build Pipeline: http://updates.jenkins-ci.org/download/plugins/build-pipeline-plugin/1.2.3/build-pipeline-plugin.hpi
Ant: http://updates.jenkins-ci.org/download/plugins/ant/1.1/ant.hpi
S3: http://updates.jenkins-ci.org/download/plugins/s3/0.2.0/s3.hpi

The parameterized trigger, build pipeline and S3 plugins are used for moving the application through the pipeline jobs. The Ant, Groovy, and Grails plugins are used for running the build for the application code. Subversion for polling and checking out from version control.

Below, I describe each of the jobs that make up the CD pipeline in greater detail.

SetupVariables: Jenkins job used for entering in necessary property values which are propagated along the rest of the pipeline.

Parameter: STACK_NAME
Type: String
Where: Used in both CreateTargetEnvironment and DeployManateeApplication jobs
Purpose: Defines the CloudFormation Stack name and SimpleDB property domain associated with the CloudFormation stack.

Parameter: HOST
Type: String
Where: Used in both CreateTargetEnvironment and DeployManateeApplication jobs
Purpose: Defines the CNAME of the domain created in the CreateTargetEnvironment job. The DeployManateeApplication job uses it when it dynamically creates configuration files. For instance, in test.oneclickdeployment.com, test would be the HOST

Parameter: PRODUCTION_IP
Type: String
Where: Used in the StoreProductionData job
Purpose: Sets the production IP for the job so that it can SSH into the existing production environment and run a database script that exports the data and uploads it to S3.

Parameter: deployToProduction
Type: Boolean
Where: Used in both CreateTargetEnvironment and DeployManateeApplication jobs
Purpose: Determines whether to use the development or production SSH keypair.

In order for the parameters to propagate through the pipeline, we pass the current build parameters using the parametrized build trigger plugin

Build: Compiles the Manatee application’s Grails source code and creates a WAR file.

To do this, we utilize a Jenkins grails plugin and run grails targets such as compile and prod war. Next, we archive the grails migrations for use in the DeployManateeApplication job and then the job pushes the Manatee WAR up to S3 which is used as an artifact repository.

Lastly, using the trigger parametrized build plugin, we trigger the StoreProductionData job with the current build parameters.

StoreProductionData: This job performs a pg dump (PostgreSQL dump) of the production database and then stores it up in S3 for the environment creation job to use when building up the environment. Below is a snippet from this job.

ssh -i /usr/share/tomcat6/development.pem -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ec2-user@${PRODUCTION_IP} ruby /home/ec2-user/database_update.rb

On the target environments created using the CD pipeline, a database script is stored. The script goes into the PostgreSQL database and runs a pg_dump. It then pushes the pg_dump SQL file to S3 to be used when creating the target environment.

After the SQL file is stored successfully, the CreateTargetEnvironment job is triggered.

CreateTargetEnvironment: Creates a new target environment using a CloudFormation template to create all the AWS resources and calls puppet to provision the environment itself from a base operating system to a fully working target environment ready for deployment. Below is a snippet from this job.

if [ $deployToProduction ]
then
SSH_KEY=development
else
SSH_KEY=production
fi

# Create Cloudformaton Stack
ruby ${WORKSPACE}/config/aws/create_stack.rb ${STACK_NAME} ${WORKSPACE}/infrastructure/manatees/production.template ${HOST} ${JENKINSIP} ${SSH_KEY} ${SGID} ${SNS_TOPIC}

# Load SimpleDB Domain with Key/Value Pairs
ruby ${WORKSPACE}/config/aws/load_domain.rb ${STACK_NAME}

# Pull and store variables from SimpleDB
host=`ruby ${WORKSPACE}/config/aws/showback_domain.rb ${STACK_NAME} InstanceIPAddress`

# Run Acceptance Tests
cucumber ${WORKSPACE}/infrastructure/manatees/features/production.feature host=${host} user=ec2-user key=/usr/share/tomcat6/.ssh/id_rsa

# Publish notifications to SNS
sns-publish --topic-arn $SNS_TOPIC --subject "New Environment Ready" --message "Your new environment is ready. IP Address: $host. An example command to ssh into the box would be: ssh -i development.pem ec2-user@$host This instance was created by $JENKINS_DOMAIN" --aws-credential-file /usr/share/tomcat6/aws_access

Once the environment is created, a set of Cucumber tests is run to ensure it’s in the correct working state. If any test fails, the entire pipeline fails and the developer is notified something went wrong. Otherwise if it passes, the DeployManateeApplication job is kicked off and an AWS SNS email notification with information to access the new instance is sent to the developer.

DeployManateeApplication: Runs a Capistrano script which uses steps in order to coordinate the deployment. A snippet from this job is displayed below.

if [ !$deployToProduction ]
then
SSH_KEY=/usr/share/tomcat6/development.pem
else
SSH_KEY=/usr/share/tomcat6/production.pem
fi

#/usr/share/tomcat6/.ssh/id_rsa

cap deploy:setup stack=${STACK_NAME} key=${SSH_KEY}

sed -i "s@manatee0@${HOST}@" ${WORKSPACE}/deployment/features/deployment.feature

host=`ruby ${WORKSPACE}/config/aws/showback_domain.rb ${STACK_NAME} InstanceIPAddress`
cucumber deployment/features/deployment.feature host=${host} user=ec2-user key=${SSH_KEY} artifact=

sns-publish --topic-arn $SNS_TOPIC --subject "Manatee Application Deployed" --message "Your Manatee Application has been deployed successfully. You can view it by going to http://$host/wildtracks This instance was deployed to by $JENKINS_DOMAIN" --aws-credential-file /usr/share/tomcat6/aws_access

This deployment job is the final piece of the delivery pipeline, it pulls together all of the pieces created in the previous jobs to successfully deliver working software.

During the deployment, the Capistrano script SSH’s into the target server, deploys the new war and updated configuration changes and restarts all services. Then the Cucumber tests are run to ensure the application is available and running successfully. Assuming the tests pass, an AWS SNS email gets dispatched to the developer with information on how to access their new development application

We use Jenkins as the orchestrator of the pipeline. Jenkins executes a set of scripts and passes around parameters as it runs each job. Because of the role Jenkins plays, we want to make sure it’s treated the same way as application – meaning versioning and testing all of our changes to the system. For example, if a developer modifies the create environment job configuration, we want to have the ability to revert back if necessary. Due to this requirement we version the Jenkins configuration. The jobs, plugins and main configuration. To do this, a script is executed each hour using cron.hourly that checks for new jobs or updated configuration and commits them up to version control.

The CD pipeline that we have built for the Manatee application enables any change in the application, infrastructure, database or configuration to move through to production seamlessly using automation. This allows any new features, security fixes, etc. to be fully tested as it gets delivered to production at the click of a button.

In the next part of our series – which is all about using CloudFormation – we’ll go through a CloudFormation template used to automate the creation of a Jenkins environment. In this next article, you’ll see how CloudFormation procures AWS resources and provisions our Jenkins CD Pipeline environment.

Continuous Delivery in the Cloud Case Study for the Sea to Shore Alliance – Introduction (part 1 of 6)

We help companies deliver software reliably and repeatedly using Continuous Delivery in the Cloud. With Continuous Delivery (CD), teams can deliver new versions of software to production by flattening the software delivery process and decreasing the cycle time between an idea and usable software through the automation of the entire delivery system: build, deployment, test, and release. CD is enabled through a delivery pipeline. With CD, our customers can choose when and how often to release to production. On top of this, we utilize the cloud so that customers can scale their infrastructure up and down and deliver software to users on demand.

Stelligent offers a solution called Elastic Operations which provides a Continuous Delivery platform along with expert engineering support and monitoring of a delivery pipeline that builds, tests, provisions and deploys software to target environments – as often as our customers choose. We’re in the process of open sourcing the platform utilized by Elastic Operations.

In this six-part blog series, I am going to go over how we built out a Continuous Delivery solution for the Sea to Shore Alliance:

Part 1: Introduction – What you’re reading now;
Part 2: CD Pipeline – Automated pipeline to build, test, deploy, and release software continuously;
Part 3: CloudFormation – Scripted virtual resource provisioning;
Part 4: Dynamic Configuration – “Property file less” infrastructure;
Part 5: Deployment Automation – Scripted deployment orchestration;
Part 6: Infrastructure Automation – Scripted environment provisioning (Infrastructure Automation)

This year, we delivered this Continuous Delivery in the Cloud solution to the Sea to Shore Alliance. The Sea to Shore Alliance is a non-profit organization whose mission is to protect and conserve the world’s fragile coastal ecosystems and its endangered species such as manatees, sea turtles, and right whales. One of their first software systems tracks and monitors manatees. Prior to Stelligent‘s involvement, the application was running on a single instance that was manually provisioned and deployed. As a result of the manual processes, there were no automated tests for the infrastructure or deployment. This made it impossible to reproduce environments or deployments the same way every time. Moreover, the knowledge to recreate these environments, builds and deployments were locked in the heads of a few key individuals. The production application for tracking these Manatees, developed by Sarvatix, is located here.

In this case study, I describe how we went from an untested manual process in which the development team was manually building software artifacts, creating environments and deploying, to a completely automated delivery pipeline that is triggered with every change.

Figure 1 illustrates the AWS architecture of the infrastructure that we designed for this Continuous Delivery solution.

There are two CloudFormation stacks being used, the Jenkins stack – or Jenkins environment – as shown on the left and the Manatee stack – or Target environment – as shown on the right.

The Jenkins Stack

  1. * Creates the jenkins.example.com Route53 Hosted Zone
  2. * Creates an EC2 instance with Tomcat and Jenkins installed and configured on it.
  3. * Runs the CD Pipeline

The Manatee stack is slightly different, it utilizes the configuration provided by SimpleDB to create itself. This stack defines the target environment for which the application software is deployed.

The Manatee Stack

  1. * Creates the manatee.example.com Route53 Hosted Zone
  2. * Creates an EC2 instance with Tomcat, Apache, PostgreSQL installed on it.
  3. * Runs the Manatee application.

The Manatee stack is configured with CPU alarms that send an email notification to the developers/administrators when it becomes over-utilized. We’re in the process of scaling to additional instances when these types of alarms are triggered.

Both instances are encapsulated behind a security group so that they can talk between each other using the internal AWS network.

Fast Facts
Industry: Non-Profit
Profile: Customer tracks and monitors endangered species such as manatees.
Key Business Issues: The customer’s development team needed to have unencumbered access to resources along with automated environment creation and deployment.
Stakeholders: Development team and scientists and others from the Sea to Shore Alliance
Solution: Continuous Delivery in the Cloud (Elastic Operations)
Key Tools/Technologies: AWS – Amazon Web Services (CloudFormation, EC2, S3, SimpleDB, IAM, CloudWatch, SNS), Jenkins, Capistrano, Puppet, Subversion, Cucumber, Liquibase

The Business Problem
The customer needed an operations team that could be scaled up or down depending on the application need. The customer’s main requirements were to have unencumbered access to resources such as virtual hardware. Specifically, they wanted to have the ability to create a target environment and run an automated deployment to it without going to a separate team and submitting tickets, emails, etc. In addition to being able to create environments, the customer wanted to have more control over the resources being used; they wanted to have the ability to terminate resources if they were unused. To address these requirements we introduced an entirely automated solution which utilizes the AWS cloud for providing resources on-demand, along with other solutions for providing testing, environment provisioning and deployment.

On the Manatee project, we have five key objectives for the delivery infrastructure. The development team should be able to:

  1. * Deliver new software or updates to users on demand
  2. * Reprovision target environment configuration on demand
  3. * Provision environments on demand
  4. * Remove configuration bottlenecks
  5. * Ability for users to terminate instances

Our Team
Stelligent’s team consisted of an account manager and one polyskilled DevOps Engineer that built, managed, and supported the Continuous Delivery pipeline.

Our Solution
Our solution, a single delivery pipeline that gives our customer (developers, testers, etc.) unencumbered access to resources and a single click automated deployment to production. To enable this, the pipeline needed to include:

  1. * The ability for any authorized team member to create a new target environment using a single click
  2. * Automated deployment to the target environment
  3. * End-to-end testing
  4. * The ability to terminate unnecessary environments
  5. * Automated deployment into production with a single click

The delivery pipeline improves efficiency and reduces costs by not limiting the development team. The solution includes:

  • On-Demand Provisioning – All hardware is provided via EC2’s virtual instances in the cloud, on demand. As part of the CD pipeline, any authorized team member can use the Jenkins CreateTargetEnvironment job to order target environments for development work.
  • Continuous Delivery Solution so that the team can deliver software to users on demand:
  • Development Infrastructure – Consists of:
    • Tomcat: used for hosting the Manatee Application
    • Apache: Hosted the front-end website and used virtual hosts for proxying and redirection.
    • PostgreSQL: Database for the Manatee application
    • Groovy: the application is written in Grails which uses Groovy.
  • Instance Management – Any authorized team member is able to monitor virtual instance usage by viewing Jenkins. There is a policy that test instances are automatically terminated every two days. This promotes ephemeral environments and test automation.
  • Deployment to Production – There’s a boolean value (i.e. a checkbox the user selects) in the delivery pipeline used for deciding whether to deploy to production.
  • System Monitoring and Disaster Recovery – Using the AWS CloudWatch service, AWS provides us with detailed monitoring to notify us of instance errors or anomalies through statistics such as CPU utilization, Network IO, Disk utilization, etc. Using these solutions we’ve implemented an automated disaster recovery solution.


A list of the AWS tools we utilized are enumerated below.

Tool: AWS EC2
What is it? Cloud-based virtual hardware instances
Our Use: We use EC2 for all of our virtual hardware needs. All instances, from development to production are run on EC2

Tool: AWS S3
What is it? Cloud-based storage
Our Use: We use S3 as both a binary repository and a place to store successful build artifacts.

Tool:  AWS IAM
What is it? User-based access to AWS resources
Our Use: We create users dynamically and use their AWS access and secret access keys so we don’t have to store credentials as properties

Tool: AWS CloudWatch
What is it? System monitoring
Our Use: Monitors all instances in production. If an instance takes an abnormal amount of strain or shuts down unexpectedly, SNS sends an email to designated parties

Tool: AWS SNS
What is it? Email notifications
Our Use: When an environment is created or a deployment is run, SNS is used to send notifications to affected parties.

Tool: Cucumber
What is it? Acceptance testing
Our Use: Cucumber is used for testing at almost every step of the way. We use Cucumber to test infrastructure, deployments and application code to ensure correct functionality. Cucumber’s unique english-ess  verbiage allows both technical personnel and customers to communicate using an executable test.

Tool: Liquibase
What is it? Automated database change management
Our Use: Liquibase is used for all database changesets. When a change is necessary within the database, it is made to a liquibase changelog.xml

Tool: AWS CloudFormation
What is it? Templating language for orchestrating all AWS resources
Our Use: CloudFormation is used for creating a fully working Jenkins environment and Target environment. For instance for the Jenkins environment it creates the EC2 instance with CloudWatch monitoring alarms, associated IAM user, SNS notification topic, everything required for Jenkins to build. This along with Jenkins are the major pieces of the infrastructure.

Tool: AWS SimpleDB
What is it? Cloud-based NoSQL database
Our Use: SimpleDB is used for storing dynamic property configuration and passing properties through the CD Pipeline. As part of the environment creation process, we store multiple values such as IP addresses that we need when deploying the application to the created environment.

Tool: Jenkins
What is it? We’re using Jenkins to implement a CD pipeline using the Build Pipeline plugin.
Our Use: Jenkins runs the CD pipeline which does the building, testing, environment creation and deploying. Since the CD pipeline is also code (i.e. configuration code), we version our Jenkins configuration.

Tool: Capistrano
What is it? Deployment automation
Our Use: Capistrano orchestrates and automates deployments. Capistrano is a Ruby-based deployment DSL that can be used to deploy to multiple platforms including Java, Ruby and PHP. It is called as part of the CD pipeline and deploys to the target environment.

Tool: Puppet
What is it? Infrastructure automation
Our Use: Puppet takes care of the environment provisioning. CloudFormation requests the environment and then calls Puppet to do the dynamic configuration. We configured Puppet to install, configure, and manage the packages, files and services.

Tool: Subversion
What is it? Version control system
Our Use: Subversion is the version control repository where every piece of the Manatee infrastructure is stored. This includes the environment scripts such as the Puppet modules, the CloudFormation templates, Capistrano deployment scripts, etc.

We applied the on-demand usability of the cloud with a proven continuous delivery approach to build an automated one click method for building and deploying software into scripted production environments.

In the blog series, I will describe the technical implementation of how we went about building this infrastructure into a complete solution for continuously delivering software. This series will consist of the following:

Part 2 of 6 – CD Pipeline: I will go through the technical implementation of the CD pipeline using Jenkins. I will also cover Jenkins versioning, pulling and pushing artifacts from S3, and Continuous Integration.

Part 3 of 6 – CloudFormation: I will go through a CloudFormation template we’re using to orchestrate the creation of AWS resources and to build the Jenkins and target infrastructure.

Part 4 of 6 – Dynamic Configuration: Will cover dynamic property configuration using SimpleDB

Part 5 of 6 – Deployment Automation: I will explain Capistrano in detail along how we used Capistrano to deploy build artifacts and run Liquibase database changesets against target environments

Part 6 of 6 – Infrastructure Automation: I will describe the features of Puppet in detail along with how we’re using Puppet to build and configure target environments – for which the software is deployed.

07-30-2012

Continuous Delivery in the Cloud Case Study

A Case Study on using 100% Cloud-based Resources with Automated Software Delivery

We help – typically large – organizations create one-click software delivery systems so that they can deliver software in a more rapid, reliable and repeatable manner (AKA Continuous Delivery). The only way this works is when Development works with Operations. As has been written elsewhere in this series, this means changing the hearts and minds of people because most organizations are used to working in ‘siloed’ environments. In this entry, I focus on implementation, by describing a real-world case study in which we have brought Continuous Delivery Operations to the Cloud consisting of a team of Systems and Software Engineers.  

For years, we’ve helped customers in Continuous Integration and Testing so more of our work was with Developers and Testers. Several years ago, we hired a Sys Admin/Engineer/DBA who was passionate about automation. As a result of this, we began assembling multiple two-person “DevOps” teams consisting of a Software Engineer and a Systems Engineer both of whom being big-picture thinkers and not just “Developers” or “Sys Admins”. These days, we put together these targeted teams of Continuous Delivery and Cloud experts with hands-on experience as Software Engineers and Systems Engineers so that organizations can deliver software as quickly and as often as the business requires.

A couple of years ago we already had a few people in the company who were experimenting with using Cloud infrastructures so we thought this would be a great opportunity in providing cloud-based delivery solutions. In this case study, I cover a project we are currently working on for a large organization. It is a new Java-based web services project so we’ve been able to implement solutions using our recommended software delivery patterns rather than being constrained by legacy tools or decisions. However, as I note, we aren’t without constraints on this project. If I were you, I’d call “BS!” on any “case study” in which everything went flawlessly and assume it was an extremely small or a theoretical project in the author’s mind. This is the real deal. Enough said, on to the case study.      

AWS Tools

Fast Facts

Industry: Healthcare, Public Sector
Profile: The customer is making available to all, free of charge, a series of software specifications and open source software modules that together make up an oncology-extended Electronic Health Record capability.
Key Business Issues: The customer was seeking that all team members are provided “unencumbered” access to infrastructure resources without the usual “request and wait” queued-based procedures present in most organizations
Stakeholders: Over 100 people consisting of Developers, Testers, Analysts, Architects, and Project Management.
Solution: Continuous Delivery Operations in the Cloud
Key Tools/Technologies: Amazon Web Services  - AWS (Elastic Computer Cloud (EC2), (Simple Storage Service (S3), Elastic Block Storage (EBS), etc.), Jenkins, JIRA Studio, Ant, Ivy, Tomcat and PostgreSQL

The Business Problem
The customer was used to dealing with long drawn-out processes with Operations teams that lacked agility. They were accustomed to submitting Word documents via email to an Operations teams, attending multiple meetings and getting their environments setup weeks or months later. We were compelled to develop a solution that reduced or eliminated these problems that are all too common in many large organizations (Note: each problem is identified as a letter and number, for example: P1, and referred to later):


  1. Unable to deliver software to users on demand (P1)
  2. Queued requests for provisioned instances (P2)
  3. Unable to reprovision precise target environment configuration on demand (P3)
  4. Unable to provision instances on demand (P4)
  5. Configuration errors in target environments presenting deployment bottlenecks while Operations and Development teams troubleshoot errors (P5)
  6. Underutilized instances (P6)
  7. No visibility into purpose of instance (P7)
  8. No visibility into the costs of instance (P8)
  9. Users cannot terminate instances (P9)
  10. Increased Systems Operations personnel costs (P10)


Our Team
We put together a four-person team to create a solution for delivering software and managing the internal Systems Operations for this 100+ person project. We also hired a part-time Security expert. The team consists of two Systems Engineers and two Software Engineers focused on Continuous Delivery and the Cloud. One of the Software Engineers is the Solutions Architect/PM for our team.

Our Solution
We began with the end in mind based on the customer’s desire for unencumbered access to resources. To us, “unencumbered” did not mean without controls; it meant providing automated services over queued “request and wait for the Ops guy to fulfill the request” processes. Our approach is that every resource is in the cloud: Software as a Service (SaaS), Platform as a Service (PaaS) or Infrastructure as a Service (IaaS) to reduce operations costs (P10) and increase efficiency. In doing this, effectively all project resources are available on demand in the cloud. We have also automated the software delivery process to Development and Test environments and working on the process of one-click delivery to production. I’ve identified the problem we’re solving – from above – in parentheses (P1, P8, etc.). The solution includes:

  • On-Demand Provisioning – All hardware is provided via EC2’s virtual instances in the cloud, on demand (P2). We’ve developed a “Provisioner” (PaaS) that provides any authorized team member the capability to click a button and get their project-specific target environment (P3) in the AWS’ cloud – thus, providing unencumbered access to hardware resources. (P4) The Provisioner provides all authorized team members the capability to monitor instance usage (P6) and adjust accordingly. Users can terminate their own virtual instances (P9).
  • Continuous Delivery Solution so that the team can deliver software to users on demand (P1):
    • Automated build script using Ant – used to drive most of the other automation tools
    • Dependency Management using Ivy. We will be adding Sonatype Nexus
    • Database Integration/Change using Ant and Liquibase
    • Automated Static Analysis using Sonar (with CheckStyle, FindBugs, JDepend, and Cobertura)
    • Test framework hooks for running JUnit, etc.
    • Reusing remote Deployment custom Ant scripts that use Java Secure Channel and Web container configuration. However, we will be starting a process of using a more robust tool such as ControlTier to perform deployment
    • Automated document generation using Grand, SchemaSpy (ERDs) and UMLGraph
    • Continuous Integration server using Hudson
    • Continuous Delivery pipeline system – we are customizing Hudson to emulate a Deployment Pipeline
  • Issue Tracking – We’re using the JIRA Studio SaaS product from Atlassian (P10), which provides issue tracking, version-control repository, online code review and a Wiki. We also manage the relationship with the vendor and perform the user administration including workflow management and reporting.
  • Development Infrastructure - There were numerous tools selected by the customer for Requirements Management and Test Management and Execution including HP QC, LoadRunner, SoapUI, Jama Contour. Many of these tools were installed and managed by our team onto the EC2 instances
  • Instance Management - Any authorized team member is able to monitor virtual instance usage by viewing a web-based dashboard (P6, P7, P8) we developed. This helps to determine instances that should no longer be in use or may be eating up too much money. There is a policy that test instances (e.g. Sprint Testing) are terminated no less than every two weeks. This promotes ephemeral environments and test automation.
  • Deployment to Production – Much of the pre-production infrastructure is in place, but we will be adding some additional automation features to make it available to users in production (P1). The deployment sites are unique in that we aren’t hosting a single instance used by all users and it’s likely the software will be installed at each site. One plan is to deploy separate instances to the cloud or to virtual instances that are shipped to the user centers

    System Monitoring and Disaster Recovery – Using CloudKick to notify us of instance errors or anomalies. EC2 provides us with some monitoring as well. We will be implementing a more robust monitoring solution using Nagios or something similar in the coming months. Through automation and supporting process, we’ve implemented a disaster recovery solution.

Benefits
The benefits are primarily around removing the common bottlenecks from processes so that software can be delivered to users and team members more often. Also, we think our approach to providing on-demand services over queued-based requests increases agility and significantly reduces costs. Here are some of the benefits:

  • Deliver software more often – to users and internally (testers, managers, demos)
  • Deliver software more quickly – since the software delivery process is automated, we identify the SVN tag and click a button to deliver the software to any environment
  • Software delivery is rapid, reliable and repeatable. All resources can be reproduced with a single click – source code, configuration, environment configuration, database and network configuration is all checked in and versioned and part of a single delivery system.
  • Increased visibility to environments and other resources – All preconfigured virtual hardware instances are available for any project member to provision without needing to submit forms or attend countless meetings

Tools
Here are some of the tools we are using to deliver this solution. Some of the tools were chosen by our team exclusively and some by other stakeholders on the project.

  • AWS EC2 - Cloud-based virtual hardware instances
  • AWS S3 – Cloud-based storage. We use S3 to store temporary software binaries and backups
  • AWS EBS – Elastic Block Storage. We use EBS to attach PostgreSQL data volumes
  • Ant – Build Automation
  • CloudKick – Real-time Cloud instance monitoring
  • ControlTier – Deployment Automation. Not implemented yet.
  • HP LoadRunner – Load Testing
  • HP Quality Center (QC) – Test Management and Orchestration
  • Ivy – Dependency Management
  • Jama Contor - Requirements Management
  • Jenkins – Continuous Integration Server
  • JIRA Studio - Issue Tracking, Code Review, Version-Control, Wiki
  • JUnit – Unit and Component Testing
  • Liquibase – Automated database change management
  • Nagios – or Zenoss. Not implemented yet
  • Nexus – Dependency Management Repository Manager (not implemented yet)
  • PostgreSQL – Database used by Development team. We’ve written script that automate database change management
  • Provisioner (Custom Web-based) – Target Environment Provisioning and Virtual Instance Monitoring
  • Puppet – Systems Configuration Management
  • QTP – Test Automation
  • SoapUI – Web Services Test Automation
  • Sonar – code quality analysis (Includes CheckStyle, PMD, Cobertura, etc.)
  • Tomcat/JBoss – Web container used by Development. We’ve written script to automate the deployment and container configuration

Solutions we’re in the process of Implementing
We’re less than a year into the project and have much more work to do. Here are a few projects we’re in the process or will be starting to implement soon:

  • System Configuration Management – We’ve started using Puppet, but we are expanding how it’s being used in the future
  • Deployment Automation – The move to a more robust Deployment automation tool such as ControlTier
  • Development Infrastructure Automation – Automating the provisioning and configuration of tools such as HP QC in a cloud environment. etc.

What we would do Differently
Typically, if we were start a Java-based project and recommend tools around testing, we might choose the following tools for testing, requirements and test management based on the particular need:

  • Selenium with SauceLabs
  • JIRA Studio for Test Management
  • JIRA Studio for Requirements Management
  • JMeter – or other open source tool – for Load Testing

However, like most projects there are many stakeholders who have their preferred approach and tools they are familiar in using, the same way our team does. Overall, we are pleased with how things are going so far and the customer is happy with the infrastructure and approach that is in place at this time. I could probably do another case study on dealing with multiple SaaS vendors, but I will leave that for another post.

Summary
There’s much more I could have written about what we’re doing, but I hope this gives you a decent perspective of how we’ve implemented a DevOps philosophy with Continuous Delivery and the Cloud and how this has led our customer to more a service-based, unencumbered and agile environment. 

07-25-2012

DevOps in the Cloud LiveLessons (Video)

DevOps in the Cloud LiveLessons walks viewers through the process of putting together a complete continuous delivery platform for a working software application written in Ruby on Rails along with examples in other development platforms such as Grails and Java on the companion website. These applications are deployed to Amazon Web Services (AWS), which is an infrastructure as a service, commonly referred to as “the cloud”. Paul M. Duvall goes through the pieces that make up this platform including infrastructure and environments, continuous integration, build and deployment scripting, testing and database. Also, viewers will learn configuration management and collaboration practices and techniques along with what those nascent terms known as DevOps, continuous delivery and continuous deployment are all about. Finally, since this LiveLesson focuses on deploying to the cloud, viewers will learn the ins and outs of many of the services that make up the AWS cloud infrastructure. DevOps in the Cloud LiveLessons includes contributions by Brian Jakovich, who is a Continuous Delivery Engineer at Stelligent.

DevOps in the Cloud LiveLessons

Visit www.devopscloud.com to download the complete continuous delivery platform examples that are used in these LiveLessons.

Lesson 1:
Deploying a Working Software Application to the Cloud provides a high-level introduction to all parts of the Continuous Delivery system in the Cloud. In this lesson, you’ll be introduced to deploying a working software application into the cloud.

Lesson 2:
DevOps, Continuous Delivery, Continuous Deployment and the Cloud covers how to define motivations and differentiators around Continuous Delivery, DevOps, Continuous Deployment and the Cloud. The lesson also covers diagramming software delivery using spaghetti diagrams and value-stream maps.

Lesson 3:
Amazon Web Services covers the basics of the leading Infrastructure as a Service provider. You’ll learn how to use the AWS Management Console, launch and interact with Elastic Compute Cloud (EC2) instances, define security groups to control access to EC2 instances, set up an elastic load balancer to distribute load across EC2 instances, set up Auto Scaling, and monitor resource usage with CloudWatch.

Lesson 4:
Continuous Integration shows how to set up a Continuous Integration (CI) environment, which is the first step to Continuous Integration, Continuous Delivery and Continuous Deployment.

Lesson 5:
Infrastructure Automation covers how to fully script an infrastructure so that you can recreate any environment at any time utilizing AWS CloudFormation and the infrastructure automation tool, Puppet. You’ll also learn about the “Chaos Monkey” tool made popular by NetFlix – a tool that randomly and automatically terminates instances.

Lesson 6:
Building and Deploying Software teaches the basics of building and deploying a software application. You’ll learn how to create and run a scripted build, create and run a scripted deployment in Capistrano, manage dependent libraries using the Bundler and an Amazon S3-backed repository, and deploy the software to various target environments including production using the Jenkins CI server. You will also learn how anyone on the team can use Jenkins to perform self-service deployments on demand.

Lesson 7:
Configuration Management covers the best approaches to versioning everything in a way where you have a single source of truth and can look at the software system and everything it takes to create the software as a holistic unit. You’ll learn how to work from the canonical version, version configurations, set up a dynamic configuration management database that reduces the repetition, and develop collective ownership of all artifacts.

Lesson 8:
Database, covers how to entirely script a database, upgrade and downgrade a database, use a database sandbox to isolate changes from other developers and finally, to version all database changes so that they can run as part of a Continuous Delivery system.

Lesson 9:
Testing, covers the basics of writing and running various tests as part of a Continuous Integration process. You’ll learn how to write simple unit tests that will run fast tests at the code level, infrastructure tests, and deployment tests – sometimes called smoke tests. You will also learn how to get feedback on the test results from the CI system.

Lesson 10:
Delivery Pipeline demonstrates how to use the Build Pipeline plug-in in Jenkins to create a delivery pipeline for the commit, acceptance, load & performance and Production stages so that software can potentially be delivered to users with every change.

07-22-2012

Continuous Delivery with Jenkins, CloudFormation and Puppet Online Course

This 3-hour online course is ideal for Developers or Sys Admins who want to know how to script a complete Continuous Delivery platform in the Cloud using Jenkins, AWS and Puppet. By the end of the course, you will have a working Continuous Integration system on AWS with all of the source code. The course is led by award-winning author of Continuous Integration and Stelligent CTO, Paul Duvall. The next course is Tuesday August 14, 2012 at 1PM EDT.

Prerequisites:
Attendees should be at least a junior-level Developer or Linux Sys Admin

In the course, you will:

  1. Use AWS CloudFormation to script the provisioning of AWS resources such as EC2, Route 53 and S3
  2. Script the entire Jenkins and target environment infrastructure using Puppet
  3. Create a fully-functioning Continuous Delivery platform using Jenkins and AWS
  4. Participate in discussions on Continuous Delivery patterns and best practices

You will also be able to email the trainer after the course with questions on setting up your CI platform in AWS.

Register now

There will be 15+ separate exercises that you will go through in creating a working Jenkins system in AWS. If you do not have an AWS account, you will need sign up by registering with Amazon Web Services as a part of the course pre-work (a credit card to pay AWS is required). The course material will be provided to students prior to the session. You will spend approximately $10 on AWS fees during the 3-hour training session.

You will be provided access to all of the source code and instructional material prior to, during and after the course If you have any questions, contact us at training@elasticoperations.com. A website with some of the example content for the course is located at http://onlinecd.s3-website-us-east-1.amazonaws.com/

05-02-2012

Automating Infrastructures: Testing, Scripting, Versioning, Continuous

While we often employ a similar process when automating other parts of a software delivery system (build, deployment, database, etc.), in this article, I’m focusing on automating infrastructures. Typically, we go through a 5-step process toward automating environments that will be used as part of a Continuous Delivery pipeline. They are: document, test, script, version and continuous. They are described in more detail below.

  1. Document – We document the process required to manually install all operating system and software components on each OS. The key difference with this kind of documentation is that you’re documenting in such a way that it can be automated later on. You might use a wiki or similar tool to document the environment creation process. Before moving on, be sure you’re able to manually create the environment using the documented instructions – at least once. Some example documentation written in a Confluence wiki is shown below.
  2. Wiki page

  3. Test – We apply a rigorous acceptance test-driven approach in which automated tests are written along with the scripts that automate the creation of the infrastructure. Tools that we often use are Cucumber and Cucumber-Nagios and other tools that apply an acceptance test-driven approach to the infrastructure. An example using Cucumber is shown below.
  4. Feature: Scripted provisioning of target environment
    As a developer
    I would like a scripted installation of my target environment
    so that I can assume the environment will be the same everytime and my deployments will be predictable

    Background:
    Given I am sshed into the environment

    Scenario: Is Passenger installed?
    When I run "gem list"
    Then I should see "passenger"

    Scenario: Is the proper version of Ruby installed?
    When I run "/usr/bin/ruby -v"
    Then I should see "ruby 1.8.7"

  5. Script – We script the entire process using tools like Chef, Puppet and CloudFormation so that the process can be repeated and the knowledge required to install and configure the components is not isolated to a few key individuals. These scripts adhere to the behavior described in the automated test(s). A partial example using AWS CloudFormation is shown below.

  6. "Domain" : {
    "Type" : "AWS::Route53::RecordSetGroup",
    "Properties" : {
    "HostedZoneName" : { "Fn::Join" : [ "", [ {"Ref" : "HostedZone"}, "." ]]},
    "RecordSets" : [
    {
    "Name" : { "Fn::Join" : ["", [ { "Ref" : "ApplicationName" }, ".", { "Ref" : "HostedZone" }, "." ]]},
    "Type" : "A",
    "TTL" : "900",
    "ResourceRecords" : [ { "Ref" : "IPAddress" } ]
    }]
    }
    },

    "WebServer": {
    "Type": "AWS::EC2::Instance",
    "Metadata" : {
    "AWS::CloudFormation::Init" : {
    "config" : {
    "packages" : {
    "yum" : {
    "puppet" : []
    }
    },

  7. Version – Since the entire operating system and related components of the infrastructure are scripted, we ensure every infrastructure automation script and test is versioned in a version-control repository so that no change is ever lost. At this point, the documentation in step 1 is no longer used. Version-control systems typically used include Git, Subversion and Perforce – to name a few.

  8. git add .
    git commit -m "added route 53 hosted zone"
    git push

  9. Continuous – Finally, we ensure that the infrastructure can be recreated with every change that anyone applies to not just the infrastructure, but the entire software system. We use a Continuous Integration server to do this. The CI server polls the version-control system for changes; once it finds these changes, it creates the infrastructure and runs the infrastructure tests. If any failure occurs, it stops the creation of the environment and reports the error through the CI server, email and other feedback mechanisms. Typically, we use tools such as Jenkins and Anthill Pro to run a “continuous” process like this.

Jenkins

We use this approach when automating everything – from builds to deployments to creating the database and data. We’ve found that when companies are looking to automate their processes, they tend to only focus on scripting and not the rest. We find that an effectively Continuous Delivery system requires all five steps so that every change – application code, configuration, data and infrastructure – is part of a single path to production.

12-02-2011

NetFlix Chaos Monkey and Traditional vs. Cloud Operations Mindset

NetFlix has written a lot about how they are effectively using Amazon Web Services to operate their infrastructure. I've found their development and use of the Chaos Monkey (and has even proposed its vision of a "Simian Army") to be particularly interesting. The basic premise is that all systems fail eventually so the Chaos Monkey is an automated tool that intentionally disrupts the infrastructure on a regular basis by terminating instances and, in general, wreaking havoc. Their philosophy is that you should always look at your environments as "disposable" that will always fail…eventually. In practice, this is a new mindset, but it shouldn't be. It highlights the difference between a Tradtional Operations mindset and a Cloud Operations mindset. Chaos_monkey

We architect and operate Continuous Delivery systems in the Cloud and help companies migrate their operational infrastructure to the Cloud. We've found that what prevents teams from getting the most benefit from the cloud is a traditional operations mindset.

The traditional operations mindset posits that hardware environments are not ephemeral and are something to nuture and maintain for weeks, months or even years. An informed cloud mindset assumes that since anything and everything will fail – eventually – and that all environments are considered "disposable". I emphasize an informed Cloud mindset because many take the traditional Operations mindset when moving to the Cloud.  

Since we do a lot of work with AWS environments in the cloud, we've noticed some interesting antipatterns when working with traditional Development and Operations teams who aren't used to working in the cloud mindset. I've listed some of these antipatterns below.  

Environment Lease Time Policies

The traditional operations mindset believes that environment lease times are perpetual. You can spot this on a project that uses the cloud when development and QA lease times are continually extended. The cloud mindset treats all environments as ephemeral. Reasonable lease times on a cloud project could be as many as 14 days or as a few as a couple of hours. There are obviously steady-state run time environments in the cloud, but even these instances should be capable of moving the entire environment to other instances at a moment's notice. In AWS, tools like the Elastic Load Balancer, CloudWatch and Auto-scaling support failover architectures such as this.   

Centralized Control

The tradtional operations mindset is all built around control. This is because traditionally, it's the Ops team that's responsible for ensuring the applications are up and always running. Bottom line: their ass is on the line. This means that whenever you request a resource from an Ops teams, such as an virtual environment, database, etc. the request is put into a queue in which you must wait your turn based on the priority and request load of the Ops team. In a cloud operations mindset, control can be more dencentralized in terms of requesting a resource. This is coupled with fully versioned assets. The reason traditional operations teams typically fear decentralized control is because configuration assets are not managed or versioned. When these assets are managed and versioned, it's much easier to allow anyone to request any resource – particulaly in non-production environments because they can be easily re-provisioned or configured at any point. In cloud operations, resources requests can be asynchronous through use of fully automated configuration of environments and other resources.  

Lack of Configuration Management

In a traditional operations mindset, configuration is typically hidden on someone's machine, embedded within a tool managed by the Ops team or simply in the head of one or a few people on the Ops team. The reason for this is because the Ops team must control and secure the information – particularly in Staging and Production environments. However, the problem in this approach is that the information is locked away in a few people's heads and it presents a significant process bottleneck slowing down the entire software delivery process. 

In a cloud operations mindset, all configuration is managed in a database or configuration files accessible to any tool that interfaces with it on the software team. This doesn't mean that everyone has access to all configuration values in all environments (such as, say, Production), but it does mean that any team member can perform a self-service deployment without going through a separate Operations team.

Golden Images

Golden images are particular insidious because it can seem like you're doing the right thing, but you're not. Having a golden image is better than having nothing at all. A golden image is an antipattern that means that you have a snapshot of an instance/environment at a particular point in time. Some teams might even regularly snapshot their images, which is a good practice. However, the installation and configuration it took to create the image is lost. When employing the golden image antipattern, there's no way that anyone can recreate the environments in the exact same manner every single time. Moreover, the steps it might take is either locked in team member's heads or captured staticially at a particular point in time through documentation. Having documentation to manual configure the environment is definitely better than no documentation, but it signifantly reduces reliability and repeatability of environments. The cloud operations mindset says that all of the steps in creating environments are scripted and versioned in a version-control system. And, any engineer on the team should be capable of recreating these environments by typing a single command, clicking a button or it's headless through a Continuous Integration tool.  

This touches on only a few of the antipatterns that occur when applying a traditional Operations mindset to the Cloud. Teams won't realize the myriad benefits when moving to the Cloud until they change their mindset.

Start thinking like the Chaos Monkey and employing a Cloud Operations mindset!