Sunday, December 14, 2014

Configuring a Linux Swap Device with Cloud-Init

Cloud-Init is a set of Python scripts used to configure Linux instances when they boot in AWS. Cloud-Init is included on Ubuntu and Amazon Linux AMIs.

You can think of a Cloud Init script as a bare-bones Configuration Management solution like Chef or Puppet. A Cloud-Init script is passed as user data. If you have ever passed a shell script as user data, it was Cloud-Init that queried the meta-data service and executed the script. But, Cloud-Init offers a higher level syntax known as cloud-config.

One of the common examples is to mount devices when the instance boots. An obvious example is to create a swap volume from the ephemeral disks that you are likely not using. Here is a script that will create a swap volume on an m3.medium with its ephemeral volume attached at /dev/sdb. (Note: even though you select /dev/sdb in the console, Linux will see it /dev/xvdb)

#cloud-config
repo_update: true
repo_upgrade: all

mounts:
  - [ ephemeral0, none, swap, sw, 0, 0 ]

bootcmd:
 - mkswap /dev/xvdb
 - swapon /dev/xvdb

There are three parts to the script:

First, repo_update and repo_upgrade will update the repository and upgrade all packages respectively. Note that this only occurs once on the first boot.

Second, mounts will mount the ephemeral volume as a swap device with no mount point. This will both mount the volume and update fstab so it is mounted on future reboots.

Third, bootcmd will run a series of commands on each boot. In this case setup the swap area and then enable it.

Note that I am using bootcmd rather than runcmd. runcmd will only run on the first boot, while bootcmd will run every time Linux boots. If you used runcmd the swap device would be configured on the first boot, but if you start/stop the instance you would be assigned a new host and the swap device would never be configured.

Tuesday, September 30, 2014

CloudWatch Logs Push

In my last post I used the awslogs daemon to push tcpdump events to AWS CloudWatch logs.  At the time it felt silly to use a file on disk and a daemon to push events from an interactive session.  Well I had some time to dig and I found a much cleaner way to do it without the daemon.

It turns out that CloudWatch logs is implemented as a plugin to the AWS CLI.  The plugin can be configured to read from a file or you can simply pipe events directly yo it on the command line.

You need to register the plugin in your config file (~/.aws/config).  Mine looks like this.
[plugins]
cwlogs = cwlogs
[default]
region = us-east-1
aws_access_key_id = XXXXXXXXXX
aws_secret_access_key = YYYYYYYYYY

Now you can simply pipe data to "aws logs push."  You need to specify the group stream and date format as parameters.  And, of course, the group and stream must already exist in AWS.  For example:
sudo tcpdump -tttt port 80 | aws logs push --log-group-name NetworkTrace --log-stream-name i-125731f9 --datetime-format '%Y-%m-%d:%H:%M:%S.%f'

Friday, September 26, 2014

CloudWatch Logs and TCPDump

I was recently debugging an issue with a fleet of Apache web servers.  I needed to watch for some low level network events we felt might be causing an issue (TCP resets, etc.).  I thought CloudWatch Logs would be a cool, albeit unnecessary, solution.

NOTE: I found a much cleaner way to do this presented here.

The awslogs package/daemon can be configured to upload any log file.  Just add a new configuration block to /etc/awslogs/awslogs.conf.  For example, the configuration below says to upload the contents of /var/log/tcpdump to a stream identified with the servers instance id in a log group called NetworkTrace.  Note that the group and stream must be created on the AWS console first.
[/var/log/tcpdump]
file = /var/log/tcpdump
log_group_name = NetworkTrace
log_stream_name = {instance_id}
datetime_format =  %Y-%m-%d:%H:%M:%S.%f

With that done, you can start tcptrace and have it dump to a file.  But, by default, tcp trace does not include the full date and time in each record.  You need to include the -tttt option to so that awslogs can parse the date and time correctly.  The -tttt option will use the format 2014-09-24 15:20:29.522949.

Now simply start a background process to dump the trace to a file and you should start to see events in CloudWatch.  For example, this will capture everything with minimal detail.
sudo tcpdump -tttt >> /var/log/tcpdump &
If you want to capture more detail, you should filter it for only some events.  For example, the this will capture all traffic on port 80 including a hex dump of the data.

sudo tcpdump -tttt -nnvvXS tcp port 80 >> /var/log/tcpdump &

Sunday, August 10, 2014

Decoding Your AWS Bill (Part 3) Loading a Data Warehouse

In the last two posts (part 1, part 2) in this series we used PowerShell to gleam information from our monthly AWS billing report.  While you can use those scripts to learn a great amount of information from about your AWS usage, you will eventually outgrow PowerShell.  In this post I will show you how to load the bill into SQL Server for more detailed analysis.  

In the prior posts we used the monthly reports.  These reports contain a single line item for each resource for the entire month.  In this post we are going to use the hourly report.  This report shows the detailed expenses incurred each hour.  If you shut a machine down at night you will see that reflected in the report.

Creating a Staging Schema

The first step is to create a table in SQL Server to hold the data.  I am calling this a staging table because this post will present a star schema for a data warehouse.  But, you could simply use this one table and run reports directly against it.  

AWS does not have a lot of detail on the schema for the billing reports.  I have been using the following table schema for about a year now and have worked out most of the kinks.  Some of my fields are likely larger than needed, but since my data eventually I end up in a star schema, I prefer to have a little buffer.   Also note that I am using consolidated billing, but I am ignoring the blended rates.  If you wanted to add blended rates, a real should suffice.  

CREATE TABLE [dbo].[StageLineItems](
 [ReportFileName] [nvarchar](256) NOT NULL,
 [InvoiceID] [varchar](16) NOT NULL,
 [PayerAccountId] [bigint] NOT NULL,
 [LinkedAccountId] [bigint] NOT NULL,
 [RecordType] [varchar](16) NOT NULL,
 [RecordID] [decimal](26, 0) NOT NULL,
 [ProductName] [varchar](64) NOT NULL,
 [RateId] [int] NOT NULL,
 [SubscriptionId] [int] NOT NULL,
 [PricingPlanId] [int] NOT NULL,
 [UsageType] [varchar](64) NOT NULL,
 [Operation] [varchar](32) NOT NULL,
 [AvailabilityZone] [varchar](16) NULL,
 [ReservedInstance] [char](1) NOT NULL,
 [ItemDescription] [varchar](256) NOT NULL,
 [UsageStartDate] [datetime] NOT NULL,
 [UsageEndDate] [datetime] NOT NULL,
 [UsageQuantity] [real] NOT NULL,
 [Rate] [real] NOT NULL,
 [Cost] [real] NOT NULL,
 [ResourceId] [varchar](128) NULL,
 [user:Name] [varchar](256) NULL,
 [user:Owner] [varchar](256) NULL,
 [user:Project] [varchar](256) NULL
)

Loading the Data

Once you have the table created you will need to load the report.  I use an SSIS job for this.  I am loading the "Detailed Line Items with Resources and Tags" report.  This is the most detailed report available.  I created a custom script task in SSIS to download the report from S3.  This package runs every six hours to refresh the data in the warehouse.

My script expects the account id, bucket name, access key, and secret key.  In addition, you can optional specify the month, year and file name.  If you don't specify the optional params, the script will calculate the current month and return it do you can use it later in the package.  See below.



The code for my script task is below.  It calculates the dates and then uses the AWS .Net API to download the file.  Note that you must GAC AWSSDK.dll before SSIS will load it. 

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using System.IO;

namespace ST_a8746170d9b84f1da9baca053cbdc671.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            string accountId = (string)Dts.Variables["accountId"].Value;
            string accessKey = (string)Dts.Variables["accessKey"].Value;
            string secretKey = (string)Dts.Variables["secretKey"].Value;
            string bucketName = (string)Dts.Variables["bucketName"].Value;

            System.DateTime date = DateTime.Now.AddDays(-5);
            int month = (int)Dts.Variables["Month"].Value;
            if (month == 0) { month = date.Month; }
            int year = (int)Dts.Variables["Year"].Value;
            if (year == 0) { year = date.Year; }

            try
            {
                string keyName = string.Format("{0}-aws-billing-detailed-line-items-with-resources-and-tags-{1:0000}-{2:00}.csv.zip", accountId, year, month);
                Dts.Variables["keyName"].Value = keyName;

                string zipFilePath = Path.Combine(Path.GetTempPath(), keyName);
                Dts.Variables["zipFilePath"].Value = zipFilePath;

                string csvFilePath = zipFilePath.Replace(".zip", "");
                Dts.Variables["csvFilePath"].Value = csvFilePath;

                    AmazonS3Config config = new AmazonS3Config()
                    {
                        ServiceURL = "s3.amazonaws.com"
                    };

                    using (IAmazonS3 client = Amazon.AWSClientFactory.CreateAmazonS3Client(accessKey, secretKey, RegionEndpoint.USEast1))
                    {
                        GetObjectRequest request = new GetObjectRequest()
                        {
                            BucketName = bucketName,
                            Key = keyName
                        };

                        using (GetObjectResponse response = client.GetObject(request))
                        {
                            if (File.Exists(zipFilePath)) File.Delete(zipFilePath);
                            response.WriteResponseStreamToFile(zipFilePath);
                        }
                    }
                    Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (AmazonS3Exception amazonS3Exception)
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}

Notice that the report we are downloading is a .zip file.  The detailed report can get very large.  I am simply shelling out to 7Zip from the SSIS package to decompress the report.  Finally, note that the report contains a few summary lines you will likely want to exclude from the report when you load.  I use the following filter.

RecordType == "LineItem" && RateId != 0 && !ISNULL(RecordId)

Star Schema

The final piece of the puzzle is loading the data into a warehouse for reporting.  I'm not going to take you through the details of designing a data warehouse, but I can share the schema I am using.  I analyzed the data a few times using a Data Profiling Task, and ultimately settled on the following dimension tables


I hope this series has been helpful.  

Saturday, August 9, 2014

Decoding Your AWS Bill (Part 2) Chargeback with Tags

It took 6 months but I finally got time to continue the series on Decoding Your AWS bill.  In the last post, we used PowerShell to download and query the monthly bill.  In this post we use tags to create a cost allocation report.  In the next, and final post in this series, I will show you how to load the hourly detail report into SQL Server.

Let's assume that we have multiple project teams at our company and they all have servers running in the same AWS account.  We want to "charge back" each team for their usage.  We begin by tagging each instance with a project name (see figure below).  Notice that I also include a name and owner.



This is good start, but we learned in part one that charges are allocated to the instances as well as the volumes and network interfaces that are attached to them.  Therefore, we have to tag the resources as well as the instance itself.  It is probably unrealistic to ask our users to tag all the resources so let's create a script that copies tags from the instance any resources attached to it.  This way our users only have to remember to tag their instances.

The script below will read all of the tags from the instance and copy them to each resource.  I have something very similar scheduled to run once a day on each of my accounts.

(Get-EC2Instance).Instances | % {

    $Instance = $_

    #First, get the tags from each instance
    $NameTag = $Instance.Tags | Where-Object { $_.Key -eq 'Name' }
    $OwnerTag = $Instance.Tags | Where-Object { $_.Key -eq 'Owner' }
    $ProjectTag = $Instance.Tags | Where-Object { $_.Key -eq 'Project' }

    $Instance.BlockDeviceMappings | % {
        #Copy the tags to each volume
        If($NameTag -ne $null) {New-EC2Tag -Resources $_.Ebs.VolumeId -Tag $NameTag}
        If($OwnerTag -ne $null) {New-EC2Tag -Resources $_.Ebs.VolumeId -Tag $OwnerTag}
        If($ProjectTag -ne $null) {New-EC2Tag -Resources $_.Ebs.VolumeId -Tag $ProjectTag}
    }

    $Instance.NetworkInterfaces | % {
        #Copy the tags to each NIC
        If($NameTag -ne $null) {New-EC2Tag -Resources $_.NetworkInterfaceId -Tag $NameTag}
        If($OwnerTag -ne $null) {New-EC2Tag -Resources $_.NetworkInterfaceId -Tag $OwnerTag}
        If($ProjectTag -ne $null) {New-EC2Tag -Resources $_.NetworkInterfaceId -Tag $ProjectTag}
    }
}

This is a good start, but it will not really scale well.  It makes an API call for ever resource every time we run it.  It will work well for a handful of instances, but as we add more instances the script will take longer and longer to run.  It would be better to cache the tags collection and only change update those resources that need to be changed.  Here is a much better version.

$AllTags = Get-EC2Tag

Function Rectify-Tag {
    #This function only updates a tag if the current value is not correct
    Param ($ResourceId, $Tag)
    #Find the current tag in the cached collection
    $OldTag = $AllTags | Where-Object {(($_.ResourceId -eq $ResourceId) -and ($_.Key -eq $Tag.Key))}
    If(($OldTag -eq $null) -or ($OldTag.Value -ne $Tag.Value)) {
        #The currrent tag is wrong, let's fix it.
        New-EC2Tag -Resources $ResourceId -Tag $Tag
    }
}

(Get-EC2Instance).Instances | % {

    $Instance = $_

    #First, get the tags from each instance
    $NameTag = $Instance.Tags | Where-Object { $_.Key -eq 'Name' }
    $OwnerTag = $Instance.Tags | Where-Object { $_.Key -eq 'Owner' }
    $ProjectTag = $Instance.Tags | Where-Object { $_.Key -eq 'Project' }

    $Instance.BlockDeviceMappings | % {
        #Copy the tags to each volume
        If($NameTag -ne $null) {Rectify-Tag -ResourceId $_.Ebs.VolumeId -Tag $NameTag}
        If($OwnerTag -ne $null) {Rectify-Tag -ResourceId $_.Ebs.VolumeId -Tag $OwnerTag}
        If($ProjectTag -ne $null) {Rectify-Tag -ResourceId $_.Ebs.VolumeId -Tag $ProjectTag}
    }

    $Instance.NetworkInterfaces | % {
        #Copy the tags to each NIC
        If($NameTag -ne $null) {Rectify-Tag -ResourceId $_.NetworkInterfaceId -Tag $NameTag}
        If($OwnerTag -ne $null) {Rectify-Tag -ResourceId $_.NetworkInterfaceId -Tag $OwnerTag}
        If($ProjectTag -ne $null) {Rectify-Tag -ResourceId $_.NetworkInterfaceId -Tag $ProjectTag}
    }
}

Now we have to add the tags we created to our reports.  I assume at this point that you have billing reports enabled.  If not, see my prior blog post.  Log into the web console using your account credentials (not IAM credentials) and click on your name in the top right corner.  From the dropdown, click "Billing and Cost Management."  Choose "Preferences" from the menu down the left side of the screen.  Finally, click the "Manage Report Tags" link toward the end of the screen.

Now, find the tags you want to include in the report (see the figure below).  Make sure you include the project tag.





Now we can download and query the report just like we did in the last post.  The only change is that we are going to use the "$AccountId-aws-cost-allocation-$Year-$Month.csv" report rather than the "$AccountId-aws-billing-csv-$Year-$Month.csv" report we used before.

In addition, note that the custom tags we added will appear in the report as user:tag.  So our Project tag will appear as user:Project.  Therefore, if we wanted to return all the costs associated with the ERP project we would use a PowerShell query like this:

$PayerLineItems |  Where-Object {$_.'user:Project' -eq 'ERP'} | Measure-Object TotalCost -Sum

Now, we have a little problem.  You may notice that if you add up all costs associated to all projects, it does not sum to the invoice total.  This is expected.  There are a few costs we did not capture.  First, we only tagged EC2.  If you want to allocate other services, you will need to develop a similar strategy to the one we used above for EC2.  Second, you may have a support contract that adds 10% to the bill.  Third, there are some EC2 costs, like snapshots that do not include tags in the report.  There is nothing we do we these last two, but allocate them to the projects as overhead.  The script below will do just that.  I'm not going to go into detail, but you can look though my script to understand it.

Set-AWSCredentials LAB

Function Get-CostAllocationReport {
    Param(
        [string][parameter(mandatory=$false)]$AccountId,
        [string][parameter(mandatory=$false)]$BucketName, 
        [string][parameter(mandatory=$false)]$Month, 
        [string][parameter(mandatory=$false)]$Year
    )


    #If no BucketName was specified, assume it tis the same as the account alias
    $BucketName = Get-IAMAccountAlias

    #If no AccountId was specified, use the account of the current user
    $AccountID = (Get-IAMUser).ARN.Replace('arn:aws:iam::','').Substring(0,12)
    
    #If no month and year were specified, use last month
    If([System.String]::IsNullOrEmpty($Month)) {$Month = If((Get-Date).Month -eq 1){12}Else{(Get-Date).Month}} 
    If([System.String]::IsNullOrEmpty($Year)) {$Year = If($Month -eq 12){(Get-Date).Year - 1}Else{(Get-Date).Year}} 
    $Month = "{0:D2}" -f [int]$Month #Pad single digit with 0 

    #Get lastest report 
    $Key = "$AccountId-aws-cost-allocation-$Year-$Month.csv" 
    $FileName = "$env:TEMP\$AccountId-aws-cost-allocation-$Year-$Month.csv" 

    #Download the report from S3
    If(Test-Path $FileName) {Remove-Item $FileName}
    $Null = Read-S3Object -BucketName $BucketName -Key $Key -File $FileName 

    #Stip off the first line of the file #Don't see your tags in the report? New tags are excluded by default - go to https://portal.aws.amazon.com/gp/aws/developer/account?action=cost-allocation-report to update your cost allocation keys.
    $Temp = Get-Content -Path $FileName | Select -Skip 1 
    $Temp | Set-Content -Path $FileName

    Import-Csv $FileName 
}

$Report = Get-CostAllocationReport

Write-Host "Statement total =" ($Report | Where-Object {$_.RecordType -eq 'StatementTotal'}).TotalCost
$PayerLineItems = $Report | Where-Object {$_.RecordType -eq 'PayerLineItem'} 

$Summary = $PayerLineItems | Group-Object -Property 'user:Project' | % { 
    New-Object psobject -Property @{ 
        Project = $_.Name.Trim(); 
        TotalCost = ($_.Group | Measure-Object TotalCost -Sum).Sum;
    }
} 

$AllocatedCost = ($PayerLineItems |  Where-Object {$_.'user:Project' -ne ''} | Measure-Object TotalCost -Sum).Sum
$UnallocatedCost = ($PayerLineItems |  Where-Object {$_.'user:Project' -eq ''} | Measure-Object TotalCost -Sum).Sum

$ProjectAllocation = $PayerLineItems | Where-Object {$_.'user:Project' -ne ''} | Group-Object -Property 'user:Project' | % { 
    $DirectCost = ($_.Group | Measure-Object TotalCost -Sum).Sum
    $AllocationRate = $DirectCost / $AllocatedCost;
    $IndirectCost = $UnallocatedCost * $AllocationRate;

    New-Object psobject -Property @{ 
        Project = $_.Name.Trim(); 
        DirectCost = $DirectCost;
        Overhead = $IndirectCost;
        TotalCost = $DirectCost + $IndirectCost;
    }
} 

$ProjectAllocation | Format-Table Project, DirectCost, Overhead, TotalCost -AutoSize

When you run this script it should output the statement total and a table showing the costs allocated to each project.  Similar to the the following.

Statement total = 2317.048424386140

Project   DirectCost          Overhead        TotalCost
-------   ----------          --------        ---------
ERP       468.191124  25.2253642231116 493.416488223112
CRM      1181.834959  63.6753149817962  1245.5102739818
DotCom    149.640772    8.062397561231 157.703169561231
ITOM      398.925069  21.4934236199978 420.418492619998

That's it for this post.  In the next post we use the hourly report to populate a warehouse in SQL Server.

Wednesday, July 16, 2014

Bulk Importing EC2 Instances

I have been testing a a preview of a new PowerShell command, Import-EC2Instance, that will be added to the AWS PowerShell API next week.  The new command allows you to import a VM from VMware or Hyper-V.  I covered this in my book, but at the time the functionality was not available in PowerShell and I had to use the Java API.

While the new command will upload and convert your VM, you can also do the upload and convert independently.  This left me wondering if I could use the AWS Import/Export Service to ship a an external drive full of VMDK files and skip the upload process.  After some testing, it turns out you can.  Depending on the number of VMs you plan to migrate and the speed of your internet connection, this may be a great alternative.

Let me clarify that I am speaking of two similarly named services here.  EC2 Import is used to convert a VMDK (or VHD) into an EC2 Instance.  AWS Import allows you to ship large amounts of data using removable media.

Normally, the EC2 Import process works like this.  First, the PowerShell module breaks up the VMDK into 10MB chucks and uploads it to an S3 bucket.  Next, it generates a manifest file that describes how to put the pieces back together, and uploads that to S3.  Then, it calls the ec2-import-instance REST API passing a reference to the manifest.  Finally, the import service uses the Manifest to reassemble the VMDK file and convert it into an EC2 instance.

The large file is broken into chunks to make the upload easier and allow it recover from a connection error (retrying a part rather than the entire file.)  With the AWS Import/Service there is no need to break up the file.  Note that S3 supports objects up to 5TB and EC2 volumes can only be 1TB.  So there is no reason not to upload the VMDK as a single file.

So, all we need to do is create the manifest file and call the E2 Import API passing a reference to the manifest file.  If you have ever looked at one of these manifest files, they can look really daunting.  But, with only a single part, it's actually really simple.  Note that all of the URLs are pre-signed so the Import Service can access your VMDK file without granting IAM permissions to the import service.

$Bucket = 'MyBucket'
$VolumeKey = 'WIN2012/Volume.vmdk'
$VolumeSize = 20

#We need to know how big the file is to create the Manifest
$FileSize = (Get-S3Object -BucketName $Bucket -Key $VolumeKey).Size
$ByteRangeEnd = $FileSize - 1 #Byte Range is zero based

#Let's create a few pre-signed URLs for the import engine
$ManifestKey = "$VolumeKey.Manifest.xml"
$SelfDestructURL = Get-S3PreSignedURL -BucketName $Bucket -Key $ManifestKey -Expires (Get-Date).AddDays(7) -Protocol HTTPS -Verb GET
$HeadURL = Get-S3PreSignedURL -BucketName $Bucket -Key $VolumeKey -Expires (Get-Date).AddDays(7) -Protocol HTTPS -Verb HEAD
$GetURL = Get-S3PreSignedURL -BucketName $Bucket -Key $VolumeKey -Expires (Get-Date).AddDays(7) -Protocol HTTPS -Verb GET
$DeleteURL = Get-S3PreSignedURL -BucketName $Bucket -Key $VolumeKey -Expires (Get-Date).AddDays(7) -Protocol HTTPS -Verb DELETE

#The URLs need to be HTML encoded to write into the XML document
Add-Type -AssemblyName System.Web
$SelfDestructURL = [System.Web.HttpUtility]::HtmlEncode($SelfDestructURL)
$HeadURL = [System.Web.HttpUtility]::HtmlEncode($HeadURL)
$GetURL = [System.Web.HttpUtility]::HtmlEncode($GetURL)
$DeleteURL = [System.Web.HttpUtility]::HtmlEncode($DeleteURL)

#Create a Manifest file with one enourmous part
$Manifest = @"
<manifest>
    <version>2010-11-15</version>
    <file-format>VMDK</file-format>
    <importer>
        <name>ec2-upload-disk-image</name>
        <version>1.0.0</version>
        <release>2010-11-15</release>
    </importer>
    <self-destruct-url>$SelfDestructURL</self-destruct-url>
    <import>
        <size>$FileSize</size>
        <volume-size>20</volume-size>
        <parts count="1">
            <part index="0">
                <byte-range end="$ByteRangeEnd" start="0">
                <key>$VolumeKey</key>
                <head-url>$HeadURL</head-url>
                <get-url>$GetURL</get-url>
                <delete-url>$DeleteURL</delete-url>
            </byte-range></part>
        </parts>
    </import>
</manifest>
"@

#Upload the manifest file to S3
Write-S3Object -BucketName $Bucket -Content $Manifest -Key $ManifestKey

#Kick off an import task using the new manifest file
$Task = Import-EC2Instance -BucketName $Bucket -ManifestFileKey $ManifestKey -InstanceType m1.large -Architecture x86_64 -Platform Windows

Obviously there is room for improvement here.  You could import directly to a VPC, support Linux instances, or use the Import-Ec2Volume command to import additional (non-boot) volumes.  Hopefully this is good starting point.

Note that prerequisites for the EC2 Import still apply.  For example, you must convert the VMDK files to an OVF before shipping.

Friday, June 20, 2014

Writing to the EC2 Console

I have been building a bunch of Windows AMIs for EC2 recently.  If the instance fails to build it can be a real bear to diagnose issues.  You don't have access to the console to watch what's happening.  It would be great if I could log to the EC2 Console (also called the System Log on the web site) so I knew what was happening.  So I hacked the EC2Config Service to see how it was writing to the console.

The EC2 Console, it turns out, is listening to Serial Port COM1.  So if want to write a message to the log, all you have to do is write to COM1.  Of course the EC2 Config Service already has COM1 open, so we have to close it first.  Here is a quick sample.

Stop-Service Ec2Config
$Port = New-Object System.IO.Ports.SerialPort
$Port.PortName = "COM1"
$Port.BaudRate = 0x1c200
$Port.Parity = [System.IO.Ports.Parity]::None
$Port.DataBits = 8
$Port.StopBits = [System.IO.Ports.StopBits]::One
$Port.Open()
$Port.WriteLine("This was written directly to the serial port");
$Port.Close()

You can also use a helper class that ships with EC2 Config Service called ConsoleLibrary.  This implementation is thread-safe, adds the date and time, and takes care of all the serial port configuration details.  Of course you still need to close the EC2 Config Service before running this code.

#Stop the EC2 Config Service
Stop-Service Ec2Config
#Ensure the log file exists or you will get an error
New-Item -ItemType File -Force -Path 'C:\Windows\system32\WindowsPowerShell\v1.0\Logs\Ec2ConfigLog.txt'
#Load the Ec2Config Library
[System.Reflection.Assembly]::LoadFrom("C:\Program Files\Amazon\Ec2ConfigService\Ec2ConfigLibrary.dll")
#Write to the Console
[ConsoleLibrary.ConsoleLibrary]::Instance().WriteToConsole("This was written using the Ec2ConfigLibrary", $true)

As you can see below, me messages appear mixed in with the standard console messages, but note that the Console is only updated during boot.  If you write to the log after boot the messages will not appear until the next reboot.

2014/07/19 18:18:50Z: AMI Origin Version: 2014.07.10
2014/07/19 18:18:50Z: AMI Origin Name: Windows_Server-2012-R2_RTM-English-64Bit-Base
2014/07/19 18:18:50Z: OS: Microsoft Windows NT 6.2.9200.0
2014/07/19 18:18:50Z: Language: en-US
2014/07/19 18:18:50Z: EC2 Agent: Ec2Config service v2.2.5.0
2014/07/19 18:18:50Z: Driver: AWS PV Network Device v7.2.0.0
2014/07/19 18:18:50Z: Driver: AWS PV Storage Host Adapter v7.2.0.0
2014/07/19 18:18:51Z: Message: Waiting for meta-data accessibility...
2014/07/19 18:18:51Z: Message: Meta-data is now available.
2014/07/19 18:18:51Z: AMI-ID: ami-9ade1df2
2014/07/19 18:18:51Z: Instance-ID: i-05132d2e
2014/07/19 18:18:51Z: Ec2SetPassword: Disabled
2014/07/19 18:18:51Z: RDPCERTIFICATE-SUBJECTNAME: WIN-JCK73T6NRFU
2014/07/19 18:18:51Z: RDPCERTIFICATE-THUMBPRINT: 68D80A1B0567E60D0BD2C6A8068D7E1D55ED3DDC
2014/07/19 18:18:53Z: Message: Windows is Ready to use
This was written directly to the serial port
2014/07/19 19:30:54Z: This was written using the Ec2ConfigLibrary
2014/07/19 19:32:57Z: AMI Origin Version: 2014.07.10
2014/07/19 19:32:57Z: AMI Origin Name: Windows_Server-2012-R2_RTM-English-64Bit-Base
2014/07/19 19:32:57Z: OS: Microsoft Windows NT 6.2.9200.0
2014/07/19 19:32:57Z: Language: en-US
2014/07/19 19:32:57Z: EC2 Agent: Ec2Config service v2.2.5.0
2014/07/19 19:32:58Z: Driver: AWS PV Network Device v7.2.0.0
2014/07/19 19:32:58Z: Driver: AWS PV Storage Host Adapter v7.2.0.0
2014/07/19 19:32:58Z: Message: Waiting for meta-data accessibility...
2014/07/19 19:32:59Z: Message: Meta-data is now available.
2014/07/19 19:32:59Z: AMI-ID: ami-9ade1df2
2014/07/19 19:32:59Z: Instance-ID: i-05132d2e
2014/07/19 19:32:59Z: Ec2SetPassword: Disabled
2014/07/19 19:32:59Z: RDPCERTIFICATE-SUBJECTNAME: WIN-JCK73T6NRFU
2014/07/19 19:32:59Z: RDPCERTIFICATE-THUMBPRINT: 68D80A1B0567E60D0BD2C6A8068D7E1D55ED3DDC
2014/07/19 19:33:00Z: Message: Windows is Ready to use


Friday, May 30, 2014

Setting the Hostname in a SysPreped AMI

When you create an Windows AMI (Amazon Machine Image) it is configured to generate a random server name.  Often this name does not meet your needs.  Maybe your company has a specific naming convention (e.g US-NYC-1234) or you just want to use a descriptive name (e.g. WEB01).  Whatever the reason, let's look at how to set the name when you launch the machine.

In this post we will use PowerShell to read the name from a Tag on the instance.  When done, you set the hostname in launch wizard by simply filling in the Name tag.  See the image below.  Our script will read this tag and rename the server when it boots for the first time.



It is important to automate the name change.  As your cloud adoption matures, you quickly realize that you cannot have an admin log in and rename the server when it's launched.  First, it takes too long.  Second, you want servers to launch automatically, for example, in response to an auto-scaling event.

So how can you set the name?  You will find a ComputerName element in the SysPrep2008.xml file that ships with the EC2 Config Service (or in the unattended.xml file if you're not using the EC2 Config Service.)  The computer name is in the specialize section.  In the snippet below, you can see the default value of "*".  The star means that windows should generate a random name.

<component language="neutral" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" versionScope="nonSxS" publicKeyToken="31bf3856ad364e35" processorArchitecture="amd64" name="Microsoft-Windows-Shell-Setup">
    <ComputerName>*</ComputerName>
    <CopyProfile>true</CopyProfile>
    <RegisteredOrganization>Amazon</RegisteredOrganization>
    <TimeZone>Eastern Standard Time</TimeZone>
</component>

If you want to change the name you can simply hard-code whatever you want here.  Of course, if you hard-code if before you run SysPrep, every machine you create from the AMI will have the same name.  That's not what we want.  So the trick is to set the name when the machine first boots and before specialize runs.

Let's quickly review how SysPrep works.  When you run SysPrep, it wipes any identifying information form the machine (e.g. Name, SIDs, etc.)  This is known as the generalize phase.  After the generalize phase you shutdown the machine and take the image.

When a SysPreped image first boots, it runs windows setup (WinDeploy.exe).  This is known as the specialize phase.  If you have ever bought a new home computer, you have experienced the setup wizard that allows you to configure your timezone, etc.  In the cloud you cannot answer questions so you have to supply an unattended.xml file with the answers to all the questions.  

We need to inject our script into the specialize phase before setup runs.  Our script will get the machine name from the EC2 API and modify the unattended.xml file.  Here is a sample script to do just that.  The script has three parts.

  • The first part uses the meta-data service to discover the identity of the instance and the region the machine is running in.
  • The second part of the script uses the EC2 API to get the name tag from for the instance.  Note that I have not included any credentials.  I assume that the instance is in a role that allows access to the Get-EC2Tag API call.
  • The third part of the script modifies the unattended.xml file.  This is the same file shown earlier.  The script simply finds the ComputerName node and replaces the * with the correct name.


Write-Host "Discovering instance identity from meta-data web service"
$InstanceId = (Invoke-RestMethod 'http://169.254.169.254/latest/meta-data/instance-id').ToString()
$AvailabilityZone = (Invoke-RestMethod 'http://169.254.169.254/latest/meta-data/placement/availability-zone').ToString()
$Region = $AvailabilityZone.Substring(0,$AvailabilityZone.Length-1)

Write-Host "Getting Tags for the instance"
$Tags = Get-EC2Tag -Filters @{Name='resource-id';Value=$InstanceId} -Region $Region
$InstanceName = ($Tags | Where-Object {$_.Key -eq 'Name'}).Value
Write-Host "`tFound Instance Name: $InstanceName"
Write-Host "`tFound Instance Owner: $InstanceOwner"

If($InstanceName -ne $null) {
 Write-Host "Setting the machine name to $InstanceName"
 $AnswerFilePath = "C:\Windows\Panther\unattend.xml"
$AnswerFile = [xml](Get-Content -Path $AnswerFilePath) $ns = New-Object System.Xml.XmlNamespaceManager($AnswerFile.NameTable) $ns.AddNamespace("ns", $AnswerFile.DocumentElement.NamespaceURI) $ComputerName = $AnswerFile.SelectSingleNode('/ns:unattend/ns:settings[@pass="specialize"]/ns:component[@name="Microsoft-Windows-Shell-Setup"]/ns:ComputerName', $ns) $ComputerName.InnerText = $InstanceName $AnswerFile.Save($AnswerFilePath) }

So how do we get this script to run before setup? That's the tricky part.  Let's dig a bit deeper.  I said earlier that when a SysPreped image first boots it will run WinDeploy.exe.  To be more specific, it will run whatever it finds in the HKLM:\System\Setup registry key.  SysPrep will put c:\Windows\System32\oobe\windeploy.exe in the registry key before shutdown.

So we need to change that registry key after SysPrep runs, but before the system shuts down.  To do that we need to pass the /quit flag rather than /shutdown.  I'm writing about AWS, so I assume you are calling SysPrep from the EC2Config service.  If you are, you need to edit the switches element of the  BundleConfig.xml file in the EC2Config folder.  The switches element is about midway down the file.  See the example below.  Just remove /shutdown and replace it with /quit.

<Sysprep version="6.0">
      <PreSysprepRunCmd>C:\Program Files\Amazon\Ec2ConfigService\Scripts\BeforeSysprep.cmd</PreSysprepRunCmd>
      <ExePath>C:\windows\system32\sysprep\sysprep.exe</ExePath>
      <AnswerFilePath>sysprep2008.xml</AnswerFilePath>
      <Switches>/oobe /shutdown /generalize<syspr/Switches>
</Sysprep>

Alright, we are almost there.  Now you can run SysPrep and it will give you a chance to make changes before shutting down.  You want to replace the HKLM:\System\Setup registry key with the script we created above.  Don't forget to add a line to call WinDeploy.exe at the end of the script.

With all that done (it's not as bad it sounds) you can shutdown and take an image.  It will take a few tries to get all this working correctly.  I recommend that you log the output of the script using Start-Transcript.  If the server fails to boot you can attach the volume to another instance and read the log.

Friday, March 7, 2014

LI.Net User Group

I'll be presenting on AWS tonight at the LI.Net User Group.  The presentation is available here.

Monday, February 10, 2014

NYC PowerShell User Group

I'll be presenting on AWS tonight at the NYC PowerShell User Group.  The presentation is available here.

Saturday, January 25, 2014

Decoding Your AWS Bill (Part 1)

As you begin to adopt AWS you will likely be asked to report on both usage and cost. One way to do this is using the Monthly Billing report. In this post I will show you how to download your bill and analyze it using PowerShell.
AWS offers a feature called Programmatic Billing Access. When programmatic billing access is enabled, AWS periodically saves a copy of your bill to an S3 bucket. To enable programmatic billing access click here. Be sure to enable the Monthly Report.
Once programmatic billing access is enabled you can download your bill using PowerShell. The function below will download the monthly report and load it in to memory.

 
Function Get-MonthlyReport {
    Param(
        [string][parameter(mandatory=$false)]$AccountId,
        [string][parameter(mandatory=$false)]$BucketName,
        [string][parameter(mandatory=$false)]$Month,
        [string][parameter(mandatory=$false)]$Year
    )
 
    If($BucketName -eq $Null){
        #If no BucketName was specified, assume it is the same as the account alias
        $BucketName = Get-IAMAccountAlias
    }
 
    If($AccountID -eq $Null){
        #If no AccountId was specified, use the account of the current user
        $AccountID = (Get-IAMUser).ARN.Replace('arn:aws:iam::','').Substring(0,12)
    }
   
    #If no month and year were specified, use last month
    If([System.String]::IsNullOrEmpty($Month)) {$Month = If((Get-Date).Month -eq 1){12}Else{(Get-Date).Month}}
    If([System.String]::IsNullOrEmpty($Year)) {$Year = If($Month -eq 12){(Get-Date).Year - 1}Else{(Get-Date).Year}}
    $Month = "{0:D2}" -f [int]$Month #Pad single digit with 0
 
    #Download the report from S3 and save to the temp directory
    $Key = "$AccountId-aws-billing-csv-$Year-$Month.csv"
    $FileName = "$env:TEMP\$AccountId-aws-billing-csv-$Year-$Month.csv"
    If(Test-Path $FileName) {Remove-Item $FileName}
    $Null = Read-S3Object -BucketName $BucketName -Key $Key -File $FileName
 
    #Import the file from the temp directory
    Import-Csv $FileName
}
The monthly report is a CSV file with all of the line items in the bill you receive each month. In addition to the line items, the bill includes a few total lines. If you have consolidated billing enabled, there is an invoice total for each account and a statement total that includes the overall total. To get the total of your bill, you simply find the StatementTotal line. For example:

$Report = Get-MonthlyReport
$Report | Where-Object {$_.RecordType -eq 'StatementTotal'}
Alternatively you could sum up the PayerLineItems using Measure-Object.
($Report | Where-Object {$_.RecordType -eq 'PayerLineItem'} | Measure-Object TotalCost -Sum ).Sum
You can also find specific line items. For example, the following script will find the total number of on-demand instance hours.
($Report | Where-Object {$_.UsageType -like 'BoxUsage*'} | Measure-Object UsageQuantity -Sum ).Sum
And this line will find the total cost of the on-demand instances.
($Report | Where-Object {$_.UsageType -like 'BoxUsage*'} | Measure-Object TotalCost -Sum ).Sum
These will find the usage and cost of EBS storage.
($Report | Where-Object {$_.UsageType -like 'EBS:VolumeUsage*'} | Measure-Object UsageQuantity -Sum ).Sum
($Report | Where-Object {$_.UsageType -like 'EBS:VolumeUsage*'} | Measure-Object TotalCost -Sum ).Sum
These will find the usage and cost of S3.
($Report | Where-Object {$_.UsageType -like 'TimedStorage*'} | Measure-Object UsageQuantity -Sum ).Sum
($Report | Where-Object {$_.UsageType -like 'TimedStorage*'} | Measure-Object TotalCost -Sum ).Sum
And this one will show you snapshots.
($Report | Where-Object {$_.UsageType -like 'EBS:SnapshotUsage*'} | Measure-Object UsageQuantity -Sum ).Sum
($Report | Where-Object {$_.UsageType -like 'EBS:SnapshotUsage*'} | Measure-Object TotalCost -Sum ).Sum
As you can see there is a lot of interesting information in your bill that you can use to report on both usage and costs. In the next post I will use cost allocation report to calculate chargebacks.



Monday, January 20, 2014

Fun with AWS CloudTrail and SQS

CloudTrail is new service that logs all AWS API calls to an S3 bucket. While the obvious use case is creating an audit trail for security compliance, there are many other purposes. For example, we might use the CloudTrail logs to keep a Change Management Database (CMDB) up date by looking for all API calls that create, modify or delete an instance. In this exercise I’ll use CloudTrail, Simple Storage Service (S3), Simple Notifications Services (SNS), Simple Queue Service (SQS) and PowerShell to parse CloudTrail logs looking for new events.
The picture below describes the solution. CloudTrail periodically writes log files to an S3 bucket (1). When each file is written, CloudTrail also sends out an SNS notification (2). SQS is subscribing to the notification (3) and will hold it until we get around to processing it. When the PowerShell script runs, it pools the queue (4) for new CloudTrail notifications. If there are new notifications, the script downloads the log file (5) and processes it. If the script finds interesting events in the log file, it writes them to another queue (6). Now, other applications (like our CMDB) can subscribe to just the events it needs and does not have bother processing the log files.

Let’s start by Configuring CloudTrail. I just created a new S3 bucket and enabled SNS notifications creating a new topic named “CloudTrail”.

Now let’s create a new queue called “CloudTrail”. I just left the default values. This queue will hold notifications that a new CloudTrail log file has been written. You should also create queues for each of the events you care about. I created a queue for instances (to update the CMDB) and one for users (to notify the security team of new users).

Next, we need to subscribe our “CloudTrail” SQS queue to the “CloudTrail” SNS topic. Right click on the CloudTrail queue and choose “Subscribe Queue to SNS Topic.” Then choose the “CloudTrail” topic from the dropdown and click Subscribe.

The messages in the queue will look like the example below. The CloudTrail message (yellow) is wrapped in a SNS notification (green) which in turn is wrapped in an SQS message (blue). Our script will need to unwrap this structure to get to the CloudTrail message.

Let’s begin our PowerShell script by defining the queues. First we need the URL of our CloudTrail queue.
$CloudTrailQueue = 'https://sqs.us-east-1.amazonaws.com/999999999999/CloudTrail'
In addition, we need a list of CloudTrail log events we are interested in, along with which Queue to write them to. I used a hash table for this.
$InterestingEvents = @{
    'RunInstances'            = 'https://sqs.us-east-1.amazonaws.com/999999999999/Instances';
    'ModifyInstanceAttribute' = 'https://sqs.us-east-1.amazonaws.com/999999999999/Instances';
    'TerminateInstances'      = 'https://sqs.us-east-1.amazonaws.com/999999999999/Instances';
    'CreateUser'              = 'https://sqs.us-east-1.amazonaws.com/999999999999/Users';
    'DeleteUser'              = 'https://sqs.us-east-1.amazonaws.com/999999999999/Users';
}
Now we can get a batch of messages from the queue and use a loop to process them one by one.
$SQSMessages = Receive-SQSMessage $CloudTrailQueue 
$SQSMessages | % { $SQSMessage = $_  …
Remember that the message we are interested in is wrapped in both an SNS and SQS message. Therefore, we have to unpack the message which is stored as JSON.
$SNSMessage = $SQSMessage.Body | ConvertFrom-Json
$CloudTrailMessage = $SNSMessage.Message | ConvertFrom-Json
Also remember that the CloudTrail message does not contain the log file. Rather the log file is stored in S3 and the message contains the name of the bucket and path to the log file. We next have to download the cloud trail log file from S3 and save it to the temp folder.
Read-S3Object -BucketName $CloudTrailMessage.s3Bucket -Key $CloudTrailMessage.s3ObjectKey[0] -File "$env:TEMP\CloudTrail.json.gz"
The log file is JSON format, but compressed using gzip. Therefore, I am using WinZip to uncompress the JSON file. If you don’t have WinZip, you can replace this line with your favorite tool.
Start-Process -Wait -FilePath 'C:\Program Files\WinZip\winzip32.exe' '-min -e -o CloudTrail.json.gz' -WorkingDirectory $env:TEMP
Now we finally have the detailed log file. Load it and loop over the records.
$CloudTrailFile = Get-Content "$env:TEMP \CloudTrail.json" -Raw |  ConvertFrom-Json
$CloudTrailFile.Records | % { $CloudTrailRecord = $_ …
I check the event type of each record against the hash table of events we are interested in.
$QueueUrl = $InterestingEvents[$CloudTrailRecord.eventName]
If($QueueUrl -ne $null){
                $Response = Send-SQSMessage -QueueUrl $QueueUrl -MessageBody ($CloudTrailRecord | ConvertTo-Json)
}
Finally, we remove the message from the queue so we don't process it again
Remove-SQSMessage -QueueUrl $CloudTrailQueue -ReceiptHandle $SQSMessage.ReceiptHandle –Force
Here is the full script.
Set-AWSCredentials LAB

$CloudTrailQueue = 'https://sqs.us-east-1.amazonaws.com/999999999999/CloudTrail'

$InterestingEvents = @{
    'RunInstances'            = 'https://sqs.us-east-1.amazonaws.com/999999999999/Instances';
    'ModifyInstanceAttribute' = 'https://sqs.us-east-1.amazonaws.com/999999999999/Instances';
    'TerminateInstances'      = 'https://sqs.us-east-1.amazonaws.com/999999999999/Instances';
    'CreateUser'              = 'https://sqs.us-east-1.amazonaws.com/999999999999/Users';
    'DeleteUser'              = 'https://sqs.us-east-1.amazonaws.com/999999999999/Users';
}

#First, let's get a batch of up to 10 messages from the queue
$SQSMessages = Receive-SQSMessage $CloudTrailQueue -VisibilityTimeout 60 -MaxNumberOfMessages 10
Write-Host "Found" $SQSMessages.Count "messages in the queue."


$SQSMessages | % {
    Try {

        $SQSMessage = $_

        #Second, let's unpack the SQS message to get the SNS message
        $SNSMessage = $SQSMessage.Body | ConvertFrom-Json

        #Third, we unpack the SNS message to get the original CloudTrail message
        $CloudTrailMessage = $SNSMessage.Message | ConvertFrom-Json

        #Fourth, we download the cloud trail log file from S3 and save it to the temp folder
        $Null = Read-S3Object -BucketName $CloudTrailMessage.s3Bucket -Key $CloudTrailMessage.s3ObjectKey[0] -File "$env:TEMP\CloudTrail.json.gz"

        #Fifth, we uncompress the CloudTrail JSON file.  I'm using winzip here.
        Start-Process -Wait -FilePath 'C:\Program Files\WinZip\winzip32.exe' '-min -e -o CloudTrail.json.gz' -WorkingDirectory $env:TEMP

        #Read the JSON file from disk
        $CloudTrailFile = Get-Content "$env:TEMP\\CloudTrail.json" -Raw |  ConvertFrom-Json

        #Loop over all the records in the log file
        $CloudTrailFile.Records | % {

            $CloudTrailRecord = $_
            
            #Check each event against our hash table of interesting events 
            $QueueUrl = $InterestingEvents[$CloudTrailRecord.eventName]
            If($QueueUrl -ne $null){
                Write-Host "Found event " $CloudTrailRecord.eventName
        
                #If this event is interesting, write to the corresponding queue
                $Response = Send-SQSMessage -QueueUrl $QueueUrl -MessageBody ($CloudTrailRecord | ConvertTo-Json)
            }
        
        }

        #Finally, remove the message from the queue so we don't process it again
        Remove-SQSMessage -QueueUrl $CloudTrailQueue -ReceiptHandle $SQSMessage.ReceiptHandle -Force
     }
     Catch
     {
        #Log errors to the console
        Write-Host "Oh No!" $_
     }
     Finally
     {
        #Clean up the temp folder
        If(Test-Path "$env:TEMP\CloudTrail.json.gz") {Remove-Item "$env:TEMP\CloudTrail.json.gz"}
        If(Test-Path "$env:TEMP\CloudTrail.json") {Remove-Item "$env:TEMP\CloudTrail.json"}
     }
}

Tuesday, January 14, 2014

I've Been Published

My book, "Pro PowerShell for Amazon Web Services," was published today.  It's been a long road, but a great experience.


I took down anything related to AWS a few months ago to avoid a conflict of interest.  Now I can start blogging about AWS again.