AWS/Terraform Workshop #2: EC2 Networking, Autoscaling Groups, CloudWatch

Artem Nosulchik
Universal Language
Published in
9 min readJan 23, 2017

--

This post is part of AWS/Terraform Workshops series that we share with you along with our vision of Service Oriented Architecture (SOA). Check out introductory workshop and new posts at Smartling Engineering Blog.

Prerequisites

Preface

AWS EC2 Networking Concepts

Amazon EC2 is hosted in multiple locations world-wide composed of Regions and Availability Zones (AZs). Each region is a separate geographic area that has multiple, isolated locations known as Availability Zones. Amazon EC2 provides you the ability to place resources, such as EC2 instances and data in multiple locations. Resources aren’t replicated across regions unless you do so specifically.

By placing your resources in multiple AZs you increase probability that failures which may occur on Amazon side won’t cause complete outage of your service but just a temporary drop in its capacity and most probably slowdown in its performance. Amazon data centers provide high speed connections between AZs but you should keep in mind that usually inter-AZ communications adds about 1 ms latency in comparison to communications within the same AZ.

Amazon Virtual Private Cloud (VPC) enables you to launch AWS resources in a virtual network that by default is isolated from other networks. VPC in AWS can be configured with IP address range, routing tables, subnets, network gateways and security settings like ACLs. In Smartling’s SOA tech stack VPC comes preconfigured so developers shouldn’t worry about networking configurations like IP addressing scheme, routing etc. At the same time the possibility to make changes into AWS networking configuration is open to service owners.

Subnet is a range of IP addresses in your VPC so you can launch AWS resources into a subnet that you select. Each subnet resides within one AZ and cannot span zones. There are public and private subnets. Public subnets make it possible to make AWS resources like EC2 instances reachable from Internet. Private subnets are used for those resources which should not be exposed to Internet. Most of our SOA components reside in private subnets for security reasons.

EC2 security group acts as a virtual firewall that controls the traffic for one or more EC2 instances. When you launch an instance, you associate one or more security groups with the instance. You can add rules to each security group that allow traffic to or from its associated instances. You can modify the rules for a security group at any time — the new rules are automatically applied to all instances that are associated with the security group.

Read more:

AWS CloudWatch

Amazon CloudWatch monitors your AWS resources and the applications. You can use CloudWatch to collect and track metrics, which are the variables you want to measure for your resources and applications like CPU load, network traffic IO etc. CloudWatch alarms send notifications or automatically make changes to the resources you are monitoring based on rules that you define, for example trigger Autoscaling Group to scale.

A metric is the fundamental concept in CloudWatch. It represents a time-ordered set of data points that are published to CloudWatch. These data points can be either your custom metrics or metrics from services in AWS. You can retrieve statistics about those data points as an ordered set of time-series data. Think of a metric as a variable to monitor, and the data points represent the values of that variable over time. For example, the CPU usage of a particular Amazon EC2 instance is one metric, and the latency of an Elastic Load Balancing load balancer is another.

CloudWatch namespaces are containers for metrics. Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics. Example namespaces AWS/EC2 for EC2 instances metrics and AWS/ELB for Elastic Load Balancers metrics.

A dimension is a name/value pair that helps you to uniquely identify a metric. Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics. Dimensions help you design a structure for your statistics plan. Because dimensions are part of the unique identifier for a metric, whenever you add a unique name/value pair to one of your metrics, you are creating a new metric. Examples:

  • AutoScalingGroupName (this dimension filters the data you request for all instances in a specified capacity group e.g. total CPU load within ASG)
  • InstanceId (this dimension filters the data you request for the identified instance only)

Cloudwatch alarm watches a single metric over a time period you specify, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods. CloudWatch alarms will not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods. After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action that you have associated with the alarm. For Auto Scaling policy notifications, the alarm continues to invoke the action for every period that the alarm remains in the new state.

An alarm has three possible states:

  • OK: The metric is within the defined threshold
  • ALARM: The metric is outside of the defined threshold
  • INSUFFICIENT_DATA: The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state

Read more:

AWS Autoscaling Groups

Auto Scaling helps you ensure that you have the correct number of EC2 instances available to handle the load for your application. You can create collections of EC2 instances called Auto Scaling Groups (ASG). In each ASG you should specify minimum number of instances to ensure that your ASG never goes below this size. The same with maximum number of instances. If you specify desired capacity, either when you create the group or at any time thereafter, ASG ensures that your group has this many instances.

ASG launch configuration is a template that an ASG uses to launch EC2 instances. When you create a launch configuration, you specify information for the instances such as the ID of the Amazon Machine Image (AMI), the instance type, security groups etc. Changes into launch configuration doesn’t trigger ASG to recreate existing instances with new template so you should rotate instances by your own.

Auto Scaling provides several ways for you to scale your ASG:

  • Maintain current instance levels at all times (to maintain the current instance levels, Auto Scaling performs a periodic health check on running instances within ASG).
  • Manual scaling. You only need to specify the change in the maximum, minimum, or desired capacity of your Auto Scaling group.
  • Scale based on a schedule. Scaling by schedule means that scaling actions are performed automatically as a function of time and date.
  • Scale based on demand. A more advanced way to scale your resources, scaling by policy, lets you define parameters that control the Auto Scaling process. For example, you can create a policy that calls for enlarging your fleet of EC2 instances whenever the average CPU utilization rate stays above ninety percent for fifteen minutes.

Ondemand based scaling works in conjunction with AWS CloudWatch that collects metrics and trigger scaling process.

The Auto Scaling cooldown period is a configurable setting for your Auto Scaling group that helps to ensure that Auto Scaling doesn’t launch or terminate additional instances before the previous scaling activity takes effect. After the Auto Scaling group dynamically scales using a simple scaling policy, Auto Scaling waits for the cooldown period to complete before resuming scaling activities.

Auto Scaling enables you to put an instance that is in the InService state into the Standby state, update or troubleshoot the instance, and then return the instance to service. Instances that are on standby are still part of the Auto Scaling group, but they do not actively handle application traffic.

Auto Scaling enables you to suspend and then resume one or more of the Auto Scaling processes in your Auto Scaling group. This can be very useful when you want to investigate a configuration problem or other issue with your web application and then make changes to your application, without triggering the Auto Scaling process.

Read more:

Hands On

1. Login to AWS console, switch to proper AWS account (if you are using multiple accounts) and go to VPC Management Console.  a. Go to VPC section, select Your VPCs section and write down VPC ID.  b. Go to Subnets section, choose private subnet and write down its ID as well as Availability Zone that it's bound to.Note: In Smartling we're marking subnets with AWS tags so that it's easy to identify private and public subnets without digging into routing tables they are associated with.3. Specify collected data in terraform configuration:  a. Go to w2 directory in cloned Smartling/aws-terraform-workshops git repository.  b. Edit file terraform.tfvars: specify VPC ID, Subnet ID and AZ, for example:$ cat terraform.tfvars
vpc_id = "vpc-1234567"
subnet_id = "subnet-1234567"
availability_zone_id = "us-east-1c"
4. Follow terraform documentation for ASG and comments in autoscaling.tf file to complete Auto Scaling Group and Launch Configuration. a. Add missing names for terraform resources. b. Configure launch configuration to create one t2.micro instance in security group that is created in ec2.tf file. c. Set min_size = 1 and max_size = 3 in ASG, cooldown 60 seconds. d. Make sure user-data for EC2 instances in ASG contains your public SSH key. e. Apply terraform configuration:$ terraform plan
$ terraform apply
Note #1: always run terraform plan before apply and examine what is actually terraform is going to change/create/delete in AWS.Note #2: you need to configure terraform with your AWS credentials here. There're multiple ways to do it and you can find one in our AWS/Terraform Workshop #1. f. Check newly created ASG in AWS EC2 Management Console. You should see Auto Scaling group, Launch Configuration and EC2 instance created by ASG.5. Uncomment code in terraform configuration files to create CloudWatch (CW) alarm to trigger ASG scaling up policy if total CPU load in ASG is more than 40%. a. Enable EC2 detailed monitoring for EC2 instances in ASG so that CloudWatch will collect metrics every 1 minute (Hint: see docs for launch config terraform resource). b. Configure CW alarm to add +1 instance if CPU load in ASG is more than 40%, cooldown = 60. c. Use CW alarm ARN to reference it in template. d. Apply terraform configuration. e. Check CW alarm in AWS web console.6. Enable scaling protection for EC2 instance. a. Find your autoscaling group in AWS EC2 Management Console and go to "Instances" tab. b. Select and instance, click on "Actions->Instance protection->Set scale in protection".7. Generate CPU load to trigger CloudWatch alarm and ASG scaling up process. a. Login to EC2 instance via SSH and run the following commands:$ dd if=/dev/urandom bs=1M count=200096 | gzip -9 |gzip -9 | gzip -9 >/dev/null b. Review ASG events in AWS web console, see +2 instances in 2 minutes. You can see it in Activity History tab for ASG.8. Add scale down policy to remove 1x Ec2 instance in case CPU load in ASG is less than 35%. a. Create new CW alarm that will trigger scale down ASG policy. b. Apply terraform configuration.9. Watch scaling activity for Auto Scaling Group.10. Destroy AWS resources. a. Disable protection for instance in AWS web console. b. Run destroy command:$ terraform destroyNote: It will take slightly more time to terminate all resources in AWS than in previous workshop.

Introductory story:

Series of workshops:

--

--