Auto scaling is very simple in concept—when your servers start to become overloaded with traffic, AWS’s auto-scaling systems will spin up new servers to help meet demands. This can help you both cut costs and scale quickly.
Auto Scaling Saves You Money
Auto scaling allows you to scale up to meet traffic needs, but also fixes an issue with traditional server hosting; you must build your servers around peak load, but that server may remain mostly idle during non-peak hours. You’ll still be paying that server’s hourly price, however, even if you aren’t using it. This is bad for your wallet, and also bad for AWS, as they could be selling that extra capacity to someone else.
Say your application requires 16 vCPU worth of power during peak load. You could accomplish this with a c5.4xlarge instance, which costs around $500 per month. You can get it for around $200 effectively per month if you buy reserved instances upfront with 3-year contracts, but you’ll still be paying full price for an instance designed around your peak capacity. And if your needs change within your contract period, you’ll be stuck with that instance until the contract is up.
But if your application load changes throughout the day, auto scaling can help optimize costs. You could instead use multiple c5.xlarge instances with 4 vCPUs, and spin up new ones when you need to meet demands. With EC2 Spot Instances, you can also have your auto-caling group purchase spare compute capacity at huge discounts.
Building Your Infrastructure Around Automation
To make auto scaling work, you must automate your server’s whole lifecycle. The process of creating a server, installing all the dependencies your app needs to run, installing your code, running your code at startup—everything must be handled properly for auto scaling to make sense.
There are two easy ways to do this, and both have different use cases.
This method is very useful if you’re simply reaching the limits of a single server and want to scale up, or if you simply want to cut costs by scaling your servers throughout the day. The main issue is that version management is a pain; you’ll have to create a new AMI everytime you want to make changes, or automate some way of pulling updated code and configuration from a tool like git.
The second method is to use containers. Containers are a Unix concept that allows applications to be bundled up and ran in an isolated virtualized environment, while still maintaining the speed benefits or running on bare metal. You can think of it like having all the stuff your application needs to run on a CD; you could burn multiple copies of that CD and run them on multiple servers.
Everytime you need to make an update, you simply update the CD and redistribute the updated version. With the way Docker works, this makes version management quite simple. But, moving an existing application to Docker may require more initial setup than you’re comfortable doing, as it requires a significant shift in how you develop and operate your systems.
How to Get Started
You’ll need a few things to get started. First is the custom AMI. They’re relatively simple to create; from the EC2 Management Console, right-click your current server and select Image > Create Image. This will open a dialog that will make a snapshot of your server and create an AMI from that snapshot; give it a name and description and select “Create Image.”
Once the AMI is created (it may take a few minutes), scroll down to the bottom of the EC2 sidebar and select “Launch Configuration” under the “Auto Scaling” tab. Create a new launch configuration and select your custom AMI as the base.
You should choose the instance type you want to use as your increment. For example, if you’d like to scale up in 2 vCPU increments, choose a 2 vCPU instance. You’ll be doing more scaling, but your costs may be better optimized.
Next, you’ll configure the launch details. You’ll want to make sure to request spot instances, especially if you’re planning on scaling up during the day and scaling down at night. Spot instances can run for up to 6 hours. You’ll have to specify a max price; you can set this to the hourly cost of the On-Demand version of the instance, and it will always run.
You can also specify a setup script here, under the advanced settings. You can paste this in as text or as a file to run.
Next, you’ll add storage, select a security group, and select a key pair, as you usually would when creating an EC2 instance (though this is simply a template).
At the end, choose to create an auto-scaling group with the newly created launch configuration. Set a name for the group, initial size, and select your subnet.
Next, you’ll configure your scaling policies. You’ll want to choose a range to scale between, and a metric to use to scale the instances, such as average CPU utilization or average network traffic. You can also set up CloudWatch alarms to scale instances based on other metrics.
You’ll also need to specify the time in seconds that instances need to warm up; if you’re using AMIs, this time will be much lower, but you’ll still need to do testing to figure out how long it takes.
Next, you can configure notifications and tags, and review your configuration before launch. Note that creating this auto-scaling group will provision servers for you, so be prepared to pay for them.
From the “Auto Scaling Groups” tab in the EC2 Console, you can view the activity of your group, such as the current running instances or launch failures. Your group should now scale up and down, depending on load. You’ll want to keep a close eye on its behavior for the first few days, to make sure everything is in order.
When you need to update your servers, you’ll have to create a new launch configuration with a new AMI, and select the new configuration as the config for your auto-scaling group.