Auto scaling is a feature offered by many cloud providers like AWS and Google Cloud Platform, that will handle the creation and deletion of new servers in your network automatically, allowing you to scale your application to meet varying loads.
What Is Auto Scaling?
Say you have two servers behind a load balancer, both equally handling half of your traffic. If you need to handle more demand, you add another server. However, this demand is often cyclic, peaking each day with higher load, so it would be a pain to handle this manually.
Auto scaling handles it, as the name implies, automatically. You define a prebuilt template that is used to start up a copy of your servers from scratch. Whenever your network reaches a predetermined amount of load, say, 70% CPU usage, auto scaling will fire up a new instance to smooth things out. When it calms down, it’ll scale down the number or instances.
Of course, setting up this template won’t be easy, but GCP has tools to make this simpler, such as being able to use a container as a machine image.
While Auto Scaling allows you to scale up to meet any amount of demand, it can also save you money by scaling down when it’s not needed. With traditional server hosting, you need to plan for peak demand—if your server can’t handle peak traffic, you need a better server. However, this is a waste of money usually, because during off-hours when your application isn’t under peak load, you’re paying more than you need to.
Even if you’re only using one or two servers, setting up Auto Scaling can help your network handle spikes in traffic activity, and is a useful feature for any high availability network.
Setting Up a Managed Instance Group
From the GCP Management Console, select Compute Engine > Instance Groups.
Create a new instance group, and choose “New Managed Instance Group.”
You can set this group to spawn across multiple zones, which is better for high availability. Each instance group will be fixed to one region though, and this setting is permanent. You’ll need to create additional instance groups for every other region you plan on having servers in.
You’ll, of course, need an instance template set up to define what data gets put on your server, and how a new node in the Auto Scaling group gets started up. If you have one already, select it here. If not, you can read our guide on setting them up.
Below that, you’ll find the settings for Auto Scaling. The default mode auto scales up and down, but you can disable scale in and only have the network scale up. You can also set the metric that it uses to Auto Scale, which is set to CPU usage at 60% by default.
The cool-down period is basically how long a new server takes to load up—if your server takes a minute or two to get everything set up, you don’t want GCP looking at those metrics while it’s setting up, as it could report unexpectedly high CPU usage.
You can also change the minimum and maximum number of instances, to ensure performance and limit costs, respectively.
The last feature is Autohealing, which will regular perform health checks on the services running on each instance. If an instance starts acting up, it can be replaced easily. If you have a load balancer, it will route traffic away automatically but doesn’t fix the instance itself without autohealing. We recommend that you enable this feature.
Click “Create,” and the minimum number of instances will be created. You can manage them individually from the Compute Engine console, or manage the instance template to edit the settings for the whole group.