Deploy infrastructure and applications with zero-downtime

15min
|
Terraform
Consul
Nomad

Zero-downtime deployment strategies aim to reduce or eliminate downtime when you update your infrastructure or applications. These strategies involve deploying new versions incrementally rather than all at once to detect and resolve issues. Each strategy lets you test the new version in an environment with real user traffic. This helps validate the new release's performance and reliability.

This guide covers best practices for popular zero-downtime deployment methods, such as blue/green, canary, and rolling deployments. It will help you decide the deployment method best for your organization and provide the resources to implement that method.

Note

Stateful workloads like databases require additional work for blue/green, canary, and rolling deployments. Consult your database’s documentation while considering these zero-downtime strategies.

Deployment methods overview

Blue/green, canary, and rolling deployments all improve application reliability and reduce risk. While they share similar goals, each approach offers unique advantages that make it more suitable for certain types of applications or organizational needs. By choosing the most appropriate deployment method, companies can ensure smoother updates and reduce the likelihood of service disruptions.

Blue/green deployments maintain two identical production environments concurrently. This method allows you to shift traffic from the current version (blue) to the upgraded version (green).
Canary deployments introduce new versions incrementally to a subset of users. This approach lets you test upgrades with limited exposure, working alongside other deployment systems.
Rolling deployments update applications gradually across multiple servers. This technique ensures only a portion of your infrastructure changes at once, reducing the risk of widespread issues.

The difference between these strategies is how and where the application deploys. This involves the environment the application runs in, cost considerations, deployment methods, and traffic direction.

Environment setup:
- Blue/Green: Requires two nearly identical environments.
- Canary: Requires two nearly identical environments. Initially uses a small subset of users or servers.
- Rolling: Updates subsets of servers in batches.
Traffic Switching:
- Blue/Green: Switches all traffic at once.
- Canary: Gradually increases traffic to the new version.
- Rolling: Sequentially updates and transitions traffic.
Rollback Mechanism:
- Blue/Green: Switching back to the blue environment.
- Canary: Rollback involves reducing or stopping the canary deployment.
- Rolling: Rollback involves reverting batches, which can be more complex.

Since all three zero-downtime strategies offer similar benefits and aim to achieve zero-downtime deployments, the changes you plan to make will be the most important consideration when determining which deployment to implement. The changes can be either infrastructure or application.

Infrastructure changes involve setting up your environments so they are prepared to host your zero-downtime application. With blue/green deployments, you must have two identical environments. An infrastructure environment can range from creating a new green full stack (servers, networking, or databases) to creating a new cluster to run containers or adding a single green VM to an existing infrastructure stack.

However, it is important to note that running two identical infrastructure environments can increase costs. You can run blue/green environments only in production to save money. You should also have an infrastructure lifecycle strategy, such as using infrastructure-as-code to deploy your green environment only when you plan to deploy your new application version.

Application changes involve deploying and directing traffic to your new application version. You can configure your load balancer or reverse proxies to direct traffic to your green stack and perform canary testing or direct traffic in a controlled manner for rolling deployments.

Service mesh deployments use service splitters to implement zero-downtime deployments. These components, often used in service mesh architectures, allow traffic to route between different versions of an application dynamically.

External resources:

Infrastructure changes

Properly managing changes to your infrastructure, such updating network policies or upgrading your Kubernetes cluster, is important to ensure the reliability of your upgraded application and achieve zero-downtime deployments.

Blue/green deployments are good for deploying your application on a new infrastructure. Blue/green deployments require two identical application infrastructure environments, a method for deploying your application to your two environments, and a way to route your traffic between them.

The following diagram shows a basic blue/green deployment. The blue environment is the infrastructure where your current application runs. The green environment is identical except you upgraded it to host the new version of the application. This environment can be a set of servers or a new cluster running a new AMI or container.

Typical blue green deployment. The green environment is provisioned in parallel with the blue environment. When the green environment is ready, the load balancer directs traffic to the green environment.

Your blue and green environments need to be as similar as possible. Infrastructure as code (IaC) lets you describe your environment as code and consistently deploy identical environments.

IaC makes your operations more cost-effective by allowing you to easily build and remove resources when you do not need them. Using IaC also lets you spin up your green environment whenever you need it. Instead of letting your blue and green environments persist indefinitely or allocating time to build them, you can deploy your green infrastructure environment when you want to deploy your new software application. Once your green environment is stable, you can tear down your blue environment.

HashiCorp's Terraform is an infrastructure as code tool that can help you deploy and manage blue/green infrastructure environments. By using Terraform modules, you can consistently deploy identical infrastructure using the same code but different environments through variables. You can also define feature toggles in your Terraform code to create a blue and green deployment environment simultaneously. You can then test your application in your new green environment, and then when you're ready, set the toggle in your code to destroy your blue environment.

HashiCorp resources:

Read the use Application Load Balancers for blue-green and canary deployments tutorial.
Feature Toggles, Blue-Green Deployments & Canary Tests with Terraform blog by Rosemary Wang

External resources:

Blue Green Deployment blog by Martin Fowler
Continuous Blue-Green Deployments With Kubernetes blog by Tomas Fernandez

Application changes

Application changes can use blue/green, canary, rolling, or a combination of the three. Your deployment method depends on if you use virtual machines or containers, along with the criticality of your application

Load balancers and proxies

Load balancers and reverse proxies can manage your application by directing traffic between your blue and green environments. They can then direct a subset of users for canary deployments and testing and control traffic for rolling deployments.

Regardless of your cloud provider, you can use Terraform to manage the deployment and control of load balancers and proxies.

HashiCorp resources:

Read the use Application Load Balancers for blue-green and canary deployments tutorial.

External resources:

Non-containerized applications

Using a blue/green or rolling deployment is a good approach if you are deploying applications on virtual machines.

Blue/green deployments deploys your new application version to your new green environment. Once you deploy your application, you can start testing the new version in-house, and once you deem it ready, you can switch production traffic over to it.

For high-impact applications, we advise incorporating canary testing into your blue/green deployment strategy. This testing method allows you to validate your new version before fully transitioning your traffic, ensuring a stable and desired user experience. The following is an example of canary testing your green environment: After the green environment is ready, the load balancer sends a small fraction of the traffic to the green environment (in this example, 10%).

Canary test/deployment. All traffic is directed to the blue environment initially. When you perform a canary test, 10% of the traffic is directed to the green environment.

If the canary test succeeds without errors, you can incrementally direct traffic to the green environment (50/50 — split traffic) over time. In the end state, you redirect all traffic to the green environment. After verifying the new deployment, you can destroy the old blue environment. The green environment is now your current production service.

Rolling deployment. After the initial canary test, traffic to the green environment is split evenly with the blue environment (50/50). Finally, all traffic is directed to the green environment.

Containerized applications

Containers can use rolling, blue/green, and canary deployments, either through orchestration tools like Nomad and Kubernetes.

Rolling deployments are a popular strategy for deploying applications using orchestration systems. With rolling deployments, the orchestrator gradually replaces old instances with new ones. Once the new instances are available and pass health checks, the orchestrator can direct traffic to the new instances and then destroy the old instances.

Nomad supports rolling updates as a first-class feature. To enable rolling updates, you can annotate a job or task group with a high-level description of the update strategy using the update block.

Kubernetes by default uses rolling updates. Kubernetes does this by incrementally replacing current pods with new ones. The new Pods are scheduled on Nodes with available resources, and Kubernetes waits for those new Pods to start before removing the old Pods.

As described in infrastructure-changes, both Nomad and Kubernetes support blue/green deployments. Before sending all your traffic to your new cluster, you can use canary testing to ensure the new cluster is working as intended.

HashiCorp resources:

Learn how to use blue/green deployments with the Nomad blue/green and canary deployments tutorial.
To learn about Nomad rolling updates, refer to the Nomad's Rolling updates tutorial.
Learn about Nomad's update block. The update strategy is used to control things like rolling upgrades and canary deployments.

External resources:

Kubernetes - Performing a rolling update

Service mesh deployments

You can use service splitters to implement zero-downtime deployments. These components, often used in service mesh architectures, allow traffic to route between different versions of an application dynamically.

You can use Consul to help make traffic splitting decisions. Consul proxy metrics gives you detailed health and performance information about your service mesh applications. This includes upstream/downstream network traffic metrics, ingress/egress request details, error rates, and additional performance information that you can use to understand your distributed applications.

With blue/green deployments, you can configure a service splitter to initially direct all traffic to your application's "blue" (current) version. When you're ready to deploy the "green" (new) version, you can gradually adjust the splitter to shift traffic from blue to green.

You can use Consul to manage traffic for zero downtime deployments using the following steps:

First, you register your service’s blue and green versions with Consul and configure health checks to monitor the availability and health of each service instance. Once the instances are healthy, you can deploy your new version to the green stack, ensure it passes Consul's health checks, and then update traffic splitting or routing rules to shift traffic from the blue to the green service gradually.

Since your green service is now receiving traffic, you should monitor the health and performance of both versions. If issues arise, you can roll back the traffic to your blue service. Once all health and performance checks pass, you can decommission the blue service to complete your blue/green deployment.

With canary deployments, you can release new software gradually, and identify and reduce the potential blast radius of a failed software release. You first route a small fraction of the service to the new version. Similar to blue/green deployments, this can be done with a service splitter. When you confirm no errors, you slowly increase traffic to the new service until you fully promote the new environment.

Amazon EKS and Azure Kubernetes Service can use Consul service mesh to observe traffic within your service mesh. This observability enables you to quickly understand how services interact with each other and effectively debug your services' traffic

HashiCorp resources:

Deploy seamless canary deployments with service splitters tutorial
Register your services to Consul tutorial
Monitor your application health with distributed checks tutorial
Observe Consul service mesh traffic tutorial
Monitor application health and performance with Consul proxy metrics tutorial
Service splitting documentation

Next steps

Blue/green, canary, and rolling deployments help you update and deploy new versions of your infrastructure and application without downtime. By using these strategies, you can ensure your application is available for your users and meet your organization’s uptime goals.

To learn more about application deployments, visit our Streamline application deployments documentation.

React to metrics and monitoring

Tolerate failure