Best practices to automate infrastructure
When you establish operational excellence, you enable your team to focus on development by creating safe, consistent, and reliable workflows for deployment. Standardized processes allow teams to work efficiently and more easily adapt to changes in technology or business requirements.
Manual provisioning infrastructure is risky, inefficient, and difficult to scale. Operator error is inevitable, and while you can create audit logs of user actions, it can be hard to diagnose failures. As your organization grows, there will be a higher volume of changes to monitor and deploy, and manual processes will slow your development velocity. By standardizing on best practices and automating repeated workflows, you can more safely and efficiently introduce changes to your infrastructure.
This guide will provide you with best practices and resources for achieving operational excellence. You will learn why you should incorporate infrastructure as code, version control, reusable components, atomic infrastructure, standard workflows, and planning for scale to accelerate your development process.
Use infrastructure as code
Infrastructure as code (IaC) tools let you codify your resource definitions, making it easier to understand your resource configurations and infrastructure topology. Codifying your resources also enables collaboration since your team can more easily review changes made in code than manual updates. When you define your infrastructure as code, you can use the same engineering practices for your infrastructure as for application development, such as code review, automated deployment, and phased rollout that allows you to test your configuration across environments.
Infrastructure as code provides the following operational excellence benefits:
- Infrastructure that is written as code can follow your organization's established development best practices.
- Version control systems such as GitHub, GitLab, or Bitbucket, let you version your infrastructure code. This allows you to audit infrastructure changes, and roll back changes as needed.
- Infrastructure as code enables team and cross-team collaboration. By storing your code in a code repository, you can share the code with other developers. These developers can then contribute to the infrastructure code, provide feedback, and catch issues such as security or other policy violations.
- Infrastructure as code allows you to test the code similar to how you test application code.
- Deploying infrastructure as code lets you automate infrastructure deployment through scripts and CI/CD systems.
- Infrastructure as code increases deployment consistency and repeatability. You will know what infrastructure your code will deploy. Code should be able to be deployed multiple times, with almost identical outcomes.
Terraform is a cloud agnostic infrastructure as code tool. It lets you define resources and infrastructure in human-readable, declarative configuration files, and manages your infrastructure's lifecycle. Using Terraform has several advantages:
- Terraform can manage infrastructure on multiple cloud platforms.
- Terraform is declarative, so running code with the same inputs will result in the same infrastructure output.
- The human-readable configuration language helps you write infrastructure code quickly.
- Terraform's state allows you to track resource changes throughout your deployments.
HashiCorp resources:
- What is Infrastructure as Code with Terraform?
- Infrastructure as Code: What is it? Why is it important?
- Retrieving CI/CD secrets from Vault
External resources:
Use version control
Version control systems add predictability and visibility to your infrastructure management process by creating a single source of truth for your infrastructure configuration. Storing your configuration in version control also allows you to revert infrastructure to previous commits, tags, or releases.
Version control also helps facilitate collaboration between team members by allowing them to test out specific code versions locally or remotely. They will also be able to conduct code reviews by leaving comments or suggestions for the version they are testing.
HCP Terraform streamlines your development process by integrating directly with your version control system and CI/CD pipelines. These integrations provide previews of your infrastructure changes before you apply them. This lets your team review and approve changes before you apply them.
HashiCorp provides GitHub Actions that integrate with the HCP Terraform API. These actions let you create your own custom CI/CD workflows to meet your organization's needs.
HashiCorp resources:
- Learn how to use HCP Terraform to automate Terraform with GitHub Actions.
- Why should I use version control for my infrastructure?
- Terraform code style guide
- Write Terraform Tests
External resources:
Identify reusable components
If you repeatedly provision the same group of resources, you can refactor your configuration to use reusable collections instead. Reusable collections of components reduce the time it takes to deploy infrastructure by allowing developers to reuse configuration instead of writing it from scratch. Developers can also design reusable collections to comply with organizational best practices and security guidelines.
Defining reusable collections of components, such as Terraform modules, also reduces your time to provision, giving engineers a configurable way to deploy commonly used resources. You can adapt these components to account for changes to service demand, modify them based on failure modes, and release updates to allow downstream users to deploy your updated configuration.
A Terraform module is a set of Terraform configuration files in a single directory. Modules are reusable and customizable; you can wrap modules with configurations to fit your organization's standards. Creating a more modular infrastructure encourages your organization to decouple services by helping you focus on logically related resources. Decoupling can reduce the scope of failure and enable more efficient deployment due to reduced system dependencies.
For example, if your team manages object storage for multiple applications that all follow your organization's common standards, such as security or lifecycle management, you can use an object storage Terraform module for your cloud providers. The object storage module can contain configuration such as lifecycle policies, or security standards. You can store and version the module in a version control repository or Terraform registry and share it across your organization for developers to access.
Machine images can also benefit from following a reusable component workflow, commonly called a golden image. A golden image is an image on top of which developers can build applications, letting them focus on the application itself instead of system dependencies and patches. A typical golden image includes a common system, logging and monitoring tools, recent security patches, and application dependencies.
You can create a golden image with Packer and make it available for your developers and operations teams. These teams can then use Packer to ingest the golden image and install their applications and other dependencies before deploying the image with tools like Terraform. You can integrate this process with a CI/CD system to create a complete application deployment workflow for deploying to your cloud infrastructure. You can also use this workflow for containers.
If you have common infrastructure that your developers use to deploy their applications, you can use HCP Waypoint to accelerate deployments. Waypoint templates allow platform engineers to pre-define infrastructure in a Terraform no-code module. You can create a template with common infrastructure that complies with your organization's security, finance, scaling, and other policies.
For example, a template can consist of a code repository template configured with your organization's default frontend framework, linting libraries, and CI/CD pipelines. A developer could use the Waypoint template to deploy their application and know that the underlying infrastructure is configured correctly. Other template examples include a production-ready Kubernetes cluster or backend API framework configured for serverless.
HashiCorp resources:
- What is a Terraform module
- Terraform module registry
- Learn how to create Terraform modules
- Terraform create and use no-code modules
- Learn how to reuse configuration with modules
- Build a golden image pipeline with HCP Packer
Deploy atomic infrastructure components
Cloud infrastructure can be large and complex. You should deploy small updates to your infrastructure on a frequent cadence. Small and frequent deployments lower the risk of bad deployments and increase the ability to roll back changes.
You can deploy infrastructure more frequently when you couple them with CI/CD pipelines. These pipelines increase the speed and reliability of your deployments, allowing you to ship updates to your services faster.
Understanding the changes Terraform will apply to your infrastructure before you execute them is important. Terraform lets you preview changes with the plan
command so you can understand the effects of your modifications on your infrastructure prior to deploying them. Many popular CI/CD products integrate with Terraform, allowing you to manage your infrastructure effectively.
HashiCorp resources:
External resources:
Standardize and automate workflows
One of the key HashiCorp principles is to design for workflows, not underlying technologies. Focusing on workflows gives you the flexibility to introduce new tools more easily to your organization as necessary. When establishing a culture of automation, you should also ensure that you regularly reflect on your operations procedures as your team evolves. You can more easily modify and adjust standardized automation procedures than inconsistent manual processes. This also allows you to review any operational failures and update your workflows accordingly.
Terraform allows you to standardize your cloud infrastructure workflow to manage resources across cloud providers, so you do not need to learn provider-specific workflows. Standard cloud infrastructure workflows let your team work more efficiently and enable you to choose the best service for the job rather than tying you to any one platform. HCP Terraform has run triggers that can combine multiple workflows. When one workflow completes, such as creating a Kubernetes cluster, a second workflow can start automatically to create a Vault instance for the Kubernetes cluster to use.
As highlighted in the reusable components section, you can use Packer to create a standard and automated workflow for multi-cloud deployments. Packer allows you to configure an image from a single configuration file called a template, which will create multiple images that can be ingested by cloud providers such as AWS, Azure, and GCP.
HashiCorp resources:
- Terraform multi-cloud provisioning
- Build a golden image pipeline with HCP Packer
- HCP Terraform Run triggers
- Implement HashiCorp's Tao and principles
- Get started with Terraform on AWS, Azure, GCP, and OCI
Plan for scale
You should plan for variations in capacity and traffic by automating scaling events. By using monitoring and alerting to track your infrastructure and service resource usage, you can proactively and dynamically respond to varying demands for your services, ensuring more reliability and resilience.
Most major cloud providers have native auto-scaling features. You can use Terraform to manage the autoscaling configurations through the auto-scaling resources, such as the aws_autoscaling_group
resource.
Monitoring cost is an important factor when planning for scale. HashiCorp's Sentinel is a policy-as-code framework that allows you to introduce logic-based policy decisions to your systems. Codifying your policies offers the same benefits as IaC, allowing for collaborative development, visibility, and predictability in your operations. You can use Sentinel to help manage your infrastructure spending.
HashiCorp resources:
- Manage infrastructure and service monitoring
- Manage cloud native resources monitoring with Terraform
- Monitor infrastructure cost with Sentinel
- Learn about HashiCorp Sentinel
External resources:
- AWS Auto scaling
- Azure Auto scaling
- GCP Auto scaling instances and load balancing
Next steps
In this guide, you learned about infrastructure automation best practices and principles that you can incorporate into your development culture. To learn more about how to achieve operational excellence, refer to the following resources.