Reliability implementation resources

15min

Introduction

The reliability pillar recommends strategies that help prevent disruptions from a single point of failure; ensuring the high availability and business continuity of your mission-critical applications and systems.

To implement our reliability recommendations, select a best practice and resource type below.

Reference architectures

HashiCorp's reliability best practices expect that Vault and Consul are deployed in one of the following recommended configurations.

Vault infrastructure recommendations

The Vault Reference Architecture recommends best practices for infrastructure architects and operators to follow when deploying Vault using the Consul storage backend in a production environment.

In this tutorial, you will architect your Vault clusters according to HashiCorp recommended patterns and practices for replicating data.

This guide describes recommended best practices for infrastructure architects and operators to follow when deploying Vault using the Integrated Storage (Raft) storage backend in a production environment.

Consul infrastructure recommendations

This guide describes recommended best practices for infrastructure architects and operators to follow when deploying Consul in a production environment.

Tolerate failure

This best practice highlights principles for you to consider regarding the design, implementation, and operation of your business systems for you to best achieve your reliability goals.

Tutorials

Configure fault resiliency for your Consul Enterprise datacenter using redundancy zones. Redundancy zones make it possible to run one voter and any number of non-voters in each defined zone.

Consul's audit logging provides the ability to capture records of all Consul events. With audit logs, the audit team can inspect event data to learn which credentials have been used, what actions have taken place, and the timestamps associated with all of these transactions.

To protect your Vault deployment against catastrophic failure of an entire cluster. Vault Enterprise supports multi-datacenter deployment where you can replicate data across datacenters to increase performance as well as disaster recovery.

With Vault Enterprise standby nodes can handle most read-only requests and are referred to as performance standby nodes. Performance standby nodes are designed to provide horizontal scalability of read requests within a single Vault cluster.

Vault Enterprise performance replication provides consistency, scalability, and highly-available disaster recovery. In this tutorial, activate and manage performance replication nodes.

Reference documentation

Consul Enterprise read replicas provide the ability to scale clustered Consul servers. Read replicas still receive data from the cluster replication, however, they do not take part in quorum election operations. Expanding your Consul cluster in this way can scale reads without impacting write latency.

Consul Enterprise offers a network area mechanism that allows operators to federate Consul datacenters together on a pairwise basis, enabling partially-connected network topologies.

Consul agent collects various runtime metrics about the performance of different libraries and subsystems.

Automated backups

Vault Enterprise performance replication provides consistency, scalability, and highly-available disaster recovery.

Terraform Enterprise supports forwarding its logs to one or more external destinations, a process called log forwarding. Log forwarding provides increased observability, assistance complying with log retention requirements, and information during troubleshooting.

Blogs

Understand the benefits of using Consul service mesh with Datadog for modern networking and how to implement tracing into your environments

Tolerate failure

Recover Terraform Enterprise