CRE Tooling

By Mariya Breyter and Carlos Rojas
Feb 25, 2026

📄 Contents

␡

Distributing Load and Volume with Auto-Scaling and Load Balancing
Enabling Automatic Failovers for High Availability
Facilitating Controlled Deployments with Rollback Strategies
Providing Chaos Engineering Capabilities for Resilience Testing
Assisting in Incident Response with Automation
Ensuring Proper Configuration Management
Leveraging Immutable Infrastructure as a Service
Practicing Disaster Recovery Frequently
Case Study
Summary
Q&A

⎙ Print

Page 1 of 11 Next >

This chapter is from the book 

Reliability Engineering in the Cloud: Strategies and Practices for AI-Powered Cloud-Based Systems

Learn More Buy

Tools That Support Automatic Failovers, Automatic Rollbacks, Automatic Deployments, Chaos Engineering, Incident Response, Configuration Management, Immutable Infrastructure, and Disaster Recovery

Proper tooling is essential in cloud reliability engineering (CRE) to maintain the reliability, availability, and performance of cloud-based systems. Automation helps streamline recovery operations, reduce manual intervention for testing scenarios, and ensure that teams can proactively respond to issues. Cloud providers offer a large number of automation tools that embrace these principles and techniques. In this chapter, we will review some of the tools and discuss why they are important in CRE.

Distributing Load and Volume with Auto-Scaling and Load Balancing

Reliability engineering focuses on measuring how resilient, stable, and scalable your systems are. This requires distributing and balancing loads to ensure an “always on” posture for your most critical systems. Amazon Web Services’ (AWS) Well-Architected Tool is an example of a tool that allows you to conduct reviews of your applications according to AWS architectural best practices. It provides a structured framework for assessing your architecture, identifying areas in need of improvement, and making informed decisions to optimize your AWS workloads. Let’s take a look at how application teams can use this tool to configure and test the resilience, stability, and scalability of their systems.

Auto-Scaling

Cloud auto-scaling is a cloud computing feature that automatically adjusts the number of compute resources (e.g., virtual machines [VMs]) allocated to an application based on changing demand. The primary goal of auto-scaling is to ensure that applications can handle varying levels of traffic and workloads efficiently and without manual intervention. This is where the concept of elasticity becomes a major player in your cloud implementations.

AWS Auto Scaling (see Figure 7.1) automatically adjusts the number of instances in response to changes in demand, ensuring that applications are neither overprovisioned nor underprovisioned. An example of auto-scaling is the dynamic resource allocation that occurs when AWS Auto Scaling monitors the performance and resource utilization of your application. When certain predefined conditions are met, such as increased traffic or CPU utilization, AWS Auto Scaling automatically provisions additional resources based on scaling policies, such as CPU usage metrics, network traffic, or custom application-specific metrics. When demand decreases, it can scale down resources to avoid overprovisioning and reduce costs.

Auto-scaling also provides elasticity to your applications, allowing your systems to seamlessly handle traffic spikes and other fluctuations in demand without manual intervention. This elasticity contributes to high availability and improved performance.

Finally, by automatically scaling resources up and down, auto-scaling helps optimize cloud costs so that you only pay for the resources you use, which can lead to cost savings during periods of lower demand. AWS provides multiple resources for cost optimization, including AWS Cost Optimizer, AWS Cost Explorer, and AWS Cost Estimator.

GCP provides the following auto-scaling tools.

Google Kubernetes Engine (GKE) Autoscaler: GKE Autoscaler automatically adjusts the number of nodes in a given cluster based on the demands of your workloads. It can scale nodes based on various metrics, including CPU utilization and custom metrics.
Google Compute Engine Autoscaler: This tool adjusts the number of instances in a managed instance group based on the current load. It can scale based on CPU utilization, HTTP load balancing, server capacity, and custom metrics.
Google App Engine Autoscaling: Google App Engine provides automatic scaling based on request rates, response latencies, and other application metrics. It ensures that your application always has enough instances to handle incoming traffic.

Microsoft Azure offers the following options:

Azure Autoscale: Azure Autoscale enables you to automatically adjust the number of compute resources based on demand. It supports scaling based on metrics such as CPU usage, queue length, and schedule-based scaling.
Azure Virtual Machine Scale Sets: Azure Virtual Machine Scale Sets allows you to create and manage a group of identical load-balanced VMs. You can define auto-scale rules based on CPU usage or other metrics to automatically adjust the number of VM instances.
Azure App Service Autoscale: Azure App Service Autoscale offers auto-scaling capabilities for web apps, API apps, and mobile apps. It can scale instances horizontally based on metrics such as CPU usage, memory usage, and HTTP queue length.

Figure 7.2 illustrates how auto-scaling can be configured to create the most efficient and reliable posture for your applications. The figure depicts how the number of VMs will remain at two when the application experiences minimum volumes; as new workloads and users connect, the infrastructure will be elastic to support the additional load and grow from two to a maximum of five VMs. In this scenario, the application has found the need to increase to three VMs based on a condition, whether it be CPU usage, memory usage, or HTTP queues.

Load Balancing

Load balancing is a technique used to distribute incoming network traffic or requests across multiple servers or computing resources. The primary purpose of load balancing is to ensure that no single server or resource is overwhelmed by traffic, thereby improving the availability, fault tolerance, and performance of applications. AWS Elastic Load Balancing (ELB) distributes incoming application traffic across multiple targets, increasing availability and fault tolerance.

Load balancing includes traffic distribution so that load balancers evenly distribute incoming requests or network traffic across a pool of resources, ensuring efficient resource utilization. Also, load balancers monitor the health and status of services, and reroute traffic (if a server becomes nonresponsive) to ensure high availability by distributing traffic across multiple regions, thereby improving application resilience and performance.

All major cloud providers offer load-balancing tools. Following is a sample of those available.

Google Cloud Load Balancing: Google Cloud Load Balancing distributes incoming traffic across multiple instances or backend services to ensure high availability and reliability of your applications. It offers several types of load balancers:
- HTTP(S) load balancing: For distributing HTTP and HTTPS traffic across multiple backend instances or services
- TCP proxy load balancing: For distributing TCP traffic to backend instances
- SSL proxy load balancing: For distributing SSL/TLS traffic to backend instances
- Internal TCP/UDP load balancing: For distributing internal TCP and UDP traffic within your virtual private cloud (VPC) network
Azure Load Balancer: Azure Load Balancer distributes incoming network traffic across multiple VM instances in a backend pool. It supports both inbound and outbound scenarios and can be configured for various protocols including TCP, UDP, and HTTP/S.
Azure Application Gateway: Azure Application Gateway is a layer 7 load balancer that provides application-level routing and load-balancing services. It offers features such as SSL termination, cookie-based session affinity, URL-based routing, and web application firewall (WAF) capabilities.
Azure Traffic Manager: Azure Traffic Manager is a domain name system (DNS)–based traffic load balancer that distributes incoming traffic across multiple endpoints located in different Azure regions or globally. It provides various load-balancing methods including priority, weighted, performance, and geographic routing.

Table 7.1 outlines Azure’s load-balancing methods and features.

Table 7.1 Azure’s Load-Balancing Methods and Features

	Azure Traffic Manager	Azure Application Gateway	Azure Front Door	Azure Load Balancer
OSI layer	7	7	7	4
Health probes	HTTP/HTTPS/TCP	HTTP/HTTPS	HTTP/HTTPS	TCP/HTTP
SKUs	—	Basic/standard	—	Basic/standard
Load balancing	Global	Regional	Global	Global
Works at:	VMs	Any IP address	DNS CNAME	—
TCP and UDP	DNS	HTTP/HTTPS/HTTP2/WS	HTTP/HTTPS/HTTP2	TCP and UDP
Sticky sessions	Supported	Supported	Supported	Supported
Traffic control	—	Network Security Group	—	Network Security Group
WAF	—	WAF	WAF	—

All of these load-balancing tools and features help distribute incoming traffic across multiple backend instances or services to ensure high availability, scalability, and performance of your applications. With sticky sessions, a load balancer assigns an identifying attribute to a user by issuing a cookie or by tracking the user’s IP details. Then, according to the tracking ID, the load balancer can start routing all the user’s requests to a specific server for the duration of the session. This creates a seamless and stable experience for users, as they will get latency responses similar to those they would get if they were receiving service from hosts and apps within the same load balancer perimeter.

Cloud auto-scaling and load balancing are fundamental techniques for ensuring that your applications can efficiently handle varying workloads, maintain high availability, and optimize resource utilization in the cloud. Auto-scaling adapts to changing demand by adjusting the number of resources, while load balancing evenly distributes traffic to prevent overload and improve fault tolerance. Together, these technologies help create robust and responsive cloud-based applications.

Page 1 of 11 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address