CRE Tooling
- Distributing Load and Volume with Auto-Scaling and Load Balancing
- Enabling Automatic Failovers for High Availability
- Facilitating Controlled Deployments with Rollback Strategies
- Providing Chaos Engineering Capabilities for Resilience Testing
- Assisting in Incident Response with Automation
- Ensuring Proper Configuration Management
- Leveraging Immutable Infrastructure as a Service
- Practicing Disaster Recovery Frequently
- Case Study
- Summary
- Q&A
Tools That Support Automatic Failovers, Automatic Rollbacks, Automatic Deployments, Chaos Engineering, Incident Response, Configuration Management, Immutable Infrastructure, and Disaster Recovery
Proper tooling is essential in cloud reliability engineering (CRE) to maintain the reliability, availability, and performance of cloud-based systems. Automation helps streamline recovery operations, reduce manual intervention for testing scenarios, and ensure that teams can proactively respond to issues. Cloud providers offer a large number of automation tools that embrace these principles and techniques. In this chapter, we will review some of the tools and discuss why they are important in CRE.
Distributing Load and Volume with Auto-Scaling and Load Balancing
Reliability engineering focuses on measuring how resilient, stable, and scalable your systems are. This requires distributing and balancing loads to ensure an “always on” posture for your most critical systems. Amazon Web Services’ (AWS) Well-Architected Tool is an example of a tool that allows you to conduct reviews of your applications according to AWS architectural best practices. It provides a structured framework for assessing your architecture, identifying areas in need of improvement, and making informed decisions to optimize your AWS workloads. Let’s take a look at how application teams can use this tool to configure and test the resilience, stability, and scalability of their systems.
Auto-Scaling
Cloud auto-scaling is a cloud computing feature that automatically adjusts the number of compute resources (e.g., virtual machines [VMs]) allocated to an application based on changing demand. The primary goal of auto-scaling is to ensure that applications can handle varying levels of traffic and workloads efficiently and without manual intervention. This is where the concept of elasticity becomes a major player in your cloud implementations.
AWS Auto Scaling (see Figure 7.1) automatically adjusts the number of instances in response to changes in demand, ensuring that applications are neither overprovisioned nor underprovisioned. An example of auto-scaling is the dynamic resource allocation that occurs when AWS Auto Scaling monitors the performance and resource utilization of your application. When certain predefined conditions are met, such as increased traffic or CPU utilization, AWS Auto Scaling automatically provisions additional resources based on scaling policies, such as CPU usage metrics, network traffic, or custom application-specific metrics. When demand decreases, it can scale down resources to avoid overprovisioning and reduce costs.
Figure 7.1 AWS Auto Scaling (source: https://aws.amazon.com/autoscaling/; © 2024, Amazon Web Services, Inc.)
Auto-scaling also provides elasticity to your applications, allowing your systems to seamlessly handle traffic spikes and other fluctuations in demand without manual intervention. This elasticity contributes to high availability and improved performance.
Finally, by automatically scaling resources up and down, auto-scaling helps optimize cloud costs so that you only pay for the resources you use, which can lead to cost savings during periods of lower demand. AWS provides multiple resources for cost optimization, including AWS Cost Optimizer, AWS Cost Explorer, and AWS Cost Estimator.
GCP provides the following auto-scaling tools.
Google Kubernetes Engine (GKE) Autoscaler: GKE Autoscaler automatically adjusts the number of nodes in a given cluster based on the demands of your workloads. It can scale nodes based on various metrics, including CPU utilization and custom metrics.
Google Compute Engine Autoscaler: This tool adjusts the number of instances in a managed instance group based on the current load. It can scale based on CPU utilization, HTTP load balancing, server capacity, and custom metrics.
Google App Engine Autoscaling: Google App Engine provides automatic scaling based on request rates, response latencies, and other application metrics. It ensures that your application always has enough instances to handle incoming traffic.
Microsoft Azure offers the following options:
Azure Autoscale: Azure Autoscale enables you to automatically adjust the number of compute resources based on demand. It supports scaling based on metrics such as CPU usage, queue length, and schedule-based scaling.
Azure Virtual Machine Scale Sets: Azure Virtual Machine Scale Sets allows you to create and manage a group of identical load-balanced VMs. You can define auto-scale rules based on CPU usage or other metrics to automatically adjust the number of VM instances.
Azure App Service Autoscale: Azure App Service Autoscale offers auto-scaling capabilities for web apps, API apps, and mobile apps. It can scale instances horizontally based on metrics such as CPU usage, memory usage, and HTTP queue length.
Figure 7.2 illustrates how auto-scaling can be configured to create the most efficient and reliable posture for your applications. The figure depicts how the number of VMs will remain at two when the application experiences minimum volumes; as new workloads and users connect, the infrastructure will be elastic to support the additional load and grow from two to a maximum of five VMs. In this scenario, the application has found the need to increase to three VMs based on a condition, whether it be CPU usage, memory usage, or HTTP queues.
Figure 7.2 Azure Autoscale (source: https://learn.microsoft.com/en-us/azure/azure-monitor/Autoscale/autoscale-overview/; © 2024, Microsoft)
Load Balancing
Load balancing is a technique used to distribute incoming network traffic or requests across multiple servers or computing resources. The primary purpose of load balancing is to ensure that no single server or resource is overwhelmed by traffic, thereby improving the availability, fault tolerance, and performance of applications. AWS Elastic Load Balancing (ELB) distributes incoming application traffic across multiple targets, increasing availability and fault tolerance.
Load balancing includes traffic distribution so that load balancers evenly distribute incoming requests or network traffic across a pool of resources, ensuring efficient resource utilization. Also, load balancers monitor the health and status of services, and reroute traffic (if a server becomes nonresponsive) to ensure high availability by distributing traffic across multiple regions, thereby improving application resilience and performance.
All major cloud providers offer load-balancing tools. Following is a sample of those available.
Google Cloud Load Balancing: Google Cloud Load Balancing distributes incoming traffic across multiple instances or backend services to ensure high availability and reliability of your applications. It offers several types of load balancers:
HTTP(S) load balancing: For distributing HTTP and HTTPS traffic across multiple backend instances or services
TCP proxy load balancing: For distributing TCP traffic to backend instances
SSL proxy load balancing: For distributing SSL/TLS traffic to backend instances
Internal TCP/UDP load balancing: For distributing internal TCP and UDP traffic within your virtual private cloud (VPC) network
Azure Load Balancer: Azure Load Balancer distributes incoming network traffic across multiple VM instances in a backend pool. It supports both inbound and outbound scenarios and can be configured for various protocols including TCP, UDP, and HTTP/S.
Azure Application Gateway: Azure Application Gateway is a layer 7 load balancer that provides application-level routing and load-balancing services. It offers features such as SSL termination, cookie-based session affinity, URL-based routing, and web application firewall (WAF) capabilities.
Azure Traffic Manager: Azure Traffic Manager is a domain name system (DNS)–based traffic load balancer that distributes incoming traffic across multiple endpoints located in different Azure regions or globally. It provides various load-balancing methods including priority, weighted, performance, and geographic routing.
Table 7.1 outlines Azure’s load-balancing methods and features.
Table 7.1 Azure’s Load-Balancing Methods and Features
|
Azure Traffic Manager |
Azure Application Gateway |
Azure Front Door |
Azure Load Balancer |
|---|---|---|---|---|
OSI layer |
7 |
7 |
7 |
4 |
Health probes |
HTTP/HTTPS/TCP |
HTTP/HTTPS |
HTTP/HTTPS |
TCP/HTTP |
SKUs |
— |
Basic/standard |
— |
Basic/standard |
Load balancing |
Global |
Regional |
Global |
Global |
Works at: |
VMs |
Any IP address |
DNS CNAME |
— |
TCP and UDP |
DNS |
HTTP/HTTPS/HTTP2/WS |
HTTP/HTTPS/HTTP2 |
TCP and UDP |
Sticky sessions |
Supported |
Supported |
Supported |
Supported |
Traffic control |
— |
Network Security Group |
— |
Network Security Group |
WAF |
— |
WAF |
WAF |
— |
All of these load-balancing tools and features help distribute incoming traffic across multiple backend instances or services to ensure high availability, scalability, and performance of your applications. With sticky sessions, a load balancer assigns an identifying attribute to a user by issuing a cookie or by tracking the user’s IP details. Then, according to the tracking ID, the load balancer can start routing all the user’s requests to a specific server for the duration of the session. This creates a seamless and stable experience for users, as they will get latency responses similar to those they would get if they were receiving service from hosts and apps within the same load balancer perimeter.
Cloud auto-scaling and load balancing are fundamental techniques for ensuring that your applications can efficiently handle varying workloads, maintain high availability, and optimize resource utilization in the cloud. Auto-scaling adapts to changing demand by adjusting the number of resources, while load balancing evenly distributes traffic to prevent overload and improve fault tolerance. Together, these technologies help create robust and responsive cloud-based applications.
