Load Balancers

Every production system starts with one server. It works — until traffic grows, or the server crashes, or you need to deploy without downtime. A single server handling everything is a single point of failure: one machine going down takes the entire application down with it. The solution is multiple servers, and the component that coordinates them is the load balancer.

Load Balancer — Round Robin & Health Checks

Client

request

Load Balancer

round robin

S2 offline ✗

Server 1

Server 2

Server 3

What a Load Balancer Does

A load balancer sits in front of your servers. Every incoming request hits the load balancer first. The load balancer decides which server gets that request and forwards it along. The server processes it and responds — either directly to the client or back through the load balancer.

From the client's perspective, there's one address and one system. The load balancer makes multiple servers look like one. Behind it, you can add servers when traffic grows, remove them for maintenance, and route around failures — all transparently.

Load Balancing Algorithms

The load balancer needs a strategy for deciding where to send each request. Several algorithms are in common use:

Round Robin is the simplest. Requests cycle through servers in sequence: request 1 goes to Server A, request 2 to Server B, request 3 to Server C, request 4 back to Server A. Every server gets an equal share. Works well when all servers are equivalent and requests are roughly the same cost.

Least Connections sends each new request to whichever server currently has the fewest active connections. Better than round robin when requests vary significantly in how long they take — a long-running request won't keep a server occupied while others sit idle.

IP Hashing routes each client to the same server every time, based on a hash of their IP address. This gives you sticky sessions — useful when server-side state is tied to a specific machine. The downside: traffic isn't evenly distributed if clients have very different request volumes.

Weighted Round Robin is round robin with multipliers. If Server A has twice the capacity of Server B, it can be assigned weight 2 and receive twice the traffic. Useful for gradually rolling out new hardware or during blue/green deployments.

Layer 4 vs Layer 7

Load balancers operate at different levels of the network stack, and the level determines how smart they can be.

Layer 4 load balancing routes traffic based on IP addresses and TCP/UDP ports. It doesn't inspect the content of the request — it just sees packets and forwards them. This is fast and low-overhead, but dumb: it can't make routing decisions based on what's inside the request.

Layer 7 load balancing operates at the application layer and understands HTTP. It can inspect the full request — the URL, headers, cookies — and make routing decisions based on content. This enables powerful patterns:

Route /api/* to your application servers and /images/* to a CDN or image-serving cluster
Route requests with a specific X-Beta-User header to a canary deployment
Terminate SSL at the load balancer and forward plain HTTP to internal servers
Route traffic based on geographic location or language preference

Layer 7 is slower than Layer 4 due to the additional processing, but the difference is typically negligible at the application level. Most production systems use Layer 7 load balancers. AWS ALB (Application Load Balancer) and Nginx operate at Layer 7.

Health Checks

A load balancer is only useful if it knows which servers are actually capable of handling requests. Health checks are how it finds out.

The load balancer periodically sends a small probe to each server — typically an HTTP request to a /health endpoint. If the server responds with a 200, it's healthy. If it fails to respond, or responds with an error, the load balancer marks it as unhealthy and stops sending traffic to it. Requests that would have gone to that server are routed to the healthy ones instead.

This happens automatically. A server can crash, run out of memory, or hang on a deadlocked query — and within seconds the load balancer detects the failure and routes around it. No manual intervention required.

When the unhealthy server recovers and starts passing health checks again, the load balancer automatically reintroduces it. This is the foundation of graceful failover in production systems.

Real-World Tools

AWS ALB (Application Load Balancer) is the most widely used managed load balancer in cloud environments. It's Layer 7, deeply integrated with AWS services, supports routing rules, WebSocket, HTTP/2, and scales automatically.

GCP Cloud Load Balancing is Google's equivalent — global by default, can route to the nearest healthy backend across regions.

Nginx is an open-source web server that doubles as an excellent Layer 7 load balancer and reverse proxy. Highly configurable, runs on your own infrastructure, and is the backbone of many self-hosted production stacks.

HAProxy is a high-performance open-source load balancer focused on reliability and raw throughput. Used by some of the highest-traffic sites on the internet. Operates at both Layer 4 and Layer 7.

The Load Balancer as a Single Point of Failure

Here's the irony: a load balancer that protects your servers from single points of failure is itself a single point of failure. If your load balancer goes down, everything behind it becomes unreachable.

The solution is redundant load balancers. You run two load balancers — an active and a standby. Both monitor each other. If the active load balancer fails, the standby takes over within seconds using a protocol like VRRP (Virtual Router Redundancy Protocol) or via DNS failover. Managed cloud load balancers (AWS ALB, GCP Cloud Load Balancing) handle this redundancy for you automatically — it's one of the main reasons to use a managed service rather than running your own.

At very high scale, you might use anycast routing — the same IP address is announced from multiple geographic locations, and traffic routes to the nearest one. If one region's load balancer fails, traffic automatically flows to the next nearest.

Key Takeaways

A load balancer sits in front of your servers and distributes incoming requests across them — making multiple servers look like one to the client
Round Robin cycles through servers evenly; Least Connections routes to the least busy server; IP Hashing creates sticky sessions; Weighted Round Robin handles unequal server capacity
Layer 4 load balancers route on IP/TCP — fast but content-blind; Layer 7 load balancers understand HTTP — slower but can route on URL, headers, and cookies
Health checks automatically detect failed servers and route around them within seconds — no manual intervention needed
The load balancer itself can be a single point of failure; the solution is redundant load balancers or using a managed cloud service that handles redundancy automatically
Common tools: AWS ALB and GCP Cloud Load Balancing for managed cloud, Nginx and HAProxy for self-hosted

What's Next

Load balancers distribute traffic across your servers. But as systems grow, not all communication is synchronous request-response. In the next lesson we'll look at message queues — how to decouple services, absorb traffic spikes, and build systems where components don't need to be online at the same time.