Message Queues

When Service A needs to talk to Service B, the simplest option is a direct HTTP call. It works — until Service B is slow, overloaded, or temporarily down. Now Service A is stuck waiting, and the failure cascades.

Message queues break that dependency. Instead of calling Service B directly, Service A drops a message into a queue and moves on. Service B picks it up when it's ready.

Message Queue — Normal Flow

Service A

Producer

Queue

empty

fail

Dead Letter Queue

empty

failed messages · retry here

Service B

Consumer

The producer (Service A) never waits for the consumer (Service B). They're decoupled in time — one can be restarted, scaled, or swapped without the other caring.

Producer and Consumer

The two actors in any queue system:

Producer — sends messages into the queue. Doesn't know (or care) who consumes them.
Consumer — reads messages from the queue and processes them at its own pace.

A queue between them acts as a buffer. The producer can keep publishing even if the consumer is temporarily unavailable. Messages wait in the queue until they're picked up.

Why This Matters

Traffic spikes. If 10,000 requests arrive in one second and your database can handle 500/s, a direct architecture collapses. With a queue, those 10,000 messages pile up and get processed steadily — no dropped requests, no cascading failures.

Resilience. If the consumer crashes, messages stay in the queue. When it restarts, it picks up exactly where it left off. Nothing is lost.

Decoupling. Producers and consumers can be deployed, scaled, and updated independently. Add more consumer instances to drain a backlog without touching the producer at all.

Async workflows. Some work doesn't need an immediate response. A user uploads a video — you don't need to transcode it before saying "upload received." Queue the transcoding job, respond immediately, process in the background.

Real Example: Video Upload

When you upload a video to YouTube:

Your browser sends the file to an upload service → 200 OK returned immediately
An "uploaded" message is enqueued with the video ID
Multiple downstream consumers pick it up in parallel:
- Transcoder converts the video to multiple resolutions
- Thumbnail generator picks a frame
- Content moderation scans for policy violations
- Metadata indexer updates search
Each step is independent — one failure doesn't block the others

Without queues, all of this would have to happen synchronously before you got a response. With queues, you get a confirmation in milliseconds and the work happens in the background.

Dead Letter Queue (DLQ)

What happens when a message can't be processed? Maybe the payload is malformed, or a downstream dependency is broken.

If you just retry forever, you create an infinite loop that blocks other messages. Instead, after a configurable number of retry attempts, failed messages are moved to a Dead Letter Queue — a separate queue for messages that couldn't be handled.

From the DLQ you can:

Inspect failed messages to understand what went wrong
Fix the underlying issue and replay the messages
Alert on-call when failure rates exceed a threshold

DLQs turn silent data loss into observable, recoverable failures.

Kafka vs RabbitMQ

Two of the most widely used systems, with different design philosophies:

	Kafka	RabbitMQ
Model	Log-based (consumers track offset)	Push-based (broker routes to consumer)
Message retention	Days/weeks (configurable)	Until acknowledged
Throughput	Very high (millions/s)	High (tens of thousands/s)
Ordering	Per-partition	Per-queue
Replay	Yes — rewind the log	No — once consumed, gone
Best for	Event streaming, audit logs, analytics	Task queues, RPC, routing

Kafka stores messages as an immutable log. Multiple consumer groups can read the same messages independently. Great for event sourcing, stream processing, and any case where replay matters.

RabbitMQ is a traditional message broker. It routes messages to queues based on rules (exchanges and bindings), pushes to consumers, and deletes messages after acknowledgement. Simpler to operate for task queue use cases.

Tools at a Glance

Tool	Notes
Apache Kafka	High-throughput distributed log. Industry standard for event streaming.
RabbitMQ	Mature broker with flexible routing. Great for task queues and RPC patterns.
AWS SQS	Managed queue with Standard (at-least-once) and FIFO (exactly-once) modes. Zero ops.
GCP Pub/Sub	Google's managed pub/sub. Scales automatically, global delivery.
Redis Streams	Lightweight streams built into Redis. Good when you already have Redis.

Tradeoffs

Complexity. You've added an infrastructure component. It needs to be deployed, monitored, and kept healthy.

Eventual consistency. The consumer processes the message after the producer moves on. If your use case requires an immediate response with the result of processing, a queue isn't the right tool.

At-least-once delivery. Most queue systems guarantee a message will be delivered at least once — meaning it may be delivered more than once if something fails mid-acknowledgement. Your consumers need to be idempotent (processing the same message twice produces the same result).

Ordering. Strict global ordering across all messages is hard to guarantee at scale. Kafka gives ordering within a partition; RabbitMQ within a single queue. If order matters globally, design for it explicitly.

Key Takeaways

Queues decouple producers and consumers in time — neither waits on the other
They absorb traffic spikes by buffering messages until consumers can keep up
Failed messages go to a Dead Letter Queue instead of being silently dropped
Kafka is built for high-throughput event streaming with replay; RabbitMQ for flexible routing and task queues
Consumers must be idempotent — at-least-once delivery is the norm

Next: CDNs — how static assets and dynamic content are served closer to the user to reduce latency globally.