Message Queues
When Service A needs to talk to Service B, the simplest option is a direct HTTP call. It works — until Service B is slow, overloaded, or temporarily down. Now Service A is stuck waiting, and the failure cascades.
Message queues break that dependency. Instead of calling Service B directly, Service A drops a message into a queue and moves on. Service B picks it up when it's ready.
Message Queue — Normal Flow
Service A
Producer
Queue
Dead Letter Queue
failed messages · retry here
Service B
Consumer
The producer (Service A) never waits for the consumer (Service B). They're decoupled in time — one can be restarted, scaled, or swapped without the other caring.
Producer and Consumer
The two actors in any queue system:
- Producer — sends messages into the queue. Doesn't know (or care) who consumes them.
- Consumer — reads messages from the queue and processes them at its own pace.
A queue between them acts as a buffer. The producer can keep publishing even if the consumer is temporarily unavailable. Messages wait in the queue until they're picked up.
Why This Matters
Traffic spikes. If 10,000 requests arrive in one second and your database can handle 500/s, a direct architecture collapses. With a queue, those 10,000 messages pile up and get processed steadily — no dropped requests, no cascading failures.
Resilience. If the consumer crashes, messages stay in the queue. When it restarts, it picks up exactly where it left off. Nothing is lost.
Decoupling. Producers and consumers can be deployed, scaled, and updated independently. Add more consumer instances to drain a backlog without touching the producer at all.
Async workflows. Some work doesn't need an immediate response. A user uploads a video — you don't need to transcode it before saying "upload received." Queue the transcoding job, respond immediately, process in the background.
Real Example: Video Upload
When you upload a video to YouTube:
- Your browser sends the file to an upload service →
200 OKreturned immediately - An "uploaded" message is enqueued with the video ID
- Multiple downstream consumers pick it up in parallel:
- Transcoder converts the video to multiple resolutions
- Thumbnail generator picks a frame
- Content moderation scans for policy violations
- Metadata indexer updates search
- Each step is independent — one failure doesn't block the others
Without queues, all of this would have to happen synchronously before you got a response. With queues, you get a confirmation in milliseconds and the work happens in the background.
Dead Letter Queue (DLQ)
What happens when a message can't be processed? Maybe the payload is malformed, or a downstream dependency is broken.
If you just retry forever, you create an infinite loop that blocks other messages. Instead, after a configurable number of retry attempts, failed messages are moved to a Dead Letter Queue — a separate queue for messages that couldn't be handled.
From the DLQ you can:
- Inspect failed messages to understand what went wrong
- Fix the underlying issue and replay the messages
- Alert on-call when failure rates exceed a threshold
DLQs turn silent data loss into observable, recoverable failures.
Kafka vs RabbitMQ
Two of the most widely used systems, with different design philosophies:
| Kafka | RabbitMQ | |
|---|---|---|
| Model | Log-based (consumers track offset) | Push-based (broker routes to consumer) |
| Message retention | Days/weeks (configurable) | Until acknowledged |
| Throughput | Very high (millions/s) | High (tens of thousands/s) |
| Ordering | Per-partition | Per-queue |
| Replay | Yes — rewind the log | No — once consumed, gone |
| Best for | Event streaming, audit logs, analytics | Task queues, RPC, routing |
Kafka stores messages as an immutable log. Multiple consumer groups can read the same messages independently. Great for event sourcing, stream processing, and any case where replay matters.
RabbitMQ is a traditional message broker. It routes messages to queues based on rules (exchanges and bindings), pushes to consumers, and deletes messages after acknowledgement. Simpler to operate for task queue use cases.
Tools at a Glance
| Tool | Notes |
|---|---|
| Apache Kafka | High-throughput distributed log. Industry standard for event streaming. |
| RabbitMQ | Mature broker with flexible routing. Great for task queues and RPC patterns. |
| AWS SQS | Managed queue with Standard (at-least-once) and FIFO (exactly-once) modes. Zero ops. |
| GCP Pub/Sub | Google's managed pub/sub. Scales automatically, global delivery. |
| Redis Streams | Lightweight streams built into Redis. Good when you already have Redis. |
Tradeoffs
Complexity. You've added an infrastructure component. It needs to be deployed, monitored, and kept healthy.
Eventual consistency. The consumer processes the message after the producer moves on. If your use case requires an immediate response with the result of processing, a queue isn't the right tool.
At-least-once delivery. Most queue systems guarantee a message will be delivered at least once — meaning it may be delivered more than once if something fails mid-acknowledgement. Your consumers need to be idempotent (processing the same message twice produces the same result).
Ordering. Strict global ordering across all messages is hard to guarantee at scale. Kafka gives ordering within a partition; RabbitMQ within a single queue. If order matters globally, design for it explicitly.
Key Takeaways
- Queues decouple producers and consumers in time — neither waits on the other
- They absorb traffic spikes by buffering messages until consumers can keep up
- Failed messages go to a Dead Letter Queue instead of being silently dropped
- Kafka is built for high-throughput event streaming with replay; RabbitMQ for flexible routing and task queues
- Consumers must be idempotent — at-least-once delivery is the norm
Next: CDNs — how static assets and dynamic content are served closer to the user to reduce latency globally.
Enjoyed this breakdown?
Get new lessons in your inbox.