Message Queues and Event Streaming: Architecture Patterns for Distributed Systems

Executive Summary

Message queues and event streaming platforms form the backbone of modern distributed systems, enabling asynchronous communication, decoupling microservices, and supporting event-driven architectures. This research examines the current landscape of messaging technologies in 2026, comparing major platforms (Kafka, RabbitMQ, Redis Streams, NATS, Apache Pulsar), architectural patterns (saga choreography vs orchestration), delivery guarantees (exactly-once semantics), and AWS managed services (SQS, SNS, EventBridge). The key insight is that the choice of messaging technology should align with specific requirements around throughput, persistence, latency tolerance, and operational complexity, with many organizations strategically using multiple technologies for different use cases.

Message Queue vs Event Streaming: Core Concepts

Message Queue Model

Message queues follow a traditional point-to-point communication pattern where messages are stored until consumed and typically removed after acknowledgment. The architecture is straightforward: producers create messages and deliver them to the message queue, while consumers connect to the queue and retrieve messages for processing.

Key characteristics:

Push/Pull Models: RabbitMQ uses push-based delivery where messages are actively sent to consumers, while systems like Kafka use pull-based consumption where consumers retrieve messages at their own pace
Temporary Storage: Messages typically exist until consumed and acknowledged
Complex Routing: Supports sophisticated routing patterns with exchanges, bindings, and priority queues

Event Streaming Model

Event streaming platforms treat data as a continuous, ordered stream of events, enabling real-time data processing and replay capabilities. Messages are appended to a durable log that persists beyond consumption.

Key characteristics:

Log-Based Storage: Events are stored in an immutable, append-only log
Replay Capability: Consumers can reprocess historical data within the retention period
High Throughput: Optimized for millions of messages per second using sequential disk I/O
Real-Time Processing: Enables immediate response to events for use cases like real-time analytics, personalized recommendations, and streaming applications

Platform Comparison: Major Technologies

Apache Kafka

Architecture: Distributed streaming platform with log-based storage and partitioned topics

Performance: Delivers best throughput with millions of messages per second, providing lowest end-to-end latencies up to the p99.9th percentile

Key Features:

KRaft Architecture (4.0+): Eliminates ZooKeeper dependency through built-in consensus protocol
Cooperative Consumer Rebalancing: Reduces disruption during consumer group changes
Long-Term Retention: Messages persist according to configurable retention policies, enabling replay
No Priority Queues: All messages are treated equally within partitions

Best For: Event streaming at scale, analytics pipelines, event sourcing, audit logging, systems requiring message persistence and replay capabilities with throughput exceeding 100K messages/second

RabbitMQ

Architecture: Traditional message broker with exchange-based routing using AMQP protocol

Performance: Handles approximately 50,000 messages per second (can vary based on configuration), delivering very low latency at lower throughputs

Key Features:

RabbitMQ 4.0+ Enhancements: Khepri metadata storage with Raft consensus for better fault tolerance, enhanced quorum queues with checkpoint-based recovery
Complex Routing: Sophisticated message routing with exchanges (direct, topic, fanout, headers)
Priority Queues: Supports message prioritization for urgent processing
Guaranteed Delivery: Strong guarantees with acknowledgments and persistence options

Best For: Traditional request-response patterns, complex routing logic, task queues, workflow orchestration, systems requiring guaranteed delivery with moderate throughput and priority handling

Redis Streams

Architecture: In-memory data structure with optional persistence, part of Redis ecosystem

Performance: Can send up to 1 million messages per second with sub-millisecond latency

Key Features:

Lightweight Operation: Much simpler operationally, especially when Redis is already in the technology stack
In-Memory Speed: Extremely low latency for time-sensitive operations
Limited Persistence: Not designed for long-term storage (primarily in-memory)
Consumer Groups: Supports consumer group semantics similar to Kafka

Best For: Latency under 1ms is critical and message loss is tolerable, such as real-time notifications, cache invalidation, live dashboards, and scenarios where extreme speed trumps durability

NATS

Architecture: Simple, lightweight cloud-native messaging system with built-in persistence (JetStream)

Performance: Publishes and subscribes to messages at millions per second with microsecond to millisecond latency

Key Features:

Single Binary Deployment: No external dependencies, easy to deploy and manage
JetStream: Next-generation streaming platform providing real-time data streaming, highly resilient storage, and flexible data retrieval
Multiple Patterns: Supports pub-sub, request-reply, and queue groups in one system
Cloud-Native Focus: Designed for IoT messaging, microservices, and edge computing

Best For: Cloud-native applications, IoT messaging, microservices requiring simple deployment, scenarios valuing operational simplicity over maximum throughput

Apache Pulsar

Architecture: Multi-layered architecture separating compute (brokers) from storage (BookKeeper)

Performance: High throughput with strong consistency guarantees and low-latency delivery

Key Features:

Multi-Tenancy from Inception: Built-in tenant isolation with separate namespaces, access control policies, and optional broker isolation for maximum noisy neighbor protection
Geo-Replication: Native support for cross-region data replication
Tiered Storage: Automatic offloading of older data to cheaper storage tiers
Multiple Client Languages: Six official client language SDKs

Best For: Enterprise multi-tenant environments, organizations requiring strict data isolation, geo-distributed applications, scenarios demanding both messaging and streaming capabilities

AWS Managed Services

Amazon SQS (Simple Queue Service):

Model: Fully managed message queuing service with poll-based consumption
Use Cases: Decoupling microservices, buffering requests, asynchronous task processing
FIFO Queues: Support exactly-once processing with message deduplication
Dead Letter Queues: Built-in error handling for failed messages

Amazon SNS (Simple Notification Service):

Model: Push-based pub/sub messaging with fanout capabilities
Use Cases: Broadcast notifications, mobile push, email/SMS delivery
Integration: Native support for SQS, Lambda, HTTP endpoints
Special Features: Built-in support for SMS, email, and push notifications

Amazon EventBridge:

Model: Serverless event bus for event-driven architectures
Use Cases: Real-time stream processing, cross-account event sharing, SaaS integration
Advanced Features: Schema registry with automatic discovery, event archiving and replay, content-based filtering with 300+ filter patterns
Third-Party Integration: Direct integration with SaaS providers (Auth0, Zendesk, Datadog) without polling or custom webhooks

Architectural Patterns

Saga Pattern for Distributed Transactions

The Saga pattern maintains data consistency across multiple services without distributed transactions, using two primary coordination approaches:

Choreography (Decentralized)

Each microservice performs tasks independently and communicates through events. Local transactions publish domain events that trigger local transactions in other services.

Characteristics:

No central coordinator; services subscribe to each other's events
Message broker buffers requests until downstream components claim them
Common technologies: Apache Kafka, RabbitMQ, AWS SNS/SQS

Advantages:

Decoupled services with loose coupling
Simple to implement for straightforward workflows
Natural fit for event-driven architectures

Challenges:

Difficult to debug and trace control flow
No single source of truth for workflow state
Complex error handling across services

Orchestration (Centralized)

A saga orchestrator service drives each participant, telling them what to do and when using request/asynchronous response-style interaction.

Characteristics:

Central coordinator manages the entire workflow
Orchestrator communicates with participants using command messages
Technologies: IBM MQ for strong consistency, Kafka configured for orchestration, AWS Step Functions

Advantages:

Clear control flow and easier debugging
Centralized observability and monitoring
Explicit workflow state management
Better handling of complex business logic

Challenges:

Single point of failure (orchestrator)
Increased coupling to orchestrator service
Orchestrator can become a bottleneck

Selection Guidance: Choose choreography for simple workflows with few participants where loose coupling is prioritized. Choose orchestration for complex workflows requiring clear visibility, debugging capabilities, and centralized control.

Message Delivery Semantics

At-Most-Once

Messages may be lost but are never redelivered. Lowest overhead but risks data loss.

Use Cases: Non-critical metrics, log aggregation where occasional loss is acceptable

At-Least-Once

Messages won't be lost but may be delivered multiple times. Most common implementation.

Use Cases: Most production systems combine this with idempotent consumers

Implementation: Retry logic ensures delivery, but application must handle duplicates

Exactly-Once

Side effects are applied exactly once, the most difficult and costly to implement.

Challenges: "Exactly once is the most difficult delivery semantic to implement and has a high cost for the system's performance and complexity."

Implementation Approaches:

Infrastructure Level: Kafka's Exactly-Once Semantics (EOS) introduced in version 0.11, guaranteeing each message is processed exactly once
Application Level: At-least-once delivery with idempotent consumers using deduplication
- Store processed message IDs in cache/database with TTL matching retry window (24-72 hours)
- Check dedupe store for duplicates and skip reprocessing
- Achieves exactly-once effects without distributed transaction complexity

Critical Use Cases: Financial transactions, payment processing, trading systems, accounting where duplicate processing would cause serious issues

Dead Letter Queue (DLQ) Pattern

A dead letter queue is a special message queue that temporarily stores messages that cannot be processed due to errors.

Common Reasons for DLQ Routing:

Message contains deserialization errors or invalid data
Message exceeds queue/message length limits
Delivery count exceeds maximum retry limit
Message rejected by downstream service

Benefits:

Debugging Focus: Isolates problematic messages for investigation
Pipeline Continuity: Valid messages continue processing while errors are handled separately
Error Analysis: Messages contain valuable insights to prevent future issues

Best Practices:

Retry Logic: Implement retry with defined limits before DLQ routing to prevent false positives
Monitoring: Alert on DLQ growth to detect systemic issues
Remediation Process: Establish workflow for analyzing and reprocessing DLQ messages
Configuration: Set maxReceiveCount high enough for transient failures (avoid premature DLQ routing)
Processing Plan: Always have a strategy for handling DLQ messages before enabling

Implementation Example (Kafka Connect): Configure separate Kafka topic as DLQ for deserialization errors while allowing valid messages to process normally

Decision Framework

Choose Kafka When:

Building event-driven architectures requiring message persistence and replay
Throughput exceeds 100K messages/second
Analytics pipelines and event sourcing patterns
Dedicated platform teams available
Long-term message retention needed (days to weeks)

Choose RabbitMQ When:

Traditional request-response patterns required
Complex routing logic essential (topic exchanges, headers matching)
Guaranteed delivery with acknowledgments is critical
Priority queuing needed
Moderate throughput sufficient (thousands to tens of thousands msg/sec)

Choose Redis Streams When:

Sub-millisecond latency critical
Message loss tolerable
Redis already in technology stack
Real-time notifications, cache invalidation, live dashboards
Operational simplicity prioritized

Choose NATS When:

Cloud-native and edge computing scenarios
IoT messaging at scale
Operational simplicity valued over maximum throughput
Single binary deployment preferred
Microservices requiring lightweight messaging

Choose Apache Pulsar When:

Multi-tenancy with strict isolation required
Geo-replication across regions essential
Both messaging and streaming capabilities needed
Enterprise environments with multiple internal customers
Tiered storage for cost optimization

Choose AWS Managed Services When:

SQS: Serverless architectures, decoupling AWS Lambda functions, simple queuing without operational overhead
SNS: Fanout notifications, mobile push, broadcast to multiple subscribers
EventBridge: Event-driven architectures, SaaS integration, cross-account events, content-based filtering

Production Best Practices

Performance Optimization

Batching: Group messages to reduce network overhead
Compression: Enable compression for large messages (Kafka: lz4, snappy)
Partitioning Strategy: Choose partition keys to distribute load evenly
Consumer Groups: Scale consumers horizontally within groups

Reliability Patterns

Circuit Breakers: Prevent cascading failures when downstream services fail
Backpressure Handling: Slow down producers when consumers can't keep up
Monitoring: Track queue depth, consumer lag, processing rates
Alerting: Set thresholds for DLQ messages, consumer lag spikes

Operational Considerations

Schema Evolution: Use schema registry (Kafka Schema Registry, Pulsar Schema Registry)
Message Ordering: Understand ordering guarantees per technology
Reprocessing: Design for idempotency to enable safe message replay
Testing: Use consumer group offsets to test with production data

Market Trends and Evolution

Kafka: Leads in enterprise adoption with strong backing from Confluent and Apache Foundation, continuously adding features like KRaft consensus and improved observability

RabbitMQ: Stable adoption in traditional enterprise environments with ongoing improvements in reliability and performance (supported by VMware)

Redis Streams: Benefits from Redis's explosive growth in cloud-native and microservices architectures, offering simplicity and speed

Hybrid Approaches: Many organizations use multiple technologies strategically - RabbitMQ for microservices communication and complex routing, Kafka for high-volume streaming and analytics, Redis for real-time notifications

Managed Services: Growing adoption of cloud-managed services (AWS SQS/SNS/EventBridge, Confluent Cloud, Azure Service Bus) to reduce operational overhead

Sources: