Message Queues and Event Streaming: Architecture Patterns for Distributed Systems
Executive Summary
Message queues and event streaming platforms form the backbone of modern distributed systems, enabling asynchronous communication, decoupling microservices, and supporting event-driven architectures. This research examines the current landscape of messaging technologies in 2026, comparing major platforms (Kafka, RabbitMQ, Redis Streams, NATS, Apache Pulsar), architectural patterns (saga choreography vs orchestration), delivery guarantees (exactly-once semantics), and AWS managed services (SQS, SNS, EventBridge). The key insight is that the choice of messaging technology should align with specific requirements around throughput, persistence, latency tolerance, and operational complexity, with many organizations strategically using multiple technologies for different use cases.
Message Queue vs Event Streaming: Core Concepts
Message Queue Model
Message queues follow a traditional point-to-point communication pattern where messages are stored until consumed and typically removed after acknowledgment. The architecture is straightforward: producers create messages and deliver them to the message queue, while consumers connect to the queue and retrieve messages for processing.
Key characteristics:
- Push/Pull Models: RabbitMQ uses push-based delivery where messages are actively sent to consumers, while systems like Kafka use pull-based consumption where consumers retrieve messages at their own pace
- Temporary Storage: Messages typically exist until consumed and acknowledged
- Complex Routing: Supports sophisticated routing patterns with exchanges, bindings, and priority queues
Event Streaming Model
Event streaming platforms treat data as a continuous, ordered stream of events, enabling real-time data processing and replay capabilities. Messages are appended to a durable log that persists beyond consumption.
Key characteristics:
- Log-Based Storage: Events are stored in an immutable, append-only log
- Replay Capability: Consumers can reprocess historical data within the retention period
- High Throughput: Optimized for millions of messages per second using sequential disk I/O
- Real-Time Processing: Enables immediate response to events for use cases like real-time analytics, personalized recommendations, and streaming applications
Platform Comparison: Major Technologies
Apache Kafka
Architecture: Distributed streaming platform with log-based storage and partitioned topics
Performance: Delivers best throughput with millions of messages per second, providing lowest end-to-end latencies up to the p99.9th percentile
Key Features:
- KRaft Architecture (4.0+): Eliminates ZooKeeper dependency through built-in consensus protocol
- Cooperative Consumer Rebalancing: Reduces disruption during consumer group changes
- Long-Term Retention: Messages persist according to configurable retention policies, enabling replay
- No Priority Queues: All messages are treated equally within partitions
Best For: Event streaming at scale, analytics pipelines, event sourcing, audit logging, systems requiring message persistence and replay capabilities with throughput exceeding 100K messages/second
RabbitMQ
Architecture: Traditional message broker with exchange-based routing using AMQP protocol
Performance: Handles approximately 50,000 messages per second (can vary based on configuration), delivering very low latency at lower throughputs
Key Features:
- RabbitMQ 4.0+ Enhancements: Khepri metadata storage with Raft consensus for better fault tolerance, enhanced quorum queues with checkpoint-based recovery
- Complex Routing: Sophisticated message routing with exchanges (direct, topic, fanout, headers)
- Priority Queues: Supports message prioritization for urgent processing
- Guaranteed Delivery: Strong guarantees with acknowledgments and persistence options
Best For: Traditional request-response patterns, complex routing logic, task queues, workflow orchestration, systems requiring guaranteed delivery with moderate throughput and priority handling
Redis Streams
Architecture: In-memory data structure with optional persistence, part of Redis ecosystem
Performance: Can send up to 1 million messages per second with sub-millisecond latency
Key Features:
- Lightweight Operation: Much simpler operationally, especially when Redis is already in the technology stack
- In-Memory Speed: Extremely low latency for time-sensitive operations
- Limited Persistence: Not designed for long-term storage (primarily in-memory)
- Consumer Groups: Supports consumer group semantics similar to Kafka
Best For: Latency under 1ms is critical and message loss is tolerable, such as real-time notifications, cache invalidation, live dashboards, and scenarios where extreme speed trumps durability
NATS
Architecture: Simple, lightweight cloud-native messaging system with built-in persistence (JetStream)
Performance: Publishes and subscribes to messages at millions per second with microsecond to millisecond latency
Key Features:
- Single Binary Deployment: No external dependencies, easy to deploy and manage
- JetStream: Next-generation streaming platform providing real-time data streaming, highly resilient storage, and flexible data retrieval
- Multiple Patterns: Supports pub-sub, request-reply, and queue groups in one system
- Cloud-Native Focus: Designed for IoT messaging, microservices, and edge computing
Best For: Cloud-native applications, IoT messaging, microservices requiring simple deployment, scenarios valuing operational simplicity over maximum throughput
Apache Pulsar
Architecture: Multi-layered architecture separating compute (brokers) from storage (BookKeeper)
Performance: High throughput with strong consistency guarantees and low-latency delivery
Key Features:
- Multi-Tenancy from Inception: Built-in tenant isolation with separate namespaces, access control policies, and optional broker isolation for maximum noisy neighbor protection
- Geo-Replication: Native support for cross-region data replication
- Tiered Storage: Automatic offloading of older data to cheaper storage tiers
- Multiple Client Languages: Six official client language SDKs
Best For: Enterprise multi-tenant environments, organizations requiring strict data isolation, geo-distributed applications, scenarios demanding both messaging and streaming capabilities
AWS Managed Services
Amazon SQS (Simple Queue Service):
- Model: Fully managed message queuing service with poll-based consumption
- Use Cases: Decoupling microservices, buffering requests, asynchronous task processing
- FIFO Queues: Support exactly-once processing with message deduplication
- Dead Letter Queues: Built-in error handling for failed messages
Amazon SNS (Simple Notification Service):
- Model: Push-based pub/sub messaging with fanout capabilities
- Use Cases: Broadcast notifications, mobile push, email/SMS delivery
- Integration: Native support for SQS, Lambda, HTTP endpoints
- Special Features: Built-in support for SMS, email, and push notifications
Amazon EventBridge:
- Model: Serverless event bus for event-driven architectures
- Use Cases: Real-time stream processing, cross-account event sharing, SaaS integration
- Advanced Features: Schema registry with automatic discovery, event archiving and replay, content-based filtering with 300+ filter patterns
- Third-Party Integration: Direct integration with SaaS providers (Auth0, Zendesk, Datadog) without polling or custom webhooks
Architectural Patterns
Saga Pattern for Distributed Transactions
The Saga pattern maintains data consistency across multiple services without distributed transactions, using two primary coordination approaches:
Choreography (Decentralized)
Each microservice performs tasks independently and communicates through events. Local transactions publish domain events that trigger local transactions in other services.
Characteristics:
- No central coordinator; services subscribe to each other's events
- Message broker buffers requests until downstream components claim them
- Common technologies: Apache Kafka, RabbitMQ, AWS SNS/SQS
Advantages:
- Decoupled services with loose coupling
- Simple to implement for straightforward workflows
- Natural fit for event-driven architectures
Challenges:
- Difficult to debug and trace control flow
- No single source of truth for workflow state
- Complex error handling across services
Orchestration (Centralized)
A saga orchestrator service drives each participant, telling them what to do and when using request/asynchronous response-style interaction.
Characteristics:
- Central coordinator manages the entire workflow
- Orchestrator communicates with participants using command messages
- Technologies: IBM MQ for strong consistency, Kafka configured for orchestration, AWS Step Functions
Advantages:
- Clear control flow and easier debugging
- Centralized observability and monitoring
- Explicit workflow state management
- Better handling of complex business logic
Challenges:
- Single point of failure (orchestrator)
- Increased coupling to orchestrator service
- Orchestrator can become a bottleneck
Selection Guidance: Choose choreography for simple workflows with few participants where loose coupling is prioritized. Choose orchestration for complex workflows requiring clear visibility, debugging capabilities, and centralized control.
Message Delivery Semantics
At-Most-Once
Messages may be lost but are never redelivered. Lowest overhead but risks data loss.
Use Cases: Non-critical metrics, log aggregation where occasional loss is acceptable
At-Least-Once
Messages won't be lost but may be delivered multiple times. Most common implementation.
Use Cases: Most production systems combine this with idempotent consumers
Implementation: Retry logic ensures delivery, but application must handle duplicates
Exactly-Once
Side effects are applied exactly once, the most difficult and costly to implement.
Challenges: "Exactly once is the most difficult delivery semantic to implement and has a high cost for the system's performance and complexity."
Implementation Approaches:
- Infrastructure Level: Kafka's Exactly-Once Semantics (EOS) introduced in version 0.11, guaranteeing each message is processed exactly once
- Application Level: At-least-once delivery with idempotent consumers using deduplication
- Store processed message IDs in cache/database with TTL matching retry window (24-72 hours)
- Check dedupe store for duplicates and skip reprocessing
- Achieves exactly-once effects without distributed transaction complexity
Critical Use Cases: Financial transactions, payment processing, trading systems, accounting where duplicate processing would cause serious issues
Dead Letter Queue (DLQ) Pattern
A dead letter queue is a special message queue that temporarily stores messages that cannot be processed due to errors.
Common Reasons for DLQ Routing:
- Message contains deserialization errors or invalid data
- Message exceeds queue/message length limits
- Delivery count exceeds maximum retry limit
- Message rejected by downstream service
Benefits:
- Debugging Focus: Isolates problematic messages for investigation
- Pipeline Continuity: Valid messages continue processing while errors are handled separately
- Error Analysis: Messages contain valuable insights to prevent future issues
Best Practices:
- Retry Logic: Implement retry with defined limits before DLQ routing to prevent false positives
- Monitoring: Alert on DLQ growth to detect systemic issues
- Remediation Process: Establish workflow for analyzing and reprocessing DLQ messages
- Configuration: Set maxReceiveCount high enough for transient failures (avoid premature DLQ routing)
- Processing Plan: Always have a strategy for handling DLQ messages before enabling
Implementation Example (Kafka Connect): Configure separate Kafka topic as DLQ for deserialization errors while allowing valid messages to process normally
Decision Framework
Choose Kafka When:
- Building event-driven architectures requiring message persistence and replay
- Throughput exceeds 100K messages/second
- Analytics pipelines and event sourcing patterns
- Dedicated platform teams available
- Long-term message retention needed (days to weeks)
Choose RabbitMQ When:
- Traditional request-response patterns required
- Complex routing logic essential (topic exchanges, headers matching)
- Guaranteed delivery with acknowledgments is critical
- Priority queuing needed
- Moderate throughput sufficient (thousands to tens of thousands msg/sec)
Choose Redis Streams When:
- Sub-millisecond latency critical
- Message loss tolerable
- Redis already in technology stack
- Real-time notifications, cache invalidation, live dashboards
- Operational simplicity prioritized
Choose NATS When:
- Cloud-native and edge computing scenarios
- IoT messaging at scale
- Operational simplicity valued over maximum throughput
- Single binary deployment preferred
- Microservices requiring lightweight messaging
Choose Apache Pulsar When:
- Multi-tenancy with strict isolation required
- Geo-replication across regions essential
- Both messaging and streaming capabilities needed
- Enterprise environments with multiple internal customers
- Tiered storage for cost optimization
Choose AWS Managed Services When:
- SQS: Serverless architectures, decoupling AWS Lambda functions, simple queuing without operational overhead
- SNS: Fanout notifications, mobile push, broadcast to multiple subscribers
- EventBridge: Event-driven architectures, SaaS integration, cross-account events, content-based filtering
Production Best Practices
Performance Optimization
- Batching: Group messages to reduce network overhead
- Compression: Enable compression for large messages (Kafka: lz4, snappy)
- Partitioning Strategy: Choose partition keys to distribute load evenly
- Consumer Groups: Scale consumers horizontally within groups
Reliability Patterns
- Circuit Breakers: Prevent cascading failures when downstream services fail
- Backpressure Handling: Slow down producers when consumers can't keep up
- Monitoring: Track queue depth, consumer lag, processing rates
- Alerting: Set thresholds for DLQ messages, consumer lag spikes
Operational Considerations
- Schema Evolution: Use schema registry (Kafka Schema Registry, Pulsar Schema Registry)
- Message Ordering: Understand ordering guarantees per technology
- Reprocessing: Design for idempotency to enable safe message replay
- Testing: Use consumer group offsets to test with production data
Market Trends and Evolution
Kafka: Leads in enterprise adoption with strong backing from Confluent and Apache Foundation, continuously adding features like KRaft consensus and improved observability
RabbitMQ: Stable adoption in traditional enterprise environments with ongoing improvements in reliability and performance (supported by VMware)
Redis Streams: Benefits from Redis's explosive growth in cloud-native and microservices architectures, offering simplicity and speed
Hybrid Approaches: Many organizations use multiple technologies strategically - RabbitMQ for microservices communication and complex routing, Kafka for high-volume streaming and analytics, Redis for real-time notifications
Managed Services: Growing adoption of cloud-managed services (AWS SQS/SNS/EventBridge, Confluent Cloud, Azure Service Bus) to reduce operational overhead
Sources:
- Kafka vs RabbitMQ - AWS
- Kafka vs RabbitMQ - DataCamp
- RabbitMQ vs Kafka vs ActiveMQ
- When to use RabbitMQ or Apache Kafka
- Kafka Fastest Messaging System - Confluent
- Event Streaming - Confluent
- Stream Processing Platforms - Ably
- Kafka vs RabbitMQ vs Redis Benchmarks
- Choosing the Right Messaging System
- Redis vs RabbitMQ vs Kafka
- Message Queue Pattern - Microservices
- Understanding Message Queues - ByteByteGo
- Messaging Patterns - Solace
- NATS.io Official
- NATS Overview - RisingWave
- Apache Pulsar Multi-Tenancy
- Pulsar Enterprise Messaging
- Saga Pattern - Microservices.io
- Saga Orchestration vs Choreography - ByteByteGo
- Saga Pattern - Azure Architecture
- Saga Orchestration vs Choreography - Temporal
- Exactly-Once Semantics - ByteByteGo
- Exactly-Once Delivery
- Kafka Exactly-Once Semantics
- Delivery Semantics Overview
- Dead-Letter Queues - AWS
- Service Bus Dead-Letter Queues
- Kafka Dead Letter Queue
- AWS SQS vs SNS vs EventBridge Decision Guide
- AWS Messaging Services - nOps
- Choosing Between Messaging Services - AWS Blog
- EventBridge vs SQS - Ably

