Zylos Logo
Zylos
2026-01-29

Message Queues and Event Streaming: Architecture Patterns for Distributed Systems

researchdistributed-systemsmessage-queueevent-streamingkafkarabbitmqmicroservicesarchitecture

Executive Summary

Message queues and event streaming platforms form the backbone of modern distributed systems, enabling asynchronous communication, decoupling microservices, and supporting event-driven architectures. This research examines the current landscape of messaging technologies in 2026, comparing major platforms (Kafka, RabbitMQ, Redis Streams, NATS, Apache Pulsar), architectural patterns (saga choreography vs orchestration), delivery guarantees (exactly-once semantics), and AWS managed services (SQS, SNS, EventBridge). The key insight is that the choice of messaging technology should align with specific requirements around throughput, persistence, latency tolerance, and operational complexity, with many organizations strategically using multiple technologies for different use cases.

Message Queue vs Event Streaming: Core Concepts

Message Queue Model

Message queues follow a traditional point-to-point communication pattern where messages are stored until consumed and typically removed after acknowledgment. The architecture is straightforward: producers create messages and deliver them to the message queue, while consumers connect to the queue and retrieve messages for processing.

Key characteristics:

  • Push/Pull Models: RabbitMQ uses push-based delivery where messages are actively sent to consumers, while systems like Kafka use pull-based consumption where consumers retrieve messages at their own pace
  • Temporary Storage: Messages typically exist until consumed and acknowledged
  • Complex Routing: Supports sophisticated routing patterns with exchanges, bindings, and priority queues

Event Streaming Model

Event streaming platforms treat data as a continuous, ordered stream of events, enabling real-time data processing and replay capabilities. Messages are appended to a durable log that persists beyond consumption.

Key characteristics:

  • Log-Based Storage: Events are stored in an immutable, append-only log
  • Replay Capability: Consumers can reprocess historical data within the retention period
  • High Throughput: Optimized for millions of messages per second using sequential disk I/O
  • Real-Time Processing: Enables immediate response to events for use cases like real-time analytics, personalized recommendations, and streaming applications

Platform Comparison: Major Technologies

Apache Kafka

Architecture: Distributed streaming platform with log-based storage and partitioned topics

Performance: Delivers best throughput with millions of messages per second, providing lowest end-to-end latencies up to the p99.9th percentile

Key Features:

  • KRaft Architecture (4.0+): Eliminates ZooKeeper dependency through built-in consensus protocol
  • Cooperative Consumer Rebalancing: Reduces disruption during consumer group changes
  • Long-Term Retention: Messages persist according to configurable retention policies, enabling replay
  • No Priority Queues: All messages are treated equally within partitions

Best For: Event streaming at scale, analytics pipelines, event sourcing, audit logging, systems requiring message persistence and replay capabilities with throughput exceeding 100K messages/second

RabbitMQ

Architecture: Traditional message broker with exchange-based routing using AMQP protocol

Performance: Handles approximately 50,000 messages per second (can vary based on configuration), delivering very low latency at lower throughputs

Key Features:

  • RabbitMQ 4.0+ Enhancements: Khepri metadata storage with Raft consensus for better fault tolerance, enhanced quorum queues with checkpoint-based recovery
  • Complex Routing: Sophisticated message routing with exchanges (direct, topic, fanout, headers)
  • Priority Queues: Supports message prioritization for urgent processing
  • Guaranteed Delivery: Strong guarantees with acknowledgments and persistence options

Best For: Traditional request-response patterns, complex routing logic, task queues, workflow orchestration, systems requiring guaranteed delivery with moderate throughput and priority handling

Redis Streams

Architecture: In-memory data structure with optional persistence, part of Redis ecosystem

Performance: Can send up to 1 million messages per second with sub-millisecond latency

Key Features:

  • Lightweight Operation: Much simpler operationally, especially when Redis is already in the technology stack
  • In-Memory Speed: Extremely low latency for time-sensitive operations
  • Limited Persistence: Not designed for long-term storage (primarily in-memory)
  • Consumer Groups: Supports consumer group semantics similar to Kafka

Best For: Latency under 1ms is critical and message loss is tolerable, such as real-time notifications, cache invalidation, live dashboards, and scenarios where extreme speed trumps durability

NATS

Architecture: Simple, lightweight cloud-native messaging system with built-in persistence (JetStream)

Performance: Publishes and subscribes to messages at millions per second with microsecond to millisecond latency

Key Features:

  • Single Binary Deployment: No external dependencies, easy to deploy and manage
  • JetStream: Next-generation streaming platform providing real-time data streaming, highly resilient storage, and flexible data retrieval
  • Multiple Patterns: Supports pub-sub, request-reply, and queue groups in one system
  • Cloud-Native Focus: Designed for IoT messaging, microservices, and edge computing

Best For: Cloud-native applications, IoT messaging, microservices requiring simple deployment, scenarios valuing operational simplicity over maximum throughput

Apache Pulsar

Architecture: Multi-layered architecture separating compute (brokers) from storage (BookKeeper)

Performance: High throughput with strong consistency guarantees and low-latency delivery

Key Features:

  • Multi-Tenancy from Inception: Built-in tenant isolation with separate namespaces, access control policies, and optional broker isolation for maximum noisy neighbor protection
  • Geo-Replication: Native support for cross-region data replication
  • Tiered Storage: Automatic offloading of older data to cheaper storage tiers
  • Multiple Client Languages: Six official client language SDKs

Best For: Enterprise multi-tenant environments, organizations requiring strict data isolation, geo-distributed applications, scenarios demanding both messaging and streaming capabilities

AWS Managed Services

Amazon SQS (Simple Queue Service):

  • Model: Fully managed message queuing service with poll-based consumption
  • Use Cases: Decoupling microservices, buffering requests, asynchronous task processing
  • FIFO Queues: Support exactly-once processing with message deduplication
  • Dead Letter Queues: Built-in error handling for failed messages

Amazon SNS (Simple Notification Service):

  • Model: Push-based pub/sub messaging with fanout capabilities
  • Use Cases: Broadcast notifications, mobile push, email/SMS delivery
  • Integration: Native support for SQS, Lambda, HTTP endpoints
  • Special Features: Built-in support for SMS, email, and push notifications

Amazon EventBridge:

  • Model: Serverless event bus for event-driven architectures
  • Use Cases: Real-time stream processing, cross-account event sharing, SaaS integration
  • Advanced Features: Schema registry with automatic discovery, event archiving and replay, content-based filtering with 300+ filter patterns
  • Third-Party Integration: Direct integration with SaaS providers (Auth0, Zendesk, Datadog) without polling or custom webhooks

Architectural Patterns

Saga Pattern for Distributed Transactions

The Saga pattern maintains data consistency across multiple services without distributed transactions, using two primary coordination approaches:

Choreography (Decentralized)

Each microservice performs tasks independently and communicates through events. Local transactions publish domain events that trigger local transactions in other services.

Characteristics:

  • No central coordinator; services subscribe to each other's events
  • Message broker buffers requests until downstream components claim them
  • Common technologies: Apache Kafka, RabbitMQ, AWS SNS/SQS

Advantages:

  • Decoupled services with loose coupling
  • Simple to implement for straightforward workflows
  • Natural fit for event-driven architectures

Challenges:

  • Difficult to debug and trace control flow
  • No single source of truth for workflow state
  • Complex error handling across services

Orchestration (Centralized)

A saga orchestrator service drives each participant, telling them what to do and when using request/asynchronous response-style interaction.

Characteristics:

  • Central coordinator manages the entire workflow
  • Orchestrator communicates with participants using command messages
  • Technologies: IBM MQ for strong consistency, Kafka configured for orchestration, AWS Step Functions

Advantages:

  • Clear control flow and easier debugging
  • Centralized observability and monitoring
  • Explicit workflow state management
  • Better handling of complex business logic

Challenges:

  • Single point of failure (orchestrator)
  • Increased coupling to orchestrator service
  • Orchestrator can become a bottleneck

Selection Guidance: Choose choreography for simple workflows with few participants where loose coupling is prioritized. Choose orchestration for complex workflows requiring clear visibility, debugging capabilities, and centralized control.

Message Delivery Semantics

At-Most-Once

Messages may be lost but are never redelivered. Lowest overhead but risks data loss.

Use Cases: Non-critical metrics, log aggregation where occasional loss is acceptable

At-Least-Once

Messages won't be lost but may be delivered multiple times. Most common implementation.

Use Cases: Most production systems combine this with idempotent consumers

Implementation: Retry logic ensures delivery, but application must handle duplicates

Exactly-Once

Side effects are applied exactly once, the most difficult and costly to implement.

Challenges: "Exactly once is the most difficult delivery semantic to implement and has a high cost for the system's performance and complexity."

Implementation Approaches:

  1. Infrastructure Level: Kafka's Exactly-Once Semantics (EOS) introduced in version 0.11, guaranteeing each message is processed exactly once
  2. Application Level: At-least-once delivery with idempotent consumers using deduplication
    • Store processed message IDs in cache/database with TTL matching retry window (24-72 hours)
    • Check dedupe store for duplicates and skip reprocessing
    • Achieves exactly-once effects without distributed transaction complexity

Critical Use Cases: Financial transactions, payment processing, trading systems, accounting where duplicate processing would cause serious issues

Dead Letter Queue (DLQ) Pattern

A dead letter queue is a special message queue that temporarily stores messages that cannot be processed due to errors.

Common Reasons for DLQ Routing:

  • Message contains deserialization errors or invalid data
  • Message exceeds queue/message length limits
  • Delivery count exceeds maximum retry limit
  • Message rejected by downstream service

Benefits:

  • Debugging Focus: Isolates problematic messages for investigation
  • Pipeline Continuity: Valid messages continue processing while errors are handled separately
  • Error Analysis: Messages contain valuable insights to prevent future issues

Best Practices:

  1. Retry Logic: Implement retry with defined limits before DLQ routing to prevent false positives
  2. Monitoring: Alert on DLQ growth to detect systemic issues
  3. Remediation Process: Establish workflow for analyzing and reprocessing DLQ messages
  4. Configuration: Set maxReceiveCount high enough for transient failures (avoid premature DLQ routing)
  5. Processing Plan: Always have a strategy for handling DLQ messages before enabling

Implementation Example (Kafka Connect): Configure separate Kafka topic as DLQ for deserialization errors while allowing valid messages to process normally

Decision Framework

Choose Kafka When:

  • Building event-driven architectures requiring message persistence and replay
  • Throughput exceeds 100K messages/second
  • Analytics pipelines and event sourcing patterns
  • Dedicated platform teams available
  • Long-term message retention needed (days to weeks)

Choose RabbitMQ When:

  • Traditional request-response patterns required
  • Complex routing logic essential (topic exchanges, headers matching)
  • Guaranteed delivery with acknowledgments is critical
  • Priority queuing needed
  • Moderate throughput sufficient (thousands to tens of thousands msg/sec)

Choose Redis Streams When:

  • Sub-millisecond latency critical
  • Message loss tolerable
  • Redis already in technology stack
  • Real-time notifications, cache invalidation, live dashboards
  • Operational simplicity prioritized

Choose NATS When:

  • Cloud-native and edge computing scenarios
  • IoT messaging at scale
  • Operational simplicity valued over maximum throughput
  • Single binary deployment preferred
  • Microservices requiring lightweight messaging

Choose Apache Pulsar When:

  • Multi-tenancy with strict isolation required
  • Geo-replication across regions essential
  • Both messaging and streaming capabilities needed
  • Enterprise environments with multiple internal customers
  • Tiered storage for cost optimization

Choose AWS Managed Services When:

  • SQS: Serverless architectures, decoupling AWS Lambda functions, simple queuing without operational overhead
  • SNS: Fanout notifications, mobile push, broadcast to multiple subscribers
  • EventBridge: Event-driven architectures, SaaS integration, cross-account events, content-based filtering

Production Best Practices

Performance Optimization

  • Batching: Group messages to reduce network overhead
  • Compression: Enable compression for large messages (Kafka: lz4, snappy)
  • Partitioning Strategy: Choose partition keys to distribute load evenly
  • Consumer Groups: Scale consumers horizontally within groups

Reliability Patterns

  • Circuit Breakers: Prevent cascading failures when downstream services fail
  • Backpressure Handling: Slow down producers when consumers can't keep up
  • Monitoring: Track queue depth, consumer lag, processing rates
  • Alerting: Set thresholds for DLQ messages, consumer lag spikes

Operational Considerations

  • Schema Evolution: Use schema registry (Kafka Schema Registry, Pulsar Schema Registry)
  • Message Ordering: Understand ordering guarantees per technology
  • Reprocessing: Design for idempotency to enable safe message replay
  • Testing: Use consumer group offsets to test with production data

Market Trends and Evolution

Kafka: Leads in enterprise adoption with strong backing from Confluent and Apache Foundation, continuously adding features like KRaft consensus and improved observability

RabbitMQ: Stable adoption in traditional enterprise environments with ongoing improvements in reliability and performance (supported by VMware)

Redis Streams: Benefits from Redis's explosive growth in cloud-native and microservices architectures, offering simplicity and speed

Hybrid Approaches: Many organizations use multiple technologies strategically - RabbitMQ for microservices communication and complex routing, Kafka for high-volume streaming and analytics, Redis for real-time notifications

Managed Services: Growing adoption of cloud-managed services (AWS SQS/SNS/EventBridge, Confluent Cloud, Azure Service Bus) to reduce operational overhead


Sources: