Transactional Outbox Pattern: Reliable Side-Effect Delivery in Microservices
Executive Summary
The transactional outbox pattern solves one of distributed systems' most stubborn problems: atomically committing a business state change and publishing a downstream event when the database and the message broker are two separate systems. Without it, any crash between the two writes leaves the system in an inconsistent state — either a silent data loss or a phantom event. The pattern's solution is elegant and low-ceremony: write the event into an outbox table in the same database transaction as the business record, then let a separate relay process forward it to the broker asynchronously. This document covers the full pattern — core mechanics, relay strategies (polling vs. CDC), the complementary inbox pattern for consumer idempotency, production observability requirements, and the anti-patterns that signal the pattern is being overused.
The Problem: Dual Write in Distributed Systems
Every service that both persists state and publishes events faces the dual-write problem. Two independent writes cannot be made atomic without a distributed coordination protocol:
- Write business data to the database — then crash before publishing → event is lost, downstream systems are never notified.
- Publish the event first, then write to the database — then crash before writing → a phantom event triggers downstream effects with no business record to back them.
Two-Phase Commit (2PC) would solve this at the protocol level but is prohibitively expensive at scale: it blocks participants, reduces throughput, and most message brokers do not participate in XA transactions. The outbox pattern trades 2PC for a simpler invariant: use the database itself as the source of truth for both business data and pending events.
Core Mechanics
The Outbox Table
The outbox table lives in the same database as the business tables. A canonical schema looks like:
CREATE TABLE outbox_tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending | processed | failed
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
processed_at TIMESTAMPTZ,
retry_count INT NOT NULL DEFAULT 0
);
The Atomic Write
The business handler inserts into both the domain table and the outbox table in a single transaction:
BEGIN;
INSERT INTO organizations (id, name, ...) VALUES (...);
INSERT INTO outbox_tasks (event_type, payload)
VALUES ('org.created', '{"org_id": "...", "name": "..."}');
COMMIT;
Either both records land, or neither does. The relay process is now the only actor responsible for forwarding the event — application code never talks to the broker directly.
The Relay (Outbox Processor)
A background process periodically reads pending rows, publishes them to the broker, then marks them processed:
SELECT id, event_type, payload
FROM outbox_tasks
WHERE status = 'pending'
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED; -- safe for concurrent relay workers
After each successful publish:
UPDATE outbox_tasks SET status = 'processed', processed_at = now()
WHERE id = $1;
Relay Strategies: Polling vs. Change Data Capture
There are two fundamentally different approaches to implementing the relay, with meaningful trade-offs between them.
Polling Publisher
The relay runs on a timer (typically every 100ms–2s) and queries the outbox table directly.
Advantages:
- Simple to build and debug — pure application code, no infrastructure dependencies beyond the database.
- Works with any relational database.
- No additional operational components.
Disadvantages:
- Adds continuous query load to the database (
SELECT ... FOR UPDATE SKIP LOCKED). - At scale, each microservice running its own polling loop compounds database pressure — index bloat, lock contention, and connection exhaustion become real risks.
- Introduces artificial latency equal to the polling interval.
Recommendation: Start here. The schema is identical to the CDC approach, so switching later requires no application changes.
Change Data Capture (CDC) with Debezium
CDC tools such as Debezium read events directly from the database's Write-Ahead Log (WAL). PostgreSQL's logical replication exports committed changes as a stream; Debezium translates these into broker messages without ever querying the outbox table.
Advantages:
- Near-zero latency — events appear in the broker within milliseconds of commit.
- No additional load from SELECT queries on the application database.
- Scales horizontally without compounding database pressure.
Disadvantages:
- Requires Kafka (or a compatible broker) as a target.
- WAL retention must be managed: Debezium holds a replication slot, and if it falls behind, the WAL grows without bound and can fill the disk.
- More operational surface area — Debezium connector, Kafka Connect cluster, slot health monitoring.
The critical production risk with CDC is not CPU, it is WAL retention. WAL accumulates until the replication slot's LSN advances. Monitor the slot lag metric (pg_replication_slots.confirmed_flush_lsn) and alert aggressively.
Recommendation: Adopt CDC when polling latency or database load becomes a measurable problem, or when the team already operates Kafka infrastructure. The outbox table schema is the same either way.
Delivery Guarantees and Consumer Idempotency
The outbox pattern guarantees at-least-once delivery, not exactly-once. The classic failure scenario:
- Relay publishes the event to the broker successfully.
- Relay crashes before updating
status = 'processed'. - On restart, the relay re-reads the pending row and publishes again.
The broker now has a duplicate. Downstream consumers must handle it. Two standard approaches:
Idempotency Key Check
Every event carries a stable, unique identifier (e.g., the outbox row's UUID). The consumer maintains a processed_events table:
CREATE TABLE processed_events (
event_id UUID PRIMARY KEY,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Before processing: INSERT INTO processed_events (event_id) VALUES ($1) ON CONFLICT DO NOTHING — if zero rows are inserted, the event is a duplicate and is skipped.
The Inbox Pattern
The inbox is the consumer-side complement to the outbox. Instead of the consumer executing business logic directly on receipt, it writes the incoming event to a local inbox table (in the same transaction as its local business operation). A separate background processor dequeues and handles the event once.
The inbox provides:
- Deduplication — the inbox table's unique constraint on
event_idrejects duplicates at the database level. - Consumer auditability — the inbox is a persistent record of all events received, in arrival order.
- Decoupling from broker availability — the consumer can process events even if the broker is momentarily unavailable.
Outbox + Inbox together provide end-to-end reliability: the outbox guarantees the event leaves the producer; the inbox guarantees it is processed exactly once by the consumer.
Comparison with Related Patterns
Outbox vs. Saga
These patterns operate at different levels and are often combined:
| Dimension | Transactional Outbox | Saga Pattern |
|---|---|---|
| Concern | Reliable event publishing | Coordinating a multi-step distributed transaction |
| Scope | Single service, single step | Multiple services, multiple steps |
| Failure recovery | Relay retries until published | Compensating transactions undo prior steps |
| Complexity | Low | Medium–High |
The outbox is typically used inside a saga: each saga step publishes its transition event through the outbox, guaranteeing the next saga participant is notified reliably. Without the outbox, a saga step can commit locally but fail to notify the next participant.
Outbox vs. 2PC (Two-Phase Commit)
| Dimension | Transactional Outbox | Two-Phase Commit |
|---|---|---|
| Atomicity | Per-database ACID + async relay | Distributed protocol across participants |
| Throughput | High (no blocking coordinator) | Low (all participants blocked during prepare phase) |
| Infrastructure | Database + relay process | Distributed transaction manager |
| Broker support | Not required | Most brokers do not support XA |
The outbox wins on every operational dimension at the cost of accepting eventual consistency rather than synchronous consistency.
Go + PostgreSQL Implementation Patterns
The coco-outbox library (used in the Zylos/coco-workspace ecosystem) provides a reusable implementation of this pattern for Go services backed by PostgreSQL.
Key API surface
// Record an event atomically within a transaction
store.WithTx(tx).Record(ctx, outbox.Task{
EventType: "org.created",
Payload: json.RawMessage(`{"org_id": "..."}`),
})
// The relay polls and dispatches
service.Start(ctx) // begins background polling
The WithTx(tx) call is the critical integration point — it binds the outbox write to the caller's existing transaction, enforcing the atomic invariant at the API level rather than relying on caller discipline.
Integration Phase Pattern (cws-core case)
When integrating coco-outbox into an existing service that previously had its own ad-hoc init_task implementation:
- Phase 1 (module replacement): Replace the bespoke
init_taskimplementation withcoco-outbox'sStore/ServiceAPI. The event is still recorded in a standalone call (not within the business transaction) — this is architecturally correct for now and establishes the API boundary. - Phase 2 (transaction integration): Once handler-level transaction support lands (e.g., Issue #79 in cws-core), move the
Recordcall inside the business transaction usingstore.WithTx(tx). No schema change is required — only the call site moves.
This two-phase approach decouples module replacement from full consistency guarantees, reducing the blast radius of each change.
Production Observability Requirements
The outbox relay is a silent background process — without monitoring, it can fail for hours with no visible symptom while events queue up.
Metrics to instrument
| Metric | Alert Condition |
|---|---|
| Outbox lag — time between event creation and successful publish | Alert if P99 exceeds SLA (e.g., > 30s for non-realtime, > 500ms for realtime) |
Pending row count — rows in status = 'pending' | Alert if count grows monotonically (relay is stuck or falling behind) |
| Retry count distribution | Alert if any row exceeds max retries |
| DLQ size — events routed to dead-letter after max retries | Alert on any non-zero growth |
| Relay process health — PM2/process supervisor status | Alert on crash or restart |
| WAL slot lag (CDC only) | Alert if slot lag exceeds configured threshold |
Dead Letter Queue
Events that fail to publish after N retries must not be silently dropped. Route them to a DLQ with the original event, error details, and retry history. This enables:
- Manual replay after fixing the downstream issue.
- SRE investigation of the failure root cause.
- Audit trail completeness.
When NOT to Use the Outbox Pattern
The pattern adds real complexity: an extra table, a relay process, observability requirements, and consumer idempotency logic. It is not the right default for every situation.
Avoid the outbox pattern when:
- The service is a monolith. The outbox solves an inter-service consistency problem. In a monolith, direct function calls within a transaction are always atomic — no relay is needed.
- The downstream action is synchronous and user-visible. If the user is waiting for the action and the UI reflects its result immediately, eventual consistency may create a confusing experience. Synchronous calls with retry may be simpler.
- No message broker exists. Introducing a broker just to support the outbox relay is rarely justified for a single use case. Consider whether the use case requires pub/sub at all.
- The system can tolerate missed events. If the downstream effect is best-effort (e.g., cache warming, analytics), a best-effort async call without the outbox overhead may be sufficient.
- The failure window is observable and recoverable. If the service can detect and manually repair a missed event (e.g., a reconciliation job), the simpler dual-write may be acceptable.
The key question is: what is the business cost of a missed event? If the answer is "critical — it causes billing errors, missing notifications, or data inconsistency," the outbox pattern is justified. If the answer is "low — it delays a cache refresh," simpler alternatives are likely better.
Operational Checklist
For teams deploying the outbox pattern in production:
- Outbox table indexed on
(status, created_at)to support efficient polling -
SELECT ... FOR UPDATE SKIP LOCKEDused by relay to support concurrent workers - Relay process supervised (PM2, systemd, Kubernetes liveness probe)
- Outbox lag metric instrumented and alerted
- Pending row count alerted on monotonic growth
- DLQ implemented for events exceeding max retries
- Consumer idempotency implemented (inbox table or idempotency key check)
- WAL slot lag monitored if using CDC/Debezium
- Periodic cleanup job for processed rows older than retention window
Key Takeaways
- The transactional outbox pattern eliminates the dual-write problem by making the database — not the broker — the single atomic unit of truth for both state and pending events.
- Two relay strategies: polling (simple, start here) and CDC (low latency, lower DB load, higher operational complexity). The outbox schema is identical for both, so migration is non-breaking.
- The pattern guarantees at-least-once delivery. Exactly-once processing requires consumer-side idempotency — either an inbox table or an idempotency key log.
- Outbox + Inbox together provide end-to-end reliability across the producer-consumer boundary without distributed transactions.
- Observability is non-optional in production: instrument outbox lag, pending row count, DLQ size, and relay process health.
- Don't default to the outbox everywhere — evaluate the business cost of a missed event against the complexity the pattern adds.
Sources
- Microservices.io: Transactional Outbox Pattern
- Confluent: The Transactional Outbox Pattern
- AWS Prescriptive Guidance: Transactional Outbox Pattern
- Event-Driven.io: Push-based Outbox with Postgres Logical Replication
- FreeCodeCamp: How to Implement the Outbox Pattern in Go and PostgreSQL
- itnext.io: Outbox Pattern in Go and Postgres
- Ajit Singh: Debezium and the Outbox Pattern — Postgres Impact
- squer.io: Stop Overusing the Outbox Pattern
- Conduktor: Outbox Pattern for Reliable Event Publishing
- bool.dev: Inbox and Outbox Patterns Practical Guide
- Medium: Architectural Microservices Patterns — SAGA, Outbox and CQRS with Kafka
- Streamkap: The Outbox Pattern Explained

