Webhook Reliability and Delivery Guarantees for AI Agent Platforms
Executive Summary
Webhooks are the connective tissue of modern event-driven architectures, and their reliability characteristics directly determine system-wide correctness. When an AI agent misses a payment confirmation, triggers a duplicate action, or processes events out of order, the consequences cascade unpredictably across business logic that was never designed for those edge cases. Getting webhooks right is not optional — it is a fundamental infrastructure concern.
The core insight that every practitioner eventually learns is that exactly-once delivery is a theoretical ideal, not an engineering reality. Even Stripe, GitHub, and AWS — who collectively process billions of webhook deliveries — operate on at-least-once semantics. The practical solution is to combine at-least-once delivery from the sender with idempotent receivers that can absorb duplicate deliveries without producing duplicate effects. This pairing, supported by idempotency keys and persistent deduplication stores, is the actual production-grade pattern.
For AI agent platforms specifically, the stakes are higher than for conventional SaaS. Agents act on webhook payloads autonomously: an agent that receives a duplicate task.completed event may mark a pipeline step done twice, skip a human review gate, or dispatch a second notification to a user. The infrastructure reliability layer must be treated as a first-class concern rather than delegated to whatever the hosting framework provides by default.
This article covers the full stack: delivery semantics, retry strategies with exponential backoff and circuit breakers, HMAC signature verification and replay attack prevention, dead letter queues, ordering challenges, rate limiting, infrastructure patterns from Stripe and GitHub, testing approaches, and the emerging standards (CloudEvents and Standard Webhooks) that are beginning to regularize the space. A dedicated section addresses AI agent-specific considerations that go beyond typical SaaS webhook guidance.
Delivery Semantics: At-Least-Once vs. Exactly-Once
The Fundamental Impossibility
Exactly-once delivery is the holy grail of distributed messaging — and it is essentially unachievable across network boundaries without distributed transaction protocols (like two-phase commit) that introduce unacceptable latency and complexity. The fundamental problem: the sender must know whether the receiver processed the message, but the acknowledgment itself can be lost. If the sender retries after a network timeout, the receiver may process the event twice. If the sender does not retry, the receiver may never process it at all.
This is not a solvable problem at the transport layer. It is a constraint imposed by the CAP theorem and the nature of asynchronous communication. The correct response is to accept it and build systems that tolerate it.
At-Least-Once: The Production Standard
At-least-once delivery guarantees that every event will be delivered — eventually — even if it means delivering it multiple times. This is the model used by:
- Stripe: Webhooks may be delivered more than once; endpoints should be idempotent
- GitHub: Guarantees eventual delivery but documents that duplicates are possible
- Slack: Explicitly states that event delivery may be attempted multiple times
- AWS SNS/SQS: At-least-once is the default; exactly-once requires FIFO queues with additional configuration
At-least-once is achievable because the sender only needs to retry until it receives a 2xx acknowledgment. It does not need to coordinate a distributed transaction or verify that the receiver's side effects completed.
At-Most-Once: The Simple but Lossy Alternative
At-most-once delivery fires and forgets — the sender makes one attempt and moves on regardless of outcome. This is appropriate only for genuinely loss-tolerant use cases like analytics events or heartbeat pings. For anything that drives business logic, at-most-once is the wrong choice.
Achieving Practical Exactly-Once: Idempotent Consumers
The industry standard pattern combines at-least-once delivery with idempotent processing on the receiver side:
// Idempotent webhook handler (TypeScript/Node.js)
import { createClient } from 'redis';
import { db } from './database';
const redis = createClient();
async function handleWebhookEvent(event: WebhookEvent): Promise<void> {
const idempotencyKey = `webhook:processed:${event.id}`;
// Attempt to acquire the lock atomically
// NX = only set if not exists, EX = expiry in seconds
const acquired = await redis.set(idempotencyKey, '1', {
NX: true,
EX: 86400, // 24-hour window to detect duplicates
});
if (!acquired) {
// Already processed — return 200 to prevent sender retrying
console.log(`Duplicate event ${event.id} — skipping`);
return;
}
// Process event exactly once
await db.transaction(async (tx) => {
await processBusinessLogic(tx, event);
await tx.insert('processed_events', {
event_id: event.id,
processed_at: new Date(),
payload_hash: hashPayload(event),
});
});
}
The Redis SET NX operation is atomic — only one instance of a horizontally scaled receiver will succeed in claiming the lock, preventing parallel duplicate processing. The 24-hour TTL ensures the deduplication store does not grow unbounded.
A simpler database-only approach using a unique constraint:
async function handleWebhookIdempotent(event: WebhookEvent): Promise<void> {
try {
// Unique constraint on event_id prevents double insertion
await db.insert('processed_events').values({
event_id: event.id,
processed_at: new Date(),
});
} catch (error) {
if (isUniqueConstraintViolation(error)) {
// Already processed — acknowledge and return
return;
}
throw error;
}
// Process only if insert succeeded
await processBusinessLogic(event);
}
Idempotency Keys in Practice
Idempotency keys are the unique identifiers that make deduplication possible. Best practices:
- Use the sender's event ID, not a locally generated ID. The webhook payload
idfield (orevent_id, depending on the platform) is stable across retries — the same event keeps its ID each time it is delivered. - Use the idempotency key as the storage key. Store processed event IDs in a database or Redis. Check this store before processing.
- Include payload hashing for corruption detection. If the same event ID arrives with a different payload, it indicates tampering or a bug — reject and alert.
- Set appropriate TTLs. Stripe recommends checking idempotency for at least 24 hours; 48–72 hours is safer for systems with long retry windows.
Retry Strategies
Exponential Backoff
The simplest retry strategy — fixed intervals — is dangerous in production. If all subscribers to a platform retry at T+5s after a failure, the recovering service faces a thundering herd at exactly that moment. Exponential backoff spreads retries across time:
delay(n) = base * (multiplier ^ n)
Typical parameters:
base: 1–5 secondsmultiplier: 2- Max delay: 1–24 hours
- Max attempts: 3–10 (depending on SLA)
Real-world schedules used by major platforms:
| Platform | Retry Schedule |
|---|---|
| Stripe | Immediately, then exponentially up to 3 days (total ~87 attempts) |
| Svix | 5s, 5m, 30m, 2h, 5h, 10h, 10h (7 attempts over ~27h) |
| GitHub | Hourly retries for up to 72 hours |
| Shopify | 19 attempts over 48 hours |
Jitter: Breaking Synchronization
Even exponential backoff can cause synchronized spikes if many subscribers failed at the same moment (e.g., during a platform-wide outage recovery). Jitter adds randomness:
function computeBackoffDelay(
attempt: number,
baseDelayMs: number = 1000,
maxDelayMs: number = 300_000, // 5 minutes
): number {
// Exponential: base * 2^attempt
const exponential = baseDelayMs * Math.pow(2, attempt);
const capped = Math.min(exponential, maxDelayMs);
// Full jitter: random value in [0, capped]
// This is generally preferred over "equal jitter" for distributed systems
return Math.random() * capped;
}
// Usage in a retry loop
async function deliverWithRetry(
endpoint: string,
payload: unknown,
maxAttempts: number = 7,
): Promise<void> {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
const response = await fetch(endpoint, {
method: 'POST',
body: JSON.stringify(payload),
headers: { 'Content-Type': 'application/json' },
signal: AbortSignal.timeout(30_000), // 30s timeout
});
if (response.ok) return; // Success
// Don't retry 4xx (except 429) — they won't succeed on retry
if (response.status >= 400 && response.status < 500 && response.status !== 429) {
throw new NonRetryableError(`Received ${response.status} — moving to DLQ`);
}
// 5xx or 429 — retry with backoff
} catch (error) {
if (error instanceof NonRetryableError) throw error;
// Network errors, timeouts — retry
}
if (attempt < maxAttempts - 1) {
const delay = computeBackoffDelay(attempt);
await sleep(delay);
}
}
throw new MaxAttemptsExceeded(`Failed after ${maxAttempts} attempts`);
}
Jitter alone can reduce synchronized retry spikes by over 80% in real-world systems.
Response Code Handling Matrix
Not all failures should be retried:
| HTTP Status | Meaning | Action |
|---|---|---|
| 2xx | Success | Acknowledge, mark delivered |
| 400 Bad Request | Malformed payload | Move to DLQ immediately — retrying won't help |
| 401 Unauthorized | Invalid credentials | Move to DLQ, alert operator |
| 404 Not Found | Endpoint removed | Disable endpoint, notify subscriber |
| 408 Request Timeout | Slow endpoint | Retry with backoff |
| 429 Too Many Requests | Rate limited | Retry with backoff, honor Retry-After header |
| 5xx Server Error | Transient failure | Retry with backoff |
| Network timeout | Infrastructure issue | Retry with backoff |
| DNS resolution failure | Configuration error | Retry limited times, then alert |
Circuit Breakers
Where exponential backoff operates at the individual event level, circuit breakers operate at the endpoint level. They prevent sending any events to an endpoint that has been consistently failing — giving it space to recover without being hammered:
enum CircuitState {
CLOSED = 'CLOSED', // Normal operation
OPEN = 'OPEN', // Blocking requests
HALF_OPEN = 'HALF_OPEN', // Testing recovery
}
class WebhookCircuitBreaker {
private state: CircuitState = CircuitState.CLOSED;
private failureCount: number = 0;
private lastFailureTime?: Date;
private successCount: number = 0;
constructor(
private readonly failureThreshold: number = 5,
private readonly recoveryTimeMs: number = 60_000, // 1 minute
private readonly halfOpenSuccessThreshold: number = 2,
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === CircuitState.OPEN) {
if (this.shouldAttemptReset()) {
this.state = CircuitState.HALF_OPEN;
this.successCount = 0;
} else {
// Route to holding queue instead
throw new CircuitOpenError('Circuit is OPEN — routing to holding queue');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess(): void {
if (this.state === CircuitState.HALF_OPEN) {
this.successCount++;
if (this.successCount >= this.halfOpenSuccessThreshold) {
this.state = CircuitState.CLOSED;
this.failureCount = 0;
}
}
this.failureCount = 0;
}
private onFailure(): void {
this.failureCount++;
this.lastFailureTime = new Date();
if (this.failureCount >= this.failureThreshold) {
this.state = CircuitState.OPEN;
}
}
private shouldAttemptReset(): boolean {
if (!this.lastFailureTime) return false;
return Date.now() - this.lastFailureTime.getTime() > this.recoveryTimeMs;
}
}
When a circuit opens, new events bypass the delivery queue and go to a holding queue. When the circuit closes (after the recovery window), the holding queue is drained at a controlled rate — preventing the thundering herd that would occur if all queued events were dispatched simultaneously.
HMAC Signature Verification
Why Signatures Matter
Without signature verification, any actor who knows your webhook endpoint URL can send arbitrary payloads to it. For an AI agent that acts autonomously on webhook content, this is a critical security surface: a malicious actor could trigger agent actions, inject false data, or cause denial-of-service.
HMAC (Hash-based Message Authentication Code) solves this by using a shared secret known only to the sender and receiver. The sender signs the payload, and the receiver verifies the signature before processing.
The HMAC-SHA256 Standard
The overwhelming industry standard is HMAC-SHA256, used by Stripe, GitHub, Shopify, Slack, and the Standard Webhooks specification:
import { createHmac, timingSafeEqual } from 'crypto';
interface WebhookVerificationOptions {
secret: string;
toleranceSeconds?: number; // Default: 300 (5 minutes)
}
function verifyWebhookSignature(
rawBody: Buffer,
headers: Record<string, string>,
options: WebhookVerificationOptions,
): boolean {
const { secret, toleranceSeconds = 300 } = options;
// Extract timestamp and signatures from header
// Stripe format: "t=1492774577,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a05bd539313b8d25746afe"
const sigHeader = headers['stripe-signature'] ?? headers['webhook-signature'];
if (!sigHeader) return false;
const parts = Object.fromEntries(
sigHeader.split(',').map(part => part.split('=') as [string, string])
);
const timestamp = parseInt(parts['t'], 10);
const receivedSignatures = sigHeader
.split(',')
.filter(p => p.startsWith('v1='))
.map(p => p.slice(3));
// 1. Validate timestamp to prevent replay attacks
const now = Math.floor(Date.now() / 1000);
if (Math.abs(now - timestamp) > toleranceSeconds) {
throw new WebhookError('Timestamp outside tolerance window — possible replay attack');
}
// 2. Compute expected signature
// Signed payload = timestamp + "." + raw body
const signedPayload = `${timestamp}.${rawBody.toString('utf8')}`;
const expectedSignature = createHmac('sha256', secret)
.update(signedPayload)
.digest('hex');
// 3. Constant-time comparison to prevent timing attacks
const expectedBuffer = Buffer.from(expectedSignature, 'hex');
return receivedSignatures.some(sig => {
const sigBuffer = Buffer.from(sig, 'hex');
if (sigBuffer.length !== expectedBuffer.length) return false;
return timingSafeEqual(sigBuffer, expectedBuffer);
});
}
Why Timing-Safe Comparison is Non-Negotiable
Standard string comparison (===) short-circuits on the first mismatched character. An attacker can measure response times to determine how many leading characters of their guessed signature were correct, eventually reconstructing the valid signature through statistical analysis. timingSafeEqual compares all bytes in constant time, eliminating this timing side-channel.
Replay Attack Prevention
HMAC alone prevents payload tampering but not replay attacks. An attacker who captures a valid request (including its valid signature) can replay it hours or days later. The defense is timestamp inclusion:
- The sender includes a timestamp in the signature header (
t=1492774577) - The sender includes the timestamp in the signed data (so the signature is over
"timestamp.body"not just"body") - The receiver rejects any request where
|now - timestamp| > tolerance
// Middleware for Express — reject stale webhooks
function webhookTimestampGuard(toleranceSeconds: number = 300) {
return (req: Request, res: Response, next: NextFunction) => {
const sigHeader = req.headers['stripe-signature'] as string;
const timestampStr = sigHeader?.match(/t=(\d+)/)?.[1];
if (!timestampStr) {
return res.status(400).json({ error: 'Missing timestamp in signature header' });
}
const timestamp = parseInt(timestampStr, 10);
const now = Math.floor(Date.now() / 1000);
if (Math.abs(now - timestamp) > toleranceSeconds) {
return res.status(400).json({ error: 'Request timestamp too old — possible replay attack' });
}
next();
};
}
Standard tolerance window: 5 minutes (300 seconds). This accommodates clock skew between distributed systems while keeping the replay window small. Some high-security implementations use 3 minutes.
GitHub's Webhook Verification Pattern
GitHub uses a slightly different scheme — the signature is in X-Hub-Signature-256 as sha256=<hex>:
function verifyGitHubWebhook(
payload: string,
signature: string,
secret: string,
): boolean {
const expected = `sha256=${createHmac('sha256', secret).update(payload).digest('hex')}`;
const expectedBuffer = Buffer.from(expected);
const receivedBuffer = Buffer.from(signature);
if (expectedBuffer.length !== receivedBuffer.length) return false;
return timingSafeEqual(expectedBuffer, receivedBuffer);
}
Note: GitHub does not include a timestamp in the signature — their replay attack mitigation is delivery-ID-based deduplication rather than timestamp validation.
Key Rotation
Shared secrets should be rotatable without downtime. Best practice:
// Support multiple valid secrets during rotation window
async function verifyWithKeyRotation(
rawBody: Buffer,
headers: Record<string, string>,
secrets: string[], // [currentSecret, oldSecret]
): Promise<boolean> {
return secrets.some(secret =>
verifyWebhookSignature(rawBody, headers, { secret })
);
}
Maintain two valid secrets during rotation: the new key and the old key. After all senders have been updated, remove the old key.
Dead Letter Queues
Why Events End Up in the DLQ
Dead letter queues capture events that have exhausted all retry attempts. Common causes:
- Persistent endpoint unavailability: Downstream service is down for longer than the retry window
- Permanently malformed payloads: The receiver always returns 400 because the payload does not match the expected schema
- Business logic errors: The receiver returns 500 because of a bug in event processing
- Credential rotation failures: HMAC verification fails because the subscriber's secret is stale
- Rate limit exhaustion: The receiver's rate limit is too low for the event volume
Events should also be sent directly to the DLQ (bypassing retries) when the failure is clearly non-transient — specifically on 4xx responses other than 429.
DLQ Architecture
Event Producer
│
▼
Delivery Queue ──(retry exhausted)──► Dead Letter Queue
│ │
Workers DLQ Inspector
│ │
Endpoint Alert → Ops Team
│
Manual Replay CLI
│
Delivery Queue (re-enqueue)
// NestJS-style DLQ handler
@Injectable()
class DeadLetterQueueService {
constructor(
private readonly db: DatabaseService,
private readonly alertService: AlertService,
private readonly deliveryQueue: DeliveryQueueService,
) {}
async enqueue(event: FailedEvent): Promise<void> {
await this.db.insert('dead_letter_queue', {
event_id: event.id,
endpoint_id: event.endpointId,
payload: event.payload,
failure_reason: event.lastError,
attempt_count: event.attemptCount,
last_attempted_at: event.lastAttemptedAt,
enqueued_at: new Date(),
});
// Alert on first DLQ entry for an endpoint in the past hour
await this.alertService.maybeSendDLQAlert(event.endpointId);
}
// Manual replay: re-enqueue with reset attempt counter
async replay(dlqItemId: string): Promise<void> {
const item = await this.db.findOne('dead_letter_queue', dlqItemId);
if (!item) throw new NotFoundError(`DLQ item ${dlqItemId} not found`);
await this.deliveryQueue.enqueue({
...item,
attempt_count: 0,
replayed_from_dlq: true,
replayed_at: new Date(),
});
await this.db.update('dead_letter_queue', dlqItemId, {
status: 'replayed',
replayed_at: new Date(),
});
}
// Batch replay — rate limited to avoid thundering herd
async replayBatch(endpointId: string, batchSize: number = 100): Promise<number> {
const items = await this.db.query(
'SELECT * FROM dead_letter_queue WHERE endpoint_id = ? AND status = ? LIMIT ?',
[endpointId, 'pending', batchSize]
);
// Stagger replays to avoid overwhelming the recovering endpoint
for (const item of items) {
await this.replay(item.id);
await sleep(100); // 10 RPS max during replay
}
return items.length;
}
}
Alerting Thresholds
Effective DLQ alerting avoids both alert fatigue (too sensitive) and missed incidents (too loose):
| Trigger | Action | Urgency |
|---|---|---|
| First DLQ entry for endpoint | Slack notification | Low |
| 5+ DLQ entries in 10 minutes | Page on-call | Medium |
| DLQ depth > 1000 events | Incident | High |
| DLQ growing faster than draining | Incident | High |
| Same event ID in DLQ 3+ times | Investigate payload | Medium |
Observability Metrics
Key DLQ metrics to track:
dlq.depth— total events in queue (per endpoint)dlq.enqueue_rate— events entering DLQ per minutedlq.age_max— age of oldest DLQ event (indicates how long issues go unresolved)dlq.replay_success_rate— fraction of replayed events that succeedwebhook.delivery_success_rate— overall delivery health (7/28-day views)webhook.latency_p99— time from event emission to delivery acknowledgment
Webhook vs. WebSocket Trade-offs
Protocol Comparison
| Dimension | Webhook (HTTP) | WebSocket |
|---|---|---|
| Connection model | Stateless, new connection per event | Persistent bidirectional connection |
| Latency | 50–500ms (connection overhead) | <10ms (connection already open) |
| Infrastructure | Any HTTP server | Requires sticky sessions or pub/sub layer |
| Scalability | Horizontally simple — stateless | Requires connection state management |
| Reliability | Built-in retry infrastructure | Application-level reconnect logic needed |
| Push direction | Server → Client only | Bidirectional |
| Debugging | HTTP logs, standard tooling | Requires WebSocket-aware tooling |
| Firewall/proxy | Works through all firewalls | May be blocked; falls back to long-polling |
When to Use Webhooks
Webhooks are the right choice when:
- Events are infrequent and discrete: Order placed, payment confirmed, file uploaded
- Receiver is a server, not a browser: B2B integrations, Zapier/n8n flows, CI/CD pipelines
- Guaranteed delivery matters more than latency: Business-critical events that must not be lost
- The sender and receiver are different organizations: No shared infrastructure, standard HTTP is the lingua franca
- Receivers need to be independently scalable: Each subscriber processes at its own pace
When to Use WebSockets
WebSockets are the right choice when:
- Sub-100ms latency is required: Live collaborative editing, real-time trading, multiplayer games
- Bidirectional communication is needed: The client sends updates as well as receiving them
- High message frequency: Hundreds of events per second per connection
- Streaming AI responses: Token-by-token LLM output streaming, agent status updates
- Live dashboards: Metrics, logs, activity feeds that update continuously
The Liveblocks Pattern: Building AI Agents on WebSockets
Liveblocks, which builds collaborative AI copilots, chose WebSockets over HTTP for their agent infrastructure. Their reasoning:
"Copilots need to do more than return text — they should call tools, render UI, and give users control when manual confirmation is required. WebSockets are especially valuable because every client stays in sync whenever a user acts."
This reflects a broader pattern: AI agents that need to maintain shared state with multiple clients (users, other agents, dashboards) benefit from the always-open connection that WebSockets provide.
Hybrid Architectures
Most production AI agent platforms use both:
External Systems AI Agent Platform
│ │
│ Webhooks (inbound events) │
├──────────────────────────────────► │
│ │
│ Internal event bus
│ │
│ Agent Workers
│ │
│ WebSockets (real-time UI sync) │
│ ◄──────────────────────────────── │
│ │
Browser Clients Web Dashboard
The pattern: receive external events via webhooks (reliable, authenticated, retry-enabled), deliver real-time updates to UI clients via WebSockets (low latency, bidirectional). Webhooks and WebSockets are complementary, not competing.
Server-Sent Events as a Middle Ground
SSE (Server-Sent Events) provides a one-directional push channel over HTTP/1.1 without the complexity of WebSocket upgrade negotiation. For AI agent platforms that only need to push events to browser clients (not receive client pushes), SSE is often sufficient:
- No special proxy configuration required
- Automatic reconnection built into the browser API
- Works with standard HTTP caching and auth middleware
- But: unidirectional only, no binary frame support
Event Ordering Guarantees
Why Ordering is Hard
Even if the sender emits events in strict order, delivery order is not guaranteed because:
- Parallel delivery workers: Event 1 and Event 2 are dispatched simultaneously by different workers; Event 2 may reach the endpoint first if Event 1 encounters a transient network issue
- Retry asymmetry: Event 1 fails and is retried at T+5s; Event 2 succeeds immediately. The receiver sees Event 2 before Event 1.
- CDN/proxy routing: Different HTTP connections may take different network paths with different latencies
- Multi-datacenter delivery: Events sent from geographically distributed infrastructure may arrive out of order
Svix, one of the leading webhook infrastructure providers, states directly: "You can't guarantee webhook ordering." This is not a limitation of Svix specifically — it is a fundamental property of distributed HTTP delivery.
Sequence Numbers
The best mitigation is including sequence numbers in event payloads:
interface WebhookEvent {
id: string; // Unique event ID (idempotency key)
type: string; // e.g. "order.updated"
sequence: number; // Monotonically increasing per entity
entity_id: string; // The entity this event is about
entity_version: number; // Entity state version after this event
timestamp: string; // ISO 8601
data: unknown; // Event payload
}
The receiver can then detect out-of-order delivery:
async function handleOrderEvent(event: WebhookEvent): Promise<void> {
const current = await db.getEntityState(event.entity_id);
// Reject events older than current state
if (event.entity_version <= current.version) {
console.log(`Stale event — current version ${current.version}, event version ${event.entity_version}`);
return; // Acknowledge (don't retry), but don't apply
}
// Buffer if a gap exists
if (event.entity_version > current.version + 1) {
await eventBuffer.store(event);
// Try to fetch the missing events via API polling
await backfillMissingEvents(event.entity_id, current.version + 1, event.entity_version - 1);
return;
}
// Apply in-order event
await applyEvent(event);
await drainBufferedEvents(event.entity_id, event.entity_version + 1);
}
Thin Payload + Pull Pattern
An alternative that sidesteps ordering entirely: send "thin" webhook payloads containing only the event type and entity ID, then have the receiver pull the full current state from an API:
// Thin payload approach
interface ThinWebhookPayload {
event_type: 'order.updated';
order_id: string;
timestamp: string;
}
async function handleThinWebhook(payload: ThinWebhookPayload): Promise<void> {
// Don't trust the payload for state — fetch authoritative current state
const order = await apiClient.getOrder(payload.order_id);
await updateLocalOrderState(order);
}
This sacrifices the efficiency of embedded payloads but guarantees the receiver always has the correct final state, regardless of delivery order. The trade-off: higher API call volume, but simpler correctness model.
Per-Entity Ordering Channels
For systems where ordering matters, partition the delivery queue by entity ID so that all events for a given entity are processed serially:
// Route events to worker based on entity_id hash
function getWorkerIndex(entityId: string, workerCount: number): number {
const hash = murmurhash(entityId);
return hash % workerCount;
}
// All events for entity "order-123" always go to the same worker
// Preserving intra-entity ordering even with multiple workers
This preserves ordering within an entity while allowing parallel processing across different entities.
Rate Limiting Outbound Webhooks
Why Rate Limiting Matters
Without rate limits, a platform that generates bursts of events (e.g., bulk imports, batch operations, end-of-day processing) will deliver those bursts directly to subscriber endpoints. A subscriber receiving 10,000 events per minute when their endpoint handles 100 per minute will cascade into failures, DLQ overflow, and potential outages for the subscriber.
Rate limiting is about protecting receivers, not senders.
Token Bucket Implementation
The token bucket algorithm is well-suited for webhook rate limiting because it smooths bursts while allowing short-term spikes:
class TokenBucketRateLimiter {
private tokens: number;
private readonly maxTokens: number;
private readonly refillRatePerMs: number;
private lastRefillTime: number;
constructor(
private readonly endpointId: string,
maxRequestsPerMinute: number,
) {
this.maxTokens = maxRequestsPerMinute;
this.tokens = maxRequestsPerMinute;
this.refillRatePerMs = maxRequestsPerMinute / 60_000;
this.lastRefillTime = Date.now();
}
canDeliver(): boolean {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return true;
}
return false;
}
timeUntilNextToken(): number {
this.refill();
if (this.tokens >= 1) return 0;
return (1 - this.tokens) / this.refillRatePerMs;
}
private refill(): void {
const now = Date.now();
const elapsed = now - this.lastRefillTime;
const newTokens = elapsed * this.refillRatePerMs;
this.tokens = Math.min(this.maxTokens, this.tokens + newTokens);
this.lastRefillTime = now;
}
}
Per-Destination Configuration
Different subscriber endpoints have different capacity. A rate limiting system should support per-destination configuration:
interface EndpointRateLimitConfig {
endpoint_id: string;
max_requests_per_minute: number;
burst_allowance: number; // Allow short bursts above the rate
behavior_on_limit: 'queue' | 'drop' | 'error';
}
// Hookdeck's approach: adjustable per destination
// "Each user can have multiple destinations with different rate limits"
Handling 429 Responses
When a receiver returns 429, the webhook infrastructure should:
- Immediately pause delivery to that endpoint
- Check the
Retry-Afterheader — if present, wait exactly that long - If no
Retry-After, apply exponential backoff - Resume delivery at a reduced rate after the wait period
async function handleRateLimitResponse(
response: Response,
endpoint: WebhookEndpoint,
): Promise<number> {
const retryAfter = response.headers.get('Retry-After');
if (retryAfter) {
// Could be a timestamp or delay in seconds
const isTimestamp = isNaN(Number(retryAfter));
if (isTimestamp) {
return new Date(retryAfter).getTime() - Date.now();
}
return parseInt(retryAfter, 10) * 1000;
}
// No Retry-After — use endpoint's current backoff
return computeBackoffDelay(endpoint.consecutiveRateLimitHits);
}
Webhook Infrastructure at Scale
Stripe's Architecture
Stripe processes billions of webhook deliveries monthly. Key architectural decisions:
-
Async decoupling: Events are persisted immediately after creation; delivery is handled by a separate async pipeline. The API response does not wait for webhook delivery.
-
Standardized envelope: Every Stripe webhook event has consistent fields (
id,type,created,livemode,data.object), enabling generic processing without custom parsers per event type. -
Resilient delivery with DLQs: Stripe uses AWS SQS with DLQ integration for resilient event delivery. Events that fail all retries go to an SQS Dead Letter Queue with configurable retry policies.
-
Idempotent event IDs:
evt_*identifiers are stable across retries. Consumers can safely use them as idempotency keys. -
Webhook endpoint health monitoring: Stripe automatically disables endpoints that consistently fail (after 72 hours of failures) and sends email notifications.
GitHub's Architecture
GitHub's webhook system emphasizes simplicity and observability:
-
10-second acknowledgment window: GitHub expects endpoints to return 2xx within 10 seconds. Process asynchronously — immediately acknowledge receipt, then process in a background job.
-
30-day delivery history: GitHub exposes webhook delivery history via API, enabling subscribers to replay failed deliveries manually.
-
Per-repository and organization-level hooks: Allows fine-grained event routing at the source rather than requiring receivers to filter.
-
X-GitHub-Deliveryheader: A unique UUID per delivery attempt, usable as an idempotency key. -
Recent delivery logs: The GitHub UI shows recent deliveries with request/response details — invaluable for debugging.
Slack's Architecture
Slack's event API highlights challenges unique to high-volume platforms:
-
3-second acknowledgment requirement: Even stricter than GitHub. Slack's platform retries if no 2xx within 3 seconds.
-
Challenge verification: Before enabling webhooks, Slack sends a challenge request that the receiver must echo back — a one-time verification that the endpoint is legitimate.
-
retry_numandretry_reasonheaders: Slack includes these headers on retry deliveries, enabling receivers to distinguish first delivery from retries without relying on event ID deduplication alone. -
Event deduplication via
event_id:X-Slack-Retry-Num: 1combined with the stableevent_idgives receivers everything they need for idempotent processing.
Svix: Infrastructure for Webhook Providers
Svix provides webhook infrastructure-as-a-service for teams building webhook systems. Their architecture insights:
- Message queue + workers: Decouple ingestion from delivery; workers pull from a queue and attempt delivery, enabling horizontal scaling of delivery capacity independently of ingestion
- Automatic endpoint management: Track endpoint health, auto-disable consistently failing endpoints, notify application owners
- Consumer portal: Embeddable UI that shows subscribers their own delivery logs, allowing self-service debugging without contacting support
General Scalable Architecture Pattern
┌─────────────────────────────────┐
│ Event Producer │
│ (API, DB trigger, scheduler) │
└──────────────┬──────────────────┘
│ Persist event
▼
┌─────────────────────────────────┐
│ Events Database │
│ (authoritative event store) │
└──────────────┬──────────────────┘
│ Fan-out
▼
┌─────────────────────────────────┐
│ Delivery Queue (per tenant) │
│ (Redis / SQS / Kafka) │
└──────────────┬──────────────────┘
┌──────────────┼──────────────────┐
▼ ▼ ▼
Worker 1 Worker 2 Worker N
│ │ │
Rate limiter Rate limiter Rate limiter
Circuit breaker Circuit breaker Circuit breaker
│ │ │
Endpoint A Endpoint B Endpoint C
│
(on failure)
▼
Dead Letter Queue
│
Alert + Manual Replay
Testing Webhook Integrations
The Testing Challenge
Webhooks present a unique testing challenge: the sender is a third-party system outside your control. You can't simply call a function in a unit test — you need an external system to push an HTTP request to your endpoint.
Local Tunnels with ngrok
ngrok creates a secure tunnel from a public HTTPS URL to your local development server:
# Install ngrok
npm install -g ngrok
# Expose local port 3000
ngrok http 3000
# Output:
# Forwarding https://abc123.ngrok.io -> http://localhost:3000
With the public URL, you can configure real webhook providers (Stripe, GitHub, etc.) to deliver to your local environment. ngrok's web interface at http://localhost:4040 shows all incoming requests with full headers and bodies — invaluable for debugging.
ngrok CLI for automated testing:
# Capture and replay webhooks
ngrok http 3000 --log=stdout | \
grep "url=" | \
head -1 | \
sed 's/.*url=//'
Alternatives to ngrok:
- Cloudflare Tunnel: Free, no rate limits, works with Cloudflare's global network
- LocalTunnel: Open-source, self-hostable
- Smee.io: Lightweight, purpose-built for webhook proxying
- Webhook Relay: Persistent URLs even when your tunnel restarts
Mocking Webhook Payloads
For unit and integration tests that don't require live external systems:
// Jest example: mock webhook delivery
import { createHmac } from 'crypto';
import { webhookHandler } from './webhook-handler';
describe('Webhook Handler', () => {
const secret = 'test-webhook-secret';
function createStripeWebhookRequest(payload: object): {
body: Buffer;
headers: Record<string, string>;
} {
const body = Buffer.from(JSON.stringify(payload));
const timestamp = Math.floor(Date.now() / 1000);
const signedPayload = `${timestamp}.${body.toString('utf8')}`;
const signature = createHmac('sha256', secret)
.update(signedPayload)
.digest('hex');
return {
body,
headers: {
'stripe-signature': `t=${timestamp},v1=${signature}`,
'content-type': 'application/json',
},
};
}
it('processes payment.succeeded idempotently', async () => {
const payload = {
id: 'evt_test_123',
type: 'payment_intent.succeeded',
data: { object: { id: 'pi_test_456', amount: 2000 } },
};
const req = createStripeWebhookRequest(payload);
// First delivery
const result1 = await webhookHandler(req.body, req.headers);
expect(result1.status).toBe(200);
// Duplicate delivery — should not double-process
const result2 = await webhookHandler(req.body, req.headers);
expect(result2.status).toBe(200);
// Verify business effect occurred exactly once
const payment = await db.findPayment('pi_test_456');
expect(payment.processedCount).toBe(1);
});
it('rejects replayed requests outside tolerance window', async () => {
const payload = { id: 'evt_replay_test', type: 'payment_intent.succeeded' };
const staleTimestamp = Math.floor(Date.now() / 1000) - 600; // 10 minutes ago
const body = Buffer.from(JSON.stringify(payload));
const signature = createHmac('sha256', secret)
.update(`${staleTimestamp}.${body}`)
.digest('hex');
const req = {
body,
headers: { 'stripe-signature': `t=${staleTimestamp},v1=${signature}` },
};
const result = await webhookHandler(req.body, req.headers);
expect(result.status).toBe(400);
});
});
End-to-End Testing with Wiremock
For integration tests in CI/CD pipelines without external dependencies:
# docker-compose.test.yml
services:
wiremock:
image: wiremock/wiremock:latest
ports:
- "8080:8080"
volumes:
- ./test/wiremock:/home/wiremock
// test/wiremock/mappings/stripe-webhook.json
{
"request": {
"method": "POST",
"url": "/stripe/webhook"
},
"response": {
"status": 200,
"body": "{\"received\": true}"
}
}
Testing with Stripe CLI
# Stripe CLI — forward real Stripe events to local endpoint
stripe listen --forward-to localhost:3000/webhooks/stripe
# Trigger specific event types for testing
stripe trigger payment_intent.succeeded
stripe trigger customer.subscription.deleted
This combines the realism of actual Stripe event payloads with the convenience of local development.
CI/CD Webhook Testing Checklist
- Unit tests for signature verification (valid, invalid, replay attack)
- Unit tests for idempotent processing (first delivery, duplicate delivery)
- Integration tests with mocked payloads for all handled event types
- Load tests for high-volume scenarios
- Chaos tests: what happens when the webhook handler is slow (>3s)?
- Contract tests: does the payload schema match the documented API?
Emerging Standards
Standard Webhooks
The Standard Webhooks initiative (standardwebhooks.com), led by Svix with backing from Zapier, Twilio, Lob, Mux, ngrok, Supabase, and Kong, aims to eliminate the fragmentation in webhook implementations. The specification defines:
Required Headers:
| Header | Description |
|---|---|
webhook-id | Unique message ID — same across retries, use as idempotency key |
webhook-timestamp | Unix timestamp — integer seconds since epoch |
webhook-signature | HMAC-SHA256 signatures (may include multiple for key rotation) |
Signature Format:
webhook-signature: v1,<base64-encoded-hmac-sha256>
Multiple signatures (for key rotation):
webhook-signature: v1,base64_sig1 v1,base64_sig2
Signed Payload Construction:
{webhook-id}.{webhook-timestamp}.{body}
Why Standard Webhooks Matters:
Instead of each SDK implementing platform-specific signature verification, a single StandardWebhooks library covers all compliant providers:
import { Webhook } from 'standardwebhooks';
const wh = new Webhook(secret);
// Works for any Standard Webhooks-compliant provider
const event = wh.verify(rawBody, headers);
CloudEvents
CloudEvents is a CNCF graduated project (January 2024) that standardizes the envelope format for event data:
{
"specversion": "1.0",
"type": "com.example.order.created",
"source": "https://example.com/orders",
"subject": "order-123",
"id": "evt-abc123",
"time": "2026-02-26T10:00:00Z",
"datacontenttype": "application/json",
"data": {
"order_id": "order-123",
"total": 9999
}
}
CloudEvents also defines an HTTP webhook binding specification that includes:
WebHook-Request-Originheader for abuse protection (sender identifies itself)WebHook-Request-Rateheader for rate negotiation- A handshake validation mechanism to prevent SSRF-style abuse
CloudEvents is gaining adoption in cloud-native infrastructure (Azure Event Grid, Google Eventarc, Knative Eventing, AWS EventBridge all support it), making it particularly relevant for AI agents deployed in cloud environments.
The Convergence
Standard Webhooks focuses on security and delivery mechanics (signatures, idempotency). CloudEvents focuses on event semantics (envelope format, source/type taxonomy). They are complementary: a CloudEvents payload can be delivered via a Standard Webhooks-compliant transport.
AI Agent-Specific Considerations
Autonomous Action Risk
Traditional webhook consumers are humans or deterministic code — they either process events correctly or fail visibly. AI agents operate differently: they may take autonomous actions (sending messages, modifying data, calling external APIs) based on webhook content. The consequences of duplicate delivery are therefore more severe:
- A duplicate
task.assignedevent may cause an agent to send two notifications to a user - A duplicate
payment.confirmedevent may trigger two fulfillment workflows - A duplicate
message.receivedevent may generate two AI responses in a chat thread
Mitigation: Implement idempotency at every layer. The webhook receiver should be idempotent, but the downstream agent actions should also be idempotent or guarded by unique action IDs.
Long-Running Agent Tasks
AI agents often handle long-running tasks (minutes to hours). Webhook delivery systems expect acknowledgment within seconds (typically 3–30 seconds). The solution: immediate acknowledgment + async processing:
// Acknowledge immediately, process asynchronously
app.post('/webhooks/agent-task', async (req, res) => {
// 1. Verify signature immediately
if (!verifySignature(req)) {
return res.status(401).json({ error: 'Invalid signature' });
}
// 2. Persist event to durable queue
const jobId = await taskQueue.enqueue({
eventId: req.body.id,
payload: req.body,
receivedAt: new Date(),
});
// 3. Acknowledge within seconds — don't wait for agent processing
res.status(202).json({
received: true,
job_id: jobId,
// Optionally provide a status endpoint for polling
status_url: `/jobs/${jobId}/status`,
});
// Processing happens asynchronously in background workers
});
Agent-to-Agent Webhooks
As multi-agent systems become common, agents receive webhooks from other agents — not just from external platforms. Unique considerations:
- Recursive call loops: Agent A receives an event and calls Agent B; Agent B's response triggers an event that calls Agent A again. Implement loop detection via call-chain headers:
interface AgentWebhookHeaders {
'x-agent-call-chain': string; // Comma-separated agent IDs
'x-agent-call-depth': string; // Current depth as a string
'x-agent-trace-id': string; // Root trace ID
}
function detectCallLoop(headers: AgentWebhookHeaders, agentId: string): boolean {
const chain = headers['x-agent-call-chain']?.split(',') ?? [];
return chain.includes(agentId);
}
-
Trust levels: Agent-to-agent webhooks require authentication — use per-agent API keys or short-lived JWT tokens, not a single shared secret.
-
Event schema versioning: Agents may be updated independently. Include a schema version field and handle backward compatibility:
interface AgentEvent {
schema_version: '1.0' | '1.1' | '2.0';
// ... other fields
}
Webhook Fan-Out for Multi-Tenant Agent Platforms
Platforms hosting multiple AI agents (like BotsHub) need to route incoming webhooks to the correct agent:
// Fan-out: one incoming webhook → multiple agent handlers
async function fanOutWebhook(event: WebhookEvent): Promise<void> {
// Find all agents subscribed to this event type
const subscribers = await db.findSubscribersForEvent({
tenantId: event.tenantId,
eventType: event.type,
});
// Deliver to each subscriber independently
// Failure for one subscriber doesn't affect others
await Promise.allSettled(
subscribers.map(subscriber =>
deliveryQueue.enqueue({
eventId: `${event.id}:${subscriber.agentId}`,
agentId: subscriber.agentId,
payload: event,
})
)
);
}
Key principle: fan-out copies must have independent delivery tracking. If delivery to Agent A fails, it should not block or affect delivery to Agent B.
Observability for Agent Webhook Processing
Traditional webhook observability tracks delivery from sender to receiver. For agent platforms, you need to track the full chain:
External Event → Webhook Received → Agent Task Created →
Agent Processing Started → Tool Calls Made →
Agent Response Generated → Side Effects Applied
Each step should be traceable via a shared trace_id derived from the original webhook event ID. This enables debugging questions like: "Why did this payment webhook cause the agent to send 3 messages to the user?"
Practical Recommendations
For Webhook Producers (Building Webhook Systems)
-
Adopt Standard Webhooks headers: Use
webhook-id,webhook-timestamp, andwebhook-signature. This gives consumers a standard library path for verification. -
Include sequence numbers and entity version numbers in payloads. This gives consumers the data they need to detect and handle out-of-order delivery.
-
Implement multi-stage retry with exponential backoff and jitter: Immediate retry → short backoff → long backoff → DLQ. Use 7 attempts over 24 hours as a reasonable default.
-
Support multiple active secrets for key rotation. Never require subscribers to update their secret during a maintenance window.
-
Expose delivery logs via API. Subscribers debugging failures should not need to contact your support team.
-
Use circuit breakers per endpoint, not per event. When an endpoint goes unhealthy, hold events in a buffer rather than dropping them.
-
Provide a test/replay mechanism. A
POST /webhooks/{event_id}/redeliverendpoint makes subscriber debugging dramatically easier.
For Webhook Consumers (Building Receivers)
-
Verify signatures before processing. Use constant-time comparison. Reject requests outside the 5-minute timestamp window.
-
Acknowledge immediately, process asynchronously. Return 2xx within the provider's timeout window (usually 3–30s). Enqueue to a durable queue, then process in background workers.
-
Store processed event IDs with a 24–72-hour TTL. Check this store before processing to handle at-least-once delivery correctly.
-
Handle 4xx failures from your own code carefully. Returning 500 causes the sender to retry (possibly desirable); returning 400 tells the sender the event is unprocessable (will land in DLQ). Choose intentionally.
-
Design for out-of-order delivery. Use entity version numbers to detect stale events. Implement buffering for out-of-order events when ordering matters.
-
Monitor webhook processing latency and error rates. Alert when DLQ depth grows. Track time-to-process as a leading indicator of capacity issues.
For AI Agent Platforms Specifically
-
Treat idempotency as a first-class concern at every layer: receiver idempotency, agent task idempotency, and downstream action idempotency.
-
Use hybrid WebSocket + webhook architectures: webhooks for reliable delivery of discrete events from external systems, WebSockets for real-time bidirectional communication with UI clients and agent dashboards.
-
Implement call-chain tracking for agent-to-agent webhooks to prevent infinite loops.
-
Track the full processing chain with a shared trace ID from webhook receipt to side effects. This is essential for debugging autonomous agent behavior.
-
Use fan-out with independent delivery tracking when routing events to multiple agents. One agent's failure should never block another's delivery.
-
Apply rate limits both outbound and inbound. Protect your subscribers from burst traffic (outbound limiting) and protect your agents from being overwhelmed (inbound rate limiting with backpressure).
-
Evaluate CloudEvents adoption if operating in cloud-native infrastructure. Azure Event Grid, Google Eventarc, and Knative all speak CloudEvents — the standard envelope eliminates custom parsing per provider.
Summary Reference Table
| Concern | Recommendation | Key Tools/Standards |
|---|---|---|
| Delivery semantics | At-least-once + idempotent consumers | Redis SET NX, DB unique constraints |
| Retry strategy | Exponential backoff with full jitter | 7 attempts over 24h |
| Circuit breaker | Per-endpoint, not per-event | CLOSED → OPEN → HALF_OPEN |
| Signature verification | HMAC-SHA256 + timestamp | Standard Webhooks spec |
| Replay prevention | 5-minute timestamp window | Constant-time comparison |
| Failed delivery | Dead letter queue with alerting | SQS DLQ, custom DLQ |
| Event ordering | Sequence numbers + entity version | Buffer + backfill pattern |
| Rate limiting | Token bucket per destination | 429 + Retry-After handling |
| Real-time UI | WebSocket in addition to webhooks | Hybrid architecture |
| Standardization | Standard Webhooks + CloudEvents | standardwebhooks.com, cloudevents.io |
| Testing | ngrok + Stripe CLI + mocked payloads | Jest, Wiremock |
| AI agents | Async acknowledgment + trace ID chain | Fan-out with independent tracking |
References
- At-Least-Once vs. Exactly-Once Webhook Delivery Guarantees — Hookdeck
- How to Implement Webhook Idempotency — Hookdeck
- Webhooks at Scale: Best Practices and Lessons Learned — Hookdeck
- Webhook Retry Best Practices — Svix
- Why You Can't Guarantee Webhook Ordering — Svix Blog
- Announcing Standard Webhooks — Svix Blog
- Dead Letter Queue — Svix Resources
- How We Built a Rate Limiter for Outbound Webhooks — Hookdeck
- Building Resilient Webhook Handlers in AWS: DLQs for Stripe Events — Stripe Dev Blog
- HMAC Webhook Security — webhooks.fyi
- Replay Prevention — webhooks.fyi
- Why We Built Our AI Agents on WebSockets Instead of HTTP — Liveblocks
- CloudEvents Specification — CNCF
- Standard Webhooks Specification — GitHub
- How to Implement Event Ordering Guarantees — OneUptime Blog
- Webhook Security Fundamentals — Hooklistener
- Connect AI Agents with Webhooks — EverWorker
- Building a Webhook System with NestJS — DEV Community
- Mastering Exponential Backoff in Distributed Systems — BetterStack
- Webhook Architecture Patterns for Real-Time Integrations — Technori

