Webhook retries and backoff without duplicates
Retries are a feature: providers assume your endpoint might be down and will try again.
The goal is to make duplicates harmless by building idempotency into your processing path.
No credit card required
TL;DR
- Retries are normal: providers retry on timeouts, 5xx, and transient network failures.
- Duplicates are optional: enforce receiver-side idempotency with a dedupe key + window.
- Classify failures: retryable (timeouts/5xx) vs permanent (validation/4xx) and handle differently.
- Use exponential backoff + jitter, plus a max-attempt policy and a dead-letter path.
- Make side effects idempotent (DB unique constraints, idempotency keys for downstream APIs).
- Track retry counts, duplicate rate, DLQ volume, and end-to-end latency as first-class metrics.
If you are still processing inline, start with Webhook API.
Anti-patterns
- Retrying everything (including validation errors and auth failures).
- No dedupe store: every retry becomes a second side effect.
- Sleeping/retrying inside the webhook HTTP handler instead of a worker.
If you need idempotency basics, see payment webhooks.
Core concepts
There are only two reliable assumptions: the provider will retry, and you will see duplicates.
At-least-once delivery
Providers retry deliveries because they cannot know if you processed the event. 2xx means “stop retrying this attempt”.
Dedupe key
Pick a stable key: provider event ID, delivery ID, or a stable hash of raw payload + key headers. Store it for a window.
Exactly-once effects
You can’t guarantee exactly-once delivery, but you can guarantee exactly-once side effects with idempotency + durable writes.
Simple flow
Provider
Sends event
may retry
Queue
Persist first
dedupe keys
Worker
Retry safely
Ack/Nack/Reject
Backoff belongs in workers, not in webhook HTTP handlers. Keep the inbound path fast and deterministic.
Production checklist
A practical list for “duplicate-proof” webhook processing, including backoff and dead-lettering.
- [ ] Assume at-least-once delivery (retries + duplicates)
- [ ] Dedupe key chosen per provider (event id / delivery id) + fallback stable hash
- [ ] Dedupe store with TTL/window (DB unique constraint, Redis SETNX, etc.)
- [ ] Side effects are idempotent (unique constraints + idempotency keys)
- [ ] Error classification: retryable vs permanent
- [ ] Backoff: exponential + jitter + max attempts
- [ ] Dead-letter policy: quarantine permanent failures + notify humans
- [ ] Separate “accept” (2xx/202) from “processed”
- [ ] Worker concurrency is bounded; avoid thundering herds on retries
- [ ] Metrics: attempts per message, retry reasons, duplicates, DLQ volume, processing latency
- [ ] Alerts: backlog growth, sustained failures, DLQ above baseline Reference implementation
The key is explicit lifecycle control: Ack for success, Nack for retryable failures, Reject for permanent failures.
Node
Error classification + Ack/Nack/Reject
// Consume webhook payloads from Hooque and classify failures as retryable or permanent
const QUEUE_NEXT_URL =
process.env.HOOQUE_QUEUE_NEXT_URL ??
"https://app.hooque.io/queues/cons_webhook_events/next";
const TOKEN = process.env.HOOQUE_TOKEN ?? "hq_tok_replace_me";
const headers = { Authorization: `Bearer ${TOKEN}` };
class RetryableError extends Error {}
class PermanentError extends Error {}
async function processPayload(payload) {
// Example: input validation failures are permanent.
if (!payload || typeof payload !== "object") throw new PermanentError("invalid payload");
// TODO: enforce idempotency using provider event ID (payload.id) or stable hash.
// TODO: perform side effects (DB writes, API calls) with idempotency keys where supported.
}
while (true) {
const resp = await fetch(QUEUE_NEXT_URL, { headers });
if (resp.status === 204) break;
if (!resp.ok) throw new Error(`Hooque next() failed: ${resp.status}`);
const payload = await resp.json();
const meta = JSON.parse(resp.headers.get("X-Hooque-Meta") ?? "{}");
try {
await processPayload(payload);
await fetch(meta.ackUrl, { method: "POST", headers });
} catch (err) {
const isPermanent = err instanceof PermanentError;
const url = isPermanent ? meta.rejectUrl : meta.nackUrl;
const reason = isPermanent ? `permanent: ${err.message}` : `retryable: ${err.message}`;
await fetch(url, {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({ reason }),
});
}
} Python
Error classification + Ack/Nack/Reject
# Consume webhook payloads from Hooque and classify failures as retryable or permanent
import json
import os
import requests
QUEUE_NEXT_URL = os.getenv(
"HOOQUE_QUEUE_NEXT_URL",
"https://app.hooque.io/queues/cons_webhook_events/next",
)
TOKEN = os.getenv("HOOQUE_TOKEN", "hq_tok_replace_me")
headers = {"Authorization": f"Bearer {TOKEN}"}
class RetryableError(Exception):
pass
class PermanentError(Exception):
pass
def process_payload(payload: dict) -> None:
if not isinstance(payload, dict):
raise PermanentError("invalid payload")
# TODO: enforce idempotency using provider event ID (payload.get("id")) or stable hash.
# TODO: perform side effects with idempotency keys where supported.
return None
while True:
resp = requests.get(QUEUE_NEXT_URL, headers=headers, timeout=30)
if resp.status_code == 204:
break
if resp.status_code >= 400:
raise RuntimeError(f"Hooque next() failed: {resp.status_code} {resp.text}")
payload = resp.json()
meta = json.loads(resp.headers.get("X-Hooque-Meta", "{}"))
try:
process_payload(payload)
requests.post(meta["ackUrl"], headers=headers, timeout=30)
except Exception as err:
is_permanent = isinstance(err, PermanentError)
url = meta.get("rejectUrl") if is_permanent else meta.get("nackUrl")
reason = f"{'permanent' if is_permanent else 'retryable'}: {err}"
requests.post(
url,
headers={**headers, "Content-Type": "application/json"},
json={"reason": reason},
timeout=30,
) Common failure modes
Retry bugs are subtle. They show up as double charges, out-of-order state, and “stuck” backlogs.
Double side effects (e.g. double email, double charge)
Likely causes
- No idempotency key/dedupe store.
- Provider retries on timeout or transient 5xx.
- Worker processes concurrently without a unique constraint.
Next checks
- Add dedupe store keyed by provider event ID with TTL.
- Enforce DB unique constraints on side-effect records.
- Add idempotency keys to downstream APIs where supported.
Backlog grows after an outage
Likely causes
- Workers retry too aggressively (no backoff/jitter).
- Retries amplify load on a degraded dependency.
- Poison message blocks progress (no DLQ).
Next checks
- Add exponential backoff + jitter and cap attempts.
- Quarantine poison messages to DLQ.
- Scale workers with bounded concurrency.
Permanent errors keep retrying forever
Likely causes
- No error classification; everything is treated as retryable.
- Validation moved downstream and now fails repeatedly.
- Bad payload versioning / schema drift.
Next checks
- Reject permanent failures (4xx-like) and alert humans.
- Version payload schemas and support migration paths.
- Add dashboards for retry reasons and top errors.
How Hooque helps
Hooque is designed around at-least-once delivery: you control outcomes explicitly and can isolate failures safely.
- Durable queues for webhook payloads so timeouts and provider retries do not drop events.
- Explicit Ack / Nack / Reject lifecycle control for correct retry semantics.
- Inspection + replay tooling to safely re-run after a fix (idempotency required).
- Per-consumer streams (REST or SSE) so you can scale workers without changing ingest.
- Metrics to measure attempts, failures, and processing latency.
Compare patterns with payment webhooks and review pricing.
FAQ
The most common questions about duplicates, backoff, and “exactly-once” behavior.
Why do webhook providers retry?
Retries happen on timeouts, transient network issues, and non-2xx responses (often 5xx). Providers assume your endpoint might be temporarily unhealthy and will attempt delivery again. With Hooque, ingest persists immediately and your worker can retry safely using explicit Ack/Nack/Reject outcomes.
How do I prevent duplicate processing from retries?
Implement receiver-side idempotency. Use a provider event ID or delivery ID as a dedupe key, store it in a dedupe table/store with a TTL/window, and make side effects idempotent with unique constraints or idempotency keys. With Hooque, the queue interface plus per-delivery metadata makes it straightforward to implement dedupe in your consumer.
What errors should I retry vs not retry?
Retry transient failures (timeouts, network errors, 5xx). Do not retry permanent failures like invalid payloads, auth failures, or schema violations (usually 4xx) — send them to a dead-letter path for inspection. With Hooque, you can Nack retryable failures and Reject permanent failures explicitly (with a reason).
What backoff strategy should I use for webhook processing?
Use exponential backoff with jitter, cap maximum delay, and enforce a max attempt policy. Jitter avoids synchronized retry storms when many deliveries fail at once. With Hooque, backoff and retry policy live in your worker while the ingest layer remains fast and durable.
Do I need a DLQ for webhooks?
Yes if you care about reliability. A DLQ (or equivalent quarantine path) prevents poison messages from blocking progress and gives you a place to inspect and replay after a fix. With Hooque, you can Reject poison messages with a reason and use inspection/replay to recover after fixes.
Are webhooks exactly-once?
Usually no. Most webhook systems are at-least-once. You can achieve exactly-once side effects by combining idempotent processing with durable dedupe keys and transactional writes. With Hooque, the delivery lifecycle is explicit (Ack/Nack/Reject) so you can build exactly-once effects on top of at-least-once delivery.
Start processing webhooks reliably
Decouple ingest from processing, then handle retries safely with explicit Ack/Nack/Reject lifecycle control.
No credit card required