WorkOS webhook retries without duplicates
Retries are guaranteed; duplicate side effects are optional.
This page shows reliable retry/backoff patterns for WorkOS webhooks, plus a minimal worker/consumer reference you can ship.
No credit card required
TL;DR
- Assume at-least-once delivery: retries happen and duplicates are normal.
- Authenticate first: verification bugs often look like “random retries” and “phantom duplicates”.
- Ack fast; do side effects in a worker with bounded concurrency.
- Make processing idempotent with a stable dedupe key + dedupe store + window (TTL).
- Use exponential backoff + jitter; cap attempts and dead-letter permanent failures.
- Track retry attempts, duplicate rate, and dead letter queue volume as first-class metrics.
Want examples in production contexts? See monitoring webhooks and review pricing .
Anti-patterns
- Retrying everything (including validation/auth failures) and amplifying load.
- No dedupe store: every retry turns into a second side effect.
- Sleeping/retrying inside the webhook HTTP handler instead of a worker.
If you are still processing inline, start with Webhook API.
Core concepts
Treat retries as normal and engineer your receiver so duplicates are harmless.
At-least-once delivery
Providers retry because they cannot know if you processed the event. 2xx means “stop retrying this attempt”, not “side effects completed”.
Verification is reliability
Auth failures can create false “retry storms”. Verify early and treat failed verification as permanent (don’t backoff forever).
Dedupe key + window
Pick a stable key (provider id if available; else stable hash) and store it for a window so retries/replays don’t duplicate side effects.
Verification prerequisite (auth ↔ retries): Verify authenticity and handle retries with explicit outcomes.
If verification breaks (wrong secret, parsing over mutated body, clock skew), providers may retry repeatedly or your system may drop valid events. Reliability starts with a deterministic, fast ingest path.
Normal flow vs With Hooque flow
The difference is separating “accepted” from “processed” and making outcomes explicit.
Normal retry/duplicate flow
- Provider calls your endpoint.
- Handler does DB + external side effects inline.
- Timeout/5xx/network issue triggers provider retry.
- Duplicates arrive; side effects may run twice.
- Manual “replay scripts” are risky and hard to audit.
With Hooque
- Provider → Hooque ingest (fast accept; verification at ingest).
- Worker pulls from a durable queue at its own pace.
- Explicit outcomes: Ack (done), Nack (retry later/backoff), Reject (dead letter queue).
- Inspection + replay after fixes, with an audit trail.
- Idempotency still required, but easier to enforce at the worker boundary.
Production checklist
A short checklist for retries, webhook idempotency, deduplicate webhook behavior, and dead letter queue handling.
- [ ] Treat WorkOS webhooks as at-least-once delivery (retries + duplicates)
- [ ] Verification prerequisite: Verify authenticity and handle retries with explicit outcomes.
- [ ] Respond 2xx/202 quickly (accept != processed); do not block on downstream dependencies
- [ ] Choose a stable dedupe key (provider event id when available; else stable hash) + TTL/window
- [ ] Dedupe store is durable (DB unique constraint, Redis SETNX, etc.); not just in-memory
- [ ] Side effects are idempotent (unique constraints + idempotency keys for downstream APIs where supported)
- [ ] Classify failures: retryable (timeouts/5xx) vs permanent (validation/auth/4xx-like)
- [ ] Backoff: exponential + jitter, bounded concurrency, and max attempts
- [ ] Dead letter queue policy: quarantine poison messages and alert humans
- [ ] Safe replay rules: re-run only through the same idempotent worker code path
- [ ] Metrics: attempts per message, retry reasons, duplicates/dedupe hit rate, DLQ volume, processing latency Reference implementation
Minimal, correct building blocks: a dedupe-first worker and a consumer loop that uses Ack/Nack/Reject.
1) Idempotent worker (dedupe key + dedupe store)
Swap the in-memory store for Redis/DB in production; the shape stays the same.
Node
Dedupe key + TTL window
// Idempotent worker example: dedupe key + dedupe store (in-memory TTL map)
// Node 18+
import crypto from "node:crypto";
const DEDUPE_WINDOW_MS = 24 * 60 * 60 * 1000;
const dedupeStore = new Map(); // key -> expiresAtMs
function cleanup(nowMs) {
// Tiny O(n) cleanup for demo purposes; use Redis/DB in production.
for (const [k, exp] of dedupeStore.entries()) {
if (exp <= nowMs) dedupeStore.delete(k);
}
}
function computeDedupeKey(payload) {
// Prefer a stable provider event id if present; otherwise fall back to a stable hash.
if (payload && typeof payload === "object") {
const maybeId = payload.id ?? payload.event_id ?? payload.eventId;
if (typeof maybeId === "string" && maybeId.length > 0) return maybeId;
}
return crypto.createHash("sha256").update(JSON.stringify(payload ?? {})).digest("hex");
}
function shouldProcessOnce(dedupeKey) {
const now = Date.now();
cleanup(now);
const exp = dedupeStore.get(dedupeKey);
if (typeof exp === "number" && exp > now) return false;
dedupeStore.set(dedupeKey, now + DEDUPE_WINDOW_MS);
return true;
}
export async function handleWebhook(payload) {
const dedupeKey = computeDedupeKey(payload);
if (!shouldProcessOnce(dedupeKey)) return { deduped: true };
// TODO: perform side effects here (DB writes, API calls).
// Tip: enforce exactly-once effects with unique constraints or downstream idempotency keys.
return { deduped: false, ok: true };
}
// Demo runner: node worker.mjs '{"id":"evt_123","type":"example"}'
if (import.meta.url === `file://${process.argv[1]}`) {
const payload = JSON.parse(process.argv[2] ?? "{}");
console.log(await handleWebhook(payload));
} Python
Dedupe key + TTL window
# Idempotent worker example: dedupe key + dedupe store (in-memory TTL dict)
import hashlib
import json
import sys
import time
DEDUPE_WINDOW_S = 24 * 60 * 60
dedupe_store = {} # key -> expires_at_epoch_s
def cleanup(now_s: int) -> None:
for k, exp in list(dedupe_store.items()):
if exp <= now_s:
del dedupe_store[k]
def compute_dedupe_key(payload: object) -> str:
if isinstance(payload, dict):
for field in ["id", "event_id", "eventId"]:
v = payload.get(field)
if isinstance(v, str) and v:
return v
raw = json.dumps(payload or {}, sort_keys=True, separators=(",", ":")).encode("utf-8")
return hashlib.sha256(raw).hexdigest()
def should_process_once(dedupe_key: str) -> bool:
now = int(time.time())
cleanup(now)
exp = dedupe_store.get(dedupe_key)
if isinstance(exp, int) and exp > now:
return False
dedupe_store[dedupe_key] = now + DEDUPE_WINDOW_S
return True
def handle_webhook(payload: object) -> dict:
dedupe_key = compute_dedupe_key(payload)
if not should_process_once(dedupe_key):
return {"deduped": True}
# TODO: perform side effects here (DB writes, API calls).
return {"deduped": False, "ok": True}
if __name__ == "__main__":
payload = json.loads(sys.argv[1]) if len(sys.argv) > 1 else {}
print(handle_webhook(payload)) 2) Hooque consumer loop (pull + Ack/Nack/Reject)
Pull messages with GET .../next, parse X-Hooque-Meta, then POST the action URLs.
Node
GET next + meta header + POST ack/nack/reject
// Hooque consumer loop: GET next, parse X-Hooque-Meta, POST ackUrl/nackUrl/rejectUrl
// Node 18+ (fetch built-in)
const QUEUE_NEXT_URL =
process.env.HOOQUE_QUEUE_NEXT_URL ??
"https://app.hooque.io/queues/cons_webhook_events/next";
const TOKEN = process.env.HOOQUE_TOKEN ?? "hq_tok_replace_me";
const headers = { Authorization: `Bearer ${TOKEN}` };
class RetryableError extends Error {}
class PermanentError extends Error {}
async function processPayload(payload) {
if (!payload || typeof payload !== "object") throw new PermanentError("invalid payload");
// TODO: enforce idempotency (dedupe key + dedupe store + window).
// TODO: do side effects with unique constraints / downstream idempotency keys.
}
while (true) {
const resp = await fetch(QUEUE_NEXT_URL, { headers });
if (resp.status === 204) break;
if (!resp.ok) throw new Error(`Hooque next() failed: ${resp.status}`);
const payload = await resp.json();
const meta = JSON.parse(resp.headers.get("X-Hooque-Meta") ?? "{}");
try {
await processPayload(payload);
await fetch(meta.ackUrl, { method: "POST", headers });
} catch (err) {
const isPermanent = err instanceof PermanentError;
const url = isPermanent ? meta.rejectUrl : meta.nackUrl;
const reason = isPermanent ? `permanent: ${err.message}` : `retryable: ${err.message}`;
await fetch(url, {
method: "POST",
headers: { ...headers, "Content-Type": "application/json" },
body: JSON.stringify({ reason }),
});
}
} Python
GET next + meta header + POST ack/nack/reject
# Hooque consumer loop: GET next, parse X-Hooque-Meta, POST ackUrl/nackUrl/rejectUrl
import json
import os
import requests
QUEUE_NEXT_URL = os.getenv(
"HOOQUE_QUEUE_NEXT_URL",
"https://app.hooque.io/queues/cons_webhook_events/next",
)
TOKEN = os.getenv("HOOQUE_TOKEN", "hq_tok_replace_me")
headers = {"Authorization": f"Bearer {TOKEN}"}
class RetryableError(Exception):
pass
class PermanentError(Exception):
pass
def process_payload(payload: object) -> None:
if not isinstance(payload, dict):
raise PermanentError("invalid payload")
# TODO: enforce idempotency (dedupe key + dedupe store + window).
# TODO: do side effects with unique constraints / downstream idempotency keys.
return None
while True:
resp = requests.get(QUEUE_NEXT_URL, headers=headers, timeout=30)
if resp.status_code == 204:
break
if resp.status_code >= 400:
raise RuntimeError(f"Hooque next() failed: {resp.status_code} {resp.text}")
payload = resp.json()
meta = json.loads(resp.headers.get("X-Hooque-Meta", "{}"))
try:
process_payload(payload)
requests.post(meta["ackUrl"], headers=headers, timeout=30)
except Exception as err:
is_permanent = isinstance(err, PermanentError)
url = meta.get("rejectUrl") if is_permanent else meta.get("nackUrl")
reason = f"{'permanent' if is_permanent else 'retryable'}: {err}"
requests.post(
url,
headers={**headers, "Content-Type": "application/json"},
json={"reason": reason},
timeout=30,
) Common failure modes
Retries are normal. The incidents come from duplicates, infinite retries, and missing dead-letter paths.
Duplicate side effects (double charge/email/state updates)
Likely causes
- No dedupe key/dedupe store.
- Retries + concurrent workers.
- Replays executed through a different code path.
Next checks
- Enforce a stable dedupe key + TTL window.
- Make side effects idempotent with unique constraints/idempotency keys.
- Replay only through the same idempotent worker code.
Retry storm after a transient outage
Likely causes
- No backoff/jitter (immediate retries).
- Downstream dependency flaps under load.
- Workers scale without bounded concurrency.
Next checks
- Add exponential backoff + jitter and cap attempts.
- Throttle concurrency per dependency/tenant.
- Use dead-lettering for repeated failures.
Auth/verification failures look like random retries
Likely causes
- Wrong secret/token or unsafe rotation.
- Verifying over a mutated body (wrong parsing order).
- Clock skew breaks timestamp windows (where used).
Next checks
- Verify over the raw body; validate timestamps consistently.
- Rotate secrets with overlap windows.
- Use the WorkOS security spoke to validate the exact verification steps.
How Hooque helps
Durable ingest + explicit outcomes makes retries predictable and duplicates manageable.
- Hosted ingest for WorkOS webhooks with verification at ingest (reduces fake/forged duplicates).
- Durable queues so timeouts and provider retries do not drop events.
- Explicit Ack / Nack / Reject lifecycle control for correct retry semantics.
- Inspection + replay tooling to safely re-run after a fix (idempotency required).
- Metrics to measure attempts, failures, and processing latency (including DLQ volume).
Start with signup, review pricing, and see monitoring webhooks.
FAQ
Common questions about duplicates, backoff, and dead-lettering.
Why do WorkOS webhooks retry?
General: webhook senders retry on timeouts, transient network failures, and non-2xx responses (often 5xx). Retries imply at-least-once delivery, so duplicates are expected. How Hooque helps: ingest persists quickly and your worker controls outcomes explicitly with Ack (done), Nack (retry later/backoff), and Reject (dead letter queue).
Why am I receiving duplicate webhook events?
General: duplicates come from retries, replays, and delivery uncertainty. Prevent double side effects with receiver-side idempotency (stable dedupe key + dedupe store + window) and idempotent side effects (unique constraints / idempotency keys). How Hooque helps: pull/stream consumption plus per-delivery action URLs makes “process once” patterns straightforward in your consumer.
What is the best idempotency key to use?
General: prefer a stable provider event id or delivery id (something that is identical across duplicates). If you do not have one, fall back to a stable hash of the raw payload (and select headers if needed). How Hooque helps: you can persist raw payloads and build a deterministic hash-based dedupe key, then enforce it in your worker before side effects.
Should I return 2xx before processing?
General: yes—acknowledge receipt (2xx/202) as soon as you have safely accepted the event, then process asynchronously. Returning 2xx only means “accepted”, not “processed”. How Hooque helps: Hooque can accept and persist at ingest, then your consumer processes from a durable queue without risking provider timeouts.
How long should I retry before dead-lettering?
General: use exponential backoff + jitter, cap maximum delay, and enforce a max-attempt policy. Permanent failures (auth/validation/4xx-like) should go to a dead letter queue quickly. How Hooque helps: Reject provides a clean DLQ path, while Nack supports controlled retries (your worker decides backoff and limits).
What’s the difference between retry and replay/redelivery?
General: retries are automatic re-attempts after a failure; replays/redeliveries are intentional re-runs after a fix or for testing. Both require idempotency. How Hooque helps: you can inspect deliveries and re-run safely with the same consumer code path while preserving an audit trail of outcomes.
What should be retried vs rejected?
General: retry transient failures (timeouts, network errors, 5xx, temporarily unavailable dependencies). Reject permanent failures (invalid payloads, auth failures, schema mismatches) to avoid infinite loops. How Hooque helps: your consumer can Nack retryable errors and Reject poison messages explicitly with a reason.
Start processing webhooks reliably
Decouple ingest from processing, then handle retries safely with webhook idempotency, backoff, and explicit Ack/Nack/Reject outcomes.
No credit card required