Guide

Webhook retries and backoff without duplicates

Retries are a feature: providers assume your endpoint might be down and will try again.
The goal is to make duplicates harmless by building idempotency into your processing path.

Start for free See use cases

No credit card required

TL;DR

Retries are normal: providers retry on timeouts, 5xx, and transient network failures.
Duplicates are optional: enforce receiver-side idempotency with a dedupe key + window.
Classify failures: retryable (timeouts/5xx) vs permanent (validation/4xx) and handle differently.
Use exponential backoff + jitter, plus a max-attempt policy and a dead-letter path.
Make side effects idempotent (DB unique constraints, idempotency keys for downstream APIs).
Track retry counts, duplicate rate, DLQ volume, and end-to-end latency as first-class metrics.

With Hooque, most of the above is handled for you — jump to “How Hooque helps” .

If you are still processing inline, start with Webhook API.

Anti-patterns

Retrying everything (including validation errors and auth failures).
No dedupe store: every retry becomes a second side effect.
Sleeping/retrying inside the webhook HTTP handler instead of a worker.

If you need idempotency basics, see payment webhooks.

Core concepts Production checklist Reference implementation (Node + Python) Common failure modes How Hooque helps FAQ

Core concepts

There are only two reliable assumptions: the provider will retry, and you will see duplicates.

At-least-once delivery

Providers retry deliveries because they cannot know if you processed the event. 2xx means “stop retrying this attempt”.

Dedupe key

Pick a stable key: provider event ID, delivery ID, or a stable hash of raw payload + key headers. Store it for a window.

Exactly-once effects

You can’t guarantee exactly-once delivery, but you can guarantee exactly-once side effects with idempotency + durable writes.

Simple flow

Provider

Sends event

may retry

Queue

Persist first

dedupe keys

Worker

Retry safely

Ack/Nack/Reject

Backoff belongs in workers, not in webhook HTTP handlers. Keep the inbound path fast and deterministic.

Production checklist

A practical list for “duplicate-proof” webhook processing, including backoff and dead-lettering.

- [ ] Assume at-least-once delivery (retries + duplicates)
- [ ] Dedupe key chosen per provider (event id / delivery id) + fallback stable hash
- [ ] Dedupe store with TTL/window (DB unique constraint, Redis SETNX, etc.)
- [ ] Side effects are idempotent (unique constraints + idempotency keys)
- [ ] Error classification: retryable vs permanent
- [ ] Backoff: exponential + jitter + max attempts
- [ ] Dead-letter policy: quarantine permanent failures + notify humans
- [ ] Separate “accept” (2xx/202) from “processed”
- [ ] Worker concurrency is bounded; avoid thundering herds on retries
- [ ] Metrics: attempts per message, retry reasons, duplicates, DLQ volume, processing latency
- [ ] Alerts: backlog growth, sustained failures, DLQ above baseline

Reference implementation

The key is explicit lifecycle control: Ack for success, Nack for retryable failures, Reject for permanent failures.

Node

Error classification + Ack/Nack/Reject

// Consume webhook payloads from Hooque and classify failures as retryable or permanent
const QUEUE_NEXT_URL =
  process.env.HOOQUE_QUEUE_NEXT_URL ??
  "https://app.hooque.io/queues/cons_webhook_events/next";
const TOKEN = process.env.HOOQUE_TOKEN ?? "hq_tok_replace_me";
const headers = { Authorization: `Bearer ${TOKEN}` };

class RetryableError extends Error {}
class PermanentError extends Error {}

async function processPayload(payload) {
  // Example: input validation failures are permanent.
  if (!payload || typeof payload !== "object") throw new PermanentError("invalid payload");

  // TODO: enforce idempotency using provider event ID (payload.id) or stable hash.
  // TODO: perform side effects (DB writes, API calls) with idempotency keys where supported.
}

while (true) {
  const resp = await fetch(QUEUE_NEXT_URL, { headers });
  if (resp.status === 204) break;
  if (!resp.ok) throw new Error(`Hooque next() failed: ${resp.status}`);

  const payload = await resp.json();
  const meta = JSON.parse(resp.headers.get("X-Hooque-Meta") ?? "{}");

  try {
    await processPayload(payload);
    await fetch(meta.ackUrl, { method: "POST", headers });
  } catch (err) {
    const isPermanent = err instanceof PermanentError;
    const url = isPermanent ? meta.rejectUrl : meta.nackUrl;
    const reason = isPermanent ? `permanent: ${err.message}` : `retryable: ${err.message}`;

    await fetch(url, {
      method: "POST",
      headers: { ...headers, "Content-Type": "application/json" },
      body: JSON.stringify({ reason }),
    });
  }
}

Python

Error classification + Ack/Nack/Reject

# Consume webhook payloads from Hooque and classify failures as retryable or permanent
import json
import os
import requests

QUEUE_NEXT_URL = os.getenv(
    "HOOQUE_QUEUE_NEXT_URL",
    "https://app.hooque.io/queues/cons_webhook_events/next",
)
TOKEN = os.getenv("HOOQUE_TOKEN", "hq_tok_replace_me")
headers = {"Authorization": f"Bearer {TOKEN}"}

class RetryableError(Exception):
    pass

class PermanentError(Exception):
    pass

def process_payload(payload: dict) -> None:
    if not isinstance(payload, dict):
        raise PermanentError("invalid payload")

    # TODO: enforce idempotency using provider event ID (payload.get("id")) or stable hash.
    # TODO: perform side effects with idempotency keys where supported.
    return None

while True:
    resp = requests.get(QUEUE_NEXT_URL, headers=headers, timeout=30)

    if resp.status_code == 204:
        break
    if resp.status_code >= 400:
        raise RuntimeError(f"Hooque next() failed: {resp.status_code} {resp.text}")

    payload = resp.json()
    meta = json.loads(resp.headers.get("X-Hooque-Meta", "{}"))

    try:
        process_payload(payload)
        requests.post(meta["ackUrl"], headers=headers, timeout=30)
    except Exception as err:
        is_permanent = isinstance(err, PermanentError)
        url = meta.get("rejectUrl") if is_permanent else meta.get("nackUrl")
        reason = f"{'permanent' if is_permanent else 'retryable'}: {err}"
        requests.post(
            url,
            headers={**headers, "Content-Type": "application/json"},
            json={"reason": reason},
            timeout=30,
        )

Need security primitives first? See webhook security.

Common failure modes

Retry bugs are subtle. They show up as double charges, out-of-order state, and “stuck” backlogs.

Double side effects (e.g. double email, double charge)

Likely causes

No idempotency key/dedupe store.
Provider retries on timeout or transient 5xx.
Worker processes concurrently without a unique constraint.

Next checks

Add dedupe store keyed by provider event ID with TTL.
Enforce DB unique constraints on side-effect records.
Add idempotency keys to downstream APIs where supported.

Backlog grows after an outage

Likely causes

Workers retry too aggressively (no backoff/jitter).
Retries amplify load on a degraded dependency.
Poison message blocks progress (no DLQ).

Next checks

Add exponential backoff + jitter and cap attempts.
Quarantine poison messages to DLQ.
Scale workers with bounded concurrency.

Permanent errors keep retrying forever

Likely causes

No error classification; everything is treated as retryable.
Validation moved downstream and now fails repeatedly.
Bad payload versioning / schema drift.

Next checks

Reject permanent failures (4xx-like) and alert humans.
Version payload schemas and support migration paths.
Add dashboards for retry reasons and top errors.

Alerting on backlog and retries is covered in monitoring & alerting.

How Hooque helps

Hooque is designed around at-least-once delivery: you control outcomes explicitly and can isolate failures safely.

Durable queues for webhook payloads so timeouts and provider retries do not drop events.
Explicit Ack / Nack / Reject lifecycle control for correct retry semantics.
Inspection + replay tooling to safely re-run after a fix (idempotency required).
Per-consumer streams (REST or SSE) so you can scale workers without changing ingest.
Metrics to measure attempts, failures, and processing latency.

Compare patterns with payment webhooks and review pricing.

FAQ

The most common questions about duplicates, backoff, and “exactly-once” behavior.

Why do webhook providers retry?

Retries happen on timeouts, transient network issues, and non-2xx responses (often 5xx). Providers assume your endpoint might be temporarily unhealthy and will attempt delivery again. With Hooque, ingest persists immediately and your worker can retry safely using explicit Ack/Nack/Reject outcomes.

How do I prevent duplicate processing from retries?

Implement receiver-side idempotency. Use a provider event ID or delivery ID as a dedupe key, store it in a dedupe table/store with a TTL/window, and make side effects idempotent with unique constraints or idempotency keys. With Hooque, the queue interface plus per-delivery metadata makes it straightforward to implement dedupe in your consumer.

What errors should I retry vs not retry?

Retry transient failures (timeouts, network errors, 5xx). Do not retry permanent failures like invalid payloads, auth failures, or schema violations (usually 4xx) — send them to a dead-letter path for inspection. With Hooque, you can Nack retryable failures and Reject permanent failures explicitly (with a reason).

What backoff strategy should I use for webhook processing?

Use exponential backoff with jitter, cap maximum delay, and enforce a max attempt policy. Jitter avoids synchronized retry storms when many deliveries fail at once. With Hooque, backoff and retry policy live in your worker while the ingest layer remains fast and durable.

Do I need a DLQ for webhooks?

Yes if you care about reliability. A DLQ (or equivalent quarantine path) prevents poison messages from blocking progress and gives you a place to inspect and replay after a fix. With Hooque, you can Reject poison messages with a reason and use inspection/replay to recover after fixes.

Are webhooks exactly-once?

Usually no. Most webhook systems are at-least-once. You can achieve exactly-once side effects by combining idempotent processing with durable dedupe keys and transactional writes. With Hooque, the delivery lifecycle is explicit (Ack/Nack/Reject) so you can build exactly-once effects on top of at-least-once delivery.

Start processing webhooks reliably

Decouple ingest from processing, then handle retries safely with explicit Ack/Nack/Reject lifecycle control.

Start for free

No credit card required

Webhook retries and backoff without duplicates

TL;DR

Anti-patterns

Table of contents

Core concepts

At-least-once delivery

Dedupe key

Exactly-once effects

Production checklist

Reference implementation

Node

Python

Common failure modes

Double side effects (e.g. double email, double charge)

Backlog grows after an outage

Permanent errors keep retrying forever

How Hooque helps

FAQ

Why do webhook providers retry?

How do I prevent duplicate processing from retries?

What errors should I retry vs not retry?

What backoff strategy should I use for webhook processing?

Do I need a DLQ for webhooks?

Are webhooks exactly-once?

Start processing webhooks reliably