Webhook not working? a production debugging playbook
Webhook bugs feel urgent because they are: missing events and duplicates usually mean broken business logic.
Use this playbook to triage systematically: provider logs → reachability → auth → latency → idempotency.
No credit card required
TL;DR
- Start with the provider delivery log: is it sending, and what response codes does it see?
- Confirm endpoint reachability: TLS, DNS, redirects, firewall/WAF, and correct URL/method.
- If signatures fail, verify over raw body bytes and validate timestamp (clock skew matters).
- If events are missing, check filters, environments (test vs prod), and provider retry/disable settings.
- If duplicates happen, assume retries and add idempotency (dedupe on event ID/delivery ID).
- Capture raw payloads safely and replay deterministically after fixes.
If auth is the problem, read webhook security.
Anti-patterns
- Debugging only from your app logs without checking provider delivery logs.
- Logging secrets or full PII payloads during incidents.
- Fixing symptoms (timeouts) without addressing the root cause (inline processing).
If you’re missing events during incidents, see monitoring & alerting.
Core concepts
Debugging webhooks is mostly “follow the delivery.” You need both provider and receiver visibility.
Provider truth
Provider delivery logs tell you whether the provider is attempting delivery and what it observed (code/latency).
Receiver truth
Receiver logs tell you whether the request arrived, whether auth passed, and where time was spent (parse/verify/enqueue).
Replay safety
If you can store raw payloads and replay safely, you can recover from most incidents without losing data.
Simple flow
Provider
Delivery log
attempts + codes
Receiver
Logs + traces
auth + latency
Replay
Raw payloads
re-run safely
The fastest path to root cause is: provider log → HTTP status/latency → receiver auth/latency → worker outcomes.
Triage checklist
A repeatable, production-friendly sequence that reduces guesswork under pressure.
- [ ] Provider: is the webhook enabled? correct environment? correct event types?
- [ ] Provider delivery log: request URL/method correct? response code? latency? retries?
- [ ] Reachability: TLS certificate valid? DNS correct? no redirects? firewall/WAF blocks?
- [ ] Handler: respond 2xx/202 quickly (no slow work in request path)
- [ ] Signature verification:
- [ ] verify over raw body bytes (before JSON parse)
- [ ] constant-time compare
- [ ] timestamp max-age + clock skew
- [ ] Duplicates: add dedupe store keyed by event ID/delivery ID (+ TTL/window)
- [ ] Payload parsing: content-type expected? body size limits? gzip/encoding handled?
- [ ] Observability: correlate provider request ID + internal trace ID + processing outcome
- [ ] Replay: store raw payload + headers; replay through the same processing code Reference implementation
When you need to inspect what is happening, pull one message and print the metadata and payload.
Node
Debug pull + inspect meta
// Minimal “debug pull”: fetch one message, print meta + payload, then choose Ack/Nack/Reject.
const QUEUE_NEXT_URL =
process.env.HOOQUE_QUEUE_NEXT_URL ??
"https://app.hooque.io/queues/cons_webhook_events/next";
const TOKEN = process.env.HOOQUE_TOKEN ?? "hq_tok_replace_me";
const headers = { Authorization: `Bearer ${TOKEN}` };
const resp = await fetch(QUEUE_NEXT_URL, { headers });
if (resp.status === 204) {
console.log("queue empty");
process.exit(0);
}
if (!resp.ok) throw new Error(`Hooque next() failed: ${resp.status}`);
const payload = await resp.json();
const meta = JSON.parse(resp.headers.get("X-Hooque-Meta") ?? "{}");
console.log("meta:", meta);
console.log("payload:", payload);
// Choose an outcome:
// - Ack when you are confident it processed successfully
// - Nack when the failure is transient (retry later)
// - Reject when the payload is permanently bad
await fetch(meta.ackUrl, { method: "POST", headers }); Python
Debug pull + inspect meta
# Minimal “debug pull”: fetch one message, print meta + payload, then choose Ack/Nack/Reject.
import json
import os
import requests
QUEUE_NEXT_URL = os.getenv(
"HOOQUE_QUEUE_NEXT_URL",
"https://app.hooque.io/queues/cons_webhook_events/next",
)
TOKEN = os.getenv("HOOQUE_TOKEN", "hq_tok_replace_me")
headers = {"Authorization": f"Bearer {TOKEN}"}
resp = requests.get(QUEUE_NEXT_URL, headers=headers, timeout=30)
if resp.status_code == 204:
print("queue empty")
raise SystemExit(0)
if resp.status_code >= 400:
raise RuntimeError(f"Hooque next() failed: {resp.status_code} {resp.text}")
payload = resp.json()
meta = json.loads(resp.headers.get("X-Hooque-Meta", "{}"))
print("meta:", meta)
print("payload:", payload)
# Choose an outcome:
requests.post(meta["ackUrl"], headers=headers, timeout=30) Common failure modes
Use symptoms to narrow the search space quickly, then follow the checklist to confirm.
Provider shows timeouts
Likely causes
- Handler does slow work inline.
- Network path issues (WAF, cold starts).
- Downstream calls block the response.
Next checks
- Return 2xx/202 immediately after acceptance.
- Move work into a queue/worker.
- Instrument latency breakdown and add timeouts.
401/403 signature failures
Likely causes
- Wrong secret or wrong environment.
- Verifying parsed JSON not raw bytes.
- Timestamp validation failing due to skew.
Next checks
- Verify secret versions + overlap rotation.
- Capture raw bytes and compare locally.
- Allow small skew and set a max age window.
Duplicates and out-of-order state
Likely causes
- Provider retries.
- No idempotency/dedupe store.
- Worker concurrency violates ordering assumptions.
Next checks
- Add dedupe keys + TTL and unique constraints.
- Fetch authoritative state via API when needed.
- Sequence processing per object/tenant.
How Hooque helps
Debugging is easier when you have a durable event history, a replay path, and explicit delivery controls.
- Hosted ingest + durable persistence so payloads aren’t lost during outages.
- Provider-specific signature verification at ingest reduces auth debugging surface.
- Queue consumers with Ack/Nack/Reject so you can quarantine bad payloads.
- Inspection and replay tools so you can reproduce and validate fixes.
- Metrics per webhook/consumer to correlate incidents with spikes and failures.
If you’re comparing solutions, check pricing and see patterns in use cases.
FAQ
Short answers for high-pressure debugging situations.
How do I know if the provider is sending webhooks?
Check the provider’s webhook delivery log. It should show attempts, response codes, and latency. If it is not sending, the issue is usually configuration (disabled endpoint, wrong environment, wrong event types). With Hooque, you also get an ingest history and queue metrics to confirm whether events arrived and what happened next.
Why do I see 401/403 on webhooks?
Authentication failed: missing signature headers, wrong secret (test vs prod), timestamp validation failing, or signature verification performed over parsed body instead of raw bytes. With Hooque, provider-specific signature verification happens at ingest, and failures can be inspected without touching your worker.
Why do webhooks work locally but not in production?
Common causes are TLS/DNS issues, firewalls/WAF blocks, redirects, missing raw-body verification in the deployed stack, or environment mismatch (secrets, URLs, event subscriptions). With Hooque, inbound exposure is centralized in a hosted endpoint so your app only needs outbound access to consume events.
How do I debug signature verification failures?
Capture the exact raw request bytes, signature headers, and timestamp. Re-run verification locally with the same inputs. Ensure you are verifying before parsing and using constant-time comparison. With Hooque, signature verification is handled at ingest (provider-specific) and only verified messages enter your consumer queue.
How do I debug duplicates and out-of-order events?
Assume retries. Add idempotency with dedupe keys, track attempt counts, and ensure your processing is tolerant to out-of-order updates (fetch authoritative state via API when needed). With Hooque, you can Nack to retry, Reject permanent failures, and use metadata and inspection to trace why duplicates happened.
How do I safely replay a failed webhook?
Replay from persisted raw payloads through the same processing code. Only do this if side effects are idempotent and you can correlate outcomes (logs/trace IDs) to validate correctness. With Hooque, inspection and replay are built-in so you can re-run after a fix without losing payloads.
Start processing webhooks reliably
Capture events durably and debug with inspection, replay, and explicit delivery controls.
No credit card required