Case Study: How a Weak HMAC Webhook Check Let Attackers In

Somewhere around 2 AM on a Tuesday, a mid-size e-commerce platform started processing refunds it hadn't approved. Not many — maybe 40 in the first hour — but enough that the on-call engineer noticed an anomaly in the dashboard. Orders were flipping to "refunded" without any corresponding action in the admin panel. By morning, the team had traced it back to their payment gateway webhook integration, and the root cause was almost embarrassingly simple: their HMAC verification was broken in two separate ways.

This is a reconstruction of that incident — names changed, details composited from a few real post-mortems I've read and one I was adjacent to — but the vulnerabilities are real, they're common, and the fixes are instructive enough to be worth dissecting carefully.

The Setup

The platform used a popular payment gateway that, when a payment event occurred, POST'd a JSON payload to a webhook endpoint. The gateway signed each request with an HMAC-SHA256 digest of the raw request body, using a shared secret, and put the result in an X-Signature header. Standard stuff. The platform's backend was a Node.js service behind an nginx reverse proxy.

Here's a simplified version of what the webhook handler looked like:

app.post('/webhooks/payment', express.json(), async (req, res) => {
  const incoming = req.headers['x-signature'];
  const expected = crypto
    .createHmac('sha256', process.env.WEBHOOK_SECRET)
    .update(JSON.stringify(req.body))
    .digest('hex');

  if (incoming !== expected) {
    return res.status(401).send('Invalid signature');
  }

  await processPaymentEvent(req.body);
  res.sendStatus(200);
});

Read it quickly and it looks fine. There's a secret. There's an HMAC. There's a comparison. But there are two critical flaws hiding in plain sight.

Flaw #1 — The Timing Oracle

The comparison incoming !== expected uses JavaScript's native string equality operator. This is a timing-unsafe comparison. JavaScript (like most languages) short-circuits string comparison — the moment it finds a mismatched character, it returns. This means a signature that starts with the correct first byte takes slightly longer to reject than one that's completely wrong from byte zero.

In theory this sounds like an academic concern. In practice, with enough requests and a network channel that doesn't introduce too much jitter, an attacker can measure these timing differences and use them to reconstruct the expected signature one byte at a time. It's a classic timing side-channel attack — the same class of vulnerability that broke early TLS implementations.

With HMAC-SHA256 producing 64 hex characters, an attacker needs at most 64 × 16 = 1,024 probes to recover the full expected signature for a given payload (less in practice with statistical analysis). If they can send 1,024 requests without getting rate-limited, they can forge a valid signature for any payload they want.

This platform had no rate limiting on the webhook endpoint. It was an internal endpoint, after all. Who would attack it?

Flaw #2 — The Body Serialization Mismatch

The second bug is subtler and, in this incident, was actually the more immediately exploitable of the two.

The handler uses JSON.stringify(req.body) to reconstruct the payload for HMAC computation. But req.body at that point is already a parsed JavaScript object — the raw bytes are gone. JSON.stringify will re-serialize it, but JavaScript object serialization is not guaranteed to preserve key order. More importantly, it strips whitespace, normalizes Unicode escapes, and makes other transformations that may not match what the payment gateway actually signed.

In this case, the gateway sent payloads with specific whitespace formatting. JSON.stringify(req.body) produced a different byte sequence, so the computed HMAC never matched the gateway's signature — not even for legitimate requests.

So what did the team do? They "fixed" it. They commented out the signature check.

// TODO: fix HMAC mismatch issue — skipping for now
// if (incoming !== expected) {
//   return res.status(401).send('Invalid signature');
// }

That comment was committed six months before the incident. It sat in production, forgotten, while the team moved on to other things. The webhook endpoint was now completely unauthenticated — anyone who could reach it and knew the expected JSON structure could trigger refunds.

The Attack

The attacker's technique was simple: they had a legitimate account on the platform, made a real purchase, and observed a refund webhook in their browser's network tab when they were issued a refund for a returned item. The gateway's webhook payload structure was partially visible in error messages the platform was leaking (a separate issue). They crafted their own POST requests mimicking the refund event structure, targeting order IDs belonging to other customers.

Because the signature check was commented out, every forged request sailed straight through to processPaymentEvent(), which updated order statuses in the database without any secondary validation against the payment gateway's actual transaction records.

The refunds weren't hitting a real payment account — the platform's bookkeeping just thought they were refunded. Customers got "refunded" status emails. The actual money didn't move, but inventory was restocked and the accounting records were corrupted. Unraveling it took three days.

The Fixes

The team's post-mortem resulted in three concrete changes, each addressing a distinct layer of the failure.

Fix 1: Sign the Raw Body, Not the Parsed Object

The body serialization mismatch was fixed by capturing the raw request bytes before Express parses them:

app.post(
  '/webhooks/payment',
  express.raw({ type: 'application/json' }),
  async (req, res) => {
    const rawBody = req.body; // Buffer of raw bytes
    const incoming = req.headers['x-signature'];

    const expected = crypto
      .createHmac('sha256', process.env.WEBHOOK_SECRET)
      .update(rawBody)
      .digest('hex');

    if (!timingSafeEqual(incoming, expected)) {
      return res.status(401).send('Invalid signature');
    }

    const payload = JSON.parse(rawBody.toString('utf8'));
    await processPaymentEvent(payload);
    res.sendStatus(200);
  }
);

Using express.raw() instead of express.json() means the raw bytes the gateway signed are exactly what you feed into createHmac().update(). No serialization round-trip, no whitespace normalization, no surprises.

Fix 2: Constant-Time Comparison

The timing-unsafe !== was replaced with a proper constant-time comparison. Node's crypto module provides timingSafeEqual, but it operates on Buffers of equal length — you need to handle the length-check carefully to avoid leaking information through that vector too:

function timingSafeEqual(a, b) {
  const bufA = Buffer.from(a, 'hex');
  const bufB = Buffer.from(b, 'hex');

  if (bufA.length !== bufB.length) {
    // Lengths differ — still do a dummy comparison to avoid timing leak on length
    crypto.timingSafeEqual(bufA, bufA);
    return false;
  }

  return crypto.timingSafeEqual(bufA, bufB);
}

The dummy comparison when lengths differ is a small but important detail. If you return immediately on length mismatch, you've created a timing oracle on the length of the expected signature, which can leak information about your HMAC output length (and indirectly, your algorithm).

Fix 3: Verify Against the Source of Truth

The deeper architectural problem was that processPaymentEvent() trusted the webhook payload completely. Even if signature verification passes, a webhook payload is an assertion — not proof. The fix was to add a verification call against the payment gateway's API before acting on any financial event:

async function processPaymentEvent(payload) {
  if (payload.type === 'refund.created') {
    // Re-fetch the refund from the gateway API to confirm it exists
    const refund = await gatewayClient.refunds.retrieve(payload.data.refund_id);
    if (refund.status !== 'succeeded') {
      console.warn('Refund event received but gateway shows non-succeeded status', payload);
      return;
    }
    await applyRefund(refund);
  }
}

This is defense in depth. Even if someone managed to forge a valid signature somehow, the webhook handler would still hit the gateway's API and find no matching refund. The authoritative record lives at the gateway, not in the webhook payload.

Why This Keeps Happening

I don't think the developers who wrote this code were careless. The body serialization bug is genuinely tricky — the fact that req.body is already parsed is easy to miss, especially if you're porting code from a language where body parsing is more explicit. When the HMAC check produced false negatives for legitimate requests, commenting it out while debugging was a reasonable short-term move. The failure was in not tracking that as a blocking issue and letting it sit for six months.

The timing attack vector is even easier to overlook. Most developers know to hash passwords and use constant-time comparisons for those, but the same reasoning applies to any secret comparison — API keys, webhook signatures, session tokens. The mental model of "HMAC is a hash, hashes are secure" doesn't automatically extend to "and I need constant-time comparison when checking the result."

A Checklist Worth Keeping

If you're implementing webhook signature verification — regardless of language or framework — run through these before shipping:

  • Sign raw bytes. Capture the request body as bytes before any parsing layer touches it. Feed those exact bytes into your HMAC. Never re-serialize.
  • Compare with constant-time equality. Use your language's built-in constant-time comparison. In Node: crypto.timingSafeEqual. In Python: hmac.compare_digest. In Go: subtle.ConstantTimeCompare. In Ruby: ActiveSupport::SecurityUtils.secure_compare.
  • Don't skip the check temporarily. If HMAC verification is failing for legitimate requests, fix the root cause before going to production. Add a test that sends a known payload and checks that verification passes.
  • Rate-limit the endpoint. Even with correct constant-time comparison, rate limiting closes the timing attack surface further and limits the blast radius of any other bugs.
  • Verify against the source of truth for financial events. Webhooks are notifications, not authoritative records. For anything involving money or permissions, re-fetch the relevant object from the provider's API.
  • Replay protection. Most gateways include a timestamp in webhook payloads. Check that it's within a reasonable window (5 minutes is common) and reject old timestamps to prevent replay attacks.

The incident in this case study cost the team three days of engineering time to untangle corrupted records, plus the embarrassment of explaining to customers why their order statuses were wrong. The fixes took an afternoon. That asymmetry — a few hours of doing it right versus days of cleaning up — is the real argument for getting webhook security right the first time.