Skip to Content
Drift forensics

Drift forensics

When a business rule breaks, every API-monitoring tool on the market tells you something is wrong. Only Stoney tells you which change caused it, who wrote the change, and which ticket the change now contradicts.

That’s drift forensics. This page explains what drift is in Stoney’s model, what the forensic report contains, and how to operationalize it without setting up a triage queue.


What drift means here

A drift violation is recorded whenever an approved rule’s expected behavior stops being true.

Two detection paths:

  • Active drift. Your existing CI pipeline runs against a preview deployment, and a contract Stoney’s registry expects to hold no longer does. The failure is captured as a drift violation and auto-resolves when a later run passes.
  • Shape drift. A passively-monitored route’s response shape changes in a way that breaks the baseline Stoney recorded when the rule was approved — a field removed, a type changed, a required field gone optional.

Both paths produce the same artifact: a drift row tied to the rule that broke, a forensic report attached to it, and (if owners and escalation channels are configured) a Slack DM or Jira comment to the rule’s owner. See Ownership and escalation for how that last step works.


What the forensic report contains

The forensic report is the part that makes drift triage tractable. Every drift row carries a structured report with five things:

┌─ Forensic report ────────────────────────────────────────┐ │ │ │ Rule broken: Orders must have positive totals │ │ Confidence: HIGH │ │ │ │ Culprit PR: #247 by @alice │ │ "Allow negative totals for refunds" │ │ Merged 3h before drift was detected │ │ │ │ Contradicted: SALES-412 — Order total validation │ │ (status: Done) │ │ │ │ Diff excerpt: app/api/orders/route.ts:47 │ │ - if (body.total <= 0) return badReq… │ │ + // validation removed — see SALES-… │ │ │ │ Reasoning: PR #247 by @alice merged 3h before │ │ drift was detected. It modified │ │ orders.ts, which enforces the rule │ │ "Orders must have positive totals." │ │ That rule was tied to ticket SALES-412 │ │ — "Order total validation" (Done). │ │ │ └───────────────────────────────────────────────────────────┘

Each report names:

  1. The rule that broke, with its plain-English statement.
  2. The PR most likely to have caused the break, by author, title, and time relative to the drift.
  3. The Jira ticket the change now contradicts, when one is connected.
  4. The exact diff line that introduced the problem, so the engineer who reads the report doesn’t have to hunt.
  5. A natural-language explanation that ties all of the above together in one paragraph.

That last bullet is doing a surprising amount of work. The on-call engineer doesn’t have to assemble the story themselves — the report says it out loud.


Confidence levels — what they mean to you

The forensic resolver doesn’t pretend to be certain when it isn’t. Each report carries a confidence level so the engineer reading it knows how much to trust it:

LevelWhen the report is at this confidenceWhat to do
HighThe rule matched cleanly, a PR touched the exact method, an authorizing ticket is linked, and the PR merged within 24h of the drift.Trust the attribution. Fix the rule violation or revert the PR.
MediumMost of the above but missing one signal — usually a ticket link or per-method match.Verify in the PR diff before acting. The report is usually right but worth a 30-second double-check.
LowWe found a PR that touched the file but couldn’t tie it specifically to the rule.Treat as a starting point, not an answer. Use the “recent PRs” list as a triage shortlist.

Stoney’s job is to drastically shrink the investigation surface, not to replace code review. A High-confidence report typically saves an engineer 30 to 90 minutes of git-blame and Jira-spelunking; a Low-confidence report still narrows the suspect set from “all of the last week’s PRs” to “these three.”

When the resolver can’t produce a complete report, the gaps field explains honestly what it couldn’t find:

“No recent non-maintenance PRs found that touched this rule’s source files before the drift was detected.”

“No Jira ticket linked to this rule or the culprit PR.”

A partial report with gaps is still more useful than a blank alert. The UI shows the gaps rather than hiding them.


Where forensic reports show up

You won’t need to remember to open Stoney to see them. Reports appear in three places:

On the rule detail page. Every drifting rule has an expandable Forensics section under its Drift tab. The first person to open it generates the report; everyone after sees the cached version.

In the Slack DM the rule owner receives. When ownership is wired up, the owner gets a compact version of the forensic report — PR link, ticket, one-sentence reasoning — in their DM. They can act on it without opening the dashboard at all.

In the SOC 2 evidence export. Each drift event in the compliance bundle includes the same attribution snippet. Auditors love this format because it’s the paper trail they already expect for CC7.1.


What auto-resolution does for you

When a previously-failing contract starts passing again — usually because the PR was reverted or a fix landed — Stoney automatically marks the drift as resolved and records the time-to-resolve. You don’t have to remember to close the loop.

This means two things:

  1. Your unresolved-drift counter actually reflects reality. It decays without manual intervention.
  2. Mean-time-to-resolve (MTTR) becomes a number you can show your auditor or your customers. It’s computed honestly from the drift open / resolve timestamps.

You can also manually resolve a drift from the rule detail page if you’ve determined the failure was a false positive. Manual resolutions record who closed the drift and why, which feeds into the SOC 2 evidence export.


Re-running forensics

The first week or two after onboarding, Jira links sometimes need cleanup, and a few forensic reports will land with the wrong authorizing ticket. The fix is fast: click Re-run forensics on the rule’s drift entry. The resolver pulls fresh data and writes a new report.

Re-runs are rate-limited to one per drift per hour so the system can’t be hammered, but the limit is generous enough that day-of-onboarding cleanup is not painful.


Drift forensics is the killer demo

If you can only show one Stoney feature to a skeptical engineering leader, show them this one. The contrast between “contract test failed” (which is what their current tools say) and “PR #247 by @alice removed the validation, which contradicts SALES-412” (which is what Stoney says) tends to be the moment a buyer’s mental model of API monitoring shifts.

It’s also the feature that compounds the most over time. The longer Stoney has been watching your repo, the better the forensic attribution gets, because the rule’s history and the codebase’s history are more connected.


Next

Forensics tells you what happened. Ownership and escalation covers who hears about it, on what channel, and how to override.

Last updated on