Skip to content

Finding Triage

Autonomous pentesters are only as valuable as their false-positive rate. pwnkit ships a triage pipeline between the research agent and the blind verify agent. Every finding walks through a stack of independent filters, each of which can kill, downgrade, or boost it. Most filters are deterministic, zero-cost, and run before any LLM verification token is spent.

Status (2026-04-11): The effect of this pipeline has now been measured end-to-end. See the FP Reduction Moat page for the numbers and the 2026-04-11 ablation results log for the narrative. Short version: the stack strictly dominates the no-triage baseline on XBOW black-box, is a Pareto tradeoff on XBOW white-box (costs 2 flags at limit=50 for 63% fewer findings), and is a no-op on npm-bench. Layer 11 (EGATS) is the one broken layer and is opt-in only — see pwnkit#116.

Each stage is independently configurable via environment variables and surfaced through packages/core/src/triage/.

Module: triage/holding-it-wrong.ts (always on)

Kills findings where the “vulnerability” is literally the documented behavior of the sink. Classic examples: reporting fs.writeFile as an arbitrary-file-write vuln, vm.compileFunction as code execution, or toFunction(cb) as callback injection. The filter downgrades the finding to info and skips downstream verification.

Module: triage/feature-extractor.ts (always available)

Extracts a 45-element numeric vector per finding: response shape (status, size, reflection, error markers), payload signals (encoding, sink class, parameter location), and category priors. Inspired by VulnBERT’s hybrid architecture — handcrafted features alone achieve ~77% recall / 16% FPR, and the same vector fuses cleanly with neural embeddings for downstream ML.

See FEATURE_NAMES in the module for the full ordered feature list.

For a complete reference, use Feature Extractor. For the labeled JSONL pipeline that carries this vector into model training, use Triage Dataset.

Module: triage/oracles.ts (always on for supported categories)

Deterministic, category-specific verification oracles. No exploit, no report.

CategoryOracleProof
SQLiverifySqliSQL error signatures + timing delta under sleep payloads
Reflected XSSverifyReflectedXssUnique token reflected in an executable context
SSRFverifySsrfOut-of-band callback (spins a local listener on demand)
RCEverifyRceCommand output round-trip through the response
Path traversalverifyPathTraversal/etc/passwd signature (or Windows equivalent)
IDORverifyIdorDifferential response across identities

Call verifyOracleByCategory(finding, target) to dispatch by category.

Module: triage/reachability.ts Flag: PWNKIT_FEATURE_REACHABILITY_GATE=1

When a source tree is available, walks imports, route mounts, and framework entry points to check whether the vulnerable sink is actually reachable from an HTTP handler, CLI main, or user-facing API. Dead code and test-only paths are suppressed before we spend LLM tokens verifying them.

This is a zero-dependency grep/pattern pass today and is deliberately conservative: when it cannot make a confident call it returns reachable: true with low confidence so later stages still get a chance. A tree-sitter-based interprocedural upgrade is planned.

5. Multi-modal agreement (foxguard × pwnkit)

Section titled “5. Multi-modal agreement (foxguard × pwnkit)”

Module: triage/multi-modal.ts Flag: PWNKIT_FEATURE_MULTIMODAL=1

When both a source tree and the foxguard binary are available, pwnkit runs foxguard against the same code and cross-checks every finding against foxguard’s SARIF output.

  • Both scanners fire on the same file / category → auto-accepted with high confidence.
  • Only pwnkit fires, foxguard scanned the file cleanly → down-weighted or auto-rejected.
  • foxguard didn’t scan the file → no signal either way.
Terminal window
export PWNKIT_FEATURE_MULTIMODAL=1
pwnkit scan --target https://example.com --repo ./source

This is the opensoar-hq trinity validation pattern: pwnkit detects, foxguard cross-checks, opensoar responds.

Module: triage/pov-gate.ts Flag: PWNKIT_FEATURE_POV_GATE=1

Backed by the empirical ground truth from All You Need Is A Fuzzing Brain (arXiv:2509.07225): if an agent can’t build a working PoC in N turns, the finding is almost certainly a false positive.

Spins up a narrowly-scoped mini agent loop whose only job is to produce a concrete, executable exploit that demonstrably works. No speculation, no “would-be” payloads — the exploit must run and the response must contain category-specific proof of exploitation.

  • hasPov: true → boost confidence, attach the artifact to finding.evidence.
  • hasPov: false → downgrade severity to info and set triageNote = "no_pov".

Module: triage/verify-pipeline.ts (default when a runtime is available)

Inspired by GitHub Security Lab’s taskflow-agent approach, the single-shot blind verify is decomposed into four focused subtasks, each with domain- specific prompts and category-specific addendums:

  1. Reachability analysis — can the vuln be triggered from external input?
  2. Payload validation — does the PoC actually demonstrate the claim?
  3. Impact assessment — what is the real-world security impact?
  4. Exploit confirmation — independently reproduce with only the PoC and the target path.

Any step failure marks the finding as a false positive.

Flag: PWNKIT_FEATURE_CONSENSUS_VERIFY=1

Runs the structured verify pipeline N times (different sampling seeds) and takes the majority vote. Trades tokens for confidence — useful on ambiguous findings where a single verify pass is noisy.

Module: triage/memories.ts CLI: pwnkit-cli triage ... Flag: PWNKIT_FEATURE_TRIAGE_MEMORIES=1

Semgrep-style per-target persistent FP context that learns from human triage decisions. When a user marks a finding as a false positive (and says why), the reason is stored as a TriageMemory. On future scans the memories are injected as few-shot examples into the verify prompt, and a sufficiently strong match auto-rejects the finding without spending a verify call.

Scope hierarchy:

  • global — applies to every scan.
  • package — applies to findings whose target starts with a given package identifier (npm name, repo prefix).
  • target — applies only to an exact target URL or path.

Relevance is currently a lightweight token-overlap heuristic; an embedding-backed ranker can replace scoreMemory without touching the public API.

Terminal window
# Mark a finding as a false positive and remember why
pwnkit-cli triage mark-fp <finding-id> --reason "test fixture, not prod"
# Add a standalone memory (without a backing finding)
pwnkit-cli triage memory add --finding <id> --reason "sink is harmless helper" \
--scope package --scope-value my-pkg
# List memories
pwnkit-cli triage memory list --scope target

Module: triage/adversarial.ts Flag: PWNKIT_FEATURE_DEBATE=1

Two fresh-context agents argue opposing positions — a prosecutor makes the case that the finding is real, a defender makes the case that it is a false positive — and a third, deliberately skeptical judge picks the winner. Each agent sees only the other side’s written arguments, never the original research agent’s chain of thought.

This is the open-source implementation of Anthropic’s debate paper (arXiv:2402.06782). The point is error decorrelation: single-pass verify shares priors with the discovery agent (same model, same prompt family), so their mistakes line up. Adversarial agents with opposing instructions have uncorrelated error modes and catch cases that a single verifier misses.

Section titled “11. EGATS — Evidence-Gated Attack Tree Search”

Flag: --egats or PWNKIT_FEATURE_EGATS=1

Beam-search over an explicit hypothesis tree. The agent proposes attack branches, each with required evidence, and only expands branches where prior evidence is observed. Dead hypotheses are pruned aggressively, which keeps the budget focused on exploitable paths.

EGATS is the highest-variance stage in the pipeline — use it when you need breadth (e.g. unknown-class vulnerabilities) rather than depth on a known lead.

Env varDefaultStage
PWNKIT_FEATURE_REACHABILITY_GATEoff4
PWNKIT_FEATURE_MULTIMODALoff5
PWNKIT_FEATURE_POV_GATEoff6
PWNKIT_FEATURE_CONSENSUS_VERIFYoff8
PWNKIT_FEATURE_TRIAGE_MEMORIESoff9
PWNKIT_FEATURE_DEBATEoff10
PWNKIT_FEATURE_EGATSoff11

See Features for the complete env-var inventory.