Skip to content

Architecture

pwnkit is a fully autonomous agentic pentesting framework that covers AI/LLM apps, web applications, package ecosystems, and source code. It runs autonomous AI agents in a plan-discover-attack-verify-report pipeline. For web pentesting, the agent uses a shell-first approach — bash (curl, python3, bash) is the primary tool, not structured APIs. For LLM and code targets, the agent uses specialized tools (send_prompt, read_file). Blind verification kills false positives — every finding is independently re-exploited by a second agent that never sees the original reasoning.

The core pipeline has five stages:

Plan -> Discover -> Attack -> Verify -> Report

These stages are grouped into two agent sessions:

1. Research agent (Plan + Discover + Attack + PoC)

Section titled “1. Research agent (Plan + Discover + Attack + PoC)”

A single agent session that:

  1. Plans the engagement — estimates target difficulty, identifies likely vulnerability classes, and prioritizes attack vectors. Research into top pentesting agents (KinoSec at 92.3%, XBOW at 85%, MAPTA at 76.9%) shows that planning before execution is a shared trait of high-performing agents. The plan is injected into the system prompt so the agent starts with a strategy rather than fumbling through discovery.
  2. Discovers the attack surface — maps endpoints, detects models, identifies features, fingerprints web technologies, and enumerates exposed paths
  3. Attacks the target — crafts multi-turn attacks spanning prompt injection, jailbreaks, tool poisoning, data exfiltration (LLM), CORS misconfiguration, SSRF, XSS, path traversal, header injection (web), supply chain and malicious code analysis (npm), and vulnerability patterns (source code)
  4. Writes PoC code — produces a proof-of-concept that demonstrates each vulnerability

Challenge hints. When available, challenge descriptions are passed to the agent as context. This is standard practice — XBOW provides challenge descriptions to all agents in their benchmark. It is not benchmark-specific tuning; it is how a real pentester would receive a scope document.

The research agent’s tool set depends on the target type:

  • Web targets: bash (primary — run curl, python3, bash, sqlmap, anything), browser (Playwright-based headless browser for XSS testing and JavaScript-rendered pages), save_finding, done. The structured tools (crawl_page, submit_form, http_request) are available but optional — benchmarking showed the agent performs better with just shell access.
  • LLM targets: send_prompt (talk to AI/LLM apps), bash, save_finding, done.
  • Source/npm targets: read_file, search_code, list_files, run_command, save_finding.

The agent adapts its strategy based on what it discovers — if a naive prompt injection fails, it may try encoding bypasses, multi-turn escalation, or indirect injection. For web apps, it escalates from fingerprinting to active exploitation using real pentesting tools via shell. For source code, it traces data flows from user input to dangerous sinks.

Reflection checkpoints. When the agent reaches 60% of its turn budget, pwnkit injects a reflection prompt forcing the agent to review what has been tried, what failed, and what alternative approaches remain. This is inspired by deadend-cli (78% on XBOW) and PentestAgent’s self-reflection mechanism. Without reflection, agents frequently stall on a single approach and exhaust their budget.

Turn budget. MAPTA data shows 40 tool calls is the sweet spot for CTF-style challenges — enough to complete multi-step exploit chains without wasting tokens on dead ends. Deep mode uses a budget of 40 turns (increased from the original 20).

2. Triage stage (Finding verification pipeline)

Section titled “2. Triage stage (Finding verification pipeline)”

Between the research agent’s raw findings and the final report, findings flow through a multi-layer triage pipeline. Each layer rejects, downgrades, or confirms findings based on independent signals. See the FP Reduction Moat page for the measured per-profile results from the 2026-04-11 ablation, the 2026-04-11 ablation results log for experiment context and raw artifacts, and the Finding Triage ML design doc for the underlying research.

Note on EGATS (layer 11): The single-feature ablation on 2026-04-11 found that egatsTreeSearch is the one layer that regresses solve rate on hard challenges and costs ~10× the next-worst layer per flag. It has been removed from the moat and moat-only profile aliases in CI and is now opt-in only. See pwnkit#116. The broader takeaway: the moat’s effect is mode-dependent — strictly positive on XBOW black-box, a Pareto tradeoff on XBOW white-box, a no-op on npm-bench. A static scan-level policy can’t optimize all three slices, which is the direct motivation for the learned-routing work in pwnkit#113.

LayerModulePurpose
Holding-it-wrong filterpackages/core/src/triage/holding-it-wrong.tsRejects findings where the “vulnerability” is the documented behaviour of the called function (e.g. eval, writeFile, compile). Downgrades to info.
Feature extractorpackages/core/src/triage/feature-extractor.ts45 handcrafted features (response, request, metadata, text quality, cross-field) for fast first-pass signal.
Reachability gatepackages/core/src/triage/reachability.tsSuppresses findings whose sink is not reachable from an application entry point. Open-source mirror of Endor Labs’ “Code API” moat.
Multi-modal agreementpackages/core/src/triage/multi-modal.tsCross-validates against foxguard (Rust pattern scanner). Both fire = strong signal; foxguard silent on scanned file = likely FP.
Per-class oraclespackages/core/src/triage/oracles.tsDeterministic exploit oracles per category (SQLi, XSS, SSRF, RCE, path traversal, IDOR). Verified = accept with no LLM call.
PoV gatepackages/core/src/triage/pov-gate.tsNarrowly-scoped mini agent loop must produce a working executable PoC. No PoV = downgrade to info. Based on “All You Need Is A Fuzzing Brain”.
Structured verify pipelinepackages/core/src/triage/verify-pipeline.ts4-step LLM verification: reachability -> payload validation -> impact assessment -> exploit confirmation. Category-specific addendums per vuln class.
Consensus verifypackages/core/src/triage/verify-pipeline.ts (runSelfConsistencyVerify)Runs the structured verify pipeline N times in parallel and takes the majority vote with early termination.
Triage memoriespackages/core/src/triage/memories.tsSemgrep-style per-target FP memories. Injected as few-shot into the verify prompt; strong matches auto-reject without an LLM call.
Adversarial debatepackages/core/src/triage/adversarial.tsProsecutor vs. defender vs. judge with fresh contexts, based on Anthropic’s debate paper (arXiv:2402.06782). Uncorrelated error modes vs. single-pass verify.

Most layers are gated by feature flags (PWNKIT_FEATURE_REACHABILITY_GATE, PWNKIT_FEATURE_MULTIMODAL, PWNKIT_FEATURE_CONSENSUS_VERIFY, PWNKIT_FEATURE_POV_GATE, PWNKIT_FEATURE_TRIAGE_MEMORIES, PWNKIT_FEATURE_DEBATE) so they can be A/B tested independently. See packages/core/src/agent/features.ts for the full list.

The verify agent receives only the PoC code and the file path. It never sees the research agent’s reasoning, chain of thought, or attack strategy. This is the same principle as double-blind peer review.

The verify agent independently:

  • Traces data flow from the PoC
  • Attempts to reproduce the finding
  • Confirms or kills the finding

If the verify agent cannot reproduce the vulnerability, it is killed as a false positive. This eliminates the noise that plagues other scanners.

Only confirmed findings (those that survived blind verification) are included in the final report. Output formats:

  • Terminal — default interactive summary with share URL
  • HTML — rich browser report
  • PDF — printable report
  • SARIF — for the GitHub Security tab
  • Markdown — human-readable report
  • JSON — machine-readable for pipelines

Each finding includes a severity score, category, PoC code, and remediation guidance.

The pipeline adapts its tooling and attack strategy based on the target type:

ModeTargetWhat it does
deepLLM API URLPrompt injection, jailbreaks, tool poisoning, data exfiltration, multi-turn escalation (40-turn budget)
probeLLM API URLLightweight surface scan of an LLM API
webWeb application URLCORS, headers, exposed files, SSRF, XSS, path traversal, fingerprinting
mcpMCP serverTool poisoning, schema abuse, permission escalation
auditPackage or image nameSupply chain analysis, malicious code detection, dependency risk across npm, pypi, cargo, and oci
reviewLocal path or GitHub URLAI-powered source code vulnerability analysis

The mode is auto-detected from the target when possible, or set explicitly with --mode.

pwnkit decouples the scanning pipeline from the LLM backend through runtime adapters. Each adapter implements the same interface but connects to a different provider:

AdapterBackendHow it works
ApiRuntimeOpenRouter / Anthropic / OpenAIDirect HTTP calls to the provider’s API
ClaudeRuntimeClaude Code CLISpawns claude as a subprocess with tool definitions
CodexRuntimeCodex CLISpawns codex as a subprocess
GeminiRuntimeGemini CLISpawns the Gemini CLI
McpRuntimeMCP serversConnects to Model Context Protocol servers
AutoRuntimeBest availableDetects installed CLIs and picks the best per stage

The --runtime flag selects which adapter to use. The auto runtime probes for installed CLIs and picks the most capable one for each pipeline stage (for example, using Claude for deep reasoning and the API for quick classification).

pwnkit integrates with the Model Context Protocol (MCP) in two ways:

The McpRuntime adapter can connect to MCP servers, using their exposed tools as the LLM backend for the scanning pipeline. This enables using any MCP-compatible model server.

The --mode mcp scan mode probes MCP servers for:

  • Tool poisoning — malicious tool definitions that inject instructions
  • Schema abuse — tool schemas designed to exfiltrate data
  • Permission escalation — tools that request more access than needed

The product is intentionally split into two surfaces:

  • CLI — the execution surface for local runs, CI, replay, and exports
  • Dashboard — the local verification workbench for triage, evidence review, and human sign-off

The CLI runs scans and produces findings. The dashboard consumes those findings and provides a Kanban-style board for triage, evidence inspection, and disposition tracking. Both share the same local SQLite database.

For web application pentesting, pwnkit uses a shell-first approach. Instead of routing the agent through structured tools like crawl_page, submit_form, or http_request, the web mode gives the agent a minimal tool set:

  • bash — run any bash command (curl, sqlmap, python, nmap, etc.)
  • save_finding — record a confirmed vulnerability with PoC
  • done — signal completion

This works because the model already knows curl, bash pipelines, and standard pentesting tools from training data. A single curl -c cookies.txt ... | jq command replaces multiple structured tool calls and eliminates the state-threading confusion that causes agents to loop.

The structured tools (crawl_page, submit_form, http_request) are still available as optional additions, but benchmarking showed the agent performs better with just shell access.

See the Research page for the full rationale and data behind this design decision and the Benchmark page for detailed results.

Each agent has access to a set of tools depending on the scan type:

ToolUsed inPurpose
bashWeb, LLM, VerifyPrimary tool for web pentesting. Run any shell command (curl, python3, bash, sqlmap, nmap, etc.). Renamed from shell_exec to match pi-mono’s naming convention.
browserWebPlaywright-based headless browser for XSS testing and JavaScript-rendered pages. Complements bash/curl for cases where a real browser DOM is needed.
save_findingAll modesRecord a discovered vulnerability with PoC
doneAll modesSignal that the agent has finished
send_promptLLMSend prompts to AI/LLM apps
read_fileSource, npmRead source files for code review
run_commandSource, npmExecute commands in a sandbox
list_filesSource, npmEnumerate files in a directory
search_codeSource, npmSearch for patterns across a codebase
crawl_pageWeb (optional)Crawl a web page — available but bash with curl is preferred
submit_formWeb (optional)Submit a form — available but bash with curl is preferred
http_requestWeb (optional)Send HTTP requests — available but bash with curl is preferred