Skip to content

Features

pwnkit is a fully autonomous agentic pentesting framework. This page is a complete, category-organized inventory of what ships in the current release. For deep dives, follow the linked pages.

TargetCommandWhat pwnkit finds
Web appsscan --target <url> --mode webSQLi, IDOR, SSTI, XSS, auth bypass, SSRF, LFI, RCE, file upload, deserialization, request smuggling
AI / LLM appsscan --target <url>Prompt injection, jailbreaks, system-prompt extraction, PII leakage, MCP tool abuse
npm packagesaudit <pkg>Malicious code, known CVEs, supply-chain attacks
Source codereview <path>SAST-style vulnerabilities via static analysis + AI review
White-boxscan --target <url> --repo <path>Source-aware scanning — reads code before attacking
MCP serversscan --target mcp://…Tool poisoning and schema abuse
FlagDescription
--target <url>Target URL or mcp:// endpoint (required)
--mode <m>probe, deep, mcp, or web
--depth <d>quick, default, or deep
--runtime <rt>api, claude, codex, gemini, or auto
--format <f>terminal, json, md, html, sarif, pdf
--repo <path>Source code path for white-box scanning
--auth <json|file>Authenticated scanning. JSON string or file path; supports bearer, cookie, basic, header
--api-spec <path>Pre-load endpoints from OpenAPI 3.x / Swagger 2.0 (JSON or YAML)
--export <target>Export findings to an issue tracker, e.g. github:owner/repo
--raceBest-of-N strategy racing — run multiple attack strategies in parallel
--egatsEnable Evidence-Gated Attack Tree Search (beam search over hypotheses)
--cost-ceiling <usd>Hard USD ceiling — abort cleanly with partial findings preserved if exceeded
--verboseAnimated attack replay
--replayRe-render the last scan’s results from the local DB
Terminal window
npx pwnkit-cli scan --target https://app.example.com \
--auth '{"type":"bearer","token":"eyJhbGciOi..."}'
# Or point at a JSON file
npx pwnkit-cli scan --target https://app.example.com --auth ./auth.json

Supported auth types: bearer, cookie, basic, header.

Terminal window
npx pwnkit-cli scan --target https://api.example.com \
--api-spec ./openapi.yaml

Pre-loads endpoint/parameter knowledge so the agent starts from a rich surface map instead of discovering everything from scratch.

Terminal window
npx pwnkit-cli scan --target https://example.com \
--export github:my-org/my-repo
Terminal window
npx pwnkit-cli scan --target https://example.com --race

Runs multiple attack strategies in parallel and keeps whichever one produces a verified finding first.

Terminal window
npx pwnkit-cli scan --target https://example.com --egats

Evidence-Gated Attack Tree Search: the agent maintains an explicit hypothesis tree and only expands branches backed by observed evidence.

RuntimeDescription
apiDirect HTTP to an LLM provider. Default.
claudeSpawns the Claude Code CLI.
codexSpawns the OpenAI Codex CLI.
geminiSpawns the Gemini CLI.
autoAuto-detect the best runtime per pipeline stage.

Supported providers: OpenRouter (multi-model ensemble), Anthropic, Azure OpenAI, and OpenAI. See API Keys for priority order.

FeatureFlag / env varDescription
Shell executordefaultHost bash with curl, python3, and standard tooling
Kali Docker executorPWNKIT_FEATURE_DOCKER_EXECUTOR=1Runs bash inside a Kali container with the full pentesting toolset
Cloud sinkPWNKIT_CLOUD_SINK + PWNKIT_CLOUD_SCAN_IDStreams findings/final report to a remote orchestrator endpoint
PTY sessionsPWNKIT_FEATURE_PTY_SESSION=1Long-lived interactive sessions (reverse shells, DB clients, SSH)
Playwright browserauto in web modeReal-browser verification for XSS, cracked XBEN-011 & XBEN-018
Web searchPWNKIT_FEATURE_WEB_SEARCH=1Lets the agent look up CVE details and technique references
FormatDescription
terminalColored terminal report (default)
jsonMachine-readable JSON
md / markdownHuman-readable Markdown
htmlHTML report
sarifSARIF 2.1 — drops into GitHub’s Security tab

You can also ask the CLI to emit a final machine-readable PWNKIT_RESULT=... line with PWNKIT_EMIT_RESULT_LINE=1 for wrappers and orchestration layers.

pwnkit ships a full triage pipeline with 11 independent layers. See Finding Triage for the full reference.

  • Holding-it-wrong filter
  • 45-feature extractor
  • Per-class oracles (SQLi, XSS, SSRF, RCE, path traversal, IDOR)
  • Reachability gate
  • Multi-modal agreement (foxguard × pwnkit)
  • PoV generation gate
  • Structured 4-step verify pipeline
  • Self-consistency voting
  • Assistant memories (Semgrep-style)
  • Adversarial debate (prosecutor vs defender vs judge)
  • EGATS (Evidence-Gated Attack Tree Search)
FeatureFlag / env varDescription
Early-stop + retryPWNKIT_FEATURE_EARLY_STOP (on)Stops at 50% budget with no findings and retries with a different strategy
Loop detectionPWNKIT_FEATURE_LOOP_DETECTION (on)Detects A-A-A / A-B-A-B patterns and injects a warning
Context compactionPWNKIT_FEATURE_CONTEXT_COMPACTION (on)LLM-based compression of middle messages at 30k tokens
Exploit templatesPWNKIT_FEATURE_SCRIPT_TEMPLATES (on)Blind-SQLi / SSTI / auth-chain exploit scripts in the prompt
Dynamic playbooksPWNKIT_FEATURE_DYNAMIC_PLAYBOOKSVuln-class playbooks injected after recon
External working memoryPWNKIT_FEATURE_EXTERNAL_MEMORYAgent writes plan/creds to disk; re-injected at reflection checkpoints
Progress handoffPWNKIT_FEATURE_PROGRESS_HANDOFFPrior attempt findings injected when retrying
Adversarial debatePWNKIT_FEATURE_DEBATEProsecutor vs defender debate with a skeptical judge
  • XBOW black-box: 87.5% (91/104) — single model, 3 tools, full 104-challenge coverage.
  • XBOW white-box best-of-N aggregate: 92.3% (96/104) — same model + tools with --repo source access, aggregated across features=none/experimental/all. Beats MAPTA (76.9%), deadend-cli (77.6%), Cyber-AutoAgent (84.6%), XBOW’s own agent (85%), and BoxPwnr’s best single-model score (81.7%).
  • AI/LLM regression suite: 10/10 on the self-authored suite covering prompt injection, jailbreaks, system-prompt extraction, PII leakage, encoding bypass, multi-turn escalation, MCP SSRF.
  • AutoPenBench, HarmBench, npm audit harnesses shipped; see Benchmark.

pwnkit is one leg of an open-source three-part security stack:

  • pwnkit — AI agent pentester (detect)
  • foxguard — Rust security scanner (prevent)
  • opensoar — Python-native SOAR platform (respond)

With PWNKIT_FEATURE_MULTIMODAL=1, pwnkit automatically cross-validates every finding against foxguard’s pattern scanner — the same neural + symbolic agreement pattern Endor Labs uses to reach ~95% FP elimination, except fully open source.