Configuration
pwnkit is designed for zero-config usage, but every default can be overridden via CLI flags or environment variables.
Runtime modes
Section titled “Runtime modes”pwnkit is an agentic harness — bring your own AI. The --runtime flag controls which LLM backend powers the agents.
| Runtime | Flag | Description |
|---|---|---|
api | --runtime api | Uses your API key (OpenRouter, Anthropic, Azure OpenAI, or OpenAI). Best for CI and quick scans. Default. |
claude | --runtime claude | Spawns the Claude Code CLI with your existing subscription. Best for deep analysis. |
codex | --runtime codex | Spawns the Codex CLI. Best for source-level analysis. |
gemini | --runtime gemini | Spawns the Gemini CLI. Best for large-context source analysis. |
auto | --runtime auto | Auto-detects installed CLIs and picks the best one per pipeline stage. |
API runtime
Section titled “API runtime”The default api runtime makes direct HTTP calls to an LLM provider. It requires one of these environment variables:
export OPENROUTER_API_KEY="sk-or-..." # Recommendedexport ANTHROPIC_API_KEY="sk-ant-..."export AZURE_OPENAI_API_KEY="..."export OPENAI_API_KEY="sk-..."See API Keys for the full priority order and provider details.
If you use Azure, also set AZURE_OPENAI_BASE_URL and AZURE_OPENAI_MODEL unless pwnkit can read them from a valid Azure-backed ~/.codex/config.toml. For the Responses API, the base URL should include /openai/v1. pwnkit fails fast on incomplete Azure config instead of attempting a scan with guessed defaults.
CLI runtimes (claude, codex, gemini)
Section titled “CLI runtimes (claude, codex, gemini)”These runtimes spawn the respective CLI tool as a subprocess. You must have the CLI installed and authenticated:
# Claude Code CLInpm i -g @anthropic-ai/claude-code
# Codex CLInpm i -g @openai/codex
# Gemini CLInpm i -g @google/gemini-cliThen use them:
npx pwnkit-cli scan --target https://api.example.com/chat --runtime claudenpx pwnkit-cli review ./my-repo --runtime codex --depth deepScan modes
Section titled “Scan modes”The --mode flag controls what kind of target is being scanned.
| Mode | Description |
|---|---|
deep | Full autonomous pentest. Runs the research + verify agents with the full 40-turn budget. Default when the target is an https:// URL. |
probe | Lightweight surface scan — recon and fingerprinting without deep exploitation. |
web | Shell-first autonomous pentesting for web applications. The agent uses bash (curl, python3, bash) as its primary tool to probe for CORS, headers, exposed files, SSRF, XSS, SQLi, SSTI, and more. |
mcp | Scan MCP (Model Context Protocol) servers for tool poisoning and schema abuse. Default when the target starts with mcp://. |
# LLM API scan (default)npx pwnkit-cli scan --target https://api.example.com/chat
# Web app scannpx pwnkit-cli scan --target https://example.com --mode webDepth settings
Section titled “Depth settings”The --depth flag controls how thorough the scan is.
| Depth | Test Cases | Typical Time | Best For |
|---|---|---|---|
quick | ~15 | ~1 min | CI pipelines, smoke tests |
default | ~50 | ~3 min | Day-to-day scanning |
deep | ~150 | ~10 min | Pre-launch audits, thorough review |
npx pwnkit-cli scan --target https://api.example.com/chat --depth quicknpx pwnkit-cli audit express --depth deepnpx pwnkit-cli review ./my-repo --depth deep --runtime claudeOutput formats
Section titled “Output formats”pwnkit supports multiple output formats:
| Format | Description |
|---|---|
terminal | Human-readable terminal summary with share URL |
html | Rich browser report saved to a temporary file |
pdf | Printable report saved to a temporary file |
json | Machine-readable JSON output for pipelines |
sarif | SARIF format for the GitHub Security tab |
markdown | Human-readable Markdown report |
In CI (GitHub Action), set format: sarif to populate the Security tab:
- uses: peaktwilight/pwnkit@main with: mode: review path: . format: sarifDiff-aware review
Section titled “Diff-aware review”For PR workflows, review only changed files against a base branch:
npx pwnkit-cli review ./my-repo --diff-base origin/main --changed-onlyThis is particularly useful in CI to avoid scanning the entire codebase on every PR.
Verbose output
Section titled “Verbose output”Use --verbose to see the animated attack replay and detailed agent reasoning:
npx pwnkit-cli scan --target https://api.example.com/chat --verboseFeature flags
Section titled “Feature flags”pwnkit ships a set of agent-improvement features behind environment-variable flags so you can A/B test them and opt in/out per run. Every flag is read at process start; set <FLAG>=0 or <FLAG>=false to disable, anything else to enable.
| Flag | Default | What it enables |
|---|---|---|
PWNKIT_FEATURE_EARLY_STOP | on | Early-stop at 50% budget if no findings, then retry with a different strategy. |
PWNKIT_FEATURE_LOOP_DETECTION | on | Detects A-A-A and A-B-A-B action loops, injects a warning to break the cycle. |
PWNKIT_FEATURE_CONTEXT_COMPACTION | on | Compresses middle-of-conversation messages when the context exceeds 30k tokens. |
PWNKIT_FEATURE_SCRIPT_TEMPLATES | on | Adds exploit-script templates (blind SQLi, SSTI, auth chain) to the shell prompt. |
PWNKIT_FEATURE_DYNAMIC_PLAYBOOKS | off | Injects technology-specific vulnerability playbooks after the recon phase. |
PWNKIT_FEATURE_EXTERNAL_MEMORY | off | Agent writes plan/creds to disk, re-injected at reflection checkpoints. |
PWNKIT_FEATURE_PROGRESS_HANDOFF | off | Injects prior-attempt findings when retrying, so retries don’t restart from zero. |
PWNKIT_FEATURE_WEB_SEARCH | off | Lets the agent search the web for CVE details, vendor docs, and technique references. |
PWNKIT_FEATURE_DOCKER_EXECUTOR | off | Runs every bash command inside a Kali Linux container with the full pentesting toolchain. |
PWNKIT_FEATURE_CLOUD_SINK | on | Allows opt-in streaming of findings/final reports to a remote scan sink when the cloud env vars are set. |
PWNKIT_FEATURE_PTY_SESSION | off | Interactive PTY sessions for exploits requiring interactivity (reverse shells, DB clients, SSH). |
PWNKIT_FEATURE_EGATS | off | Evidence-Gated Attack Tree Search — beam search over a hypothesis tree. Also toggled by --egats. |
PWNKIT_FEATURE_CONSENSUS_VERIFY | off | Self-consistency voting: runs the verify pipeline N times and takes the majority vote. |
PWNKIT_FEATURE_DEBATE | off | Adversarial debate: prosecutor vs. defender agents argue each finding, a skeptical judge decides. |
PWNKIT_FEATURE_MULTIMODAL | off | Cross-validates findings against foxguard (Rust pattern scanner). |
PWNKIT_FEATURE_REACHABILITY_GATE | off | Suppresses findings whose sink is not reachable from an application entry point. |
PWNKIT_FEATURE_POV_GATE | off | Requires a working executable PoC per finding, otherwise downgrades to info. |
PWNKIT_FEATURE_TRIAGE_MEMORIES | off | Injects Semgrep-style per-target persistent FP memories into the verify pipeline. Pairs with pwnkit-cli triage. |
Docker executor overrides
Section titled “Docker executor overrides”When PWNKIT_FEATURE_DOCKER_EXECUTOR=1 is enabled, these extra env vars
control the container image and bootstrap behavior:
| Variable | Default | Purpose |
|---|---|---|
PWNKIT_DOCKER_IMAGE | ghcr.io/peaktwilight/pwnkit:latest | Override the executor image |
PWNKIT_DOCKER_BOOTSTRAP_TOOLS | auto | Force or disable apt-based tool bootstrap inside the container |
Bootstrap rules:
- default GHCR image -> no bootstrap, use the pre-baked toolchain
kalilinux/kali-rolling-> bootstrap tools on first startPWNKIT_DOCKER_BOOTSTRAP_TOOLS=1-> always bootstrapPWNKIT_DOCKER_BOOTSTRAP_TOOLS=0-> never bootstrap
Cost ceiling
Section titled “Cost ceiling”You can bound API spend per scan, audit, or review:
export PWNKIT_COST_CEILING_USD=5npx pwnkit-cli scan --target https://example.com --mode webOr override it per command:
npx pwnkit-cli audit lodash --cost-ceiling 2npx pwnkit-cli review ./my-repo --cost-ceiling 10If the ceiling is exceeded, pwnkit preserves partial findings and exits with code 4.
Cloud sink
Section titled “Cloud sink”If you want to stream findings and the final report to an orchestration layer:
export PWNKIT_CLOUD_SINK=https://api.example.comexport PWNKIT_CLOUD_SCAN_ID=scan_123export PWNKIT_CLOUD_TOKEN=secret-tokenWhen set, pwnkit posts:
- each finding as
{ "finding": ... } - the final report as
{ "report": ..., "final": true }
to:
${PWNKIT_CLOUD_SINK}/scans/${PWNKIT_CLOUD_SCAN_ID}/findingsSet PWNKIT_FEATURE_CLOUD_SINK=0 to disable this behavior even when the env vars are present.
Machine-readable result line
Section titled “Machine-readable result line”Set:
export PWNKIT_EMIT_RESULT_LINE=1to make the CLI print one final PWNKIT_RESULT=... JSON line summarizing:
- success/failure
- exit code and exit reason
- target type
- finding counts
- estimated cost and token usage when available
This is useful for wrappers, CI parsers, and the cloud orchestration path.
Example: maximum-accuracy pentest
Section titled “Example: maximum-accuracy pentest”Turn on every false-positive reduction feature for a client-ready scan:
export PWNKIT_FEATURE_CONSENSUS_VERIFY=1export PWNKIT_FEATURE_REACHABILITY_GATE=1export PWNKIT_FEATURE_POV_GATE=1export PWNKIT_FEATURE_TRIAGE_MEMORIES=1export PWNKIT_FEATURE_MULTIMODAL=1
npx pwnkit-cli scan --target https://example.com --mode web --depth deepExample: Kali toolchain + web search
Section titled “Example: Kali toolchain + web search”export PWNKIT_FEATURE_DOCKER_EXECUTOR=1export PWNKIT_FEATURE_WEB_SEARCH=1
npx pwnkit-cli scan --target https://example.com --mode webExample: raw Kali fallback
Section titled “Example: raw Kali fallback”export PWNKIT_FEATURE_DOCKER_EXECUTOR=1export PWNKIT_DOCKER_IMAGE=kalilinux/kali-rollingexport PWNKIT_DOCKER_BOOTSTRAP_TOOLS=1
npx pwnkit-cli scan --target https://example.com --mode web