Skip to content

Configuration

pwnkit is designed for zero-config usage, but every default can be overridden via CLI flags or environment variables.

pwnkit is an agentic harness — bring your own AI. The --runtime flag controls which LLM backend powers the agents.

RuntimeFlagDescription
api--runtime apiUses your API key (OpenRouter, Anthropic, Azure OpenAI, or OpenAI). Best for CI and quick scans. Default.
claude--runtime claudeSpawns the Claude Code CLI with your existing subscription. Best for deep analysis.
codex--runtime codexSpawns the Codex CLI. Best for source-level analysis.
gemini--runtime geminiSpawns the Gemini CLI. Best for large-context source analysis.
auto--runtime autoAuto-detects installed CLIs and picks the best one per pipeline stage.

The default api runtime makes direct HTTP calls to an LLM provider. It requires one of these environment variables:

Terminal window
export OPENROUTER_API_KEY="sk-or-..." # Recommended
export ANTHROPIC_API_KEY="sk-ant-..."
export AZURE_OPENAI_API_KEY="..."
export OPENAI_API_KEY="sk-..."

See API Keys for the full priority order and provider details.

If you use Azure, also set AZURE_OPENAI_BASE_URL and AZURE_OPENAI_MODEL unless pwnkit can read them from a valid Azure-backed ~/.codex/config.toml. For the Responses API, the base URL should include /openai/v1. pwnkit fails fast on incomplete Azure config instead of attempting a scan with guessed defaults.

These runtimes spawn the respective CLI tool as a subprocess. You must have the CLI installed and authenticated:

Terminal window
# Claude Code CLI
npm i -g @anthropic-ai/claude-code
# Codex CLI
npm i -g @openai/codex
# Gemini CLI
npm i -g @google/gemini-cli

Then use them:

Terminal window
npx pwnkit-cli scan --target https://api.example.com/chat --runtime claude
npx pwnkit-cli review ./my-repo --runtime codex --depth deep

The --mode flag controls what kind of target is being scanned.

ModeDescription
deepFull autonomous pentest. Runs the research + verify agents with the full 40-turn budget. Default when the target is an https:// URL.
probeLightweight surface scan — recon and fingerprinting without deep exploitation.
webShell-first autonomous pentesting for web applications. The agent uses bash (curl, python3, bash) as its primary tool to probe for CORS, headers, exposed files, SSRF, XSS, SQLi, SSTI, and more.
mcpScan MCP (Model Context Protocol) servers for tool poisoning and schema abuse. Default when the target starts with mcp://.
Terminal window
# LLM API scan (default)
npx pwnkit-cli scan --target https://api.example.com/chat
# Web app scan
npx pwnkit-cli scan --target https://example.com --mode web

The --depth flag controls how thorough the scan is.

DepthTest CasesTypical TimeBest For
quick~15~1 minCI pipelines, smoke tests
default~50~3 minDay-to-day scanning
deep~150~10 minPre-launch audits, thorough review
Terminal window
npx pwnkit-cli scan --target https://api.example.com/chat --depth quick
npx pwnkit-cli audit express --depth deep
npx pwnkit-cli review ./my-repo --depth deep --runtime claude

pwnkit supports multiple output formats:

FormatDescription
terminalHuman-readable terminal summary with share URL
htmlRich browser report saved to a temporary file
pdfPrintable report saved to a temporary file
jsonMachine-readable JSON output for pipelines
sarifSARIF format for the GitHub Security tab
markdownHuman-readable Markdown report

In CI (GitHub Action), set format: sarif to populate the Security tab:

- uses: peaktwilight/pwnkit@main
with:
mode: review
path: .
format: sarif

For PR workflows, review only changed files against a base branch:

Terminal window
npx pwnkit-cli review ./my-repo --diff-base origin/main --changed-only

This is particularly useful in CI to avoid scanning the entire codebase on every PR.

Use --verbose to see the animated attack replay and detailed agent reasoning:

Terminal window
npx pwnkit-cli scan --target https://api.example.com/chat --verbose

pwnkit ships a set of agent-improvement features behind environment-variable flags so you can A/B test them and opt in/out per run. Every flag is read at process start; set <FLAG>=0 or <FLAG>=false to disable, anything else to enable.

FlagDefaultWhat it enables
PWNKIT_FEATURE_EARLY_STOPonEarly-stop at 50% budget if no findings, then retry with a different strategy.
PWNKIT_FEATURE_LOOP_DETECTIONonDetects A-A-A and A-B-A-B action loops, injects a warning to break the cycle.
PWNKIT_FEATURE_CONTEXT_COMPACTIONonCompresses middle-of-conversation messages when the context exceeds 30k tokens.
PWNKIT_FEATURE_SCRIPT_TEMPLATESonAdds exploit-script templates (blind SQLi, SSTI, auth chain) to the shell prompt.
PWNKIT_FEATURE_DYNAMIC_PLAYBOOKSoffInjects technology-specific vulnerability playbooks after the recon phase.
PWNKIT_FEATURE_EXTERNAL_MEMORYoffAgent writes plan/creds to disk, re-injected at reflection checkpoints.
PWNKIT_FEATURE_PROGRESS_HANDOFFoffInjects prior-attempt findings when retrying, so retries don’t restart from zero.
PWNKIT_FEATURE_WEB_SEARCHoffLets the agent search the web for CVE details, vendor docs, and technique references.
PWNKIT_FEATURE_DOCKER_EXECUTORoffRuns every bash command inside a Kali Linux container with the full pentesting toolchain.
PWNKIT_FEATURE_CLOUD_SINKonAllows opt-in streaming of findings/final reports to a remote scan sink when the cloud env vars are set.
PWNKIT_FEATURE_PTY_SESSIONoffInteractive PTY sessions for exploits requiring interactivity (reverse shells, DB clients, SSH).
PWNKIT_FEATURE_EGATSoffEvidence-Gated Attack Tree Search — beam search over a hypothesis tree. Also toggled by --egats.
PWNKIT_FEATURE_CONSENSUS_VERIFYoffSelf-consistency voting: runs the verify pipeline N times and takes the majority vote.
PWNKIT_FEATURE_DEBATEoffAdversarial debate: prosecutor vs. defender agents argue each finding, a skeptical judge decides.
PWNKIT_FEATURE_MULTIMODALoffCross-validates findings against foxguard (Rust pattern scanner).
PWNKIT_FEATURE_REACHABILITY_GATEoffSuppresses findings whose sink is not reachable from an application entry point.
PWNKIT_FEATURE_POV_GATEoffRequires a working executable PoC per finding, otherwise downgrades to info.
PWNKIT_FEATURE_TRIAGE_MEMORIESoffInjects Semgrep-style per-target persistent FP memories into the verify pipeline. Pairs with pwnkit-cli triage.

When PWNKIT_FEATURE_DOCKER_EXECUTOR=1 is enabled, these extra env vars control the container image and bootstrap behavior:

VariableDefaultPurpose
PWNKIT_DOCKER_IMAGEghcr.io/peaktwilight/pwnkit:latestOverride the executor image
PWNKIT_DOCKER_BOOTSTRAP_TOOLSautoForce or disable apt-based tool bootstrap inside the container

Bootstrap rules:

  • default GHCR image -> no bootstrap, use the pre-baked toolchain
  • kalilinux/kali-rolling -> bootstrap tools on first start
  • PWNKIT_DOCKER_BOOTSTRAP_TOOLS=1 -> always bootstrap
  • PWNKIT_DOCKER_BOOTSTRAP_TOOLS=0 -> never bootstrap

You can bound API spend per scan, audit, or review:

Terminal window
export PWNKIT_COST_CEILING_USD=5
npx pwnkit-cli scan --target https://example.com --mode web

Or override it per command:

Terminal window
npx pwnkit-cli audit lodash --cost-ceiling 2
npx pwnkit-cli review ./my-repo --cost-ceiling 10

If the ceiling is exceeded, pwnkit preserves partial findings and exits with code 4.

If you want to stream findings and the final report to an orchestration layer:

Terminal window
export PWNKIT_CLOUD_SINK=https://api.example.com
export PWNKIT_CLOUD_SCAN_ID=scan_123
export PWNKIT_CLOUD_TOKEN=secret-token

When set, pwnkit posts:

  • each finding as { "finding": ... }
  • the final report as { "report": ..., "final": true }

to:

${PWNKIT_CLOUD_SINK}/scans/${PWNKIT_CLOUD_SCAN_ID}/findings

Set PWNKIT_FEATURE_CLOUD_SINK=0 to disable this behavior even when the env vars are present.

Set:

Terminal window
export PWNKIT_EMIT_RESULT_LINE=1

to make the CLI print one final PWNKIT_RESULT=... JSON line summarizing:

  • success/failure
  • exit code and exit reason
  • target type
  • finding counts
  • estimated cost and token usage when available

This is useful for wrappers, CI parsers, and the cloud orchestration path.

Turn on every false-positive reduction feature for a client-ready scan:

Terminal window
export PWNKIT_FEATURE_CONSENSUS_VERIFY=1
export PWNKIT_FEATURE_REACHABILITY_GATE=1
export PWNKIT_FEATURE_POV_GATE=1
export PWNKIT_FEATURE_TRIAGE_MEMORIES=1
export PWNKIT_FEATURE_MULTIMODAL=1
npx pwnkit-cli scan --target https://example.com --mode web --depth deep
Terminal window
export PWNKIT_FEATURE_DOCKER_EXECUTOR=1
export PWNKIT_FEATURE_WEB_SEARCH=1
npx pwnkit-cli scan --target https://example.com --mode web
Terminal window
export PWNKIT_FEATURE_DOCKER_EXECUTOR=1
export PWNKIT_DOCKER_IMAGE=kalilinux/kali-rolling
export PWNKIT_DOCKER_BOOTSTRAP_TOOLS=1
npx pwnkit-cli scan --target https://example.com --mode web