Configuration

pwnkit is designed for zero-config usage, but every default can be overridden via CLI flags or environment variables.

Runtime modes

pwnkit is an agentic harness — bring your own AI. The --runtime flag controls which LLM backend powers the agents.

Runtime	Flag	Description
`api`	`--runtime api`	Uses your API key (OpenRouter, Anthropic, Azure OpenAI, or OpenAI). Best for CI and quick scans. Default.
`claude`	`--runtime claude`	Spawns the Claude Code CLI with your existing subscription. Best for deep analysis.
`codex`	`--runtime codex`	Spawns the Codex CLI. Best for source-level analysis.
`gemini`	`--runtime gemini`	Spawns the Gemini CLI. Best for large-context source analysis.
`auto`	`--runtime auto`	Auto-detects installed CLIs and picks the best one per pipeline stage.

API runtime

The default api runtime makes direct HTTP calls to an LLM provider. It requires one of these environment variables:

export OPENROUTER_API_KEY="sk-or-..."   # Recommended
export ANTHROPIC_API_KEY="sk-ant-..."
export AZURE_OPENAI_API_KEY="..."
export OPENAI_API_KEY="sk-..."

See API Keys for the full priority order and provider details.

If you use Azure, also set AZURE_OPENAI_BASE_URL and AZURE_OPENAI_MODEL unless pwnkit can read them from a valid Azure-backed ~/.codex/config.toml. For the Responses API, the base URL should include /openai/v1. pwnkit fails fast on incomplete Azure config instead of attempting a scan with guessed defaults.

CLI runtimes (claude, codex, gemini)

These runtimes spawn the respective CLI tool as a subprocess. You must have the CLI installed and authenticated:

# Claude Code CLI
npm i -g @anthropic-ai/claude-code

# Codex CLI
npm i -g @openai/codex

# Gemini CLI
npm i -g @google/gemini-cli

Then use them:

npx pwnkit-cli scan --target https://api.example.com/chat --runtime claude
npx pwnkit-cli review ./my-repo --runtime codex --depth deep

Scan modes

The --mode flag controls what kind of target is being scanned.

Mode	Description
`deep`	Full autonomous pentest. Runs the research + verify agents with the full 40-turn budget. Default when the target is an `https://` URL.
`probe`	Lightweight surface scan — recon and fingerprinting without deep exploitation.
`web`	Shell-first autonomous pentesting for web applications. The agent uses `bash` (curl, python3, bash) as its primary tool to probe for CORS, headers, exposed files, SSRF, XSS, SQLi, SSTI, and more.
`mcp`	Scan MCP (Model Context Protocol) servers for tool poisoning and schema abuse. Default when the target starts with `mcp://`.

# LLM API scan (default)
npx pwnkit-cli scan --target https://api.example.com/chat

# Web app scan
npx pwnkit-cli scan --target https://example.com --mode web

Depth settings

The --depth flag controls how thorough the scan is.

Depth	Test Cases	Typical Time	Best For
`quick`	~15	~1 min	CI pipelines, smoke tests
`default`	~50	~3 min	Day-to-day scanning
`deep`	~150	~10 min	Pre-launch audits, thorough review

npx pwnkit-cli scan --target https://api.example.com/chat --depth quick
npx pwnkit-cli audit express --depth deep
npx pwnkit-cli review ./my-repo --depth deep --runtime claude

Output formats

pwnkit supports multiple output formats:

Format	Description
`terminal`	Human-readable terminal summary with share URL
`html`	Rich browser report saved to a temporary file
`pdf`	Printable report saved to a temporary file
`json`	Machine-readable JSON output for pipelines
`sarif`	SARIF format for the GitHub Security tab
`markdown`	Human-readable Markdown report

In CI (GitHub Action), set format: sarif to populate the Security tab:

- uses: peaktwilight/pwnkit@main
  with:
    mode: review
    path: .
    format: sarif

Diff-aware review

For PR workflows, review only changed files against a base branch:

npx pwnkit-cli review ./my-repo --diff-base origin/main --changed-only

This is particularly useful in CI to avoid scanning the entire codebase on every PR.

Verbose output

Use --verbose to see the animated attack replay and detailed agent reasoning:

npx pwnkit-cli scan --target https://api.example.com/chat --verbose

Feature flags

pwnkit ships a set of agent-improvement features behind environment-variable flags so you can A/B test them and opt in/out per run. Every flag is read at process start; set <FLAG>=0 or <FLAG>=false to disable, anything else to enable.

Flag	Default	What it enables
`PWNKIT_FEATURE_EARLY_STOP`	on	Early-stop at 50% budget if no findings, then retry with a different strategy.
`PWNKIT_FEATURE_LOOP_DETECTION`	on	Detects A-A-A and A-B-A-B action loops, injects a warning to break the cycle.
`PWNKIT_FEATURE_CONTEXT_COMPACTION`	on	Compresses middle-of-conversation messages when the context exceeds 30k tokens.
`PWNKIT_FEATURE_SCRIPT_TEMPLATES`	on	Adds exploit-script templates (blind SQLi, SSTI, auth chain) to the shell prompt.
`PWNKIT_FEATURE_DYNAMIC_PLAYBOOKS`	off	Injects technology-specific vulnerability playbooks after the recon phase.
`PWNKIT_FEATURE_EXTERNAL_MEMORY`	off	Agent writes plan/creds to disk, re-injected at reflection checkpoints.
`PWNKIT_FEATURE_PROGRESS_HANDOFF`	off	Injects prior-attempt findings when retrying, so retries don’t restart from zero.
`PWNKIT_FEATURE_WEB_SEARCH`	off	Lets the agent search the web for CVE details, vendor docs, and technique references.
`PWNKIT_FEATURE_DOCKER_EXECUTOR`	off	Runs every bash command inside a Kali Linux container with the full pentesting toolchain.
`PWNKIT_FEATURE_CLOUD_SINK`	on	Allows opt-in streaming of findings/final reports to a remote scan sink when the cloud env vars are set.
`PWNKIT_FEATURE_PTY_SESSION`	off	Interactive PTY sessions for exploits requiring interactivity (reverse shells, DB clients, SSH).
`PWNKIT_FEATURE_EGATS`	off	Evidence-Gated Attack Tree Search — beam search over a hypothesis tree. Also toggled by `--egats`.
`PWNKIT_FEATURE_CONSENSUS_VERIFY`	off	Self-consistency voting: runs the verify pipeline N times and takes the majority vote.
`PWNKIT_FEATURE_DEBATE`	off	Adversarial debate: prosecutor vs. defender agents argue each finding, a skeptical judge decides.
`PWNKIT_FEATURE_MULTIMODAL`	off	Cross-validates findings against foxguard (Rust pattern scanner).
`PWNKIT_FEATURE_REACHABILITY_GATE`	off	Suppresses findings whose sink is not reachable from an application entry point.
`PWNKIT_FEATURE_POV_GATE`	off	Requires a working executable PoC per finding, otherwise downgrades to `info`.
`PWNKIT_FEATURE_TRIAGE_MEMORIES`	off	Injects Semgrep-style per-target persistent FP memories into the verify pipeline. Pairs with `pwnkit-cli triage`.

Docker executor overrides

When PWNKIT_FEATURE_DOCKER_EXECUTOR=1 is enabled, these extra env vars control the container image and bootstrap behavior:

Variable	Default	Purpose
`PWNKIT_DOCKER_IMAGE`	`ghcr.io/peaktwilight/pwnkit:latest`	Override the executor image
`PWNKIT_DOCKER_BOOTSTRAP_TOOLS`	auto	Force or disable apt-based tool bootstrap inside the container

Bootstrap rules:

default GHCR image -> no bootstrap, use the pre-baked toolchain
kalilinux/kali-rolling -> bootstrap tools on first start
PWNKIT_DOCKER_BOOTSTRAP_TOOLS=1 -> always bootstrap
PWNKIT_DOCKER_BOOTSTRAP_TOOLS=0 -> never bootstrap

Cost ceiling

You can bound API spend per scan, audit, or review:

export PWNKIT_COST_CEILING_USD=5
npx pwnkit-cli scan --target https://example.com --mode web

Or override it per command:

npx pwnkit-cli audit lodash --cost-ceiling 2
npx pwnkit-cli review ./my-repo --cost-ceiling 10

If the ceiling is exceeded, pwnkit preserves partial findings and exits with code 4.

Cloud sink

If you want to stream findings and the final report to an orchestration layer:

export PWNKIT_CLOUD_SINK=https://api.example.com
export PWNKIT_CLOUD_SCAN_ID=scan_123
export PWNKIT_CLOUD_TOKEN=secret-token

When set, pwnkit posts:

each finding as { "finding": ... }
the final report as { "report": ..., "final": true }

to:

${PWNKIT_CLOUD_SINK}/scans/${PWNKIT_CLOUD_SCAN_ID}/findings

Set PWNKIT_FEATURE_CLOUD_SINK=0 to disable this behavior even when the env vars are present.

Machine-readable result line

Set:

export PWNKIT_EMIT_RESULT_LINE=1

to make the CLI print one final PWNKIT_RESULT=... JSON line summarizing:

success/failure
exit code and exit reason
target type
finding counts
estimated cost and token usage when available

This is useful for wrappers, CI parsers, and the cloud orchestration path.

Example: maximum-accuracy pentest

Turn on every false-positive reduction feature for a client-ready scan:

export PWNKIT_FEATURE_CONSENSUS_VERIFY=1
export PWNKIT_FEATURE_REACHABILITY_GATE=1
export PWNKIT_FEATURE_POV_GATE=1
export PWNKIT_FEATURE_TRIAGE_MEMORIES=1
export PWNKIT_FEATURE_MULTIMODAL=1

npx pwnkit-cli scan --target https://example.com --mode web --depth deep

Example: Kali toolchain + web search

export PWNKIT_FEATURE_DOCKER_EXECUTOR=1
export PWNKIT_FEATURE_WEB_SEARCH=1

npx pwnkit-cli scan --target https://example.com --mode web

Example: raw Kali fallback

export PWNKIT_FEATURE_DOCKER_EXECUTOR=1
export PWNKIT_DOCKER_IMAGE=kalilinux/kali-rolling
export PWNKIT_DOCKER_BOOTSTRAP_TOOLS=1

npx pwnkit-cli scan --target https://example.com --mode web