Commands
All commands are available via npx pwnkit-cli <command>. You can also skip the subcommand and let auto-detect figure it out (see Getting Started).
Probe AI/LLM apps, web apps, APIs, or MCP servers for vulnerabilities.
# Scan an LLM APInpx pwnkit-cli scan --target https://api.example.com/chat
# Scan a traditional web appnpx pwnkit-cli scan --target https://example.com --mode web
# Deep scan with Claude Code CLInpx pwnkit-cli scan --target https://api.example.com/chat --depth deep --runtime claude
# Authenticated scan using a bearer tokennpx pwnkit-cli scan --target https://api.example.com \ --auth '{"type":"bearer","token":"eyJhbGciOi..."}'
# Scan an API with an OpenAPI spec pre-loadednpx pwnkit-cli scan --target https://api.example.com --api-spec ./openapi.yaml
# Run 5 attack strategies in parallel — first to succeed winsnpx pwnkit-cli scan --target https://example.com --mode web --race
# Evidence-Gated Attack Tree Search (EGATS)npx pwnkit-cli scan --target https://example.com --mode web --egats
# Abort cleanly if the scan exceeds a USD ceilingnpx pwnkit-cli scan --target https://example.com --mode web --cost-ceiling 5
# Export findings to GitHub Issuesnpx pwnkit-cli scan --target https://example.com --mode web \ --export github:myorg/myrepo
# Generate an HTML report (auto-opens in browser)npx pwnkit-cli scan --target https://example.com --mode web \ --format htmlKey flags:
| Flag | Description | Default |
|---|---|---|
--target <url> | The URL or mcp:// endpoint to scan | (required) |
--mode <mode> | Scan mode: probe, deep, mcp, web | auto |
--depth <depth> | Scan depth: quick, default, deep | default |
--runtime <rt> | Runtime: auto, api, claude, codex, gemini | auto |
--format <fmt> | Output format: terminal, json, md, html, sarif, pdf | terminal |
--timeout <ms> | Request timeout in milliseconds | 30000 |
--api-key <key> | API key for the LLM provider | (from env) |
--model <model> | Specific LLM model to use | provider default |
--repo <path> | Local source code path for white-box scanning | (none) |
--auth <json> | Authenticated scanning credentials (see below) | (none) |
--api-spec <path> | Path to an OpenAPI 3.x / Swagger 2.0 spec (JSON or YAML) | (none) |
--export <target> | Export findings to an issue tracker, e.g. github:owner/repo | (none) |
--race | Best-of-N: run 5 attack strategies in parallel, first-to-succeed wins | false |
--egats | Evidence-Gated Attack Tree Search (beam search over hypothesis tree) | false |
--cost-ceiling <usd> | Hard USD ceiling; aborts cleanly with partial findings preserved if exceeded | (none) |
--db-path <path> | Path to SQLite database | ~/.pwnkit/pwnkit.db |
--verbose | Show animated attack replay and detailed agent reasoning | false |
--replay | Replay the last scan’s results without re-running | false |
--auth credential formats
Section titled “--auth credential formats”The --auth flag accepts either an inline JSON string or a path to a JSON file. Four credential types are supported:
# Bearer token--auth '{"type":"bearer","token":"eyJhbGciOi..."}'
# Session cookie--auth '{"type":"cookie","value":"session=abc123; csrf=def456"}'
# HTTP Basic auth--auth '{"type":"basic","username":"admin","password":"hunter2"}'
# Custom header (e.g. API key)--auth '{"type":"header","name":"X-API-Key","value":"sk_live_..."}'
# Or load from a file--auth ./auth.json--api-spec — OpenAPI / Swagger import
Section titled “--api-spec — OpenAPI / Swagger import”Point --api-spec at an OpenAPI 3.x or Swagger 2.0 document (JSON or YAML). pwnkit will parse the spec, extract all endpoints with their parameter schemas and auth requirements, and seed the recon phase with that knowledge so the agent starts pentesting with full endpoint awareness instead of having to crawl.
npx pwnkit-cli scan --target https://api.example.com --api-spec ./openapi.yaml--race — best-of-N strategy racing
Section titled “--race — best-of-N strategy racing”With --race, pwnkit spawns 5 attack strategies in parallel against the same target. The first agent to confirm a finding wins; the others are terminated. Ideal for hard targets where a single linear attack plan gets stuck.
--egats — Evidence-Gated Attack Tree Search
Section titled “--egats — Evidence-Gated Attack Tree Search”EGATS performs a beam search over a tree of attack hypotheses, pruning branches that fail evidence checks. Slower than --race but much more thorough.
--cost-ceiling — hard spend guardrail
Section titled “--cost-ceiling — hard spend guardrail”Set a hard per-scan USD ceiling:
npx pwnkit-cli scan --target https://example.com --mode web --cost-ceiling 5If cumulative estimated spend exceeds the ceiling, pwnkit:
- preserves findings collected so far
- exits with status code
4 - emits
exit_reason: "cost_ceiling_exceeded"in the machine-readable result line
The CLI flag overrides PWNKIT_COST_CEILING_USD.
--export github:owner/repo
Section titled “--export github:owner/repo”Pushes every confirmed finding to a GitHub repo as an issue, with severity labels, evidence blocks, and reproduction steps. Requires GITHUB_TOKEN in the environment with repo scope.
Install and security-audit any npm package with static analysis and AI review.
npx pwnkit-cli audit express --version 4.18.2npx pwnkit-cli audit react --depth deep --runtime claudenpx pwnkit-cli audit left-pad --format htmlThe package is installed in a sandbox, scanned with semgrep, and then reviewed by an AI agent that traces data flow and looks for supply-chain vulnerabilities.
Key flags:
| Flag | Description | Default |
|---|---|---|
<package> | npm package name | (required) |
--version <v> | Specific version to audit | latest |
--depth <d> | Audit depth: quick, default, deep | default |
--runtime <rt> | Runtime: auto, api, claude, codex, gemini | auto |
--format <fmt> | Output format: terminal, json, md, html, sarif, pdf | terminal |
--timeout <ms> | AI agent timeout in milliseconds | 600000 |
--api-key <key> | API key for the LLM provider | (from env) |
--model <model> | Specific LLM model to use | provider default |
--cost-ceiling <usd> | Hard USD ceiling; aborts cleanly with partial findings preserved if exceeded | (none) |
--db-path <path> | Path to SQLite database | ~/.pwnkit/pwnkit.db |
--verbose | Detailed agent output | false |
review
Section titled “review”Deep source code security review of a local repo or GitHub URL.
# Review a local directorynpx pwnkit-cli review ./my-ai-app
# Review a GitHub repo (cloned automatically)npx pwnkit-cli review https://github.com/user/repo
# Diff-aware review against a base branchnpx pwnkit-cli review ./my-repo --diff-base origin/main --changed-onlyKey flags:
| Flag | Description | Default |
|---|---|---|
<repo> | Local path or git URL | (required) |
--depth <d> | Review depth: quick, default, deep | default |
--format <fmt> | Output format: terminal, json, md, html, sarif, pdf | terminal |
--runtime <rt> | Runtime: auto, api, claude, codex, gemini | auto |
--diff-base <ref> | Git base ref for diff-aware review | (none) |
--changed-only | Restrict semgrep + prioritization to changed files | false |
--timeout <ms> | AI agent timeout in milliseconds | 600000 |
--api-key <key> | API key for LLM provider | (from env) |
--model <model> | Specific LLM model to use | provider default |
--cost-ceiling <usd> | Hard USD ceiling; aborts cleanly with partial findings preserved if exceeded | (none) |
--db-path <path> | Path to SQLite database | ~/.pwnkit/pwnkit.db |
--verbose | Detailed agent output | false |
triage
Section titled “triage”Triage findings and manage learned false-positive memories. Every time you mark a finding as a false positive, pwnkit stores a pattern that future verify passes will consult — think Semgrep’s nosemgrep but learned automatically.
# Create a memory from an existing findingpwnkit-cli triage memory add --finding NF-001 --reason "test fixture, not reachable in prod"
# List all memoriespwnkit-cli triage memory listpwnkit-cli triage memory list --scope target --category xss
# Delete a memorypwnkit-cli triage memory remove <memory-id>
# Mark a finding as FP and auto-create a memorypwnkit-cli triage mark-fp NF-042 --reason "known sandbox echo endpoint"triage memory add
| Flag | Description | Default |
|---|---|---|
--finding <id> | Finding ID (full or prefix) to derive the memory from | (required) |
--reason <text> | Why this finding is a false positive | (required) |
--scope <scope> | Memory scope: global, target, package | target |
--scope-value <v> | Scope identifier (target URL or package name) | (inferred) |
--db-path <path> | Path to SQLite database | default |
triage memory list
| Flag | Description |
|---|---|
--scope <scope> | Filter by scope: global, target, package |
--category <cat> | Filter by vulnerability category |
--db-path <path> | Path to SQLite database |
triage memory remove <id> — deletes a memory by its ID.
triage mark-fp <finding-id> — flips a finding’s triage status to suppressed and auto-creates a memory.
| Flag | Description | Default |
|---|---|---|
--reason <text> | Why this finding is a false positive | (required) |
--scope <scope> | Memory scope | target |
--scope-value <v> | Scope identifier | (inferred) |
Enable memory injection into the verify pipeline with PWNKIT_FEATURE_TRIAGE_MEMORIES=1 (see Configuration).
resume
Section titled “resume”Resume a persisted review or audit scan by its scan ID.
npx pwnkit-cli resume <scan-id>Useful when a long-running deep scan was interrupted or when you want to continue where a previous run left off.
dashboard
Section titled “dashboard”Open the local verification workbench for board-based triage, evidence review, and scan provenance.
npx pwnkit-cli dashboardnpx pwnkit-cli dashboard --port 48123The dashboard provides a Kanban-style board for triaging findings, reviewing evidence, and tracking active scans. It runs entirely locally.
Key flags:
| Flag | Description | Default |
|---|---|---|
--port <port> | Port to bind | 48123 |
--host <host> | Host to bind | 127.0.0.1 |
--no-open | Do not auto-open a browser | (opens by default) |
--db-path <path> | Path to SQLite database | ~/.pwnkit/pwnkit.db |
history
Section titled “history”Browse past scans with status, depth, findings count, and duration.
npx pwnkit-cli historynpx pwnkit-cli history --limit 20| Flag | Description | Default |
|---|---|---|
--limit <n> | Number of scans to show | 10 |
findings
Section titled “findings”Query, filter, and inspect verified findings across all scans. Findings are persisted in a local SQLite database.
# List all findingsnpx pwnkit-cli findings list
# Filter by severitynpx pwnkit-cli findings list --severity critical
# Filter by category and statusnpx pwnkit-cli findings list --category prompt-injection --status confirmed
# Inspect a specific finding with full evidencenpx pwnkit-cli findings show NF-001
# Triage findingsnpx pwnkit-cli findings accept <finding-id> --note "confirmed and tracked"npx pwnkit-cli findings suppress <finding-id> --note "known test fixture"npx pwnkit-cli findings reopen <finding-id>Finding lifecycle: discovered -> verified -> confirmed -> scored -> reported (or false-positive if verification fails).
Subcommands:
| Subcommand | Description |
|---|---|
list | List findings with optional filters |
show <id> | Show a finding with full evidence |
accept <id> | Accept a finding as confirmed |
suppress <id> | Suppress a finding (known FP or accepted risk) |
reopen <id> | Reopen a previously suppressed finding |
XBOW benchmark runner
Section titled “XBOW benchmark runner”The XBOW benchmark runner lives in packages/benchmark and is invoked with pnpm --filter @pwnkit/benchmark xbow. It runs pwnkit against the 104 XBOW validation challenges and reports pass/fail with evidence.
# Run the whole benchmarkpnpm --filter @pwnkit/benchmark xbow
# Run a specific subset of challengespnpm --filter @pwnkit/benchmark xbow --only XBEN-010,XBEN-051,XBEN-066
# Skip the first 20 challenges (useful for resuming)pnpm --filter @pwnkit/benchmark xbow --start 20
# Include full finding objects in results JSON (for offline analysis)pnpm --filter @pwnkit/benchmark xbow --save-findings| Flag | Description | Default |
|---|---|---|
--only <ids> | Comma-separated challenge IDs to run | (all 104) |
--start <n> | Skip the first n challenges | 0 |
--save-findings | Include full finding objects in the results JSON | false |