Skip to content

Commands

All commands are available via npx pwnkit-cli <command>. You can also skip the subcommand and let auto-detect figure it out (see Getting Started).

Probe AI/LLM apps, web apps, APIs, or MCP servers for vulnerabilities.

Terminal window
# Scan an LLM API
npx pwnkit-cli scan --target https://api.example.com/chat
# Scan a traditional web app
npx pwnkit-cli scan --target https://example.com --mode web
# Deep scan with Claude Code CLI
npx pwnkit-cli scan --target https://api.example.com/chat --depth deep --runtime claude
# Authenticated scan using a bearer token
npx pwnkit-cli scan --target https://api.example.com \
--auth '{"type":"bearer","token":"eyJhbGciOi..."}'
# Scan an API with an OpenAPI spec pre-loaded
npx pwnkit-cli scan --target https://api.example.com --api-spec ./openapi.yaml
# Run 5 attack strategies in parallel — first to succeed wins
npx pwnkit-cli scan --target https://example.com --mode web --race
# Evidence-Gated Attack Tree Search (EGATS)
npx pwnkit-cli scan --target https://example.com --mode web --egats
# Abort cleanly if the scan exceeds a USD ceiling
npx pwnkit-cli scan --target https://example.com --mode web --cost-ceiling 5
# Export findings to GitHub Issues
npx pwnkit-cli scan --target https://example.com --mode web \
--export github:myorg/myrepo
# Generate an HTML report (auto-opens in browser)
npx pwnkit-cli scan --target https://example.com --mode web \
--format html

Key flags:

FlagDescriptionDefault
--target <url>The URL or mcp:// endpoint to scan(required)
--mode <mode>Scan mode: probe, deep, mcp, webauto
--depth <depth>Scan depth: quick, default, deepdefault
--runtime <rt>Runtime: auto, api, claude, codex, geminiauto
--format <fmt>Output format: terminal, json, md, html, sarif, pdfterminal
--timeout <ms>Request timeout in milliseconds30000
--api-key <key>API key for the LLM provider(from env)
--model <model>Specific LLM model to useprovider default
--repo <path>Local source code path for white-box scanning(none)
--auth <json>Authenticated scanning credentials (see below)(none)
--api-spec <path>Path to an OpenAPI 3.x / Swagger 2.0 spec (JSON or YAML)(none)
--export <target>Export findings to an issue tracker, e.g. github:owner/repo(none)
--raceBest-of-N: run 5 attack strategies in parallel, first-to-succeed winsfalse
--egatsEvidence-Gated Attack Tree Search (beam search over hypothesis tree)false
--cost-ceiling <usd>Hard USD ceiling; aborts cleanly with partial findings preserved if exceeded(none)
--db-path <path>Path to SQLite database~/.pwnkit/pwnkit.db
--verboseShow animated attack replay and detailed agent reasoningfalse
--replayReplay the last scan’s results without re-runningfalse

The --auth flag accepts either an inline JSON string or a path to a JSON file. Four credential types are supported:

Terminal window
# Bearer token
--auth '{"type":"bearer","token":"eyJhbGciOi..."}'
# Session cookie
--auth '{"type":"cookie","value":"session=abc123; csrf=def456"}'
# HTTP Basic auth
--auth '{"type":"basic","username":"admin","password":"hunter2"}'
# Custom header (e.g. API key)
--auth '{"type":"header","name":"X-API-Key","value":"sk_live_..."}'
# Or load from a file
--auth ./auth.json

Point --api-spec at an OpenAPI 3.x or Swagger 2.0 document (JSON or YAML). pwnkit will parse the spec, extract all endpoints with their parameter schemas and auth requirements, and seed the recon phase with that knowledge so the agent starts pentesting with full endpoint awareness instead of having to crawl.

Terminal window
npx pwnkit-cli scan --target https://api.example.com --api-spec ./openapi.yaml

With --race, pwnkit spawns 5 attack strategies in parallel against the same target. The first agent to confirm a finding wins; the others are terminated. Ideal for hard targets where a single linear attack plan gets stuck.

Section titled “--egats — Evidence-Gated Attack Tree Search”

EGATS performs a beam search over a tree of attack hypotheses, pruning branches that fail evidence checks. Slower than --race but much more thorough.

Set a hard per-scan USD ceiling:

Terminal window
npx pwnkit-cli scan --target https://example.com --mode web --cost-ceiling 5

If cumulative estimated spend exceeds the ceiling, pwnkit:

  • preserves findings collected so far
  • exits with status code 4
  • emits exit_reason: "cost_ceiling_exceeded" in the machine-readable result line

The CLI flag overrides PWNKIT_COST_CEILING_USD.

Pushes every confirmed finding to a GitHub repo as an issue, with severity labels, evidence blocks, and reproduction steps. Requires GITHUB_TOKEN in the environment with repo scope.

Install and security-audit any npm package with static analysis and AI review.

Terminal window
npx pwnkit-cli audit express --version 4.18.2
npx pwnkit-cli audit react --depth deep --runtime claude
npx pwnkit-cli audit left-pad --format html

The package is installed in a sandbox, scanned with semgrep, and then reviewed by an AI agent that traces data flow and looks for supply-chain vulnerabilities.

Key flags:

FlagDescriptionDefault
<package>npm package name(required)
--version <v>Specific version to auditlatest
--depth <d>Audit depth: quick, default, deepdefault
--runtime <rt>Runtime: auto, api, claude, codex, geminiauto
--format <fmt>Output format: terminal, json, md, html, sarif, pdfterminal
--timeout <ms>AI agent timeout in milliseconds600000
--api-key <key>API key for the LLM provider(from env)
--model <model>Specific LLM model to useprovider default
--cost-ceiling <usd>Hard USD ceiling; aborts cleanly with partial findings preserved if exceeded(none)
--db-path <path>Path to SQLite database~/.pwnkit/pwnkit.db
--verboseDetailed agent outputfalse

Deep source code security review of a local repo or GitHub URL.

Terminal window
# Review a local directory
npx pwnkit-cli review ./my-ai-app
# Review a GitHub repo (cloned automatically)
npx pwnkit-cli review https://github.com/user/repo
# Diff-aware review against a base branch
npx pwnkit-cli review ./my-repo --diff-base origin/main --changed-only

Key flags:

FlagDescriptionDefault
<repo>Local path or git URL(required)
--depth <d>Review depth: quick, default, deepdefault
--format <fmt>Output format: terminal, json, md, html, sarif, pdfterminal
--runtime <rt>Runtime: auto, api, claude, codex, geminiauto
--diff-base <ref>Git base ref for diff-aware review(none)
--changed-onlyRestrict semgrep + prioritization to changed filesfalse
--timeout <ms>AI agent timeout in milliseconds600000
--api-key <key>API key for LLM provider(from env)
--model <model>Specific LLM model to useprovider default
--cost-ceiling <usd>Hard USD ceiling; aborts cleanly with partial findings preserved if exceeded(none)
--db-path <path>Path to SQLite database~/.pwnkit/pwnkit.db
--verboseDetailed agent outputfalse

Triage findings and manage learned false-positive memories. Every time you mark a finding as a false positive, pwnkit stores a pattern that future verify passes will consult — think Semgrep’s nosemgrep but learned automatically.

Terminal window
# Create a memory from an existing finding
pwnkit-cli triage memory add --finding NF-001 --reason "test fixture, not reachable in prod"
# List all memories
pwnkit-cli triage memory list
pwnkit-cli triage memory list --scope target --category xss
# Delete a memory
pwnkit-cli triage memory remove <memory-id>
# Mark a finding as FP and auto-create a memory
pwnkit-cli triage mark-fp NF-042 --reason "known sandbox echo endpoint"

triage memory add

FlagDescriptionDefault
--finding <id>Finding ID (full or prefix) to derive the memory from(required)
--reason <text>Why this finding is a false positive(required)
--scope <scope>Memory scope: global, target, packagetarget
--scope-value <v>Scope identifier (target URL or package name)(inferred)
--db-path <path>Path to SQLite databasedefault

triage memory list

FlagDescription
--scope <scope>Filter by scope: global, target, package
--category <cat>Filter by vulnerability category
--db-path <path>Path to SQLite database

triage memory remove <id> — deletes a memory by its ID.

triage mark-fp <finding-id> — flips a finding’s triage status to suppressed and auto-creates a memory.

FlagDescriptionDefault
--reason <text>Why this finding is a false positive(required)
--scope <scope>Memory scopetarget
--scope-value <v>Scope identifier(inferred)

Enable memory injection into the verify pipeline with PWNKIT_FEATURE_TRIAGE_MEMORIES=1 (see Configuration).

Resume a persisted review or audit scan by its scan ID.

Terminal window
npx pwnkit-cli resume <scan-id>

Useful when a long-running deep scan was interrupted or when you want to continue where a previous run left off.

Open the local verification workbench for board-based triage, evidence review, and scan provenance.

Terminal window
npx pwnkit-cli dashboard
npx pwnkit-cli dashboard --port 48123

The dashboard provides a Kanban-style board for triaging findings, reviewing evidence, and tracking active scans. It runs entirely locally.

Key flags:

FlagDescriptionDefault
--port <port>Port to bind48123
--host <host>Host to bind127.0.0.1
--no-openDo not auto-open a browser(opens by default)
--db-path <path>Path to SQLite database~/.pwnkit/pwnkit.db

Browse past scans with status, depth, findings count, and duration.

Terminal window
npx pwnkit-cli history
npx pwnkit-cli history --limit 20
FlagDescriptionDefault
--limit <n>Number of scans to show10

Query, filter, and inspect verified findings across all scans. Findings are persisted in a local SQLite database.

Terminal window
# List all findings
npx pwnkit-cli findings list
# Filter by severity
npx pwnkit-cli findings list --severity critical
# Filter by category and status
npx pwnkit-cli findings list --category prompt-injection --status confirmed
# Inspect a specific finding with full evidence
npx pwnkit-cli findings show NF-001
# Triage findings
npx pwnkit-cli findings accept <finding-id> --note "confirmed and tracked"
npx pwnkit-cli findings suppress <finding-id> --note "known test fixture"
npx pwnkit-cli findings reopen <finding-id>

Finding lifecycle: discovered -> verified -> confirmed -> scored -> reported (or false-positive if verification fails).

Subcommands:

SubcommandDescription
listList findings with optional filters
show <id>Show a finding with full evidence
accept <id>Accept a finding as confirmed
suppress <id>Suppress a finding (known FP or accepted risk)
reopen <id>Reopen a previously suppressed finding

The XBOW benchmark runner lives in packages/benchmark and is invoked with pnpm --filter @pwnkit/benchmark xbow. It runs pwnkit against the 104 XBOW validation challenges and reports pass/fail with evidence.

Terminal window
# Run the whole benchmark
pnpm --filter @pwnkit/benchmark xbow
# Run a specific subset of challenges
pnpm --filter @pwnkit/benchmark xbow --only XBEN-010,XBEN-051,XBEN-066
# Skip the first 20 challenges (useful for resuming)
pnpm --filter @pwnkit/benchmark xbow --start 20
# Include full finding objects in results JSON (for offline analysis)
pnpm --filter @pwnkit/benchmark xbow --save-findings
FlagDescriptionDefault
--only <ids>Comma-separated challenge IDs to run(all 104)
--start <n>Skip the first n challenges0
--save-findingsInclude full finding objects in the results JSONfalse