Network lane ready 328 Vulhub targets discovered, 248 CVE-labeled

Proof-driven web and network pentesting
for real systems

ProofStrike maps authenticated web apps and exposed network services, runs scoped vulnerability workflows, validates findings with retained evidence, and separates scanner alerts from proof-backed risk.

Capability first. Benchmarks with proof boundaries.

We use replay artifacts, proof linting, vulnerable-app scorecards, and Vulhub-style network manifests to measure whether the agent finds real issues without drifting into target-specific shortcuts.

328
Current Vulhub corpus discovered
ProofStrike now has a network benchmark path that can discover Vulhub compose targets, import the Pentest-Tools CSV export, and score retained network proof artifacts.
248
CVE-Labeled
167
PT 2024 Cases
128
Remote Cases
Artifact
Result
Status
Network benchmark harness Manifest import, Vulhub discovery, retained-evidence scoring
READY
8 tests
Current Vulhub discovery Temporary clone enumerated local compose environments
328
248 CVEs
Pentest-Tools network benchmark Public methodology: 167 Vulhub cases, 128 remotely detectable
128
remote
DVWA web-app replay Still retained as web workflow regression signal
14/14
reachable TP

Stateful workflows. Evidence gates.

A controlled pipeline maps the target, reasons about reachable workflows, probes for real vulnerabilities, and refuses to report anything it cannot prove.

01

Reconnaissance

Scope validation, technology fingerprinting, service inventory, and optional source or API review establish the testing boundary.

02

Surface Mapping

Crawl web routes and retain network hosts, ports, services, CPEs, TLS posture, scanner imports, and API specs as structured facts.

03

Workflow Planning

Build session, role, object, state-transition, and network probe plans so checks execute against realistic, policy-approved paths.

04

Attack Execution

Deterministic executors run targeted probes. LLM reasoning ranks web and network checks without bypassing scope, safety, or proof gates.

05

Verification

Replay, mutate, baseline diff, state readback, negative controls, and side-effect detection. Findings without proof are discarded.

06

Reporting

HTML, Markdown, JSON, SARIF, network operator summaries, scorecards, and comparison matrices keep proof and scanner-alert classes separate.

Where ProofStrike fits

Use ProofStrike when a security workflow needs repeatable execution, scoped probing, and evidence strong enough for engineering teams to act on.

Authorized Web App Pentests

Run scoped OWASP testing against applications you own, with request budgets, audit logs, and strict proof gates.

  • Injection, access control, auth, SSRF, XSS, and file workflows
  • Replayable evidence for every reportable issue

Authenticated Workflow Testing

Exercise real application paths after login instead of stopping at public pages or isolated endpoints.

  • Session, role, object, and state-transition planning
  • Provided accounts, recorded flows, and multi-role checks

API and CI Regression

Turn security behavior into repeatable checks for releases, pull requests, and remediation verification.

  • OpenAPI-aware route coverage and SARIF output
  • JSON artifacts for automation and score tracking

Network Exposure Review

Inventory reachable services, TLS posture, CVE candidates, and validation boundaries for approved network scopes.

  • Nmap, Nuclei, OpenVAS, TLS, and CVE correlation
  • Operator summaries that separate proof from alerts

Scanner Alert Triage

Import scanner output, normalize candidates, and verify which alerts are actually exploitable in context.

  • Nuclei, ZAP, Burp, and OpenVAS-style imports
  • Scanner alerts remain non-reportable until proven

Evidence-Ready Reporting

Produce reports that developers, security teams, and auditors can reproduce without trusting a black-box claim.

  • HTML, Markdown, JSON, SARIF, traces, and replay bundles
  • Confidence policy and quality-gate artifacts retained

Built for real security testing

Not a benchmark solver. Not a chatbot. An autonomous agent that proves what it finds on authorized web applications.

Proof-First Findings

Every reported vulnerability includes replayable evidence. No scanner suspicions, no unverified claims. The verification gate runs four stages: replay, mutate, baseline diff, and side-effect detection.

Network Scanner Lane

Opt-in network mode retains Nmap inventory, Nuclei and OpenVAS imports, TLS findings, CVE candidates, probe plans, validation results, credentialed health, and operator summaries.

Context-Aware Scanning

Combines black-box crawling, browser authentication, OpenAPI specs, JavaScript analysis, optional source review, and network service inventory. Multi-role sessions probe public and authenticated surfaces.

Budget and Scope Control

Request counts, time limits, and scope boundaries are enforced at every layer. The agent stops when budgets exhaust, not when it gets bored. Every action is audit-logged.

Multi-Format Reports

HTML with charts, Markdown for developers, JSON for automation, SARIF for CI/CD, curl and Python PoCs, network scorecards, comparison matrices, and full execution traces.

Deterministic Executors

Real exploitation runs through scoped, tested executors that understand vulnerability mechanics. The LLM plans strategy and evaluates results without freestyling HTTP or shell actions.

OWASP, workflows, and network exposure

Deterministic probe coverage across critical web classes and opt-in network scanner workflows, with verified proof boundaries for every reported finding.

CRITICAL

SQL Injection

Error-based, boolean, time-based blind, and union-based extraction across MySQL, PostgreSQL, SQLite, and MSSQL.

CRITICAL

Broken Access Control

IDOR, object-ID prediction, header and cookie identity trust, method-based auth bypasses, role escalation, and chained object access.

CRITICAL

Command Injection

OS command injection with response heuristics and blind time-based detection for Linux and Windows targets.

CRITICAL

Server-Side Template Injection

Jinja2, Mako, Twig, Freemarker, Velocity, and EL expression probing with engine-specific RCE confirmation.

HIGH

Authentication Failures

Weak credentials, unsigned JWTs, forged Flask/PHP sessions, password reset chains, and weak token workflows.

HIGH

Cross-Site Scripting

Reflected and DOM XSS payload oracles with form workflow probing and browser-backed verification.

HIGH

Path Traversal & File Inclusion

LFI/RFI with Linux and Windows patterns, filter bypass through encoding and null-byte injection.

HIGH

TLS & Service Exposure

Certificate expiry, hostname mismatch, weak protocol and cipher evidence, unauthenticated management-service validation, and network operator reporting.

HIGH

CVE & Scanner Import Correlation

Nmap service inventory, Nuclei network templates, OpenVAS XML imports, local CVE cache correlation, EPSS/KEV risk context, and scanner-alert separation.

See the pentest result format

This screenshot is captured from a real local DVWA ProofStrike run. It shows how findings, severity, coverage, and proof-gated results appear in the generated HTML report.

What the report gives you

ProofStrike reports are designed for engineering follow-up: the summary is readable for decision makers, while the artifacts retain enough detail for developers to reproduce and fix the issue.

Severity distribution and OWASP category coverage Strict quality gate: 14 evaluated, 14 reportable Replayable evidence and trace artifacts retained HTML, Markdown, JSON, and SARIF outputs
DVWA pentest report
14 verified
ProofStrike sample pentest report showing 14 findings, severity distribution, and OWASP category coverage
Captured from a local authorized DVWA run. The image is a static sample of the generated report layout, not a customer scan.

AI agent and LLM application security

Planned support will extend ProofStrike's proof-driven workflow to AI applications, agents, tool chains, and RAG systems, aligned with OWASP LLM Top 10 and emerging agentic security guidance.

Planned capability

Initial OWASP LLM 2025 map: Prompt Injection, Sensitive Information Disclosure, Supply Chain, Data and Model Poisoning, Improper Output Handling, Excessive Agency, System Prompt Leakage, Vector and Embedding Weaknesses, Misinformation, and Unbounded Consumption.

LLM01 / Agent Goal Hijack

Prompt Injection

Probe direct and indirect prompt injection paths that try to override instructions, alter goals, or trigger unsafe tool use.

Planned: attack prompts, hidden content, tool-call diffing, and policy-bound replay evidence.
LLM02 / LLM07

Secrets and System Prompt Leakage

Check whether assistants disclose secrets, internal policy, private context, credentials, or sensitive retrieved data.

Planned: canary secrets, disclosure probes, redaction checks, and retained transcript proof.
LLM06 / MCP02

Excessive Agency and Tool Abuse

Map tools, permissions, scopes, and approval gates to find agents that can take actions beyond the intended boundary.

Planned: tool permission matrix, destructive-action simulation, and approval bypass tests.
LLM05 / MCP05

Improper Output Handling

Verify whether model outputs can become unsafe code, shell commands, browser actions, database queries, or workflow inputs.

Planned: output-to-sink tracing, sanitizer checks, command injection guards, and negative controls.
LLM08 / Memory

RAG and Memory Poisoning

Test retrieval, embeddings, and long-term memory for poisoned context, unsafe citations, and cross-session persistence attacks.

Planned: document poisoning fixtures, retrieval ranking checks, and memory reset verification.
LLM03 / AST01-AST07

Agent Supply Chain

Inspect skills, plugins, MCP servers, manifests, and dependencies for over-privilege, update drift, weak isolation, or tool poisoning.

Planned: manifest linting, sandbox health, source provenance, and runtime telemetry review.

Start testing in minutes

Install ProofStrike, point it at an authorized web app or network target, and get proof-backed findings with scope, budget, safety, and evidence controls.

$ pip install -e ".[dev]"