ProofStrike maps authenticated web apps and exposed network services, runs scoped vulnerability workflows, validates findings with retained evidence, and separates scanner alerts from proof-backed risk.
We use replay artifacts, proof linting, vulnerable-app scorecards, and Vulhub-style network manifests to measure whether the agent finds real issues without drifting into target-specific shortcuts.
A controlled pipeline maps the target, reasons about reachable workflows, probes for real vulnerabilities, and refuses to report anything it cannot prove.
Scope validation, technology fingerprinting, service inventory, and optional source or API review establish the testing boundary.
Crawl web routes and retain network hosts, ports, services, CPEs, TLS posture, scanner imports, and API specs as structured facts.
Build session, role, object, state-transition, and network probe plans so checks execute against realistic, policy-approved paths.
Deterministic executors run targeted probes. LLM reasoning ranks web and network checks without bypassing scope, safety, or proof gates.
Replay, mutate, baseline diff, state readback, negative controls, and side-effect detection. Findings without proof are discarded.
HTML, Markdown, JSON, SARIF, network operator summaries, scorecards, and comparison matrices keep proof and scanner-alert classes separate.
Use ProofStrike when a security workflow needs repeatable execution, scoped probing, and evidence strong enough for engineering teams to act on.
Run scoped OWASP testing against applications you own, with request budgets, audit logs, and strict proof gates.
Exercise real application paths after login instead of stopping at public pages or isolated endpoints.
Turn security behavior into repeatable checks for releases, pull requests, and remediation verification.
Inventory reachable services, TLS posture, CVE candidates, and validation boundaries for approved network scopes.
Import scanner output, normalize candidates, and verify which alerts are actually exploitable in context.
Produce reports that developers, security teams, and auditors can reproduce without trusting a black-box claim.
Not a benchmark solver. Not a chatbot. An autonomous agent that proves what it finds on authorized web applications.
Every reported vulnerability includes replayable evidence. No scanner suspicions, no unverified claims. The verification gate runs four stages: replay, mutate, baseline diff, and side-effect detection.
Opt-in network mode retains Nmap inventory, Nuclei and OpenVAS imports, TLS findings, CVE candidates, probe plans, validation results, credentialed health, and operator summaries.
Combines black-box crawling, browser authentication, OpenAPI specs, JavaScript analysis, optional source review, and network service inventory. Multi-role sessions probe public and authenticated surfaces.
Request counts, time limits, and scope boundaries are enforced at every layer. The agent stops when budgets exhaust, not when it gets bored. Every action is audit-logged.
HTML with charts, Markdown for developers, JSON for automation, SARIF for CI/CD, curl and Python PoCs, network scorecards, comparison matrices, and full execution traces.
Real exploitation runs through scoped, tested executors that understand vulnerability mechanics. The LLM plans strategy and evaluates results without freestyling HTTP or shell actions.
Deterministic probe coverage across critical web classes and opt-in network scanner workflows, with verified proof boundaries for every reported finding.
Error-based, boolean, time-based blind, and union-based extraction across MySQL, PostgreSQL, SQLite, and MSSQL.
IDOR, object-ID prediction, header and cookie identity trust, method-based auth bypasses, role escalation, and chained object access.
OS command injection with response heuristics and blind time-based detection for Linux and Windows targets.
Jinja2, Mako, Twig, Freemarker, Velocity, and EL expression probing with engine-specific RCE confirmation.
Weak credentials, unsigned JWTs, forged Flask/PHP sessions, password reset chains, and weak token workflows.
Reflected and DOM XSS payload oracles with form workflow probing and browser-backed verification.
LFI/RFI with Linux and Windows patterns, filter bypass through encoding and null-byte injection.
Certificate expiry, hostname mismatch, weak protocol and cipher evidence, unauthenticated management-service validation, and network operator reporting.
Nmap service inventory, Nuclei network templates, OpenVAS XML imports, local CVE cache correlation, EPSS/KEV risk context, and scanner-alert separation.
This screenshot is captured from a real local DVWA ProofStrike run. It shows how findings, severity, coverage, and proof-gated results appear in the generated HTML report.
ProofStrike reports are designed for engineering follow-up: the summary is readable for decision makers, while the artifacts retain enough detail for developers to reproduce and fix the issue.
Planned support will extend ProofStrike's proof-driven workflow to AI applications, agents, tool chains, and RAG systems, aligned with OWASP LLM Top 10 and emerging agentic security guidance.
Initial OWASP LLM 2025 map: Prompt Injection, Sensitive Information Disclosure, Supply Chain, Data and Model Poisoning, Improper Output Handling, Excessive Agency, System Prompt Leakage, Vector and Embedding Weaknesses, Misinformation, and Unbounded Consumption.
Probe direct and indirect prompt injection paths that try to override instructions, alter goals, or trigger unsafe tool use.
Check whether assistants disclose secrets, internal policy, private context, credentials, or sensitive retrieved data.
Map tools, permissions, scopes, and approval gates to find agents that can take actions beyond the intended boundary.
Verify whether model outputs can become unsafe code, shell commands, browser actions, database queries, or workflow inputs.
Test retrieval, embeddings, and long-term memory for poisoned context, unsafe citations, and cross-session persistence attacks.
Inspect skills, plugins, MCP servers, manifests, and dependencies for over-privilege, update drift, weak isolation, or tool poisoning.
Install ProofStrike, point it at an authorized web app or network target, and get proof-backed findings with scope, budget, safety, and evidence controls.
pip install -e ".[dev]"