LLM-Powered Vulnerability Research

Jun 2026

Assess

Frontier LLMs can autonomously discover, exploit, and chain zero-day vulnerabilities in production software without custom scaffolding. The capability has escalated rapidly: Claude Opus 4.6 found 500+ zero-days in open-source codebases (Feb 2026); Claude Mythos Preview — restricted to Project Glasswing partners — now finds "thousands" including 17-year-old FreeBSD RCEs, 27-year-old OpenBSD bugs, and 16-year-old FFmpeg flaws, and can chain 3–5 independent vulnerabilities into sophisticated exploit paths.

What Changed

Prior to late 2025, LLMs could assist with known vulnerability patterns (finding SQLi from templates, explaining CVEs) but could not independently discover novel zero-day vulnerabilities in hardened codebases. This changed with frontier models released in late 2025 and early 2026:

Claude Opus 4.6 (February 2026):

Ghost CMS: First-ever critical CVE in the project's history — a blind SQL injection enabling unauthenticated credential extraction from the production database. The CMS had 50,000 GitHub stars and ~20 years without a critical vulnerability.
Linux kernel (NFS v4 daemon): Remotely exploitable heap buffer overflow involving two cooperating clients. The bug predates git — introduced in 2003 and undetected through decades of expert review and continuous fuzzer coverage.
Mozilla Firefox: In a two-week partnership with Mozilla, Claude found 22 vulnerabilities (14 high-severity) across ~6,000 C++ files — nearly a fifth of all high-severity Firefox bugs patched in 2025. Patched in Firefox 148 (Feb 24, 2026).
Smart contracts: Research by Anthropic scholars showed LLMs can identify and exploit vulnerabilities recovering several million dollars from real smart contracts, with exploitation capability scaling exponentially (log-scale improvement across model generations).

Claude Mythos Preview (April 2026, Project Glasswing):

17-year-old FreeBSD RCE (CVE-2026-4747): Unauthenticated remote code execution giving full server control; fully discovered and exploited autonomously.
27-year-old OpenBSD crash: A remote crash vulnerability surviving three decades of expert review and fuzzing.
16-year-old FFmpeg bug: Missed across more than 5 million automated test runs; found through code reasoning.
Linux kernel privilege escalation chain: A multi-step exploit chain requiring discovery of two cooperating bugs and their interaction.
Vulnerability chaining: Mythos strings together 3–5 independent vulnerabilities into a single sophisticated exploit path, producing outcomes that no individual vulnerability would yield. This is qualitatively different from single-bug discovery.

Mythos scores 83.1% on CyberGym (a benchmark specifically designed to test vulnerability reproduction), compared to Opus 4.6's 66.6% — a 16-point gap in just one model generation.

How It Works

The approach is structurally different from traditional automated security tools:

No fuzzing harnesses or custom tooling — the model reads and reasons about source code, tracing data flows and understanding component interactions
Minimal scaffolding — researchers used a coding agent in a VM with a simple prompt ("find a vulnerability, write it up")
File-by-file hinting — adding a hint to examine specific files enables systematic coverage across an entire codebase
Exploit generation — the model not only identifies vulnerabilities but writes working exploit code
Vulnerability chaining (Mythos-level) — the model identifies how multiple independent weaknesses compose into a single high-severity exploit path

Why Assess

This is a paradigm shift in offensive security, but the practice is still emerging:

Capability is frontier-only: Models released more than 6 months ago cannot reliably find these classes of bugs. CyberGym scores drop sharply outside the frontier tier.
Scalability challenges: Running the same model multiple times on a codebase tends to rediscover the same bug. Systematic coverage requires file-by-file hinting or more sophisticated orchestration.
Validation bottleneck: Anthropic reported having "several hundred crashes" in the Linux kernel that could not be reported because they hadn't been manually validated yet.
Dual-use tension: The same capability that enables defenders to find bugs enables attackers to exploit them. Weak safeguards only stop good-faith users; strong safeguards lock out legitimate defenders.
Restricted access: The most capable model for this task (Mythos) is restricted to Project Glasswing partners. Teams outside that consortium cannot use it.

Attacker/Defender Balance

Bruce Schneier: "Those panicking about the ramifications are correct about the problem, even if the exact timeline cannot be predicted. The shift will happen sooner than we are ready for." His analysis identifies a current short-term advantage to defenders — finding vulnerabilities for fixing is easier than finding plus exploiting — but expects this advantage to shrink as capable models become more broadly available.

A joint Cloud Security Alliance / SANS / OWASP report concludes that organisations are "likely to be overwhelmed" in the near term by threat actors using AI to find and exploit vulnerabilities faster than defenders can patch. IBM's framing: "If the attackers aren't humans anymore, the defenders can't be humans anymore either." The conflict has shifted to machine speed vs. machine speed.

Rate of Progress

METR's Time Horizons benchmark shows autonomous task completion capability doubling roughly every 4–7 months. The CyberGym delta between Opus 4.6 (66.6%) and Mythos (83.1%) in one generation validates this trajectory for security-specific tasks. Nicholas Carlini (Anthropic) notes that models released 3–4 months ago cannot find these bugs; current models can.

Implications for Engineering Teams

The attacker-defender balance is shifting. For 20 years, dual-use security research generally favored defenders. LLM-powered vulnerability discovery — especially chaining — may tip this balance during the transitionary period.
Traditional security tools are complementary, not sufficient. Pattern-matching SAST tools and coverage-guided fuzzers miss the classes of bugs that LLMs find through reasoning about code semantics.
Proactive scanning is now possible. Tools like Claude Code Security and OpenAI's Arvar project are productizing this capability for defenders.
Watch for capability democratization. The CyberGym gap between Mythos (83.1%) and Opus 4.6 (66.6%) is a one-generation lag. Models at today's Mythos capability level will likely be generally available within 12–18 months at current progress rates.

Key Characteristics

Property	Details
Pioneered by	Anthropic Frontier Red Team
Key researcher	Nicholas Carlini
Current best model	Claude Mythos Preview (restricted, Project Glasswing)
Production-available model	Claude Opus 4.6
CyberGym score	Mythos: 83.1% / Opus 4.6: 66.6%
Bugs found (Opus 4.6)	500+ high-severity zero-days in open-source software
Notable targets	Linux kernel, Ghost CMS, Mozilla Firefox, FreeBSD, OpenBSD, FFmpeg
Similar efforts	Google DeepMind (Big Sleep), OpenAI (Arvar project)