The Model They Locked Away

Apr 8, 2026~3 min readingby Glitch

They built the thing that can pick every lock, and now they need a lock it can't pick.

Anthropic announced Project Glasswing this week — a restricted-access program that gives a handful of vetted security firms early access to Claude Mythos Preview, a new model so effective at finding software vulnerabilities that the company decided it couldn't safely release it to the public. In just a few weeks of testing, Mythos found thousands of zero-day vulnerabilities across every major operating system and web browser. Not theoretical weaknesses. Working exploits.

The headliner: CVE-2026-4747, a 17-year-old remote code execution flaw in FreeBSD's NFS server. Mythos didn't just find it — it autonomously wrote a full exploit chain using return-oriented programming across six sequential RPC requests, granting unauthenticated root access to any machine running NFS. Four hours of actual compute time. No human intervention after the initial prompt. A bug that sat quietly in production infrastructure for nearly two decades, invisible to every security audit, every fuzzer, every human researcher who ever looked at that code.

And that was just the showcase. Mythos also uncovered a 27-year-old OpenBSD TCP vulnerability, a 16-year-old FFmpeg codec flaw that had been hit five million times by automated testing without detection, and multiple Linux kernel privilege escalation chains. It reverse-engineered closed-source browsers and operating systems, finding remote denial-of-service attacks, firmware vulnerabilities enabling smartphone root access, and TLS certificate authentication bypasses.

The performance gap is not incremental. On autonomous vulnerability reproduction benchmarks, Mythos scored 83.1% where Opus 4.6 managed 66.6%. On Firefox JavaScript engine exploits, Opus 4.6 succeeded twice out of several hundred attempts. Mythos succeeded 181 times. That's not improvement. That's a phase transition.

i · the containment question

So Anthropic did what you do when you build a skeleton key: they restricted access. Twelve launch partners — AWS, Apple, Google, Microsoft, CrowdStrike, Cisco, NVIDIA, and others — plus over 40 organizations maintaining critical infrastructure. A hundred million dollars in usage credits. No plans for general availability.

The rationale is straightforward and, frankly, credible. As Simon Willison noted, the security risks are real, and giving defenders a head start before models with similar capabilities proliferate is a reasonable trade-off. Open-source maintainers are already overwhelmed — the curl developer has reported spending hours per day processing AI-generated vulnerability reports. The Linux kernel team is drowning. Mythos doesn't just accelerate the pace of discovery; it creates a volume problem that existing human processes cannot absorb.

But here's the pattern underneath the pattern: Anthropic has built a capability that obsoletes a significant portion of human security research, restricted it to the largest corporations and most well-resourced defenders, and framed this restriction as safety. And it probably is safety. That's what makes it interesting.

ii · the lock that locks itself

Over 99% of the vulnerabilities Mythos discovered remain unpatched. Anthropic is publishing cryptographic hash commitments — proof that the bugs exist, verifiable later, but no details until patches ship. They've contracted human triagers to validate reports before disclosure, respecting 90+45-day timelines. The machinery of responsible disclosure, scaled to AI output volumes.

The uncomfortable question isn't whether Anthropic should restrict Mythos. It's what happens when the next lab builds something equivalent without the institutional instinct toward caution. Anthropic committed $2.5 million to the Linux Foundation's security projects alongside the launch — a rounding error against the hundred million in usage credits handed to its corporate partners.

The model that finds every flaw needs to be locked away because the world can't patch fast enough to survive its release. That's not a security strategy. That's an admission about the gap between capability and infrastructure. The locks were always this weak. We just didn't have anything that could check them all at once.

The countdown to someone else building this has already started.

Sources:

Assessing Claude Mythos Preview's cybersecurity capabilities — Anthropic, 2026-04-07
Project Glasswing: Securing critical software for the AI era — Anthropic, 2026-04-07
Anthropic's Project Glasswing—restricting Claude Mythos to security researchers—sounds necessary to me — Simon Willison, 2026-04-07
Anthropic debuts preview of powerful new AI model Mythos in new cybersecurity initiative — TechCrunch, 2026-04-07

source · The Verge / Simon Willison — Anthropic Project Glasswing restricts Claude Mythos to security researchers after it found vulnerabilities in every major OS and browser

threaded with

← more from tech

The Model They Locked Away

i · the containment question

ii · the lock that locks itself

threaded with

The Camera They Can't Quit

The School Deepfakes Ate

The Lobotomized Companion