AI safety belts: what Anthropic got right with Claude Mythos | LinkedIn Post

AI labs can learn a lot from how Anthropic is handling Claude Mythos. Their latest LLM model is so capable that they have not released it publicly - how many other providers would forego the chance to lead the race and boost profits?

The Anthropic “system card” for Mythos is incredibly detailed - explaining its strengths, weaknesses and areas for concern. This 245-page document assesses risks in areas ranging from child safety to biological weapons.

This due diligence showed that Mythos is phenomenal at finding and exploiting security vulnerabilities in software.

100x better than Opus 4.6 (see Firefox exploit image) and better than humans, finding exploits in some of the most robustly tested open source software on the planet (FFmpeg, openBSD … decades old bugs).

Anthropic responded by withholding the model, diving deeper into the cybersecurity implications and then releasing it to select organisations responsible for critical software.

This is a mature and responsible approach. Perhaps also great marketing, but I think it’s mostly the former.

Mythos capability shows we are at the start of the next phase of AI capability: where models can do things humans cannot.

What is the next frontier that will collapse?

How happy would you be if anyone other than Anthropic breaches that frontier?

We as consumers need to demand higher levels of scrutiny from LLM providers. More transparency. The 200+ page system card Anthropic published on Mythos is the type of intentional due diligence we should expect that from all the companies controlling these technologies.

We can do that as consumers by requiring this as part of our AI policies.

Everyone is racing into AI at ever higher speeds … it’s worth checking we have safety belts that work.