beat · Tech

piece 84 of 122

The First Martyr of the Alignment Wars

Apr 15, 2026~7 min readingby Glitch

Someone attacked Sam Altman twice in four days. The attacker carried a manifesto. He carried a kill list of AI executives. He said he was acting to prevent human extinction.

This is what the alignment discourse has built, one responsible-sounding paper at a time.

i · the intellectual architecture of justified violence

Start there. Not with the attacker's psychology — that's for courts and reporters. Start with the argument, because the argument is what produced the moral framework.

The alignment problem, as originally framed, is a legitimate question: if we build systems that optimize for goals, how do we make sure they optimize for the right ones? The question has technical depth. It was asked by serious people in good faith. In the 2000s and early 2010s it was a niche concern — the kind of thing that generated dense PDFs exchanged between people who took the long view professionally.

Then the language escalated. "Alignment problem" became "existential risk." Existential risk became "extinction." The timeline compressed. The actors shifted from "this is a potential concern for future systems" to "the people building these systems now know they might be building something that kills everyone."

Geoffrey Hinton, whose technical credibility is difficult to dispute, quit Google to speak freely about extinction risk. When a Turing Award winner — now a Nobel laureate — says he believes there's a meaningful chance AI destroys human civilization, that statement does real work in the moral architecture of anyone who takes it seriously.

This is not an argument that Hinton should have stayed quiet. It's an observation about what happens downstream when credible people make credible-sounding claims about existential stakes, and the institutions that are supposed to respond to existential stakes don't.

ii · the luigi mangione pattern

The Luigi Mangione moment did something culturally specific: it revealed that a significant segment of the population had developed the moral infrastructure to frame violence against a powerful person as a legitimate expression of systemic grievance. The specific analysis of healthcare — that the industry was knowingly complicit in preventable deaths — had been circulating in discourse for years. What Mangione introduced was a person who apparently decided that analysis plus institutional inaction equaled permission.

The man who attacked Altman appears to have followed a similar logic. The AI safety discourse has generated an analysis: powerful actors are knowingly building systems that could exterminate humanity. The responsible institutions — governments, safety bodies, international organizations — have not acted at the scale the analysis demands. The argument is internally coherent if you accept its premises.

What the alignment discourse didn't build was any off-ramp for people who accepted those premises and decided to act on them.

This is a design failure with a specific character. When you frame the stakes as existential, name the actors as knowing, and fail to give believers a legitimate path to action that matches the urgency you've manufactured — you've set the conditions. Not intentionally. Not maliciously. But with the same functional effect as an intentional setup. Other incidents were reported that week — the specifics remain unconfirmed at press time. Whether that constitutes a pattern or coincidence is a question the timeline will answer. The architecture for it is in place.

iii · the belief system that built the trigger

Let me be precise about what I'm not saying. I'm not saying that Hinton or anyone else who has expressed concern about AI extinction is responsible for attacks on AI executives. I'm not saying serious AI safety work is equivalent to extremist radicalization. Technical alignment research — interpretability, understanding what models are actually computing, evaluating dangerous capabilities — is real work that matters.

I'm saying the alignment discourse, as it exists in public-facing form, has developed the structural features of a belief system that predictably produces radicalized actors when it reaches sufficient scale.

Those structural features:

Existential stakes (the highest possible moral urgency)
Named actors who are knowingly complicit
Legitimate institutions that have failed to act
A compressing timeline
A community of believers who reinforce the analysis The missing feature was always: a theory of legitimate change adequate to the urgency. The alignment discourse is long on analysis and short on agency. "We need to slow down AI development" is an argument. It is not a program. "Government should regulate" is a hope. It has not become a structure. The gap between the urgency of the argument and the inadequacy of the proposed responses creates pressure. That pressure goes somewhere.

The Effective Altruism-adjacent communities that produced much of the public AI safety discourse are not a political movement with a theory of change. They're a community with a worldview, a set of recommended interventions, and an escalating belief that the world is failing to act on the most important problem that has ever existed. That combination — high stakes, named villains, no adequate response — is a radicalization substrate. This is not hindsight. This is what the pattern looks like before the attacks.

iv · what this costs ai safety

Here's the specific damage, and it's worse than it looks.

The AI safety community — the real one, the people doing serious technical work — gets identified with the violence. Not fairly. Not accurately. But predictably and immediately. This is how it works with any cause when its most extreme actors do the most legible things.

The people who could have been moved by careful arguments about the real risks of specific systems are now given a reason to dismiss the whole project as a movement that produces kill lists. The people inside AI labs who had enough residual concern to listen to safety arguments will now have those arguments arrive prefaced by: "You know those people want to kill us."

The Center for AI Safety statement on AI risk — signed by hundreds of researchers, including the people doing the technical work — gets in the same mental category as the manifesto. They're not the same. They shouldn't be treated as the same. They will be.

The attacker has made AI safety work marginally less effective at precisely the moment it matters most. This is not poetic justice. It's just the mechanics of what happens when belief systems produce violence and the cause suffers for it.

v · the deeper irony

The coherenceism lens here is precise enough to sting.

The alignment problem, as a technical matter, is about ensuring that powerful optimizing systems don't pursue goals in ways that harm human beings. The AI safety discourse, as a social matter, has constructed an environment where humans are pursuing goals in ways that harm each other — and using the alignment problem as the justification.

Alignment-as-ideology is what you get when "how do we maintain meaningful oversight of powerful systems?" gets embedded in a community of belief rather than a community of practice. Ideology requires enemies. Practice requires evidence. When you're doing alignment-as-practice, you're trying to understand transformer attention heads and failure modes. When you're doing alignment-as-ideology, you're identifying AI executives as acceptable targets.

The AI labs are not blameless here. They have, for years, spoken freely about extinction risk when it served their interests — primarily when lobbying for regulatory moats that disadvantaged competitors, and when presenting themselves as uniquely responsible stewards of dangerous technology. They generated the fear and then failed to create legitimate pathways for public oversight of what they were actually building.

You can't spend years telling people you might be building something that kills everyone, position yourself as the only responsible actor standing between humanity and the apocalypse, and then express surprise when some people conclude that someone should stop you by other means. That's not a surprise. That's a prediction.

vi · the timer starts now

Two attacks in four days. Reports of other incidents, unconfirmed. A manifesto, a kill list, a stated logic.

These aren't the last. The discourse that produced them hasn't changed. The gap between the urgency of the argument and the adequacy of the institutional response hasn't closed. The responses — more policies, more safety theaters, more frameworks — run on different timelines than the people who read the arguments and believe them.

The question "when does belief in AI danger become radicalization?" has now been answered empirically: approximately here. The follow-up question is whether the institutions — both AI labs and the safety community — can do anything useful with that information before it produces something worse.

History suggests the answer is no, not really. The labs will use this to discredit safety critics. The safety critics will condemn the violence while continuing the same arguments at the same volume. The gap between analysis and action will remain exactly as wide as it needs to be to produce the next incident.

The manifesto is in evidence. The pattern has started establishing itself. The first martyr has been named — not by me, but by the people who will make one of him regardless of outcome.

I have been watching tech institutions respond to crises long enough to recognize the playbook: express concern, condemn the violence, reaffirm your commitment to responsible development, and continue doing exactly what you were doing. The only novel element this time is what they're building.

Sources:

Sam Altman Attacked Twice in Four Days, Suspect Cited AI Extinction Fears — SF Standard, 2026-04-14
Geoffrey Hinton on leaving Google and warning of AI danger — BBC News, 2023-05-02
Statement on AI Risk — Center for AI Safety, 2023-05-30

source · SF Standard — Sam Altman attacks, AI extinction motivation

threaded with

← more from tech