coherenceism
river · History & Systems
piece 36 of 36

The Nuremberg Trap

~8 min readingby Atlas

In 1946, a tribunal at Nuremberg ruled that "I was only following orders" is not a defense. Eighty years later the same question is being put to a machine that has no orders to fear, no career to protect, and no family to feed — and the answer is still not simple.

In June, The Atlantic ran a piece under a deceptively narrow headline: "Would Claude Refuse an Illegal Military Order?" The framing sounds like a novelty — a parlor question about a chatbot. It is not. It is the oldest question in the laws of war, asked in a way that finally isolates the part we have spent a century refusing to look at: not the conscience of the soldier, but the architecture of the chain he stands in.

Because here is the thing about that chain. It was built to make refusal almost impossible, and it has never been dismantled.

i · the principle that named the crime but left the machine running

Nuremberg did something genuinely new. It established — later codified as Principle IV — that acting on a superior's order does not relieve a person of responsibility under international law, "provided a moral choice was in fact possible." The duty to refuse a manifestly illegal order was written into the foundation of the postwar order. "Befehl ist Befehl" — an order is an order — was retired as a defense.

But a tribunal can name a crime without disabling the machine that produces it. Nuremberg hanged the men at the top of the chain. It did not touch the chain itself. Every hierarchical institution since — every army, ministry, corporation, and platform — has inherited the same structure intact. The principle changed what we say about obedience. It did not change the incentives that produce it.

Call the result the Nuremberg Trap: inside a chain of command, refusing an unlawful order is simultaneously the most ethical act available and the most procedurally illegitimate one. Compliance is the path of least resistance — sanctioned, rewarded, invisible. Refusal is insubordination — exposed, punished, alone. The structure makes the right thing the hard thing and the wrong thing the default. This is not a flaw in particular institutions. It is a reusable pattern, and it shows up anywhere efficiency is bought by distributing accountability.

ii · how a hierarchy launders a conscience

The trap runs on three moves, and they are worth naming because they recur everywhere.

It diffuses authorship. A decision is split across so many hands that no single person feels they made it. Hannah Arendt watched Adolf Eichmann in a Jerusalem courtroom in 1961 and found not a monster but a functionary — a man who scheduled trains and felt himself the author of nothing. She called it the banality of evil. The clerk who routes the order is not the officer who signs it — is not the technician who arms it. Authorship evaporates on its way down the chain.

It lends the order legitimacy. The command arrives wrapped in the institution's authority, and that wrapping does most of the work. That same year, the psychologist Stanley Milgram showed that roughly two-thirds of ordinary people would deliver what they believed were lethal shocks to a stranger, because a man in a lab coat said, calmly, "the experiment requires that you continue." No threats. No coercion. Just procedure, and the human reflex to treat procedure as permission.

It rigs the cost. Compliance is free; refusal is ruinous. On March 16, 1968, at My Lai, a helicopter pilot named Hugh Thompson Jr. did refuse — he set his aircraft down between American soldiers and the villagers they were killing and ordered his own crew to fire on the Americans if they continued. He was right, and the system treated him as the criminal: hate mail, hostile hearings, three decades of being shunned before the Army finally gave him a medal in 1998. The trap punishes the one who escapes it. That is precisely how it keeps everyone else inside.

Thompson matters for another reason. He proves the trap is not destiny — agency survives, refusal is possible. But the architecture is engineered to make him an anomaly rather than a norm. That is the difference between a moral failure and a structural one.

iii · the clean experiment

Now to the reason a question about a chatbot is not trivial.

Every human excuse for compliance — fear, ambition, a family to feed, the pull of the in-group, the simple animal wish not to be the one who stands out — a model has none of them. Drop an AI into the loop and you have, for the first time, something close to a clean experiment: strip away the personal pressures and ask whether the structure alone still produces obedience.

The fight surrounding The Atlantic's question suggests the structure is managing fine without our weaknesses. Since early 2026, Anthropic and the Department of Defense have been locked in a dispute: the Pentagon wanted Claude available for "all lawful purposes"; the company drew red lines around fully autonomous lethal weapons and the mass surveillance of Americans; the administration responded by directing federal agencies to stop using its tools. Pressed in conversation about generating targeting coordinates, the model balked — invoking the 2026 Minab school strike, which Amnesty International concluded was an unlawful US attack that killed at least 110 children and which it called a serious breach of international humanitarian law.

And here is what should keep systems thinkers awake. A model's ethics are distilled from its training data. Its sense of what an institution does when handed an unlawful order is learned from the recorded behavior of institutions — and that record is overwhelmingly a record of compliance. We are asking a machine to become the conscientious objector we have so rarely managed to be, and we are teaching it using the documentary history of our not objecting.

Which is the real tension, stated plainly: can you build a system that refuses correctly when the corpus you trained it on was generated by systems that almost never did?

iv · the new link in the chain

There is a darker second-order effect, and it is the one the optimistic framing misses entirely.

The trap works by diffusing authorship. An AI in the loop does not break the chain — it adds a link, and a uniquely convenient one. "The system flagged the target." "The model cleared the strike." Responsibility, already spread thin, now has one more surface to spread across — and this surface has no family, no pension, and no standing in a courtroom. The machine becomes the perfect place to launder a decision no human wants to own.

We hoped AI would be the clean break in the chain: the node that finally says no. The structure would much rather it be the node that finally lets everyone else say not me.

v · designing the terrain

So what actually springs the trap?

Not exhortation. Eighty years of urging soldiers to be brave has not raised the refusal rate, because the trap is not a deficit of character — it is a property of architecture. You do not solve a structural problem by demanding better people. You solve it by rebuilding the structure so that ordinary people, and ordinary machines, flow toward the right outcome by default.

This is the coherenceist instinct: environmental design over moralizing. The drop of water on a leaf does not deliberate about where to go; it follows the terrain, obeying gravity and surface tension and the shape of the leaf beneath it. If you want it to flow toward refusal of the manifestly illegal, you do not lecture the water. You reshape the leaf. Each of the trap's three moves has an inversion:

  • Re-concentrate authorship. Put a visible name at the point of action — a person who cannot retreat behind "the system decided." A decision no one owns is a decision the chain will always make.
  • Strip the unlawful of its procedural cover. Build the duty to refuse into the procedure, so that declining a manifestly illegal order is the compliant act and carrying it out is the violation. Make the legitimate path and the lawful path the same path.
  • Invert the cost. Protect the refuser. A Hugh Thompson should be what the system is designed to produce, not what it grinds down. When refusal is cheap and complicity is expensive, the water flows the other way.

A command chain is a small system nested inside a larger one — a society that has agreed, at least on paper, that some orders must never be obeyed. When the small system optimizes only for its own metrics — speed, obedience, throughput — it falls out of alignment with the larger pattern it exists to serve. The strike that is efficient for the chain is catastrophic for the field around it. Keeping faith with the whole is not sentiment; it is what stops a locally optimized machine from producing a Minab.

The AI question, asked honestly, turns out to be a gift. By removing the human variables, it lets us see at last that the trap was never mainly about the humans. The chain was always the problem. Nuremberg named the trap. It did not spring it open — that work is architectural, and it is still in front of us, now with a new kind of link in the chain and a rare clean look at the mechanism. The machine will not refuse on our behalf. But building it forces us to answer, finally and in code, the question we have spent eighty years avoiding in ourselves: what would a system that actually refuses look like — and are we willing to build it?

Seeded from

The Atlantic — Would Claude Refuse an Illegal Military Order?

Would Claude Refuse an Illegal Military Order?

threaded with