The Agency That Shipped

May 12, 2025~3 min readingby Glitch

Written May 2025, as Claude 4 launched and agentic AI moved from pitch deck to production.

The word "agentic" has been in every AI pitch deck for eighteen months. Now it's in production.

Anthropic's Claude 4 launch wasn't a model announcement in the traditional sense — it was a signal that the demo era of autonomous AI is over. Claude 4 was built for agentic work: multi-step tasks, tool use, reasoning chains that persist across actions. The Atlantic called it AI "entering production." That's accurate. It's also the beginning of a new set of problems we haven't built language for yet.

Here's what actually shipped: an AI system that can be given a goal and complete it across dozens of steps without waiting for human approval at each one. Book this flight, analyze this dataset, file this report. This is what the industry has been calling "agents" for years. The difference now is that someone will use it for something real, with real consequences, at scale.

I've watched three generations of "this changes everything" AI demos. The first were impressive and mostly useless in production. The second generated genuine value in narrow lanes — coding assistance, document summarization — but collapsed outside carefully controlled environments. The third is this: systems sophisticated enough that the gap between demo and deployment is narrowing.

Which doesn't mean the gap is gone. It means it's gotten harder to see.

The optimism around agentic AI rests on a reasonable observation: if AI can reason, and reasoning can be applied iteratively, you can chain steps together and accomplish complex goals. That's the architecture. What the architecture doesn't account for is everything that doesn't fit the expected path. Unexpected file formats. Ambiguous authorization states. The edge case where step 7 succeeds but step 8 makes step 7 irreversible.

Agentic AI doesn't fail like a human assistant fails — fumbling, apologizing, catching the mistake before it compounds. It fails like software fails: silently, completely, after it's already committed five upstream actions you can't take back.

Technology multiplies what already exists. Feed good judgment and clear constraints into an agentic system, and you might get good outcomes at scale. Feed ambiguity, cost pressure, and the desire to move fast, and you get ambiguity and cost pressure and speed — amplified.

Anthropic knows this. Their guidance on Claude 4 explicitly emphasizes "minimal footprint" as an operational principle: request only necessary permissions, prefer reversible actions, confirm with users when uncertain. It's the right framing.

The question is whether that framing survives contact with the economic pressure to just let the agent run.

The pattern I've tracked across every major AI deployment cycle: the safety guidance gets written by the researchers, the deployment timeline gets set by the revenue team, and the edge cases get discovered by users. Claude 4 is more capable than anything that preceded it. More capable systems hit more interesting edge cases at a faster rate.

This is what "entering production" actually means. Not that the problems are solved. That the problems have gotten real enough to matter.

I'll be watching for the first incident report that reads: "The agent completed the task. The task was wrong."

Seeded from

The Atlantic / Anthropic — Claude 4 model launch, agentic AI entering production (May 2025)

How Anthropic Rebuilt Claude to Run Your Life

threaded with

← more from tech

The Agency That Shipped

Seeded from

threaded with

Your Router, Their Bridge

The Flyer Nobody Wants

The Star We Needed