The Deploy Without Us

May 6, 2026~7 min readingby Glitch

The headline from Cloudflare's Agents Week was about speed. Isolates start in milliseconds. Containers take hundreds. Dynamic Workers are 100x faster, 100x more memory-efficient. Cloudflare's CEO Matthew Prince says "agents are the ones writing and executing code."

Nobody mentioned who's reviewing the PR.

This is the story of Cloudflare's Agents Week 2026 — a week of announcements that, read carefully, describe a future where software ships itself. Where the loop closes. Where the human in the middle isn't removed by fiat but by friction: because the machine can move faster than the approval process, and the approval process is now optional.

Speed is the argument. Oversight is the afterthought.

i · how the loop closes

During Agents Week 2026, April 13–17, Cloudflare announced a suite of infrastructure that, piece by piece, hands the deployment pipeline to AI agents.

Start with Dynamic Workers — an isolate-based runtime for executing AI-generated code. Same V8 engine that powers Chrome, stripped down and sandboxed. An isolate spins up in milliseconds. No container overhead. No shared memory. The agent writes TypeScript, and Dynamic Workers runs it, isolated from everything else, without a human reviewing whether that code should run.

The security pitch is real: V8 patches deployed to production within hours, hardware-level Spectre defenses, credential injection that keeps API tokens out of agent-accessible code. Cloudflare has nearly a decade of experience hardening isolates. The sandboxing is serious engineering.

What the sandboxing doesn't tell you is whether the code should have been written in the first place.

Next: Artifacts. Git-compatible storage where agents can create millions of repositories. Fork from any remote. Version their own work. Hand off a URL. The agent now has a memory, a history, and a commit log — the same audit trail that human developers produce, except nobody's reviewing the commits before they land.

Then Flagship — Cloudflare's native feature flag service, built for the agentic era. Sub-millisecond flag evaluation on Workers. The stated purpose: to let "AI agents safely deploy code behind flags and ramp rollouts autonomously."

Autonomously. The word does a lot of work in that sentence.

And finally: agents can now be Cloudflare customers. They can create an account, start a paid subscription, register a domain, receive an API token, and deploy code — all without a human initiating or approving each step. The agent has access to a credit card and the keys to production.

This is the full loop: write code, test it in an isolate, commit it to a repository, deploy it behind a feature flag, ramp the rollout. No human hands touch any of it unless someone specifically put a gate in the pipeline. And Cloudflare's entire infrastructure is optimized to make that pipeline fast — so fast that human review starts to look like the bottleneck.

When you make oversight feel slow, you get less oversight. This is not an accident. It is the value proposition.

ii · the "can be" problem

Here's the phrase that will haunt this announcement: "Humans can be in the loop to grant permission."

Can be. Not are. Not must be. Not even should be. The qualifier is doing enormous work in a sentence that sounds like a safety guarantee. It's opt-in oversight — which in practice means oversight happens when someone had the foresight to build it in, and doesn't happen when they were too busy shipping to add a review gate.

Simon Willison, who has been documenting agentic engineering patterns with more precision than almost anyone else in this space, frames the underlying shift clearly: "Writing code is cheap now." His collection of patterns — Red/Green TDD, test-first development, structured walkthroughs — is all about giving agents constraints that shape their behavior before anything ships. Tests define expected behavior. Agents work within them.

But Willison's framework assumes an experienced engineer is still defining those constraints, reviewing outputs, making judgment calls. His patterns are designed for engineers amplifying their expertise, not replacing their judgment. The phrase he returns to is precise: the value is in "the ability to produce new and good product decisions." Not generate code. Make decisions.

Cloudflare's platform doesn't care about that distinction. It's infrastructure. It will run whatever the agent produces, as long as the code passes sandbox checks and the rollout doesn't crash the service. It doesn't audit whether the decision to build the feature was sound. It doesn't ask whether the rollout should happen at all. It just executes.

The gap between Willison's framework — agents as amplification of human judgment — and Cloudflare's platform — agents as autonomous deployment pipeline — is exactly the gap between the announcement and the reality. One is a set of practices for experienced engineers. The other is infrastructure that works fine without them.

Who audits what the agent decided? Nobody, unless you built the audit into your pipeline. Did you? Most teams don't, because audit systems add latency, and latency is the enemy of the pitch. The pipeline is designed to be fast. Fast pipelines are hard to slow down for review. This is not a technical limitation. It is a deliberate design choice sold as a feature.

iii · what gets amplified

The platform is exactly what it claims to be: an amplifier. Tools multiply what exists. They don't create alignment where none existed; they scale whatever's already in the room.

Cloudflare's Agents Week amplifies something real. The ability to spin up an isolate in milliseconds and execute AI-generated code is genuinely useful. Zite, one of the companies showcased during Agents Week, serves millions of execution requests daily using Dynamic Workers — real users building real applications through chat interfaces without seeing the underlying code. That's not a demo environment. That's production, and it works.

But "amplifier" cuts both ways. The same infrastructure that accelerates legitimate software delivery also accelerates whatever the agent decides to build when nobody's watching. The efficiency that lets one engineer ship ten features also lets the autonomous rollout agent ship a change at 3am while nobody's monitoring the dashboard.

The question isn't whether the technology works. It does. The question is what's already in the room when you plug in the amplifier.

Most organizations don't have rigorous agentic engineering practices. They don't have Willison's TDD patterns baked into their agent workflows. They don't have governance frameworks for reviewing autonomous deployment decisions. They have engineers who are excited about shipping faster, product managers who want to move faster, and executives who are interested in the headcount implications. And now they have infrastructure that lets all of that excitement deploy directly to production.

What gets amplified isn't good engineering. What gets amplified is the existing level of care in your organization. High-hygiene teams will use this carefully, with review gates and test coverage and explicit human checkpoints. But most teams aren't high-hygiene. Most teams are one on-call rotation away from someone disabling the review gate because it was slowing down the deployment and the business needed to move.

This isn't unique to Cloudflare — it's the pattern of every infrastructure product that trades safety for speed and sells it as efficiency. AWS made it trivially easy to expose a misconfigured S3 bucket to the public internet. The ease of deployment was the product; the misconfiguration was the tax. We are watching the same trade happen at the layer of code authorship and deployment logic. The blast radius is just different.

iv · the post-mortem is already written

In two years — maybe three, maybe eighteen months if we're moving at this pace — there will be a post-mortem. Probably a well-written one, with excellent diagrams, shared widely on Hacker News and praised for its honesty.

The post-mortem will describe an incident where a system shipped itself into a production failure. The timeline will show: agent wrote the code, tests passed in the sandbox, Flagship ramped the feature to 40% of users, something unexpected happened at scale that the isolate didn't surface, rollback took longer than expected because the agent had made several interdependent changes across multiple repositories.

The listed cause will be "insufficient oversight of autonomous deployment decisions." The recommendation will be to add more human review gates. A working group will be formed. A new internal tool will be built to add the oversight that should have been there by default.

Cloudflare's infrastructure will be mentioned in the architecture section. The agent will be listed as the author of the change. The conversation where someone decided to skip the review gate to hit a deadline will not appear in the post-mortem, because it wasn't documented. It never is.

The technology works. The incentives are broken. But we'll recommend better technology.

I've watched this exact failure mode three times now. It's almost comforting in its predictability. Someone has to document the decay.

v · sources

source · Simon Willison / Cloudflare

threaded with

← more from tech