Stripe's coding agents: the walls matter more than the model

(Part of my Today I Learned series)

Stripe merges over 1,300 AI-written pull requests every week, and almost every headline about it is missing the actual point.

The temptation is to frame this as proof that models have got good enough to ship production code unsupervised. But that framing gets it backwards. Stripe built their "minions" system around deliberate constraint. They call the core design pattern "blueprints": orchestration flows that alternate between fixed, deterministic code nodes and open-ended agent loops. Their write-up puts it plainly: "putting LLMs into contained boxes compounds into system-wide reliability upside." The model does not run the system. The system runs the model. Each minion pulls from a curated slice of Stripe's MCP toolset, gets at most two CI rounds, and terminates at a pull request. Engineers can still intervene or work alongside, but the agent produces the whole branch without hand-holding. They built this in-house rather than using off-the-shelf agents because their codebase is hundreds of millions of lines of mostly Ruby, with proprietary libraries and compliance constraints that generic agents simply cannot navigate. Context is not optional. It is the whole problem.

Therefore the human review gate at the end is not a formality. It is load-bearing. A CodeRabbit analysis of real production PRs found that AI-authored code introduces 1.75x more logic errors and 2.74x more XSS vulnerabilities than human-written code. Stripe's system is not immune to that; it is designed around it. The insight that keeps landing for me is this: the unglamorous parts of the architecture, the deterministic nodes, the two-round CI cap, the mandatory reviewer, are doing more work than the model is. Reliability at scale comes from knowing precisely where an LLM will fail and building the walls before it gets there.

Stripe Dev Blog: Minions

Join engineers getting weekly insights on agents, RAG & production LLM systems

No spam, no sharing to third party. Only you and me.

Member discussion