Agentic Architecture
Harness Engineering: The Outer System That Makes Agents Reliable
Building a good harness is what separates a good agentic implementation from a great one.
We’re Being Too Loose With the Term “World Model”
I finally got through the 3 hr+ Max Bennett interview on MLST (link at the end). It took me over
TIL: Quantisation
Spent some time properly working through quantisation this week.
I liked this piece (from ngrok) because it does not stop
Claude Code
Write Skills Like Workstations, Not Prompts
Claude Code skills work best when you treat them as workstations, not prompts: folders with scripts, gotchas, templates, and progressive disclosure that manage the agent's attention budget at runtime.
How the Claude Code team designs agent tools
(Part of my Today I Learned series. Short posts on things that made me think.)
When Claude Code shipped, the
Make Claude Code Review Its Own Plans
(Part of my TIL series)
If you've used Claude Code's plan mode, you've probably
eval driven development
Ship Prompts Like Software: Regression Testing for LLMs
Because "it seemed fine when I tested it" is not a deployment strategy.
Part 4 of 4: Evaluation-Driven Development for LLM Systems
evals
Four Ways to Grade an LLM (Without Going Broke)
Your evaluation technique should match the question you're asking, not your ambition.
eval driven development
Your Golden Dataset Is Worth More Than Your Prompts
Most teams spend weeks perfecting prompts and minutes on evaluation data. That's backwards.
Part 2 of 4: Evaluation-Driven Development for LLM Systems
evals
Build LLM Evals You Can Trust
If five correct responses are enough to ship an LLM feature, what are you actually measuring: quality, or luck?
Part 1 of 4: Evaluation-Driven Development for LLM Systems
Stripe's coding agents: the walls matter more than the model
(Part of my Today I Learned series)
Stripe merges over 1,300 AI-written pull requests every week, and almost every
Deep Blue
Part of my Today I Learned series. Short posts on things that made me think.
Simon Willison and the Oxide
TIL: The real bottleneck in AI coding isn't speed
Both Anthropic and OpenAI shipped "fast inference" this week, and their approaches reveal two very different bets. Anthropic
TIL: Markov Language
Programming languages were designed to make code easy for humans to write. But Davis Haupt argues we've been
AI-Value-Creation
Become an AI Value Creator
Part 1 of a five-part premium series for technical leaders. From data strategy to governance to what comes after the hype cycle, this is the blueprint for building AI that actually compounds.
claude-code
Claude Code Tips From the Guy Who Built It
Boris Cherny created Claude Code at Anthropic. Over three Twitter threads (early January, late January, and February 2026), he shared
TIL: Learning New Tech With AI Assistance Might Backfire
A study of 52 developers found that using AI to learn a new Python library led to worse comprehension scores, with no speed improvement. Here's what actually works.
Agentic Architecture
Temporal + LangGraph: A Two-Layer Architecture for Multi-Agent Coordination
Using Temporal and LangGraph for multi-agent systems in production solves retries, state persistence, and failures.