Anup Jadhav

London

I'm an AI and Software Engineer with 20+ years of technology leadership experience, currently focusing on Generative AI and Agentic AI architectures. I specialize in developing Agentic AI systems that solve complex business problems.

Agentic Architecture

Harness Engineering: The Outer System That Makes Agents Reliable

Building a good harness is what separates a good agentic implementation from a great one.

02 Apr

We’re Being Too Loose With the Term “World Model”

I finally got through the 3 hr+ Max Bennett interview on MLST (link at the end). It took me over

31 Mar

TIL: Quantisation

Spent some time properly working through quantisation this week. I liked this piece (from ngrok) because it does not stop

28 Mar

Claude Code

Write Skills Like Workstations, Not Prompts

Claude Code skills work best when you treat them as workstations, not prompts: folders with scripts, gotchas, templates, and progressive disclosure that manage the agent's attention budget at runtime.

18 Mar

How the Claude Code team designs agent tools

(Part of my Today I Learned series. Short posts on things that made me think.) When Claude Code shipped, the

01 Mar

Make Claude Code Review Its Own Plans

(Part of my TIL series) If you've used Claude Code's plan mode, you've probably

28 Feb

eval driven development

Ship Prompts Like Software: Regression Testing for LLMs

Because "it seemed fine when I tested it" is not a deployment strategy. Part 4 of 4: Evaluation-Driven Development for LLM Systems

26 Feb

evals

Four Ways to Grade an LLM (Without Going Broke)

Your evaluation technique should match the question you're asking, not your ambition.

25 Feb

eval driven development

Your Golden Dataset Is Worth More Than Your Prompts

Most teams spend weeks perfecting prompts and minutes on evaluation data. That's backwards. Part 2 of 4: Evaluation-Driven Development for LLM Systems

24 Feb

evals

Build LLM Evals You Can Trust

If five correct responses are enough to ship an LLM feature, what are you actually measuring: quality, or luck? Part 1 of 4: Evaluation-Driven Development for LLM Systems

23 Feb

Stripe's coding agents: the walls matter more than the model

(Part of my Today I Learned series) Stripe merges over 1,300 AI-written pull requests every week, and almost every

20 Feb

Deep Blue

Part of my Today I Learned series. Short posts on things that made me think. Simon Willison and the Oxide

17 Feb

TIL: The real bottleneck in AI coding isn't speed

Both Anthropic and OpenAI shipped "fast inference" this week, and their approaches reveal two very different bets. Anthropic

16 Feb

TIL: Markov Language

Programming languages were designed to make code easy for humans to write. But Davis Haupt argues we've been

15 Feb

AI-Value-Creation

Become an AI Value Creator

Part 1 of a five-part premium series for technical leaders. From data strategy to governance to what comes after the hype cycle, this is the blueprint for building AI that actually compounds.

15 Feb

claude-code

Claude Code Tips From the Guy Who Built It

Boris Cherny created Claude Code at Anthropic. Over three Twitter threads (early January, late January, and February 2026), he shared

15 Feb

TIL: Learning New Tech With AI Assistance Might Backfire

A study of 52 developers found that using AI to learn a new Python library led to worse comprehension scores, with no speed improvement. Here's what actually works.

30 Jan

Agentic Architecture

Temporal + LangGraph: A Two-Layer Architecture for Multi-Agent Coordination

Using Temporal and LangGraph for multi-agent systems in production solves retries, state persistence, and failures.

14 Jan