Agentic Architecture
Harness Engineering: The Outer System That Makes Agents Reliable
Building a good harness is what separates a good agentic implementation from a great one.
eval driven development
Ship Prompts Like Software: Regression Testing for LLMs
Because "it seemed fine when I tested it" is not a deployment strategy.
Part 4 of 4: Evaluation-Driven Development for LLM Systems
evals
Four Ways to Grade an LLM (Without Going Broke)
Your evaluation technique should match the question you're asking, not your ambition.
eval driven development
Your Golden Dataset Is Worth More Than Your Prompts
Most teams spend weeks perfecting prompts and minutes on evaluation data. That's backwards.
Part 2 of 4: Evaluation-Driven Development for LLM Systems
evals
Build LLM Evals You Can Trust
If five correct responses are enough to ship an LLM feature, what are you actually measuring: quality, or luck?
Part 1 of 4: Evaluation-Driven Development for LLM Systems
Agentic Architecture
Temporal + LangGraph: A Two-Layer Architecture for Multi-Agent Coordination
Using Temporal and LangGraph for multi-agent systems in production solves retries, state persistence, and failures.
Agentic Architecture
Understanding Generative UI
A Layered Walkthrough of the Generative UI paper everyone is talking about
Agentic Architecture
Eliza Redux: A Real-Time Voice AI Crisis Support Agent
I built a crisis support voice AI Agent in roughly 90 minutes at a voice AI hackathon and won. Here&
RAG
Rethinking RAG: Meta’s REFRAG
For the past few years, retrieval-augmented generation (RAG) has been the workhorse architecture for grounded LLM applications. You retrieve relevant
AI Agents
MCP + A2A: The Protocols Making AI Agents Actually Work Together
I presented this at the AI Engineer meetup in London. It is a short, practical overview of why agent interoperability
Agentic Architecture
AI Agent Use Case Evaluation: From Risk Assessment to Implementation
When I first started evaluating Agentforce implementations, I made the classic engineer's mistake: jumping straight into technical capabilities