eval driven development
Your Golden Dataset Is Worth More Than Your Prompts
Most teams spend weeks perfecting prompts and minutes on evaluation data. That's backwards.
Part 2 of 4: Evaluation-Driven Development for LLM Systems
evals
Build LLM Evals You Can Trust
If five correct responses are enough to ship an LLM feature, what are you actually measuring: quality, or luck?
Part 1 of 4: Evaluation-Driven Development for LLM Systems
Agentic Architecture
Temporal + LangGraph: A Two-Layer Architecture for Multi-Agent Coordination
Using Temporal and LangGraph for multi-agent systems in production solves retries, state persistence, and failures.
Agentic Architecture
Understanding Generative UI
A Layered Walkthrough of the Generative UI paper everyone is talking about
Agentic Architecture
Eliza Redux: A Real-Time Voice AI Crisis Support Agent
I built a crisis support voice AI Agent in roughly 90 minutes at a voice AI hackathon and won. Here&
RAG
Rethinking RAG: Meta’s REFRAG
For the past few years, retrieval-augmented generation (RAG) has been the workhorse architecture for grounded LLM applications. You retrieve relevant
AI Agents
MCP + A2A: The Protocols Making AI Agents Actually Work Together
I presented this at the AI Engineer meetup in London. It is a short, practical overview of why agent interoperability
Agentic Architecture
AI Agent Use Case Evaluation: From Risk Assessment to Implementation
When I first started evaluating Agentforce implementations, I made the classic engineer's mistake: jumping straight into technical capabilities