RAG
RAG at Scale: What It Takes To Serve 10,000 Queries A Day
Most teams start with a simple RAG prototype. It feels elegant, almost magical. A vector database, a handful of chunks,
LLM Scaling
Thinking Smarter, Not Harder: How LLMs Can Learn on the Fly
...or how I learned to stop worrying and love inference-time scaling
LLM Scaling
How to think about LLM Model Size
Breaking Down Parameters, Training Data, and Compute