LLM Scaling - Anup Jadhav

LLM Scaling

Fitting LLMs on Self-Hosted GPUs

How much VRAM does your LLM need, and which GPU should you actually rent? A free calculator covering DeepSeek, Llama, Mixtral on H100, B200, A100.

04 May

Lilian Weng's Why We Think is a survey of test-time compute and chain-of-thought reasoning. Here's what I pulled out of it.

14 Apr

RAG

24 Nov

LLM Scaling

...or how I learned to stop worrying and love inference-time scaling

05 Feb

LLM Scaling

Breaking Down Parameters, Training Data, and Compute

03 Feb