Mar 28, 2026

TIL: Quantisation

Spent some time properly working through quantisation this week.

I liked this piece (from ngrok) because it does not stop at “make the model smaller”. It gets into the actual mechanics: lower-bit representations, scale factors, dequantisation, and the trade-off between compression and error.

I also implemented a small version of the workflow locally to make the ideas concrete for myself. The bit that stood out most was how different symmetric and asymmetric quantisation can be once you actually look at the error distribution, rather than just the file size.

My main takeaway is that quantisation is really a precision-allocation problem. The question is not just how much you can compress a model, but how much numerical fidelity you can give up before your task stops working.

Join engineers getting weekly insights on agents, RAG & production LLM systems

No spam, no sharing to third party. Only you and me.

TIL: Quantisation

by Anup Jadhav

Member discussion

More like this

How the Claude Code team designs agent tools

Make Claude Code Review Its Own Plans

Stripe's coding agents: the walls matter more than the model

Deep Blue

TIL: The real bottleneck in AI coding isn't speed

TIL: Markov Language