Oct 07, 2025

Data in CSV Format Isn’t Always the Best for LLMs

When you feed a large table into an LLM, the way you format the input can change the model’s accuracy quite a bit. In a test of 11 formats (CSV, JSON, markdown table, YAML and more), a markdown “key: value” style scored around 60.7 % accuracy, which was far ahead of CSV at roughly 44.3 %. CSV and JSONL, despite being the usual defaults, were among the weakest performers.

What stood out to me was the trade off. The top format used many more tokens, so you have to balance cost and accuracy. For anyone working with agents, retrieval systems or table data, sticking with CSV by default might be leaving performance on the table. It is worth experimenting with different formats. Read the full article

Join engineers getting weekly insights on agents, RAG & production LLM systems

No spam, no sharing to third party. Only you and me.

Data in CSV Format Isn’t Always the Best for LLMs

by Anup Jadhav

Member discussion

More like this

How the Claude Code team designs agent tools

Make Claude Code Review Its Own Plans

Stripe's coding agents: the walls matter more than the model

Deep Blue

TIL: The real bottleneck in AI coding isn't speed

TIL: Markov Language