Data in CSV Format Isn’t Always the Best for LLMs

When you feed a large table into an LLM, the way you format the input can change the model’s accuracy quite a bit. In a test of 11 formats (CSV, JSON, markdown table, YAML and more), a markdown “key: value” style scored around 60.7 % accuracy, which was far ahead of CSV at roughly 44.3 %. CSV and JSONL, despite being the usual defaults, were among the weakest performers.

What stood out to me was the trade off. The top format used many more tokens, so you have to balance cost and accuracy. For anyone working with agents, retrieval systems or table data, sticking with CSV by default might be leaving performance on the table. It is worth experimenting with different formats. Read the full article

Subscribe to The AI Engineering Brief

No spam, no sharing to third party. Only you and me.