Oct 04, 2025

AI agents are starting to do real work

Ethan Mollick argues that AIs have quietly crossed a line. OpenAI recently tested models on complex, expert-designed tasks that usually take humans four to seven hours. Humans still performed better, but the gap is shrinking fast. Most AI mistakes were about formatting or following instructions, not reasoning.

The standout example is Claude 4.5 replicating academic research on its own. Work that would have taken hours was done in minutes, hinting at how whole fields could change when repetitive but valuable tasks get automated.

It’s a reminder that the real shift isn’t just about replacing jobs. It’s about rethinking how we work with AI so we don’t drown in a sea of AI-generated busywork.

👉 Read Ethan’s full piece

Join engineers getting weekly insights on agents, RAG & production LLM systems

No spam, no sharing to third party. Only you and me.

AI agents are starting to do real work

by Anup Jadhav

Member discussion

More like this

Harness Engineering: The Outer System That Makes Agents Reliable

We’re Being Too Loose With the Term “World Model”

TIL: Quantisation

How the Claude Code team designs agent tools

Make Claude Code Review Its Own Plans

Stripe's coding agents: the walls matter more than the model