AI agents are starting to do real work

Ethan Mollick argues that AIs have quietly crossed a line. OpenAI recently tested models on complex, expert-designed tasks that usually take humans four to seven hours. Humans still performed better, but the gap is shrinking fast. Most AI mistakes were about formatting or following instructions, not reasoning.

The standout example is Claude 4.5 replicating academic research on its own. Work that would have taken hours was done in minutes, hinting at how whole fields could change when repetitive but valuable tasks get automated.

It’s a reminder that the real shift isn’t just about replacing jobs. It’s about rethinking how we work with AI so we don’t drown in a sea of AI-generated busywork.

👉 Read Ethan’s full piece

Subscribe to The AI Engineering Brief

No spam, no sharing to third party. Only you and me.