AI Agents Are Getting Twice as Capable Every 7 Months

Why Agent Orchestration Is the Next AI Breakthrough

Jun 10, 2025

What if I told you there was a "Moore’s Law" for AI agents?

A new study from METR1 just revealed exactly that.
The time horizon of AI agents is doubling every seven months.

The time horizon is a simple but powerful idea. It is the length of human-time tasks that an AI model can complete reliably (with at least 50 percent success rate).

This growth is astonishing, and it has major implications for those of us building agentic systems.

What the Research Shows

METR tracked how state-of-the-art AI models perform across a wide range of software and reasoning tasks, from quick decisions to complex multi-step challenges.

They benchmarked each task by asking: How long does this take a skilled human to complete? Then they tested AI models on the same tasks to find the current limits of reliable performance.

Length of asks AIs can do is doubling every 7 months — source: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

Here is what they found:

The time horizon of frontier AI models has been doubling every 7 months since 2019.
In 2019, models could reliably complete tasks lasting only a few seconds.
Today’s top models (like Claude 3.7 Sonnet) can complete tasks that take 50 to 60 minutes of human effort about half the time.
Performance on tasks longer than 4 hours still drops below 10 percent, but the growth trend is clear.
If this curve continues, we could see agents capable of managing week-long projects in 2 to 4 years, and month-long projects by the end of the decade.

Why This Matters for Agent Builders

If you are building AI agents today, this is not just an interesting research finding. It is a strategic roadmap.

Planning Horizons Define Reliability

Many agent failures we see today are not because models lack skills or knowledge.
They fail because the task requires sustained coherence across more steps than the model can currently handle.

When agents exceed their planning horizon, errors multiply:

Memory inconsistencies
Forgotten goals
Hallucinated intermediate steps
Loss of coherence across long action chains

Understanding and respecting the current time horizon (about 1 hour today) is crucial if you want agents that actually work in production.

Doubling Time Is Non-Linear

This exponential growth curve means that agents you build today will become twice as capable within the year, even without changing your architecture.

Models are succeeding at increasingly long tasks chart

A 1-hour-capable agent today could become a 2-hour-capable agent by year-end, and a 4-hour-capable agent in early 2026.
This is why designing agents for modular extensibility is key.

What You Should Do Next

Anchor on the current horizon. Build agents that perform well under today’s approximately 1-hour task limit.
Design for modularity. Architect your agents to extend their capabilities as model horizons grow.
Segment complex workflows. Break large tasks into smaller phases that align with current agent capabilities.
Prepare for agent collaboration. Multi-agent systems are the clearest path to scaling beyond individual model limits.
Monitor the curve. Track horizon progress and adjust your strategy accordingly. This is a moving target.

Why Agent Orchestration Is Now Critical

This METR finding strongly validates why agent orchestration patterns will be the key engineering focus for the next wave of AI builders.

Here is why:

Single-agent limits. No single model will reliably handle week-long projects alone anytime soon. Orchestration lets you combine specialised agents into more capable systems.
Coherence engineering. Orchestration patterns help preserve coherent long-term state across many agent interactions, enabling sustained multi-day workflows.
Recovery and resilience. Proper orchestration enables retries, fallback strategies, phased memory management, and structured error handling. All of these are essential when operating near or beyond current horizons.

What Is Coming Next from Me

This is why my upcoming writing is focused on agent orchestration patterns.

I will be sharing:

The key design patterns for robust orchestration across agents
Strategies for memory consistency and continuity across long workflows
Recovery and fallback patterns that enable reliable long-horizon performance
Architectures for multi-agent collaboration and phased task execution

If the METR curve holds, orchestration will soon matter more than model size alone. The agent systems that win will be the ones that can manage complexity, memory, and collaboration across expanding time horizons.

If you want to follow this work, stay tuned. I will be publishing the first post in this series soon.

TL;DR

The time horizon of AI agents, meaning the length of human-time tasks they can reliably complete, is doubling every 7 months.
Current frontier agents can handle approximately 1-hour tasks with reasonable reliability.
This makes agent orchestration the next engineering frontier:
- To extend capabilities beyond individual model limits
- To build systems that scale with the rapid growth of agent horizons
- To enable robust, production-grade agentic workflows today and tomorrow

If you have been waiting to invest in agent architecture, this is your wake-up call. The window for building differentiated agent systems is wide open. The bar is about to rise fast.

It is time to build.

- Anup

References

Full paper

Measuring AI Ability to Complete Long Tasks | METR (March 2025)

The AI Engineering Brief

Discussion about this post