Choosing the Right Models for Agentic Development

Introduction

Building autonomous AI agent(s) to handle coding tasks is the holy grail of current engineering. However, assuming a single, monolithic frontier model is the right answer for every step of the process is a common mistake. An effective agentic workflow should not use a single model for all tasks.

To build a high-performance agent, you need a multi-model strategy that balances cognitive load, speed, and cost. In this post, I break down a typical model configuration stack and explain exactly when to deploy each model—and the critical nuances that determine success.

Key Concepts

The Personas: Building Your AI Dev Team

I classify modern models into three primary personas for agentic workflows. The examples below reflect configurations used when developing with Google Antigravity. In these systems, the handoff between Architect and Workhorse is typically a hardcoded pipeline step, though more advanced agent managers can dynamically route based on task complexity scores.

1. The Architects (Strategic Thinking & Complexity)

These models are your “Lead Engineers.” Use them sparingly for planning, system design, or resolving complex bugs.

Claude Opus 4.6 (Thinking): Your ultimate escalation model for deep synthesis and complex multi-file systems.
Gemini 3.1 Pro (High): Your massive context synthesizer for ingesting entire repositories and sweeping refactors.

2. The Workhorses (Daily Execution)

Your “Mid-Level Developers” balancing intelligence and speed for standard feature execution.

Claude Sonnet 4.6 (Thinking): Often the best default for primary coding agents, handling Jira tickets and API integrations efficiently.
Gemini 3.1 Pro (Low): Ideal for top-tier knowledge requirements on simple, direct tasks.

3. The Sprinters (Rapid Tool-Use & Iteration)

Built for sheer speed and high-volume throughput.

Gemini 3 Flash: Crucial for the “inner loop”—generating boilerplate, fixing syntax errors, and acting as a fast “critic.”

4. The Local Specialist (Security)

GPT-OSS 120B (Medium): For when privacy is non-negotiable and data must remain internal.

The Core Nuance: Why Use “Low” Effort for Execution?

Initially you may ask why not always use the model with the most “thinking” capability? In automated agentic loops, giving a model too much cognitive budget can actually ruin the workflow.

Speed and Latency: “Low” effort models stream results almost immediately, preventing bottlenecks in the automated system.
Preventing Over-Engineering: Frontier models at “High” effort may hallucinate edge cases or introduce unnecessary complexity for simple tasks.
Pipeline Stability: Predictable, snappy response times ensure automated loops don’t time out.
Token Economics & Cost: Deploying a Sprinter for the inner loop (boilerplate, unit testing, error fixing) drastically reduces overall API costs at scale compared to defaulting to an expensive Architect for every minor iteration.

The Strategy: Use “High” for Exploration, and “Low” for Execution.

Conclusion

An efficient agentic pipeline follows a multi-step pattern:

Planning: Architect (e.g., Claude Opus) generates the high-level plan.
Implementation: Workhorse (e.g., Claude Sonnet or Gemini Pro Low) implements the changes.
Iteration: Sprinter (e.g., Gemini Flash) runs tests and fixes syntax errors.
Fallback Loop: If the Workhorse or Sprinter becomes stuck in a logic loop or execution failure, the Architect is re-engaged to provide the deep reasoning required to unblock the pipeline.
Final Review: Architect performs a final quality assessment.

By treating your AI models as specialized team members rather than general-purpose tools, you can build agentic workflows that are faster, more reliable, and more cost-effective.

I share my thoughts on all things engineering and leadership, straight to your inbox.