Practice 03 · AI Blog

Multi-agent systems: when to orchestrate vs. consolidate into one agent

The AI architecture slide that shows up most in 2026 has 6–8 connected boxes, each with a fancy name: orchestrator, researcher, writer, critic, executor. It's the multi-agent era. The promise: specialized agents collaborate, each does what it does best, the emergent result beats any monolithic agent. In some cases, true. In most cases, operational fantasy sold as sophistication.

This text is about when multi-agent pays off, when it's expensive overhead in disguise, and the rule to decide before spending three months building orchestration that should have been a single agent.

What separates real multi-agent from theater

A multi-agent system isn't "an agent calling another agent". By that definition, any LLM with tool use becomes multi-agent — and the term loses meaning. The useful definition is narrower: a system where multiple agents maintain their own state, reason independently, and coordinate shared decisions.

Under that definition, three things have to be present to deserve the name:

  • Real specialization. Each agent does something qualitatively different — not an arbitrary task split.
  • Independent state. Each agent maintains its own context, memory, perspective. It isn't just parallel fan-out of calls.
  • Non-trivial coordination. There's a communication protocol, rules of who decides what, and the result depends on the interaction. Not an orchestrator calling functions.

Without these three, the system is a monolith with prompt chain — cheaper, faster, easier to debug. Calling it multi-agent is jargon.

Multi-agent is one of the most sophisticated — and most frequently misapplied — tools of 2026. For most cases, a single agent with well-designed tools solves better.

Three cases where multi-agent pays

The contexts where the architecture makes sense. All have something in common: coordination cost is less than specialization gain.

  1. Long tasks with inherently different perspectives. Code generation + critical review + testing. Research + writing + editing. Here it makes sense for one agent to generate and another to criticize — because the critical perspective is different from the generative. A single agent doing both has bias (likes its own answer). Coordination cost is paid by quality gain.
  2. Technical domains with heterogeneous expertise. A system that needs legal reasoning + financial analysis + commercial writing. Each demands different prompt, knowledge base, tone. Single agent is mediocre at all; specialist agents shine each in their domain, orchestrator coordinates.
  3. Workflows with human decision between steps. Systems where a human approves between steps, or where different humans interact with different agents. Natural multi-agent structure reflects the human structure — doesn't try to hide it.

Outside these three, multi-agent tends to be over-engineering. And over-engineering in AI costs more than in traditional systems, because latency compounds (each agent adds inference time) and cost multiplies (each agent runs an LLM, each coordination costs tokens).

The two most expensive antipatterns

Where multi-agent becomes theater, the mistake has a name.

Antipattern 1: split a single task into N agents "because it looks organized". Team builds an agent that classifies intent, an agent that extracts entities, an agent that searches context, an agent that generates the answer. Each one is an LLM call. The system gets 4× slower, costs 4× more, and quality doesn't improve — because the split is arbitrary. A single agent with clear instruction and tool calling would solve the same problem in one call.

Antipattern 2: orchestrator + identical workers. "Let's have a master agent that distributes tasks to 5 parallel workers". If the workers do the same thing, that's fan-out — not multi-agent. It can be useful for parallelizing, but don't call it multi-agent. And in 80% of cases where this appears, the parallelization gain is smaller than the coordination overhead.

These two cover most systems that call themselves "multi-agent" in conference presentations and in practice are expensive prompt chains.

The honest rule before deciding

Before approving multi-agent architecture, five questions separate technical decision from fashion.

  1. Is the split between agents qualitative or just task-level? Different perspectives, expertise or states = qualitative, multi-agent makes sense. Same thing in pieces = monolith is better.
  2. How much does it cost in total latency? Each agent adds 1–5s of inference. A 5-agent system can be unusable in synchronous UX. Pays off in async workflow; kills real-time chatbots.
  3. How much does it cost in total tokens? Coordination consumes tokens. A typical multi-agent system costs 3–10× more than equivalent monolith. Calculate before signing an annual contract.
  4. How's the debugging? Multi-agent is multiplicatively harder to debug — you need to trace state in N agents + communication protocol. Without serious evaluation, the system becomes a black box in 6 months.
  5. Is there a single agent that resolves to 70%? If yes, start there. Climb to multi-agent only after exhausting. Inverting is expensive.

Whoever answers the five clearly knows if multi-agent is necessary or ornament. Whoever answers "it depends" on three or more doesn't have a defined use case for any architecture.

The parallel with LLM as internal agent

As I argued about LLM as internal agent, the most common mistake in enterprise AI is treating the tool as a universal solution. Multi-agent systems carry the same mistake at another layer: the temptation to use sophisticated architecture because it's available, even when the use case doesn't call for it.

The technical gain of multi-agent is real where applicable. The cost is high where it isn't. The difference between a paying project and AI theater is, again, scope discipline.

The decision for 2026

If your company is debating multi-agent, three honest moves before committing architecture:

Start with a single agent and well-designed tools. Covers 70–80% of the use cases that show up as "we need multi-agent". Tool calling with clear instruction substitutes orchestration in almost everything.

Pilot multi-agent only where specialization is qualitative. Generation + critique is the most classic case worth it. Mixed heterogeneous technical domains, second. Other cases, go back to single agent.

Measure latency and cost from the first prototype. Multi-agent scales poorly in both. Discovering this in production is expensive; in pilot, cheap.

The worst enterprise AI decision in 2026 is confusing sophistication with necessity. Multi-agent systems are powerful where they fit. Applying everywhere is the most expensive way to do with five agents what one would do better. (For those who want to see this in real operation, with 5 agents in production for 90 days and the errors named, the field diary is here.)

Keep the conversation going

Want to discuss this topic with a partner?

Initial diagnostic with no commitment. We assess your scenario in a week and send back a report.

Talk to a partner