AAAI 2027 Submission

Learning Drift-Aware Coordination in LLM-Based Multi-Agent Systems

A coordinator-driven framework that detects and mitigates coordination drift in LLM-based multi-agent systems through shared context, targeted intervention, and preference-based optimization.

Anonymous Author(s)

Paper Code Dataset Benchmark

Overview

How can LLM-based multi-agent systems maintain efficient and stable coordination over long horizons in interactive environments?

DriCo introduces an explicit coordinator that aggregates agent-level information into shared context, assigns roles, and intervenes when conflict, redundancy, or loops emerge. Agents use a hierarchical policy: a planner generates team-aware sub-goals, while an actor executes primitive actions with value-aware guidance.

Overview of the proposed DriCo framework — DriCo framework: coordinator-mediated shared context, planning, and action execution.

Contributions

What DriCo adds

Coordination drift

A measurable abstraction for long-horizon team-level misalignment, covering conflicts, redundant sub-goals, and looping behaviors.

Shared-context coordination

A coordinator constructs team-level shared context and role assignments from dispersed local observations and sub-goals.

Drift-aware intervention

When drift exceeds a threshold, the coordinator revises context and selectively asks affected agents to regenerate sub-goals.

Hierarchical agents

Each agent uses a planner for high-level sub-goal generation and an actor for low-level action execution.

Preference-based learning

Coordinator and actor policies are optimized using preference objectives that favor lower drift and higher value actions.

LLM-Overcooked

A language-oriented Overcooked-AI extension with separate training/evaluation environments, diverse layouts, recipes, and held-out task compositions.

Method

Coordinator-driven long-horizon execution

Collect local information

Agents send local observations and candidate sub-goals to a communication buffer. The coordinator uses this buffer as the team-level coordination state.

Construct shared context

The coordinator summarizes team state into shared context and assigns roles. Planners condition on role, local observation, and shared context to generate sub-goals.

Intervene on drift

If trigger drift is high, the coordinator updates context and asks affected agents to revise their sub-goals, reducing conflict and redundancy while recovering from loops.

DriCo coordinator, planner, and actor modules — Framework detail: shared context construction, context-aware planning, and preference-based action execution.

Coordination drift objective

DriCo scores candidate shared contexts by the negative trigger drift they induce. Lower drift means more coherent team-level behavior, so preference learning increases the likelihood of contexts that improve downstream coordination.

Conflict: incompatible sub-goals under the current shared context.
Redundancy: duplicated normalized task-level sub-goals.
Loop: repeated action patterns in recent action history.

Benchmark

LLM-Overcooked

Why a new benchmark?

DriCo is evaluated on LLM-Overcooked, an LLM-oriented extension of Overcooked-AI designed for language-conditioned goals, structured recipes, and long-horizon multi-agent coordination.

Separate training and evaluation environments.
Diverse layouts and recipe compositions.
Reference trajectory datasets for learning coordination policies.
Held-out configurations for in-benchmark generalization.

Results

DriCo improves stability on harder tasks

55/70

Best average success/progress with Qwen-2.5 7B

DriCo reaches 55% average success and 70% progress completeness on Forced Coordination with Qwen-2.5 7B, outperforming all listed baselines in the same setting.

Performance comparison results for DriCo on selected Llama-3.1 8B settings — Performance visualization for selected Llama-3.1 8B settings.

Lower drift

Conflict, redundancy, and loop reduction

Drift analysis indicates that the coordinator keeps occurrence rates lower, especially on harder task levels where loops accumulate without shared-context intervention.

Qualitative recovery behavior

In the Baked Eggplant case study, DriCo detects loop states and restores progress through targeted coordination updates, while baseline agents exhibit conflict, redundancy, or repeated wait-and-place behaviors.

Qualitative case study on the Baked Eggplant task — Case study: conflict, redundancy, loop, and DriCo recovery behavior.

Coordination drift analysis

DriCo maintains lower conflict, redundancy, and loop occurrence rates, especially on harder tasks where the baseline accumulates loop-driven drift.

Coordination drift occurrence rates across task difficulty levels — Coordination drift occurrence rates decomposed into conflict, redundancy, and loop components.

Intervention effect analysis

Intervention-triggered shared-context updates help DriCo continue improving progress completeness instead of plateauing early.

Effect of intervention on progress completeness over the episode horizon — Progress completeness with intervention versus no-intervention over the episode horizon.

Analysis	Main takeaway
Overall performance	DriCo obtains the strongest success rate and progress completeness across task layouts and backbone models.
Coordination drift	The coordinator reduces intended failure modes: conflicts, redundant sub-goals, and loops.
Ablation	Removing shared context, intervention, or RL training degrades performance; shared context produces the largest drop.
Qualitative cases	On Baked Eggplant, DriCo detects loop states and restores progress through targeted coordination updates.