Orchestrator
Combines priors, task metadata, retrieved neighbors, and a compact policy model.
Open agentic model routing for coding agent
A router that chooses the most suitable LLM for solving the current coding task in an agent workflow, verifies what happened, and carries that experience into the next task.
Static routers decide from frozen priors. ACRouter learns from execution signals across context, action, feedback, and memory.
Combines priors, task metadata, retrieved neighbors, and a compact policy model.
Aggregates AST checks, execution, prompt tests, and rule signals into feedback.
Stores task embeddings, chosen model, observed quality, cost, and verification trace.
A stream-shaped benchmark for model selection across single-turn coding tasks and an OOD agentic-programming stream.
Nine ID coding dimensions plus OOD agentic programming.
Relative lift from giving the router performance statistics.
Routers are compared by cumulative regret over realistic streams.
Main experiment
Grouped by component-configuration taxonomy. The left block measures in-distribution coding tasks; the right block measures real-world OOD agentic programming.
ACRouter leads all non-oracle routers on 2,919 in-distribution tasks.
The agentic router keeps the lowest regret on 176 held-out agentic-programming tasks.
Efficient baselines can be cheap, but their OOD quality drops sharply.
| Taxonomy | Router | In-Distribution | OOD Test | ||||
|---|---|---|---|---|---|---|---|
| AvgPerf % | CumReg | Perf / USD | AvgPerf % | CumReg | Perf / USD | ||
| Upper bound | Oracle | 57.00 | 0 | 8.20 | 75.89 | 0 | 2.32 |
| Agent-as-a-Router | ACRouter ours | 49.98 | 205.5 | 3.79 | 62.50 | 17.0 | 1.18 |
| Dynamic: Online Bandit | LinTS | 46.48 | 307.4 | 4.49 | 46.43 | 35.9 | 0.75 |
| LinUCB | 46.84 | 296.9 | 4.38 | 49.82 | 31.1 | 0.96 | |
| Static: Heuristic | DimensionBest | 47.50 | 277.4 | 3.69 | -- | -- | -- |
| kNN Retrieval | 47.18 | 286.7 | 6.07 | 14.29 | 66.7 | 1.45 | |
| Static: Trained Policy | LogReg | 47.26 | 284.4 | 6.27 | 19.64 | 61.8 | 1.17 |
| RouteLLM-BERT | 47.22 | 285.5 | 6.22 | 21.43 | 59.4 | 1.30 | |
| TF-IDF+MLP | 46.97 | 292.8 | 6.11 | 13.39 | 67.9 | 1.17 | |
| Qwen3.5-0.8B-Finetuned | 46.41 | 309.1 | 6.82 | 55.36 | 27.2 | 0.74 | |
| RouteLLM-MF | 46.16 | 316.5 | 6.19 | 8.93 | 72.7 | 0.94 | |
| Single-Model Baselines | Always-Opus 4.6 | 43.83 | 387.1 | 1.29 | 57.14 | 26.7 | 0.64 |
| Always-Kimi-K2.5 | 36.66 | 593.3 | 12.62 | 18.75 | 62.3 | 1.22 | |
| Always-Qwen3.5-Plus | 37.16 | 580.2 | 2.05 | 2.68 | 80.1 | 0.19 | |
| Random | 38.75 | 533.6 | 2.48 | 31.25 | 50.4 | 0.85 | |
Bold green values mark the strongest non-oracle quality/regret result. Gold values mark the strongest cost-efficiency result. DimensionBest is not applicable to OOD because unseen agentic-programming tasks have no predefined dimension-to-model mapping.
Evidence views
Three compact views show model complementarity, regret over task streams, and the cost-performance frontier behind the headline numbers.
Performance varies by coding dimension, while cost and AvgPerf per dollar expose why a single premium model is not always the right deployment choice.
Static routers grow faster on in-distribution tasks and collapse on OOD tasks, while ACRouter keeps lower regret as verified memory accumulates.
ACRouter extends the deployable frontier upward in both ID and OOD, with higher AvgPerf and less cost than always choosing a premium model.
An ARA-style entry is ready for ACRouter, benchmark splits, score matrices, verifier traces, and the held-out agentic stream.
Press fetch to load the local artifact manifest.
Updated OOD results sharpen the split: ACRouter keeps the lowest regret among routers, while a standalone GPT-5.4 backend resolves 75.00% on the same 176 agentic-programming tasks.
@article{agent2026zhou,
title = {Agent-as-a-Router: Agentic Model Routing for Coding Tasks},
author = {Pengfei Zhou, Zhiwei Tang, Yixing Ma, Jiasheng Tang, Yizeng Han, Zhenglin Wan, Fanqing Meng, Wei Wang, Bohan Zhuang, Wangbo Zhao, Yang You},
journal = {arXiv preprint arXiv:2606.22902},
year = {2026},
archivePrefix = {arXiv},
eprint = {2606.22902},
url = {https://arxiv.org/abs/2606.22902},
}