Papers
arxiv:2511.20975

Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows

Published on Nov 26, 2025
Authors:
,
,
,

Abstract

Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the many LLM inferences that each request must pass through. Configuration selection, or the cost-aware assignment of workflow agents to specific LLMs, can reduce these costs, but existing approaches bind configuration decisions before request execution, making them ill-suited for the heterogeneous and lengthy execution of workflows. Specifically, system loads can fluctuate rapidly and substantially during a request's lifetime, causing fixed configurations to quickly become suboptimal. We present Aragog, a system that progressively adapts a request's configuration throughout its execution to match runtime dynamics. To make this practical despite the massive space of workflow configurations, Aragog decouples the problem into two core elements -- a one-time routing step that identifies all accuracy-preserving configurations, and a cheap per-stage scheduler that selects among them using up-to-date system observations -- and introduces novel strategies to accelerate each. Across diverse workflows and model families, Aragog increases maximum serving throughput by 50.0--217.0\% and reduces median latency by 32.5--78.9\% at peak request rates, while maintaining accuracy comparable to the most expensive configurations.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.20975 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.20975 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.20975 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.