Routing to models is genuinely hard. It means mapping each task to the right model - which requires benchmarking models against your product's specific tasks and dialing in the quality/cost trade-off. And there is an opportunity in that difficulty. Here is the progression I saw with enterprises while on Gemini. Phase 1 (2024): Default to the "it" model. Everybody used GPT regardless of task, because it was the shiny new thing. Phase 2 (early 2025): Over-optimize. Teams over-corrected, looking for the smallest/cheapest model for their task, but did not have evals sophisticated enough to map tasks to models. They ended up burning cycles and shipping slower. Phase 3: Nuanced routing. The industry’s eval muscle and model diversity got to a point where the most sophisticated AI-native startups succeeded in breaking their product into sub-agents and routed each task to the right model - e.g. hardest reasoning to Claude, simplest to Gemini Flash-Lite or open-weight models. And like most product patterns, enterprises followed the AI-native builders 6-9 months later.
将任务路由到不同模型这件事,确实很难。它意味着要把每项任务映射到合适的模型上——这就要求你针对自己产品的具体任务,对模型进行 benchmark(基准测试),并调好质量/成本之间的权衡。而这种难度本身也蕴含着机会。以下是我在 Gemini 期间观察到的企业演进路径。阶段 1(2024):默认使用那个“it” model(当红模型)。无论任务是什么,大家都用 GPT,因为它是那个闪闪发光的新东西。阶段 2(2025 年初):过度优化。团队矫枉过正,试图为自己的任务找到最小/最便宜的模型,但他们并没有足够成熟的 evals(评测)能力,无法把任务准确映射到模型上。结果就是白白消耗精力,产品上线更慢。阶段 3:精细化路由。行业在 eval(评测)能力和模型多样性方面发展到了这样的程度:最成熟的 AI-native 初创公司开始成功地把自己的产品拆分成多个 sub-agents(子 agent),并将每项任务路由到正确的模型——例如,把最难的推理交给 Claude,把最简单的任务交给 Gemini Flash-Lite 或 open-weight models。和大多数产品模式一样,企业会在 6 到 9 个月后跟随这些 AI-native builders。