🐦 X · 动态Madhu Guru @realmadhuguru· 2026 年 6 月 6 日· 164 词 · 约 1 分钟

Madhu Guru · @realmadhuguru

SPACE 播放 / 暂停←→ 上一句 / 下一句

Routing to models is genuinely hard. It means mapping each task to the right model - which requires benchmarking models against your product's specific tasks and dialing in the quality/cost trade-off. And there is an opportunity in that difficulty. Here is the progression I saw with enterprises while on Gemini. Phase 1 (2024): Default to the "it" model. Everybody used GPT regardless of task, because it was the shiny new thing. Phase 2 (early 2025): Over-optimize. Teams over-corrected, looking for the smallest/cheapest model for their task, but did not have evals sophisticated enough to map tasks to models. They ended up burning cycles and shipping slower. Phase 3: Nuanced routing. The industry’s eval muscle and model diversity got to a point where the most sophisticated AI-native startups succeeded in breaking their product into sub-agents and routed each task to the right model - e.g. hardest reasoning to Claude, simplest to Gemini Flash-Lite or open-weight models. And like most product patterns, enterprises followed the AI-native builders 6-9 months later.

将任务路由到不同模型这件事，确实很难。它意味着要把每项任务映射到合适的模型上——这就要求你针对自己产品的具体任务，对模型进行 benchmark（基准测试），并调好质量/成本之间的权衡。而这种难度本身也蕴含着机会。以下是我在 Gemini 期间观察到的企业演进路径。阶段 1（2024）：默认使用那个“it” model（当红模型）。无论任务是什么，大家都用 GPT，因为它是那个闪闪发光的新东西。阶段 2（2025 年初）：过度优化。团队矫枉过正，试图为自己的任务找到最小/最便宜的模型，但他们并没有足够成熟的 evals（评测）能力，无法把任务准确映射到模型上。结果就是白白消耗精力，产品上线更慢。阶段 3：精细化路由。行业在 eval（评测）能力和模型多样性方面发展到了这样的程度：最成熟的 AI-native 初创公司开始成功地把自己的产品拆分成多个 sub-agents（子 agent），并将每项任务路由到正确的模型——例如，把最难的推理交给 Claude，把最简单的任务交给 Gemini Flash-Lite 或 open-weight models。和大多数产品模式一样，企业会在 6 到 9 个月后跟随这些 AI-native builders。

♥ 27↻ 3💬 5x.com ↗

原文 ↗https://x.com/realmadhuguru

🐦 X · 动态Madhu Guru @realmadhuguru· 2026 年 6 月 6 日· 164 词 · 约 1 分钟

Madhu Guru · @realmadhuguru

SPACE 播放 / 暂停←→ 上一句 / 下一句

♥ 27↻ 3💬 5x.com ↗

原文 ↗https://x.com/realmadhuguru