BuildSpeak每日 builder 文摘
今日归档生词本关于
🐦 X · 动态Madhu Guru @realmadhuguru· 2026 年 6 月 6 日· 164 词 · 约 1 分钟

Madhu Guru · @realmadhuguru

SPACE 播放 / 暂停·←→ 上一句 / 下一句
Routing to models is genuinely hard. It means mapping each task to the right model - which requires benchmarking models against your product's specific tasks and dialing in the quality/cost trade-off. And there is an opportunity in that difficulty. Here is the progression I saw with enterprises while on Gemini. Phase 1 (2024): Default to the "it" model. Everybody used GPT regardless of task, because it was the shiny new thing. Phase 2 (early 2025): Over-optimize. Teams over-corrected, looking for the smallest/cheapest model for their task, but did not have evals sophisticated enough to map tasks to models. They ended up burning cycles and shipping slower. Phase 3: Nuanced routing. The industry’s eval muscle and model diversity got to a point where the most sophisticated AI-native startups succeeded in breaking their product into sub-agents and routed each task to the right model - e.g. hardest reasoning to Claude, simplest to Gemini Flash-Lite or open-weight models. And like most product patterns, enterprises followed the AI-native builders 6-9 months later.
将任务路由到不同模型这件事,确实很难。它意味着要把每项任务映射到合适的模型上——这就要求你针对自己产品的具体任务,对模型进行 benchmark(基准测试),并调好质量/成本之间的权衡。而这种难度本身也蕴含着机会。以下是我在 Gemini 期间观察到的企业演进路径。阶段 1(2024):默认使用那个“it” model(当红模型)。无论任务是什么,大家都用 GPT,因为它是那个闪闪发光的新东西。阶段 2(2025 年初):过度优化。团队矫枉过正,试图为自己的任务找到最小/最便宜的模型,但他们并没有足够成熟的 evals(评测)能力,无法把任务准确映射到模型上。结果就是白白消耗精力,产品上线更慢。阶段 3:精细化路由。行业在 eval(评测)能力和模型多样性方面发展到了这样的程度:最成熟的 AI-native 初创公司开始成功地把自己的产品拆分成多个 sub-agents(子 agent),并将每项任务路由到正确的模型——例如,把最难的推理交给 Claude,把最简单的任务交给 Gemini Flash-Lite 或 open-weight models。和大多数产品模式一样,企业会在 6 到 9 个月后跟随这些 AI-native builders。
♥ 27↻ 3💬 5x.com ↗
原文 ↗https://x.com/realmadhuguru
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers