This is a critical post to read if you’re building an applied AI company right now. “An application earns its place in the untrainable corner by doing unglamorous work: arranging a company's private reality so a model can act on it, handing the model the tools to act, working with the customer to change the reality of its workforce. A company that brings the translation is tough to copy – and the translation never ends. Integration and maintenance run as long as the relationship does, won by teams that put domain-specialized engineers and tools next to the customer.” There’s still an insanely large gulf between model capabilities and what it takes to apply them to specific corporate workflows. Some of that is technology that needs to be built, a lot is access to (and formatting of) the right data to work with, and a ton more is on the change management and specific implementation work (FDEs, etc.) it takes to make AI work in any specific corporate setting. 2 things can be very true at once: frontier models and labs will continue to grow an incredible amount, and there will be a vast ecosystem of software and services companies that emerge to bring the power of these models to real enterprises. This makes room for new infrastructure provides, applied AI companies in every vertical, new versions of system integrators, and more players. Incredibly exciting time on all fronts.
如果你现在正在打造一家 applied AI 公司,这是一篇必须读的重要文章。“一个应用之所以能在不可训练的角落里占据一席之地,靠的是做那些并不光鲜的工作:整理一家公司的私有现实,让模型能够基于它采取行动;把模型执行行动所需的工具交到它手里;与客户合作,改变其劳动力体系的现实。能够完成这种翻译的公司很难被复制——而且这种翻译永远不会结束。只要合作关系还在,集成与维护就会持续进行,而胜出的是那些把领域专精的工程师和工具放在客户身边的团队。” 模型能力与将其应用到特定企业工作流所需的一切之间,仍然存在大得惊人的鸿沟。其中一部分是尚需构建的技术,很大一部分是获取正确数据并将其格式化以供使用,更多则在于让 AI 在任何特定企业环境中真正发挥作用所需的变革管理与具体实施工作(FDEs 等)。有两件事可以同时成立:frontier models 和 labs 会继续实现惊人的增长;同时,也会涌现出一个庞大的软件与服务公司生态,把这些模型的力量带给真实企业。这为新的基础设施提供商、各个垂直领域的 applied AI 公司、新版本的 system integrators 以及更多参与者留出了空间。这对所有方向而言,都是一个令人无比兴奋的时刻。
If you thought AI progress was slowing down, well here's the immediate answer to that. Huge jump in capability across the board. This is going to deliver major improvement in agents across almost all knowledge work categories.
如果你以为 AI 的进展正在放缓,那么这就是对此最直接的回答。整体能力出现了巨大跃升。这将给几乎所有知识工作类别中的 agents 带来重大改进。
Great post. So much about model performance is a function of how much compute you’re doing at inference time. This means compute-normalized benchmarks is the only logical path forward. And yet, the challenge is it’s a lot harder than it seems given it’s subjective how much compute to apply, which means models behave differently at different thresholds (simplistically, model X’s min thinking may beat model Y’s min thinking, but be reversed at high), and there are a near infinite set of thresholds you could choose to set. But either way, moving more in this direction would be great for better understanding AI progress.
很棒的一篇文章。模型性能在很大程度上取决于你在 inference time 投入了多少 compute。这意味着,按 compute 归一化的 benchmarks 才是唯一合乎逻辑的前进路径。不过,难点在于,这件事比看上去要难得多,因为该投入多少 compute 本身就带有主观性,这意味着模型在不同阈值下会表现不同(简单说,model X 的低思考量版本可能胜过 model Y 的低思考量版本,但在高阈值下结果可能反过来),而你几乎可以设定出无限多个不同的阈值。无论如何,朝这个方向多推进一些,都会非常有助于我们更好地理解 AI 的进展。