BuildSpeak每日 builder 文摘
今日归档生词本关于
AL

Aaron Levie

@levie ↗

ceo @box - your business lives in content. unleash it with AI

2 最新67 累计45 期
每条推文 hover 显示单独 ▶
2026 年 6 月 23 日 · 2 条 →

Almost all AI model and agent progress is downstream from evals. Open weights post training for specific domains comes down to evals. Agent improvements in the applied AI layer is all about evals. Agentic enterprise deployments that actually can augment work is all about evals. It’s all evals. This will become a core competency of any enterprise in the future. The companies that are able to best understand their own (and/or customers) workflows and how well agents participate in that work will be in the best position to actually drive real automation.

几乎所有 AI model 和 agent 的进展,都是由 evals 驱动的下游结果。针对特定领域的 open weights 后训练,归根结底也取决于 evals。应用层 AI 中 agent 的改进,核心也全是 evals。真正能够增强工作的 agentic enterprise 部署,同样全都离不开 evals。一切都是 evals。未来,这将成为任何 enterprise 的核心能力。那些最能理解自身(和/或客户)工作流,以及 agent 在这些工作中参与效果的公司,将最有机会真正推动实际的自动化。

♥ 308↻ 29💬 336/23 · 01:17x.com ↗

We heard that HTML is a big deal again. You can now preview, edit, manage versions, and securely share any HTML based content on Box. Great for being able to work with any agent produced content immediately.

我们听说 HTML 又重新变得非常重要了。现在,你可以在 Box 上预览、编辑、管理版本,并安全地分享任何基于 HTML 的内容。这对于立即处理任何由 agent 生成的内容非常有帮助。

♥ 225↻ 12💬 206/22 · 19:28x.com ↗
2026 年 6 月 22 日 · 2 条 →

Another new idea to push the state of AI architectures forward. Sakana released a model that effectively uses a mixture of models to get work done. You get a single API but then the work gets farmed out the model that best performs the task. “Fugu manages model selection, delegation, verification, and synthesis automatically. It solves tasks directly when that is enough, or coordinates a team of expert models when a problem calls for more. The complexity of a multi-agent system never reaches your code.” This is generally how applied AI products are building their agent harnesses at this point, but the idea of making this an LLM that any developer can interact with is also a great idea. As we get more innovation with both frontier closed and OSS models, there’s going to be a ton of value produced for the layer that can route the best.

又一个推动 AI 架构发展的新想法。Sakana 发布了一个模型,它通过有效使用 mixture of models(模型混合)来完成工作。你得到的是一个单一的 API,但随后工作会被分发给最擅长该任务的模型。“Fugu 会自动管理模型选择、任务委派、验证和综合。当直接解决任务已经足够时,它就直接处理;而当问题需要更多能力时,它会协调一支由专家模型组成的团队。多 agent 系统的复杂性永远不会传达到你的代码中。” 目前,应用型 AI 产品大体上都是这样构建它们的 agent harness(agent 编排/承载层)的,但把这件事做成一个任何开发者都能交互的 LLM,同样是个很棒的想法。随着 frontier closed 和 OSS 模型两端都持续创新,能够把请求路由给最佳模型的那一层将会创造出大量价值。

♥ 206↻ 23💬 256/22 · 04:41x.com ↗

Agents will use software 100X more than people. When that happens, theres a huge need for guardrails on what the agents are doing so they don’t leak data or change the wrong information, authoritative sources of truth for them to work with, logging and auditing of what they’re doing, the ability to collaborate with people through these systems, and more. A simple query on any given agentic task could pull in more data than a user touches in a month. As a result, there are lots of categories of software that when it goes headless that the usage and value go up substantially. Agents will end up using our CRM data, documents and corporate knowledge, analytics data, and other information far more than people ever did. The platforms that can move toward the model of powering these headless interactions, and have a business model and technology strategy to support this, will be in the best position in the future.

Agents 使用软件的频率将会是人的 100 倍。当这种情况发生时,就会非常需要 guardrails(护栏/约束机制)来限制 agents 在做什么,以防它们泄露数据或修改错误的信息;还需要它们可以依赖的权威 truth source(事实来源),对其行为进行 logging 和 auditing(日志记录与审计),以及通过这些系统与人协作的能力,等等。对任何一个给定的 agentic task(agent 式任务)做一次简单查询,调取的数据量都可能比一个用户一个月接触的数据还多。因此,很多类别的软件一旦变成 headless(无界面/无头)模式,其使用量和价值都会显著提升。Agents 最终会比人更多地使用我们的 CRM 数据、文档和企业知识、分析数据以及其他信息。那些能够转向为这些 headless 交互提供底层能力的平台,并且拥有与之配套的商业模式和技术战略的公司,未来将处于最有利的位置。

♥ 228↻ 44💬 346/22 · 00:20x.com ↗
2026 年 6 月 21 日 · 1 条 →

Pretty remarkable what’s happening with open weights AI right now. We’re seeing models achieve SOTA results on specific tasks, and getting close to frontier on some areas of coding and other domains. The more that open weights is able to maintain only a marginal gap from the frontier, instead of a widening gap, the more value that can be created with AI. Incidentally, this is actually fine for the frontier labs as well; if we can lower the cost of an overall task then AI usage goes up in general. You’re still likely using frontier models for planning, orchestration, reviewing, and other parts of work. But this is all very good for the applied layer of AI, which is now in a great position to cost optimize workloads with cheaper models or use tailored open models post-trained for specific tasks to improve performance.

当前 open weights AI 的发展相当令人瞩目。我们看到,一些模型已经在特定任务上取得了 SOTA(state-of-the-art,当前最先进)结果,并且在 coding 等一些领域以及其他领域中,正逐渐接近 frontier(前沿)水平。open weights 与 frontier 之间如果能始终保持只是边际差距,而不是差距不断拉大,那么 AI 所能创造的价值就会越大。顺带一提,这其实对 frontier labs 也同样有利;如果我们能降低一个整体任务的成本,那么 AI 的总体使用量就会上升。你仍然很可能会在 planning、orchestration、reviewing 以及工作的其他环节中使用 frontier models。但这一切都对 AI 的 applied layer(应用层)非常有利,因为它现在处在一个很好的位置:既可以用更便宜的模型来优化 workload(工作负载)成本,也可以使用针对特定任务进行 post-trained(后训练)的定制 open models 来提升性能。

♥ 434↻ 45💬 466/20 · 20:41x.com ↗
2026 年 6 月 20 日 · 1 条 →

The main variable in getting success with agents is whether you can get the agent the context it needs to do its work; and a major factor in that is if you can create a shared working area for that agent that a human can understand as well. This is one of the reasons why agents using file systems is such a big deal. It creates a unified system that both the person and the agent can work within to pass around data. “What they need is a working set: plans, notes, task lists, policies, drafts, summaries, logs, corrections, decisions, etc. For that layer, a filesystem-shaped interface tends to be more legible to both the model and the humans supervising it.” It turns out giving agents access to the systems we already know how to use, but in a way that is best optimized for them, is the perfect primitive for agents to work.

决定 agent 能否成功的主要变量,是你是否能为它提供完成工作所需的 context(上下文);而其中一个关键因素,则是你是否能为这个 agent 创建一个人类也能理解的共享工作区。这也是为什么使用 file systems(文件系统)的 agents 如此重要的原因之一。它创造了一个统一的系统,让人和 agent 都能在其中协作并传递数据。“它们需要的是一个 working set(工作集合):plans、notes、task lists、policies、drafts、summaries、logs、corrections、decisions 等等。对于这一层,一个类似 filesystem 的接口,往往对 model 和监督它的人类来说都更清晰易读。” 事实证明,让 agents 访问我们已经知道如何使用的系统,但以最适合它们的方式进行优化,正是 agents 开展工作的完美 primitive(基础构件)。

♥ 308↻ 30💬 466/19 · 20:27x.com ↗
2026 年 6 月 19 日 · 2 条 →

The fact that open weights models are being discussed credibly at this level of capability should be a huge update for many. The implications of open models getting to frontier performance ensures that you can always have sovereign AI, have the ability to post train for your specific workflows, cost optimize for various workloads, and actually afford to do much more with AI (which opens up meaningfully different applications). Huge win for the applied AI layer.

在这样的能力水平上,open weights models 被认真地纳入讨论,这一事实本身对很多人来说都应当是一次巨大的认知更新。open models 达到 frontier performance(前沿性能)所带来的影响在于:你将始终可以拥有 sovereign AI(主权 AI),能够针对自己的特定工作流进行 post train(后训练),为不同工作负载做成本优化,并且真正负担得起用 AI 做更多事情(这会打开一些明显不同的新应用)。这对 applied AI layer(应用层 AI)来说是一次巨大胜利。

♥ 319↻ 49💬 396/19 · 04:09x.com ↗

This is a good update for getting access to Fable. It also gives us a view into what the future is likely going to look like with AI regulation. The government will have frameworks that are used to determine future model releases past a certain threshold of capability or compute levels. Given all the constituents involved, and the economic and societal significance of AI, this was practically an inevitability. It may seem small but the implications are massive. It will mean that each model update will go through an extensive review, testing, and feedback process. And in that processes lots of groups will weigh in on the risk of the model, and there will be lots of subjectivity on what the actual risks are or practicalities of exploiting those risks. A positive potential future here would be we still get massive model progress but they just happen in bigger jumps at once, where the labs pack in major improvements since the cost and slow down of each review stacks up. On the other side, the risk is that past a certain threshold we may not get to see the rapid back and forth of model progress that we’ve gotten used to which can have negative compounding effects. Hoping for the former outcome.

这对获得 Fable 的访问权限来说是一个不错的进展。这也让我们得以一窥未来 AI 监管很可能会是什么样子。政府会建立一些 framework(框架),用来判定未来模型发布是否越过了某个 capability(能力)或 compute(算力)水平的阈值。考虑到其中牵涉的所有利益相关方,以及 AI 在经济和社会层面的重要性,这几乎是不可避免的。这件事看起来可能不大,但其影响极其深远。这意味着此后的每一次模型更新,都将经历广泛的审查、测试和反馈流程。而且在这个过程中,很多群体都会对模型的风险发表意见,对于实际风险究竟是什么、以及利用这些风险在实践中是否可行,也会存在大量主观判断。这里一种积极的可能未来是:我们仍然能看到模型取得巨大的进步,只不过这些进步会以更大的跳跃一次性出现,因为每次审查带来的成本和放缓会不断累积,于是实验室会把重大改进打包进去。另一方面,风险在于,一旦越过某个阈值,我们可能就看不到已经习以为常的那种模型快速往复迭代了,而这可能带来负面的复合效应。希望结果会是前一种。

♥ 151↻ 9💬 156/19 · 02:52x.com ↗
2026 年 6 月 18 日 · 1 条 →

The past couple months we may be witnessing what the Applied AI layer will look like at scale. Despite some of the initial critique that this would just be a thin layer on the LLM, it’s turning out that actually driving agentic workflows in an enterprise is far more complex. And anywhere there’s complexity you generally gain a moat and value over time. Here are a few of the components that appear to make up the playbook based on the examples we’re collectively seeing in coding, legal, healthcare, customer support, financial services and other fields: * Build the features that bridge the gap between the intelligence and the workflow. Some workflows can be automated by simply going to a general purpose interface, but others need tuned interfaces and features tied to the work they’re augmenting or automating. They need features that are specific to capturing the kind of data that’s needed as context for the agent. And they need a variety of bespoke tools for the agent to use, and unique interfaces for the human-in-the-loop UX. Going far deeper than just presenting the output tokens is clearly critical, and the more depth there is here definitionally the more sustaining value. * Act as the model router balancing frontier intelligence with cheaper models. A natural advantage that any model neutral platform has is that it can naturally (in a business model-aligned way) leverage whatever level of intelligence is necessary for the workflows they’re automating to get done. There are plenty of scenarios where you need GPT-5.5 or Fable level capability, and also lots of workloads where a more efficient closed or open weights do the trick. Only the companies that have deep evals on specific tasks across all models, and the ability business model wise to leverage them, are in a great position. * Drive the actual implementation and change management via FDE or equivalent. A big reason the applied layer works at scale is that most enterprises need some degree of help and support with change management in implementing agents for their workflows. Data has to be cleaned up and moved to modern systems, processes have to be re-engineered and documented, workflows have to be evaled, SLAs have to get achieved, and so on. All of this is going to be unique for every type of process that gets implemented, which means the companies that have expertise in a given domain and come with all the relevant best practices will be in a strong position. * Implement domain specific GTM that creates expertise in that field. Beyond FDEs the companies that can build sales and GTM motions aligned to their domains also have a natural advantage. Most IT and line of business leaders have too many things to do in any given day; so if you’re not on their agenda, likely someone else is. Depending on the industry, there are entirely different sets of language you use, ways of working through security and compliance, regulatory controls you have to support, industry events that companies convene at, different system integrator and consulting partners you need to work with, and so on. The more generalized this gets the less you can speak the customers language, which is where the applied layer has a leg up. A final note. There remains a view that a lot of this is all mitigated by model intelligence alone, and the bitter lesson solves all of this in the limit. That’s possibly true, but enterprises need help changing *today*. And many aspects of how to bring intelligence to real world work don’t only depend on the axis of the pure capability of the model, so most of what you’re doing now to win ends up being important no matter how good the models get.

过去几个月里,我们或许正在见证大规模 Applied AI(应用层 AI)会是什么样子。尽管一开始有一些批评认为,这不过是叠在 LLM 之上的一层很薄的外壳,但事实证明,要真正把 agentic workflows(agent 驱动的工作流)推进到企业环境中,复杂得多。而凡是存在复杂性的地方,通常就会随着时间形成 moat(护城河)和价值。基于我们在 coding、legal、healthcare、customer support、financial services 及其他领域共同看到的案例,下面是一些似乎正在构成这套 playbook(方法论)的组件:* 构建能够弥合 intelligence 与 workflow 之间差距的功能。有些工作流只需要进入一个通用界面就能自动化,但另一些则需要经过调优的界面,以及与其所增强或自动化的工作紧密绑定的功能。它们需要专门用于采集 agent 所需上下文数据的功能;也需要各种定制工具供 agent 使用,以及面向 human-in-the-loop UX 的独特界面。显然,仅仅呈现输出 token 远远不够,必须做得更深;而这里做得越深,从定义上讲,可持续的价值也就越大。* 充当 model router(模型路由器),在前沿 intelligence 与更便宜的模型之间做平衡。任何 model neutral platform(模型中立平台)的一个天然优势在于,它可以很自然地——并且以符合其商业模式的方式——调用完成其所自动化工作流所需的 intelligence 水平。有很多场景确实需要 GPT-5.5 或 Fable 级别的能力,但也有大量工作负载使用更高效的 closed 或 open weights 模型就足够了。只有那些对所有模型在具体任务上做了深入 evals(评测),并且在商业模式上也有能力去灵活利用它们的公司,才真正处于有利位置。* 通过 FDE 或同类角色推动实际实施与 change management(变更管理)。应用层之所以能够大规模发挥作用,一个重要原因在于,大多数企业在为其工作流部署 agents 时,都需要某种程度的变更管理帮助与支持。数据必须被清洗并迁移到现代系统中;流程必须被重新设计并文档化;工作流必须经过 eval;SLA 必须达成,等等。所有这些对每一种被实施的流程都会是独特的,这意味着,那些在特定 domain(领域)中拥有专业能力,并且自带相关最佳实践的公司,将处于强势位置。* 落地 domain specific GTM(领域特定的 Go-To-Market),从而在该领域形成专长。除了 FDE 之外,那些能够建立与其所在领域相匹配的销售和 GTM 动作的公司,也拥有天然优势。大多数 IT 和业务条线负责人每天都有太多事情要处理;所以如果你不在他们的议程上,很可能别人就在。视行业而定,你需要使用完全不同的话语体系,采用不同的安全与合规推进方式,支持必须满足的监管控制,参加公司会聚集的行业活动,并与不同的 system integrator 和咨询伙伴合作,等等。越是做得泛化,你就越难真正说客户的语言,这正是应用层占优的地方。最后补充一点。仍然有人认为,这一切最终都会被模型 intelligence 本身所化解,bitter lesson 会在极限情况下解决所有这些问题。这或许没错,但企业需要的是今天就获得变革帮助。而且,将 intelligence 引入现实世界工作中的许多方面,并不只取决于模型纯能力这一条轴线,所以你现在为了赢而做的大部分事情,不管模型未来变得多强,最终都会依然重要。

♥ 241↻ 20💬 366/18 · 03:53x.com ↗
2026 年 6 月 17 日 · 2 条 →

One of the biggest questions in AI is how far behind open weights models remain from closed models at any given time. There are huge differences in market structures depending on whether open weights models remain 3 or 6 months behind, or if they fall behind by years. The answer to this will determine how the chip stack plays out, where inference can be run, what sovereign AI looks like, what happens at the applied AI layer, what the margin structure looks like in AI, how much companies can afford to spend on AI, and more. At the moment the open weights players appear to be holding up at keeping close to frontier levels of capability. Will be fun to see how this plays out.

AI 领域中最大的问题之一,是 open weights models(开放权重模型)在任意一个时间点究竟会落后 closed models(闭源模型)多远。市场结构会因此出现巨大差异:到底是 open weights models 只落后 3 个月或 6 个月,还是会落后几年。这个问题的答案将决定 chip stack(芯片栈)会如何演变、inference(推理)可以在哪里运行、sovereign AI(主权 AI)会是什么样子、applied AI(应用层 AI)会发生什么、AI 的利润率结构会是什么样、公司能负担多少 AI 支出,等等。目前看来,open weights 阵营似乎仍能把能力维持在接近 frontier(前沿)水平的位置。接下来会怎么发展,应该会很有意思。

♥ 206↻ 13💬 256/17 · 02:24x.com ↗

The Cursor deal is symbolically quite significant. It was effectively the first mega success in the applied layer of AI. They firmly proved out the value proposition of having a deep domain focus, the role you play as a model router, when to lean into frontier models vs. when to train your own, and the role of applied AI GTM and distribution to make sure you’re actually taking advantage of the market opportunity. Every aspect of their business was tuned to carve out ground and keep doubling down in a highly competitive space. This is really the first at scale template for how to execute this playbook.

Cursor 这笔交易在象征意义上相当重要。它实际上是 AI 的 applied layer(应用层)中第一个真正意义上的超级成功案例。他们非常扎实地验证了这些价值主张:深耕垂直领域的价值、你作为 model router(模型路由器)所扮演的角色、何时应该依赖 frontier models(前沿模型)而不是自己训练,以及 applied AI 的 GTM(go-to-market,市场进入)和 distribution(分发)在确保你真正抓住市场机会方面所起的作用。他们业务的每一个方面都经过精细调校,以便在一个竞争极其激烈的领域中占据地盘,并持续加码。这确实是第一个大规模展示如何执行这套 playbook(打法手册)的模板。

♥ 736↻ 38💬 466/16 · 15:37x.com ↗
2026 年 6 月 16 日 · 3 条 →

Key post that gives a bit of insight into what the future of AI could look like. “The most interesting thing happening in AI isn't that one model is getting smarter. It's that intelligence is becoming increasingly customizable. The companies that win won't necessarily be the ones with the biggest models. They'll be the ones that turn intelligence into something uniquely their own.” The ability to combine your unique data, workflows, and a layer that can route intelligence to whatever model best performs the task is clearly the future.

这篇关键帖子让人稍微看清了 AI 的未来可能会是什么样子。“AI 里正在发生的最有趣的事情,并不是某一个 model 变得更聪明了,而是 intelligence(智能)正变得越来越可定制。最终胜出的公司,未必是那些拥有最大 model 的公司,而会是那些把 intelligence 变成自身独特能力的公司。” 能够把你独有的数据、workflows(工作流),以及一层可以把 intelligence 路由到最适合完成任务的 model 的系统结合起来,这显然就是未来。

♥ 178↻ 23💬 206/16 · 04:13x.com ↗

It’s very easy to say “we need an FDA for AI” or some equivalent government agency. Well this is what that would look like. The capabilities of AI models have near infinite permutations. It’s going to be very hard have a purely objective set of metrics that can be universally applied before every model release, without extended back and forth, research, and debate between model labs, academics, and the government. Now, imagine this same process with every country that you’re doing business with, globally, for every single model release. Add in a backlog of dozens of AI model releases, and you can quickly see how in the limit this will dramatically slow down all AI progress. This is why we need to primarily focus on regulating the applied uses of AI, where the risk actually shows up.

说一句“我们需要一个 AI 版的 FDA”,或者某个类似的政府机构,是非常容易的。好,这就是那种东西实际会是什么样子。AI model 的能力几乎有无限多种排列组合。想要在每次 model 发布之前,都制定出一套纯粹客观、还能被普遍适用的指标,而不经历 model labs、学术界和政府之间反复拉扯、研究和辩论,这将会非常困难。现在,再想象一下:对于你在全球开展业务所涉及的每一个国家,在每一次 model 发布时,都要走一遍同样的流程。再加上积压的几十个 AI model 发布,你很快就能看出,如果走到极限,这会显著拖慢整个 AI 的进展。这也是为什么我们需要主要去监管 AI 的 applied uses(实际应用场景),因为风险真正出现的地方是在那里。

♥ 151↻ 18💬 376/15 · 16:10x.com ↗

Open source going to win big

Open source 会大获全胜。

♥ 413↻ 28💬 486/15 · 14:22x.com ↗
2026 年 6 月 15 日 · 2 条 →

Great post. The companies that are able to get their unique IP, institutional knowledge, and data into a format and architecture that lets them capture all of the gains and progress in AI are going to be in the best position in the future. “the real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound. You can offload a task, or even a job, but you can never offload your learning. The future of the firm is the ability to compound that learning across people and AI. This requires a new architectural approach where every business is able to build agentic systems that improve over time, while still retaining control over their IP. A company should be able to switch out a “generalist” model without losing the “company veteran” expertise built into their learning system.” We’re all collectively figuring out the right architecture for the future of AI. But it’s clear that so much of the power and value will accrue to wherever can best leverage any AI system against their information. This is also why the applied AI layer will also gain so much value over the coming years.

很棒的文章。那些能够把自己独特的 IP、institutional knowledge(机构知识)和数据整理成一种格式与架构,从而抓住 AI 带来的全部收益与进展的公司,未来会处于最有利的位置。“真正的机会不在于挑选最好的 model,而在于在 model 之上构建一个 learning loop(学习闭环),让 human capital(人力资本)和 token capital(token 资本)实现复利增长。你可以外包一个 task,甚至一份工作,但你永远无法外包自己的学习。企业未来的核心,是能够让这种学习在人与 AI 之间持续复利。这需要一种新的架构方法,让每一家企业都能构建会随时间不断改进的 agentic systems(agent 系统),同时仍然保有对自身 IP 的控制。公司应该能够替换掉一个‘generalist’ model,而不会失去其学习系统中沉淀下来的‘company veteran’级专业经验。” 我们都还在共同摸索适用于 AI 未来的正确架构。但很明显,巨大的能力与价值将会积累到那些最善于让任何 AI system 结合自身信息发挥作用的地方。这也是为什么在未来几年里,applied AI 这一层也会获得如此巨大的价值。

♥ 712↻ 74💬 586/14 · 19:13x.com ↗

The big winner in all of this is going to be open weights models. This is a huge win for the field, as a risk that was entirely theoretical and untested 2 days ago (that a model could be pulled back), now has a new precedent that’s been set. The game theory the US should highly consider, and the risk with regulating AI at the model layer vs. applied layer, is that other countries now have even more incentive to develop sovereign AI. If at any moment a model can be become unavailable to your country’s users or businesses, this poses very real risk on relying on technology from a particular country. As a result, it forces major countries to charter their own path on AI development, which reduces America’s leadership role in this tech stack over time. The most likely solution that other countries will rely on is open weights models, which currently is generally not coming from the US. America should be considering all of these downstream implications as it decides how and where in the stack to be regulating AI. At the same time, we should be doing a ton more OSS innovation.

这整件事里的最大赢家将会是 open weights models。这对整个领域来说是一次巨大的利好,因为一个在 2 天前还完全停留在理论层面、从未被验证过的风险(即一个 model 可能会被撤回),现在已经有了新的 precedent(先例)。美国应当高度重视其中的 game theory(博弈逻辑),以及在 model layer(模型层)而不是 applied layer(应用层)监管 AI 所带来的风险:其他国家现在会有更强的动力去发展 sovereign AI(主权 AI)。如果一个 model 在任何时刻都可能对你国家的用户或企业变得不可用,那么依赖来自某一个特定国家的技术就会构成非常现实的风险。因此,这会迫使主要国家在 AI 发展上走自己的道路,而这会随着时间推移削弱 America 在这套 tech stack(技术栈)中的领导地位。其他国家最有可能依赖的解决方案是 open weights models,而目前这类模型通常并不是来自美国。America 在决定应当如何以及在这套 stack 的哪些位置监管 AI 时,应该考虑所有这些下游影响。同时,我们也应该大幅增加 OSS 创新。

♥ 564↻ 57💬 766/14 · 14:35x.com ↗
2026 年 6 月 14 日 · 3 条 →

The layer that can route to the best AI model for the particular job is going to increase in value substantially. There are at least 3 big reasons: * Cost optimization: there are plenty of use cases where you need frontier intelligence for some tasks and something far cheaper for others. Even in the same task you may use frontier intelligence for planning and review of the work, but an OSS or cheaper model for the bulk of the workload. This is going to be standard across large buckets of work going forward. * Capability maximization: despite the bitter lesson and models generally getting better in the same direction, there are still lots of differences between models. Some are better at tool use, others better at coding, and others again better at certain domains of knowledge work. The ability to route between these at different times is a huge advantage. * Risk mitigation: while the Fable situation is somewhat of a black swan, it’s possible we’re heading toward a regulatory environment where governments may restrict models at different times based on their approval mechanisms or new things they discover. This means you’re going to want flexibility in being able to deploy workloads across different providers as a form of risk mitigation. Ultimately, it’s going to increasingly be a a strategic advantage for the applied AI layer that they can effectively route between models. Will be very interesting to see how this evolves.

那个能够为特定工作路由到最佳 AI model(模型)的层,其价值将会大幅提升。至少有 3 个重要原因:* 成本优化:有很多 use case(用例)里,你会需要在某些任务上使用 frontier intelligence(前沿智能),而在另一些任务上使用便宜得多的模型。即便是在同一个任务里,你也可能会用 frontier intelligence 来做规划和工作审查,但把大部分工作负载交给 OSS 或更便宜的模型。今后这会在大类工作中成为标准做法。* 能力最大化:尽管 bitter lesson(苦涩教训)依然成立,而且 models(模型)总体上是在朝同一方向变强,但模型之间仍然存在大量差异。有些更擅长 tool use(工具使用),有些更擅长 coding(编程),还有些则更擅长某些知识工作领域。在不同时间点在这些模型之间进行路由的能力,是一个巨大的优势。* 风险缓释:虽然 Fable 这次的情况某种程度上算是 black swan(黑天鹅)事件,但我们也有可能正在走向一种监管环境:政府可能会基于自己的审批机制,或基于他们新发现的情况,在不同时间限制不同模型。这意味着,你会希望自己具备灵活性,能够把工作负载部署到不同 provider(提供方)上,把这作为一种风险缓释手段。归根结底,对于 applied AI(应用层 AI)来说,能否有效地在不同模型之间进行路由,将越来越成为一种战略优势。这个方向接下来会如何演化,会非常值得关注。

♥ 209↻ 16💬 276/14 · 02:47x.com ↗

Everyone thinks this is some kind of 4D chess or conspiracy. But it’s quite standard to try and jailbreak AI models, and by definition they would share that research with the government given that’s whole point. I don’t think Amazon assumed this would be the next move.

每个人都觉得这是什么 4D chess(四维棋)或者阴谋论。但尝试 jailbreak AI models(越狱 AI 模型)其实是相当标准的做法,而且按定义来说,他们既然做这项研究,本来就会把研究结果分享给政府,因为那本来就是整个事情的目的。我不认为 Amazon 预料到这会是下一步动作。

♥ 195↻ 8💬 306/14 · 01:08x.com ↗

This whole Fable export control situation is actually net positive to regulation discourse. It’s an early peek into what AI regulation would end up looking like at scale when enacted at the model layer instead of the specific application of the AI. The government would have sole discretion over when a model can be released to the to public, based on a bunch of factors that they inherently control. In this case, based on the available reporting, the risk is that the model can be jailbroken to deliver increased cyber exploit capabilities. The issue is that actually you want models to be able to have those capabilities on the defense side of cyber as well, and for all intents and purposes, by Anthropic’s own response, you can execute these capabilities today in other models. So thus the whole challenge will be that you’re debating with the government, over months and months, with every model release, what these models are actually capable of and what their risks are. Inherently, there’s not only a lot of subjectivity in determining those risks, but there’s also many other factors that go into the risks being practical in the first place. The net result is that we would end up with backlog of AI releases, progress in the market inherently would dramatically slow down, and AI would start to look more like any other sclerotic industry. If this paradigm had existed 3 years ago at the start of the current AI wave, we’d likely currently be stuck on GPT-4 level intelligence at this point. This is why, wherever possible, we should be regulating the applied use of AI. We should continue to study and enforce the dangerous use of AI in cyber attacks, financial services risks, fraud, biowarfare, and other spaces. AI safety is incredibly important, but slowing down progress this early in the development of AI I suspect is net harmful.

整个 Fable export control(出口管制)事件,实际上对监管话语来说是净正面的。它让我们提前看到了一眼:如果 AI regulation(AI 监管)是在 model layer(模型层)而不是针对 AI 的具体应用来实施,那么大规模落地后会是什么样子。政府将对一个模型何时可以向公众发布拥有完全裁量权,而依据则是一系列本质上由他们掌控的因素。在这个案例里,根据现有报道,风险在于这个模型可能被 jailbreak(越狱),从而提供更强的 cyber exploit(网络利用/攻击)能力。问题在于,你其实也希望模型在 cyber(网络安全)的防御侧具备这些能力,而且就实际效果而言,按照 Anthropic 自己的回应,今天你已经可以在其他模型上执行这些能力了。所以,真正的挑战将会是:你要在每一次模型发布时,花上数月又数月与政府争论这些模型到底具备什么能力、其风险又是什么。这里面不仅在风险认定上天然存在大量主观性,而且风险之所以会在现实中成立,本身还取决于许多其他因素。最终结果就是,AI 发布会出现积压,市场进展会不可避免地显著放缓,而 AI 行业会开始看起来像其他任何僵化的行业一样。如果这种范式在 3 年前、也就是这一轮 AI 浪潮开始时就已经存在,那么我们现在很可能还卡在 GPT-4 水平的智能上。这就是为什么,只要有可能,我们就应该监管 AI 的应用使用层。我们应该继续研究并执法打击 AI 在 cyber attacks(网络攻击)、financial services risks(金融服务风险)、fraud(欺诈)、biowarfare(生物战)以及其他领域中的危险使用。AI safety(AI 安全)极其重要,但在 AI 发展这么早的阶段就放慢进展,我怀疑总体上是有害的。

♥ 318↻ 34💬 636/13 · 17:02x.com ↗
2026 年 6 月 13 日 · 3 条 →

This is a big turning point for AI regulation. The government is starting to deem some models too powerful for certain uses, which creates a precedent for a range of possible controls in the future. I’m in the camp that this is unnecessary and we should be primarily regulating the use of AI, as opposed to the underlying models. But, equally, there are plenty of people that actually prefer this outcome. Either way, it’s unlikely that we’re going back to a world where the government doesn’t have far more meaningful involvement in the rate of AI progress.

这对 AI 监管来说是一个重大的转折点。政府开始认定某些 model(模型)对于特定用途来说过于强大,这为未来一系列可能的管控措施开创了先例。我属于这样一种观点:这没有必要,我们主要应该监管的是 AI 的使用,而不是底层 model(模型)本身。但与此同时,也有很多人实际上更倾向于这样的结果。不管怎样,我们不太可能再回到一个政府对 AI 进展速度没有更深度、也没有更实质性介入的世界。

♥ 451↻ 31💬 776/13 · 02:05x.com ↗

This is pretty freaking cool

这真他妈挺酷的。

♥ 384↻ 12💬 96/13 · 00:39x.com ↗

Incredible. Congrats to @elonmusk and the entire SpaceX team on the 25 years of blood, sweat and tears to build a world-defining company. Amazing to have examples like this that push the future forward. The downstream implications of this are enormous.

难以置信。祝贺 @elonmusk 和整个 SpaceX 团队,在 25 年的 blood, sweat and tears(心血与艰辛付出)中打造出一家定义世界的公司。能有这样推动未来向前发展的榜样,实在太了不起了。这件事带来的下游影响将是巨大的。

♥ 276↻ 25💬 136/12 · 16:20x.com ↗
2026 年 6 月 12 日 · 1 条 →

At Box, we just surveyed 1,640 IT leaders across the US, Japan, and Europe about agentic AI adoption. Many standout findings, but a big one was that the companies that adopted AI the most are planning to grow headcount the most. Obviously lots of ways you can read that data and variables mixed in, but it’s actually quite intuitive that the companies that become most productive want to (and are able to) reinvest back into the business to keep getting the gains going. The narrative of jobs being wiped out assumes that companies will take a fixed approach to what they want to be able for work on. What’s happening in practice is it’s causing companies to want to light up more engineering projects, sell to more customers, automate more processes to give time back, and more. That all leads to more work to be done by people.

在 Box,我们刚刚就 agentic AI 的采用情况,调查了来自美国、日本和欧洲的 1,640 位 IT 领导者。结果中有很多突出的发现,但其中一个重要结论是:采用 AI 最多的公司,也计划扩大最多的员工规模。显然,这组数据可以有很多种解读方式,其中也混杂着各种变量;但实际上,这一点很符合直觉:那些生产力提升最多的公司,会希望(也有能力)把收益重新投入业务中,以持续扩大这些增益。“工作岗位会被消灭”的叙事,假设了公司对于自己希望完成的工作范围会采取一种固定不变的思路。而现实中正在发生的是,这会促使公司启动更多 engineering 项目,向更多客户销售,自动化更多流程以腾出时间,等等。这一切最终都会带来更多需要由人来完成的工作。

♥ 124↻ 24💬 236/12 · 04:16x.com ↗
2026 年 6 月 11 日 · 1 条 →

Lots of evidence of huge jumps in capability for Fable across coding (and related) tasks. It’s also a major jump in accuracy and success in complex knowledge work tasks. In our Box AI Complex Work Eval, we tested the model against Opus 4.8 and saw huge boosts across almost every industry. For our eval we give the Box AI Agent, using Fable, a set of hard real world knowledge work problems that deal with enterprise documents. Then score how the agent performs the tasks. The main differentiators for Fable vs Opus 4.8 is that it doesn't take shortcuts on complex reasoning, it gets multi-step calculations right, and it's significantly more consistent across runs. We saw the biggest leaps in Media & Entertainment (78% vs 61%), Technology (81% vs 73%), Financial Services (89% vs 83%), and Healthcare (66% vs 60%). Here are some specific examples: * Legal M&A due diligence: On a task reviewing NDA terms against a semiconductor company's contracting policy, Fable correctly identified that a joint-ownership clause violates exclusivity requirements while a liability cap is permitted under a Super Cap exception. Fable scored 100% vs Opus's 78%. * Healthcare: On a clinical radiology error audit across 12 reports, Fable precisely categorized each error by severity grade and correctly concluded no Grade 3 errors existed. Opus prematurely escalated a case to "major error requiring immediate departmental review" when the evidence didn't support it — Fable 63% vs Opus 41%. * Media & Entertainment: On a genre profitability projection task, Fable correctly recognized that a 20% Argentine tax deduction was already embedded in the source spreadsheet figures and didn't double-apply it. Opus applied it again on top — a compounding error across 4 genre calculations that took its score negative on the task vs Fable's 74%. * Retail analytics: On a task analyzing high-growth product articles against an investment benchmark, Fable correctly computed each article's growth rate individually and identified that only 2 of 5 exceeded the threshold. Opus confused "high growth relative to average" with "above the benchmark" — scoring 61% vs Fable's 94%. * Financial Services: On a 5-year debt facility projection, Fable correctly applied interest to opening balances and used the right capex figure. Opus applied interest to the total facility amount and computed tax from the wrong base — two compounding errors. Fable scored 83% vs Opus's 62%. * Technology: On a SaaS feature valuation requiring computation of a Feature Value Index across multiple regions, Fable applied the formula correctly and got exact values for the markets. Opus got the arithmetic wrong on multiple criteria — Fable scored 100% vs Opus's 74%. Overall, huge step change in complex analysis, work that requires analytical reasoning, and deep domain understanding. Fable will be available shortly in the Box AI Studio for customers to build agents with.

有大量证据表明,Fable 在 coding(编码)及相关任务上的能力实现了巨大跃升。在复杂知识工作任务中的准确性和成功率也有显著提升。在我们的 Box AI Complex Work Eval 中,我们将该模型与 Opus 4.8 对比测试,看到它在几乎所有行业中都有大幅提升。在这项 eval 中,我们让使用 Fable 的 Box AI Agent 处理一组与企业文档相关、具有真实世界难度的知识工作问题,然后根据 agent 完成任务的表现进行评分。与 Opus 4.8 相比,Fable 的主要差异在于:它在复杂推理上不会走捷径,能够正确完成多步骤计算,并且在多次运行中的一致性显著更高。我们看到提升最大的领域包括 Media & Entertainment(78% 对 61%)、Technology(81% 对 73%)、Financial Services(89% 对 83%)和 Healthcare(66% 对 60%)。以下是一些具体例子:* Legal M&A due diligence:在一项根据某 semiconductor company 的 contracting policy 审查 NDA 条款的任务中,Fable 正确识别出 joint-ownership clause 违反了 exclusivity requirements,而 liability cap 在 Super Cap exception 下是被允许的。Fable 得分 100%,Opus 为 78%。* Healthcare:在一项针对 12 份报告的临床 radiology 错误审计任务中,Fable 准确按严重等级对每个错误进行了分类,并正确得出不存在 Grade 3 错误的结论。Opus 在证据并不支持的情况下,过早将一个案例升级为“major error requiring immediate departmental review”——Fable 63%,Opus 41%。* Media & Entertainment:在一项类型片盈利能力预测任务中,Fable 正确认识到 20% 的 Argentine tax deduction 已经包含在源 spreadsheet 数据中,因此没有重复应用。Opus 又额外应用了一次——这个在 4 类题材计算中不断累积的错误,使其在该任务上的得分变成负分,而 Fable 为 74%。* Retail analytics:在一项将高增长产品相关文章与投资基准进行分析比较的任务中,Fable 正确分别计算了每篇文章的增长率,并识别出 5 篇中只有 2 篇超过阈值。Opus 将“相对平均值的高增长”和“高于 benchmark(基准)”混淆了——得分 61%,Fable 为 94%。* Financial Services:在一项 5 年期债务融资 projection 任务中,Fable 正确地将利息应用于期初余额,并使用了正确的 capex figure。Opus 则将利息应用于整个融资额度,并从错误的基数计算税额——这是两个会叠加放大的错误。Fable 得分 83%,Opus 为 62%。* Technology:在一项需要跨多个地区计算 Feature Value Index 的 SaaS feature valuation 任务中,Fable 正确应用了公式,并得出了各市场的精确数值。Opus 在多个指标上都算错了——Fable 得分 100%,Opus 为 74%。总体而言,这是在复杂分析、需要分析性推理的工作以及深度领域理解方面的一次巨大跃迁。Fable 很快将在 Box AI Studio 中上线,供客户用来构建 agents。

♥ 177↻ 19💬 266/11 · 04:08x.com ↗
2026 年 6 月 10 日 · 3 条 →

This is a critical post to read if you’re building an applied AI company right now. “An application earns its place in the untrainable corner by doing unglamorous work: arranging a company's private reality so a model can act on it, handing the model the tools to act, working with the customer to change the reality of its workforce. A company that brings the translation is tough to copy – and the translation never ends. Integration and maintenance run as long as the relationship does, won by teams that put domain-specialized engineers and tools next to the customer.” There’s still an insanely large gulf between model capabilities and what it takes to apply them to specific corporate workflows. Some of that is technology that needs to be built, a lot is access to (and formatting of) the right data to work with, and a ton more is on the change management and specific implementation work (FDEs, etc.) it takes to make AI work in any specific corporate setting. 2 things can be very true at once: frontier models and labs will continue to grow an incredible amount, and there will be a vast ecosystem of software and services companies that emerge to bring the power of these models to real enterprises. This makes room for new infrastructure provides, applied AI companies in every vertical, new versions of system integrators, and more players. Incredibly exciting time on all fronts.

如果你现在正在打造一家 applied AI 公司,这是一篇必须读的重要文章。“一个应用之所以能在不可训练的角落里占据一席之地,靠的是做那些并不光鲜的工作:整理一家公司的私有现实,让模型能够基于它采取行动;把模型执行行动所需的工具交到它手里;与客户合作,改变其劳动力体系的现实。能够完成这种翻译的公司很难被复制——而且这种翻译永远不会结束。只要合作关系还在,集成与维护就会持续进行,而胜出的是那些把领域专精的工程师和工具放在客户身边的团队。” 模型能力与将其应用到特定企业工作流所需的一切之间,仍然存在大得惊人的鸿沟。其中一部分是尚需构建的技术,很大一部分是获取正确数据并将其格式化以供使用,更多则在于让 AI 在任何特定企业环境中真正发挥作用所需的变革管理与具体实施工作(FDEs 等)。有两件事可以同时成立:frontier models 和 labs 会继续实现惊人的增长;同时,也会涌现出一个庞大的软件与服务公司生态,把这些模型的力量带给真实企业。这为新的基础设施提供商、各个垂直领域的 applied AI 公司、新版本的 system integrators 以及更多参与者留出了空间。这对所有方向而言,都是一个令人无比兴奋的时刻。

♥ 278↻ 23💬 286/10 · 04:45x.com ↗

If you thought AI progress was slowing down, well here's the immediate answer to that. Huge jump in capability across the board. This is going to deliver major improvement in agents across almost all knowledge work categories.

如果你以为 AI 的进展正在放缓,那么这就是对此最直接的回答。整体能力出现了巨大跃升。这将给几乎所有知识工作类别中的 agents 带来重大改进。

♥ 638↻ 68💬 726/9 · 17:18x.com ↗

Great post. So much about model performance is a function of how much compute you’re doing at inference time. This means compute-normalized benchmarks is the only logical path forward. And yet, the challenge is it’s a lot harder than it seems given it’s subjective how much compute to apply, which means models behave differently at different thresholds (simplistically, model X’s min thinking may beat model Y’s min thinking, but be reversed at high), and there are a near infinite set of thresholds you could choose to set. But either way, moving more in this direction would be great for better understanding AI progress.

很棒的一篇文章。模型性能在很大程度上取决于你在 inference time 投入了多少 compute。这意味着,按 compute 归一化的 benchmarks 才是唯一合乎逻辑的前进路径。不过,难点在于,这件事比看上去要难得多,因为该投入多少 compute 本身就带有主观性,这意味着模型在不同阈值下会表现不同(简单说,model X 的低思考量版本可能胜过 model Y 的低思考量版本,但在高阈值下结果可能反过来),而你几乎可以设定出无限多个不同的阈值。无论如何,朝这个方向多推进一些,都会非常有助于我们更好地理解 AI 的进展。

♥ 79↻ 9💬 176/9 · 16:08x.com ↗
2026 年 6 月 9 日 · 1 条 →

There’s no amount of intelligence that can get packed into AI models that replaces the need for context. For any sufficiently general purpose AI, you will always have to guide it in the direction you want as it has an infinite range of directions it can go in. As long as the same model is used by a lawyer, an engineer, a financial analyst, or a healthcare professional, and as long as you’re trying to do anything uniquely differentiated or specific, then instructions, domain context, and proprietary data will always need to get into the context window for the model to be useful. This is partly why AI automation doesn’t come for free, and why there’s still a wide spectrum of who’s getting the largest gains from AI and who’s not. You have to put in real work, and you get real value on the other end. This is one of the advantages that applied AI will also have in the market. Any layer of abstraction above just the raw intelligence that can meaningfully get you off to the races faster will likely continue to be valuable.

无论往 AI model 里塞进多少智能,都无法取代对 context(上下文)的需求。对于任何足够通用的 AI,你始终都必须把它引导到你想要的方向,因为它可能前往的方向实际上是无限的。只要同一个 model 被 lawyer、engineer、financial analyst 或 healthcare professional 共同使用,并且你想做的事情具有任何独特差异化或具体性,那么 instruction(指令)、domain context(领域上下文)和 proprietary data(专有数据)就始终需要进入它的 context window(上下文窗口),这个 model 才会真正有用。这也是为什么 AI automation(自动化)并不是免费获得的部分原因之一,也说明了为什么在“谁从 AI 中获得最大收益、谁没有”这件事上,至今仍存在很大的光谱差异。你必须投入真正的工作,而另一端你也会获得真正的价值。这也是 applied AI(应用型 AI)在市场中的优势之一。任何建立在纯粹原始智能之上的 abstraction layer(抽象层),只要能切实帮助你更快起步并进入状态,就很可能会持续具备价值。

♥ 216↻ 25💬 516/9 · 03:24x.com ↗
2026 年 6 月 8 日 · 3 条 →

The numbers may be a bit extreme here, but unquestionably use-cases have to stratify in the next year or two between model families. We’ll see a split between frontier intelligence for high end tasks and work, and much cheaper models for high volume workloads that can sufficiently be peeled off to cheaper models. Frontier will still be far bigger than today because the use-cases will demand it, but the low-end will get quite a bit larger as well. The big update here is that the layer that can efficiently route the workload to the right model will then become increasingly valuable since that becomes one of the new hard problems in AI agents. Agent orchestration that can cost optimize while still performing the task successfully will be in a strong position.

这里的数字也许有点极端,但毫无疑问,在未来一两年里,各类 use-case(使用场景)必须在不同的 model family(模型家族)之间分层。我们会看到一种分化:一边是用于高端任务与工作的 frontier intelligence(前沿智能),另一边则是便宜得多的模型,用于那些量很大、且能被充分剥离给更便宜模型处理的 workload(工作负载)。Frontier 的规模仍会比今天大得多,因为 use-case 会需要它;但低端市场也会扩大不少。这里最大的变化是:能够高效地把 workload 路由到正确模型的那一层,将变得越来越有价值,因为这会成为 AI agent(智能体)中的新难题之一。能够在成功完成任务的同时优化成本的 agent orchestration(agent 编排),将处于非常有利的位置。

♥ 141↻ 6💬 376/8 · 04:09x.com ↗

This is what the market got wrong about AI eating enterprise software. Building good software in the past was very hard. Yes, AI has made that a bit easier, though it’s still hard to build something that’s got good taste, differentiated, high quality, secure, and so on. But nevertheless, that’s only one component of building a platform that enterprises rely on. The plurality of costs in most enterprise software companies is actually on GTM, because at scale most enterprise software categories are tough to break into and need a heavy amount of consultative selling and support for implementation and integration of solutions. AI hasn’t reduced the need for that, and in many cases requires it even more now, as landscapes get even more busy and complicated for buyers to navigate through. If you make one thing cheaper and more abundant (development of software) then the new problem of discoverability and market differentiation (GTM) becomes the hardest part.

这正是市场在“AI 会吞掉 enterprise software(企业软件)”这件事上判断错的地方。过去,构建优秀的软件非常困难。没错,AI 确实让这件事稍微容易了一些,尽管要做出有品味、差异化、高质量、安全等等的东西,仍然很难。但无论如何,那只是构建一个企业所依赖的平台的其中一个组成部分。大多数 enterprise software 公司的成本大头,其实是在 GTM(go-to-market,市场进入/销售转化)上,因为一旦到了规模化阶段,多数 enterprise software 品类都很难打入,必须依赖大量顾问式销售,以及为解决方案的实施与集成提供支持。AI 并没有减少这方面的需求,而且在很多情况下,现在反而更需要它,因为买家需要穿越的市场格局变得更加拥挤和复杂了。如果你让一件事变得更便宜、更充足(software 的开发),那么新的难题——可发现性和市场差异化(GTM)——就会成为最难的部分。

♥ 378↻ 43💬 676/7 · 22:53x.com ↗

Box now has a markdown editor on the web. Full CLI support. Commenting. Full version history. Box Drive also lets you connect to any desktop client as a mounted drive, so you instantly work with all your files in Claude Cowork, Codex, Obsidian, Cursor, or any other app.

Box 现在在 web 上有了一个 markdown editor(Markdown 编辑器)。完整的 CLI 支持。评论功能。完整的版本历史。Box Drive 还允许你把它作为挂载盘连接到任何 desktop client,这样你就能立即在 Claude Cowork、Codex、Obsidian、Cursor 或任何其他 app 中处理你的所有文件。

♥ 251↻ 21💬 276/7 · 15:49x.com ↗
2026 年 6 月 7 日 · 1 条 →

Token costs are becoming one of the hottest topics for any enterprise I talk with right now. It’s very bullish for AI in general because it means these systems are being used at a scale that wasn’t contemplated before. It also gives way to another form of differentiation that will emerge for the applied AI layer, which is model routing. As tokens take on a significant amount of the cost of any given workflow, then companies will inevitably want to ensure that their dollars go into the most efficient use of tokens for the particular job at hand. Frontier intelligence will always be relevant at the high end of tasks, like coding, legal and financial analysis, healthcare, and more. And dollars spent here will only go up over time. But, equally, you can peel off individual tasks to lower cost models (whether they’re from open weights vendors or the major labs) and deliver a more efficient end outcome. To do this effectively, the applied AI layer needs to understand the workflows in their domain better than anyone else, and be able to mix and match models to different jobs. If you’re doing document extraction, you need to know which models perform better or worse for any given document type. If you’re legal analysis, you want to know which models perform various types of tasks best. And so on. This will become one of the bigger differentiation points over time. The companies with the best evals, the best ability to route the workloads, and those that have business models directly aligned to customers financial goals, will be in a great position.

Token 成本正迅速成为我现在与几乎所有企业交流时最热门的话题之一。从整体上看,这对 AI 非常 bullish(看涨),因为这意味着这些系统正以此前未曾设想过的规模被使用。这也为 applied AI(应用层 AI)带来了另一种将会出现的差异化形式,也就是 model routing(模型路由)。随着 token 在任何给定 workflow(工作流)的成本中占据相当大的比重,企业必然会希望确保自己的每一美元,都被用于针对当前具体任务最有效率的 token 使用方式。对于编码、法律与金融分析、医疗健康等高端任务,frontier intelligence(前沿智能)始终都会具有相关性。而花在这些场景上的资金只会随着时间推移而不断增加。但与此同时,你也可以把其中的单个任务剥离出来,交给成本更低的模型(无论它们来自 open weights vendors,还是 major labs),从而交付一个整体上效率更高的最终结果。要想高效做到这一点,applied AI 层必须比任何其他人都更理解其所在领域的 workflow,并且能够针对不同任务灵活组合不同模型。如果你在做 document extraction(文档提取),你就需要知道,对于任何给定的文档类型,哪些模型表现更好,哪些更差。如果你在做法律分析,你就会想知道,哪些模型最擅长完成不同类型的任务。其他领域也是如此。随着时间推移,这将成为更重要的差异化因素之一。那些拥有最佳 evals(评测)、最强 workload(工作负载)路由能力,以及其商业模式与客户财务目标直接对齐的公司,将处于非常有利的位置。

♥ 404↻ 63💬 736/6 · 18:02x.com ↗
2026 年 6 月 6 日 · 1 条 →

Coding is basically the pinnacle of what you could reasonably automate with AI, and yet we still need human engineers to oversee agents for them to be effective. The AI models are trained on an incredible amount of sophisticated code. The users are highly technical and can use the latest tools quickly. The work is “verifiable” because you can test an app. The outcomes are often removed from the quality of the code (you can have sloppy code but the app can still work). And the context for the agent is often already digitized and sitting in the codebase. That’s an incredible amount of benefits that AI coding agents get to work with. Some of those apply to knowledge work, but most don’t in areas where the work needs to be fully reviewed to be useful, or where data isn’t as abundantly digitized. This makes the job for agents in knowledge work more complicated. So if with all of that, engineers still remain in very high demand, the risks are going to be less than what’s perceived for other areas of knowledge work. Agents will let people do far more than they did before, but the people don’t go away.

编程基本上已经是你可以 reasonably(相当合理地)用 AI 自动化的顶峰了,但即便如此,我们仍然需要人类工程师来监督 agent,agent 才能真正有效。AI 模型是在数量惊人的复杂代码上训练出来的。用户本身也高度技术化,能够快速使用最新工具。这项工作是“verifiable”(可验证的),因为你可以测试一个 app。最终结果往往与代码质量并不完全绑定(代码可以写得很 sloppy,但 app 仍然能运行)。而且,agent 所需的上下文常常早已数字化,并且就存在于 codebase 中。这些都是 AI coding agent 能够利用的巨大优势。其中有些优势也适用于知识工作,但在大多数领域并不成立,尤其是那些必须经过完整审查才有用的工作,或者数据还没有被如此充分数字化的领域。这就让 agent 在知识工作中的任务变得更加复杂。所以,如果在具备所有这些有利条件的情况下,工程师依然处于非常高的需求之中,那么其他知识工作领域面临的风险就会比人们想象的小。agent 会让人们做到比以前多得多的事,但人并不会消失。

♥ 283↻ 31💬 416/6 · 00:28x.com ↗
2026 年 6 月 5 日 · 1 条 →

Good thought provoking post from Anthropic. I think this paragraph points to the key element of the optimistic scenario of AI: “There has been an explosion of new ideas, initiatives, tools, and simulations, as a result of Anthropic employees working with highly capable models—far more than we have the capacity to pursue. The rate at which organizations can spot and fix these bottlenecks may be a skill that improves over time, and it may become the most important skill for any organization.” AI lowers the barrier dramatically to allowing us to do more. As a result of that, we have far more ideas than we can pursue, and for the ones that we want to pursue we’re ultimately limited by our ability to go take on the surrounding work to execute those ideas. There’s almost no amount of AI progress that can happen where that goes away. AI is going to let us build much more software, launch more marketing campaigns, research more drugs, and so on. All of this work, even when augmented by agents, still ultimately requires people to manage.

Anthropic 这篇帖子很发人深省。我认为,这一段指出了 AI 乐观情景中的关键要素:“随着 Anthropic 员工与能力极强的 model(模型)协作,新的想法、倡议、工具和模拟大量涌现——多到远远超出我们能够推进的容量。组织识别并修复这些瓶颈的速度,可能会成为一种会随着时间推移而提升的能力,并且可能成为任何组织最重要的能力。”AI 大幅降低了让我们做更多事情的门槛。结果就是,我们拥有的想法远多于能够真正推进的数量;而对于那些我们确实想推进的想法,我们最终仍然受限于自己是否有能力去承担并完成其周边所需的工作,从而把这些想法执行出来。几乎无论 AI 取得多大进展,这一点都不会消失。AI 将使我们能够构建更多软件、发起更多营销活动、研究更多药物,等等。所有这些工作,即使有 agent(智能体)增强,最终仍然需要由人来管理。

♥ 195↻ 18💬 336/5 · 02:48x.com ↗
2026 年 6 月 4 日 · 2 条 →

The jobs data coming out continues to suggest the opposite of what a lot of people had thought would happen. Just take engineering, as the prime example of the area with greatest AI impact (and perceived risk). Most companies now have far more software projects than ever before because of AI, and effectively only engineers are going to be the ones doing that work. You can get by for a while by being non-technical building software, but eventually someone has to understand what the thing is that got built, has to maintain it, has to fix security issues that come up, upgrade the systems beneath it, and so on. That’s all jobs. Now apply that to a number of other job functions. AI is going to cause companies to hire more in sales because agents can let them process more leads and do more customer research. AI will cause an explosion of new marketing roles because of how much more efficient it is to launch campaigns and target. The list goes on. AI is going to have the opposite effect that lots of people thought on jobs.

持续公布的就业数据表明,情况恰恰与很多人原先以为会发生的相反。就拿 engineering 来说,它是 AI 影响最大(也是大家感知中风险最高)的领域中的典型例子。现在,大多数公司因为 AI 而拥有比以往多得多的软件项目,而实际上,真正会去完成这些工作的只会是 engineers。你也许可以在一段时间里以非技术身份去构建软件,但最终总得有人理解被构建出来的东西到底是什么、负责维护它、修复出现的安全问题、升级其底层系统,等等。这些全都是工作岗位。再把这个逻辑应用到许多其他职能上。AI 会让公司在 sales 上招聘更多人,因为 agents(智能体)能让他们处理更多 leads(销售线索),并做更多客户研究。AI 还会催生大量新的 marketing 岗位,因为它让发起 campaign(营销活动)和做 target(定向投放)变得高效得多。这样的例子还可以一直列下去。AI 对就业的影响,将会与很多人原先所想的完全相反。

♥ 358↻ 49💬 496/4 · 00:49x.com ↗

Even with employer caps, the spend on AI tokens dramatically exceeds any other historical spend on software. Typically, companies maybe would spend on the order of $10-50 for a software license per month per employee, but now will pay hundreds or thousands on tokens to augment their productivity. This shows you how big the TAM for intelligence is in the enterprise. The markets for AI are going to dramatically expand the size of the traditional software markets over time.

即便存在 employer caps(雇主支出上限),企业在 AI tokens(令牌)上的花费,仍然显著超过历史上任何其他软件支出。通常来说,公司过去给每位员工支付的软件 license(许可证)月费,大概也就是 10 到 50 美元这个量级;但现在,它们会为 tokens 花费数百甚至数千美元,以提升员工生产力。这说明,在企业市场中,intelligence(智能)的 TAM(总可服务市场)有多么庞大。随着时间推移,AI 市场将显著扩大传统软件市场的规模。

♥ 219↻ 21💬 576/3 · 21:10x.com ↗
2026 年 6 月 3 日 · 1 条 →

As token budgets take on a larger part of operating expenses over time, model routing is the inevitable conclusion. This is also one of the biggest areas of differentiation for the applied AI layer over time. By understanding the different work patterns in your domain, and having strong evals for that domain, you’ll be able to cost/performance optimize effectively. We’re still likely at the point where most use-cases will need frontier performance for the foreseeable future; but soon you will be able to peel off individual use-cases and send them to lower cost models once the quality is sufficient for the task. Enterprises individually trying to figure this out themselves at scale will likely not be possible, so the products that can intelligently route these workflows to the right tier of model will be in a strong position to aggregate more demand.

随着 token 预算在运营支出中长期占据越来越大的比例,model routing(模型路由)将成为不可避免的结论。这也将是 applied AI layer(应用型 AI 层)长期最重要的差异化领域之一。通过理解你所在领域中的不同工作模式,并为该领域建立强有力的 evals(评估),你就能够有效地在成本与性能之间做优化。我们目前很可能仍处在这样一个阶段:在可预见的未来,大多数 use-case(用例)仍然需要 frontier performance(前沿性能);但很快,一旦质量足以胜任任务,你就可以把单独的 use-case 剥离出来,发送给成本更低的模型。各家 enterprise(企业)如果想各自独立地、大规模地把这件事摸索清楚,可能并不现实,因此,能够智能地将这些工作流路由到正确模型层级的产品,将处于能够聚合更多需求的有利位置。

♥ 299↻ 25💬 456/3 · 00:52x.com ↗
2026 年 6 月 1 日 · 1 条 →

This is effectively the #1 problem for AI agents in the enterprise. As we go from agentic coding (where a large amount of context is in the code base, and users are technical enough to get the rest to the agent easily) to a world of knowledge work agents, the context problem becomes much more acute. We see this every day with customers at Box. For existing digital knowledge, it’s often fragmented across legacy systems or environments that don’t play nice with agents, and have access controls that don’t map to the real work that needs to be done, which become a huge hurdle for getting agents the context they need. This has to all get moved to modern, secure cloud environments. But also, companies often haven’t captured and digitized some of the critical context that agents need to work with. Decisions, processes, and workflows often live in people’s heads and tribal knowledge that need to get turned into unstructured data for agents. This is actually one of the biggest points of leverage for applied AI companies, because they can work to specialize in getting agents exactly the information and domain expertise they need. But it’s also one of the reasons why FDEs and new system integrator plays will also work so well right now. The companies that figure this out will be able to get the most out of AI going forward.

这实际上是企业中 AI agents 面临的头号问题。随着我们从 agentic coding 走向知识工作 agents 的世界,context(上下文)问题会变得更加尖锐——在 agentic coding 里,大量 context 都在代码库中,而用户也足够懂技术,能比较容易地把其余信息提供给 agent。我们在 Box 与客户的日常接触中每天都能看到这一点。对于现有的数字化知识,它往往分散在各类 legacy systems(遗留系统)中,或者存在于那些无法很好配合 agents 的环境里;同时,这些环境的访问控制也常常无法映射到实际工作真正需要完成的方式,这就成为让 agents 获得所需 context 的巨大障碍。这些内容都必须迁移到现代化、安全的云环境中。但除此之外,很多公司其实还没有把 agents 开展工作所需的一些关键信息捕捉下来并完成数字化。决策、流程和工作流往往存在于人的脑子里,或者存在于 tribal knowledge(组织内的隐性经验)中,而这些都需要被转化为 agents 可用的 unstructured data(非结构化数据)。这其实是 applied AI companies 的一个最大杠杆点之一,因为它们可以专门去解决如何把 agents 真正需要的信息和领域专长准确地提供给它们。但这也是为什么 FDEs 以及新型 system integrator 模式在当下会如此有效的原因之一。那些把这个问题解决好的公司,未来将能够从 AI 中获得最大的收益。

♥ 479↻ 54💬 656/1 · 00:44x.com ↗
2026 年 5 月 31 日 · 1 条 →

Again, maybe counterintuitive, but in the majority of conversations I have with CIOs, CTOs, and CEOs in large enterprises, they are either growing due to AI (in new job functions like FDEs, engineering, etc.) or at a minimum reinvesting efficiency savings back into the business in new areas (sales, marketing, etc.). David Solomon, CEO of Goldman Sachs, articulated this perfectly in a NYTimes OpEd last week. The AI boom is both creating all new jobs in the build out of AI systems and the implementation across sectors, but also freeing up dollars to invest in areas that have been underfunded or have more demand now because of AI. Most businesses have been constrained by how much software they can produce at a given cost, how many sales reps they can hire, how many marketing campaigns they can run, how they can do outbound customer success motions with enough tailoring, how they can find more risk in their business and prevent it, and 100s of other things. When AI makes it possible to do more of this, investment goes back into the business. The companies that better serve their customers win over the long run, and those that just try and find savings end up doing worse.

这点再次说明,也许有些反直觉,但在我与大型企业 CIO、CTO 和 CEO 的大多数交流中,他们要么正因为 AI 而扩张(新增了像 FDE、engineering 等新的岗位职能),要么至少会把效率提升节省下来的资金重新投入业务中的新领域(如 sales、marketing 等)。Goldman Sachs 的 CEO David Solomon 上周在一篇 NYTimes OpEd 中对此做了非常到位的阐述。AI boom 既在 AI 系统建设以及跨行业落地实施过程中创造出全新的工作岗位,也释放出更多资金,用于投资那些过去资金不足、或因 AI 而出现更多需求的领域。大多数企业一直受限于:在既定成本下他们能开发多少 software(软件)、能雇用多少 sales reps、能开展多少 marketing campaigns、能以足够个性化的方式推进 outbound customer success、能在业务中发现并防范多少更多风险,以及其他数百项类似事项。当 AI 让企业有可能在这些方面做得更多时,投资就会重新回流到业务中。从长期来看,那些更好服务客户的公司会胜出,而那些只想着节省成本的公司最终往往会表现更差。

♥ 154↻ 19💬 255/31 · 03:17x.com ↗
2026 年 5 月 30 日 · 1 条 →

The app layer couldn’t get a better advertisement than a company spending $500M to build their own version of it. Obviously lots of nuance here that can’t be captured in the headline, but this should make you very bullish on software.

对 app layer(应用层)来说,没有比一家公司花 5 亿美元去构建它们自己的版本,更好的广告了。显然,这里面有很多无法在标题里体现的细微差别,但这应该会让你对 software(软件)非常看多。

♥ 348↻ 18💬 325/30 · 00:54x.com ↗
2026 年 5 月 28 日 · 1 条 →

A meaningful portion of enterprises I talk to outside of Silicon Valley generally are looking to hire while also adopting agents. There’s a huge wave of technical and engineering talent needed inside originations, building software or acting as FDEs for agents. And as AI drives efficiency in areas like the customer lifecycle, companies are leaning in even more heavily to client-facing jobs. In a world where AI did everything for you with no human oversight needed, maybe we’d be having a different conversation. But that’s not how AI works. Even for the areas that have the most automation potential, agents are automating tasks, not whole jobs. As they automate tasks, the agents need to be steered, their work reviewed, the outputs incorporated and more. All of this is requiring people to do the work. And for the areas that have less automation potential, companies are freeing up dollars from efficiency gains elsewhere to hire in those areas now. Yes, maybe AI lets you respond to front line support tickets automatically, but the companies (instead of just dropping the profit to the bottom line) will go and invest in new areas of sales and customer success that will add more differentiation for clients. Companies don’t remain static. They automating tasks where they can and free up dollars to move onto the next thing that matters.

我在 Silicon Valley 以外接触到的相当一部分企业,普遍都在一边招聘、一边采用 agents(智能体)。在 originations(业务发起/获客)环节,企业内部对技术和工程人才有着巨大的需求——无论是用来构建软件,还是作为 agents 的 FDEs。随着 AI 在 customer lifecycle(客户生命周期)等领域推动效率提升,企业也更加重仓面向客户的岗位。如果存在这样一个世界:AI 能在完全不需要人工监督的情况下替你完成一切,那么我们讨论的也许会是另一回事。但 AI 的实际运作方式并不是这样。即便是在那些自动化潜力最大的领域,agents 自动化的也是任务,而不是整份工作。随着它们自动化任务,这些 agents 需要被引导、它们的工作需要被审查、它们的输出需要被整合,等等。所有这些都仍然需要人来完成。而对于那些自动化潜力较低的领域,企业则会把在其他地方通过效率提升省下来的预算,转而用于现在在这些领域招聘。没错,也许 AI 可以让你自动回复一线支持工单,但企业并不会只是把这部分利润直接沉到底线(bottom line),而是会去投资 sales(销售)和 customer success(客户成功)等新领域,为客户创造更多差异化价值。企业不会停滞不变。它们会在能自动化任务的地方推进自动化,并释放出资金,投入到下一个真正重要的事情上。

♥ 173↻ 15💬 355/27 · 03:50x.com ↗
2026 年 5 月 21 日 · 1 条 →

Great post on FDEs. Everyone should read it if you’re interested in this job category. This is a job that is going to be around as long as AI keeps changing rapidly, which it inevitably will. People often wonder why isn’t this like just deploying other forms of technology in the past, like cloud. Because something like cloud adoption affected a fairly concentrated set of users (developers and IT), and generally didn’t require a fundamental change to the workflows of employees to get the benefits of the new service being delivered on the cloud. At best you went to one training session and you were done. With agents, the work to implement them is not only highly technical, but they directly impact the underlying workflows that people participate in. This means there’s a ton of technical work and change management that comes with it. Further, the pace of change of cloud wasn’t nearly as quick, so there was a lot more time for best practices to propagate. Now, every model change means either something new can be done that wasn’t possible before, or some piece of scaffolding is now redundant or holding you back. This is why it’s commonly easier for a vendor or partner that’s seen the implementation hundreds or thousands of times help do the work, even with internal support from the customer. So, this job isn’t going away any time soon, and will be a great path for a lot of technical talent, especially early career.

关于 FDEs 的这篇帖子很棒。如果你对这一类岗位感兴趣,人人都应该读一读。只要 AI 继续快速变化——而它几乎必然会如此——这类工作就会一直存在。人们常常会问,为什么这不像过去部署其他技术形式,比如 cloud(云)?因为像 cloud adoption(云采用)这样的事情,影响的是一组相对集中的用户(developers 和 IT),而且通常并不需要从根本上改变员工的工作流程,就能获得把新服务部署到云上的收益。很多时候,你最多参加一次培训就结束了。但对于 agents(智能体)来说,实施它们的工作不仅技术性很强,而且它们会直接影响人们参与其中的底层工作流程。这意味着随之而来的不仅有大量技术工作,还有变更管理。此外,cloud 的变化速度根本没有这么快,所以最佳实践有更多时间传播。现在,每一次 model(模型)变化都意味着:要么出现了以前做不到的新能力,要么某些 scaffolding(脚手架式支持结构)已经变得多余,甚至开始拖后腿。这就是为什么,即使客户内部也有支持,通常还是由那些已经看过数百次甚至数千次实施过程的 vendor 或 partner 来协助完成这项工作会更容易。所以,这类工作短期内不会消失,而且会成为很多技术人才——尤其是职业早期人才——的一条很好的发展路径。

♥ 556↻ 62💬 375/21 · 04:19x.com ↗
2026 年 5 月 20 日 · 2 条 →

Token costs will become a dominant topic in enterprises going forward with AI. Just got out of a dinner with many Fortune 500 enterprise CIOs and this was the most heated topic. A mix of strategies are being employed, but basically no one feels like they have the right solution. A mix of: figuring out how to prioritize workloads to different models, giving out access to better or worse agents by user type, setting different spend caps by team, having teams justify AI by their use-case, and some just having unfettered access. Everyone is trying to figure out a semi/predictable model right now in a world where the underlying tech and cost models are constantly evolving.

随着 AI 在企业中的推进,token 成本将成为一个主导性话题。刚参加完一场与许多 Fortune 500 企业 CIO 共进的晚餐,这就是现场讨论最激烈的话题。大家正在采用各种策略,但基本上没有人觉得自己已经找到了正确解法。做法包括:研究如何把不同工作负载优先分配给不同模型;按用户类型提供能力更强或更弱的 agent(智能体)访问权限;按团队设置不同的支出上限;要求团队根据其 use-case(使用场景)为 AI 的投入作出论证;还有一些公司则是完全不设限制地开放访问。眼下,所有人都在试图摸索出一种半可预测 / 可预测的模式,而底层技术和成本模型却一直在不断演变。

♥ 281↻ 20💬 405/20 · 05:08x.com ↗

Gemini 3.5 Flash is out, and it's a major jump over Gemini 3 Flash in model capability for knowledge work. We've been evaluating it on our Box AI Complex Work Eval in early release, and the model delivers a 12 percentage point jump on complex document tasks. For testing this model, we give the Box AI Agent (using Gemini 3.5) complex problems to solve that represent common but difficult knowledge worker tasks in banking, consulting, public sector, healthcare, and other industries. These tasks can be things like drafting reports, doing due diligence, and more, given a set of relevant documents. In our tests, Gemini 3.5 Flash delivered jumps across every industry, including: * Financial services: 81% vs 73% (+8pp) * Public sector: 76% vs 59%, (+17pp) * Healthcare: 73% vs 51%, (+22pp) * Life Sciences: 67% vs 47%, (+20pp) Incredible to see the continued performance gains. Gemini 3.5 Flash will be available soon in Box AI Studio and through the Box API. The Box MCP Server will soon be available in the Gemini app with more details to come.

Gemini 3.5 Flash 已发布,相比 Gemini 3 Flash,它在知识型工作方面的模型能力有了重大跃升。我们在早期版本中,使用 Box AI Complex Work Eval 对它进行了评估,该模型在复杂文档任务上的表现提升了 12 个百分点。在测试这个模型时,我们让 Box AI Agent(使用 Gemini 3.5)去解决复杂问题,这些问题代表了银行、咨询、公共部门、医疗保健及其他行业中常见但困难的知识工作者任务。给定一组相关文档后,这些任务可能包括起草报告、开展 due diligence(尽职调查)等。在我们的测试中,Gemini 3.5 Flash 在各个行业都实现了提升,包括:* Financial services:81%,对比 73%(+8pp)* Public sector:76%,对比 59%(+17pp)* Healthcare:73%,对比 51%(+22pp)* Life Sciences:67%,对比 47%(+20pp)持续的性能提升令人惊叹。Gemini 3.5 Flash 很快将在 Box AI Studio 和通过 Box API 提供。Box MCP Server 也将很快在 Gemini app 中可用,更多细节即将公布。

♥ 177↻ 17💬 275/19 · 18:29x.com ↗
2026 年 5 月 19 日 · 1 条 →

This is true of all agents, not just coding agents. Probably the biggest challenge that most companies run into in their agent strategy is getting agents the right constrained context to work with for a task. Too much information or conflicting sources, and the agent can easily draw from the data and produce the wrong result. Conflicting sources of truth for documents, data sources that haven’t been kept up to date, knowledge management systems that rely on tribal knowledge to navigate, and so on. On the other end, of course, too little information and the upside is highly limited of agents in the first place. Thus, a lot of challenges with AI strategies are actually data strategy challenges in disguise. This is why there’s such a significant premium on getting structured and unstructured data environments setup properly so agents can work with information effectively. Critical for any large enterprise adopting agents, and also a clear benefit in some cases to startups that can be designed this way from scratch.

这一点适用于所有 agent,不只是 coding agent。大多数公司在其 agent 战略中遇到的最大挑战,很可能是如何为 agent 提供适合完成任务、且受到恰当约束的上下文(context)。信息太多,或者信息来源彼此冲突,agent 就很容易基于这些数据得出错误结果。比如文档存在相互冲突的事实来源(sources of truth),数据源没有持续更新,知识管理系统依赖 tribal knowledge(仅靠组织内部心照不宣的经验)才能导航,等等。另一方面,当然,如果信息太少,那么 agent 本身能够带来的上行空间也会非常有限。因此,AI 战略中的许多挑战,其实是披着外衣的数据战略挑战。这也就是为什么,正确搭建 structured 和 unstructured data 环境会有如此显著的溢价,因为这样 agent 才能有效地处理信息。这对于任何采用 agent 的大型企业都至关重要;而对于能够从零开始按这种方式设计的 startup,在某些情况下也显然是一种优势。

♥ 132↻ 14💬 255/19 · 03:17x.com ↗
2026 年 5 月 18 日 · 2 条 →

Right now there’s a temporary mismatch between the jobs that used to be sought after in some fields and the new jobs that are becoming in demand in those fields. For instance, if you studied CS, for years the general direction of travel was often to join a tech company and build customer-facing software in some form. A significant portion of the CS pipeline from college to hire was built for this. When you realize that AI is going to make coding abundant, you realize everyone will need technical talent to implement agentic systems. This means the types of roles engineers should be thinking about radically expands. I was talking to a Fortune 500 pharma CEO a week ago that commented on how much more technical talent they need right now. The job may be different from what it was 5 years ago when thinking about tech, but the demand for the skills are still there. And this is what I’m hearing from every CIO and CEO across nearly every industry right now. We definitely need colleges to wake up to this; but we equally need companies think about how they craft pipelines into these jobs.

眼下,在某些领域里,过去热门的岗位与这些领域中正在变得抢手的新岗位之间,存在一种暂时性的错配。比如,如果你学的是 CS,很多年来,大方向往往是加入一家 tech company,去构建某种面向客户的软件。大学到招聘之间,相当大一部分 CS 人才输送链路都是围绕这个目标建立的。当你意识到 AI 会让 coding 变得极其充裕时,你也会意识到:每个组织都将需要 technical talent 来实施 agentic systems(智能体系统)。这意味着,工程师应该考虑的岗位类型会急剧扩展。一周前我和一位 Fortune 500 pharma CEO 聊天时,他提到他们现在需要多得多的 technical talent。这个岗位也许已经不同于 5 年前人们谈论 tech 工作时的样子,但对这些技能的需求依然存在。而且,这也是我现在从几乎每个行业的 CIO 和 CEO 那里反复听到的。我们当然需要 colleges 尽快意识到这一点;但同样也需要 companies 去思考,如何为这些岗位打造人才输送 pipeline。

♥ 275↻ 24💬 375/18 · 03:45x.com ↗

One of the best things students and colleges can do is not bail on learning and teaching the fundamentals of any given domain. AI will trick you into thinking you don’t need to go deep in a particular area, but that’s wrong. The expert with AI is always going to be far more capable than the novice. Those that can steer AI agents properly, figure out how to evaluate their work, fix their mistakes, and incorporate their work into a workflow will always be the most potent users of these tools. The experienced software developer that’s built and scaled complex systems using agents outrun someone just vibe coding. The designer that uses AI will build far better products and campaigns than anyone else. The banker or analyst that understands financial models will be able to pull off far more with agents. Despite some of the rhetoric in the valley that this is less implement now, that couldn’t be further from the case. Don’t give up on going deep in your craft.

学生和 colleges 最该做的事情之一,就是不要放弃学习和教授任何特定领域的 fundamentals(基本功)。AI 会诱使你以为自己不需要在某个具体方向上钻得很深,但这是错的。掌握专业能力又会使用 AI 的 expert,能力始终会远远强于 novice(新手)。那些能够正确引导 AI agents、知道如何评估它们的工作、修正它们的错误,并把它们的产出纳入工作流的人,永远都会是这些工具最有威力的使用者。一个使用 agents 构建并扩展过复杂系统的资深 software developer,会把那些只会 vibe coding 的人远远甩在后面。会使用 AI 的 designer,会做出比其他人好得多的产品和 campaign。理解 financial models 的 banker 或 analyst,将能够借助 agents 完成多得多的事情。尽管 valley 里有一些论调好像在说,现在没那么需要落实执行了,但事实恰恰完全相反。不要放弃在你的手艺上深耕。

♥ 425↻ 53💬 415/17 · 16:38x.com ↗
2026 年 5 月 16 日 · 2 条 →

I’m fully forward deployed engineering pilled specifically because AI simply is not the same as software. In software, you deliver a stable piece of technology to a customer and they adopt it and that’s that (extreme over simplification). In AI, you’re delivering something that is constantly evolving both due to the nature of the new capabilities and best practices that emerge, but also because the underlying models change so much that they can meaningfully change the workflow as a result of their upgrades. For this reason it’s far more logical that one vendor can share best practices across thousands of companies more efficiently than every single company can learn and manage these best practices themselves. Further, the learnings from those customers should go right back into the core product as a result. As we go from chat systems to anyone can relatively easily adopt to agentic systems that require more meaningful efforts to manage and update, the FDE model (or equivalent) essentially becomes a core competency for anyone deploying AI at scale.

我之所以完全相信 forward deployed engineering,特别是因为 AI 确实不同于传统 software。在 software 领域,你把一项相对稳定的技术交付给客户,客户采用之后,事情基本也就到此为止了(这是极度简化的说法)。而在 AI 领域,你交付的是一种持续演进的东西:一方面,新能力和最佳实践(best practices)会不断出现;另一方面,底层模型变化非常大,以至于它们的升级会实质性地改变工作流(workflow)。正因如此,由一个 vendor 在成千上万家公司之间共享最佳实践,显然比让每一家公司都自己去学习和管理这些最佳实践要高效得多。进一步说,从这些客户那里得到的经验,也应该因此直接回流到核心产品中。随着我们从任何人都相对容易采用的 chat systems,走向需要投入更多实质性精力去管理和更新的 agentic systems,FDE model(或同类模式)本质上会成为任何大规模部署 AI 的组织的一项核心能力。

♥ 220↻ 18💬 375/16 · 04:13x.com ↗

Headless software is the future

Headless software 是未来

♥ 251↻ 17💬 345/15 · 18:40x.com ↗
2026 年 5 月 15 日 · 2 条 →

We’re in a period where everything feels like it’s getting jumbled up across roles because AI lets you explore the adjacencies of other functions more easily. We all collectively have to figure out the new form of definition of what these jobs look like in a world of agents, and certainly many will look different from what they did before. But there are some immutable laws that will eventually re-emerge over time and become clear again. As an example, when you’re scaling, product managers should be spending an insane amount of time with customers and getting feedback on the product and thinking through what to do build next, how to design it so it’s usable, and so on. Engineers should be understanding the business objectives, and building systems that scale and are secure, even as feature velocity increases by 10X. Now both can do a bit more of the others role, and this can temporarily get conflated as doing the whole thing, but eventually the work adds up to be enough that it makes sense to specialize again. Similarly, in GTM, the product marketer can certainly generate a working design and video for a launch, but the specialist is always going to (or should) have an eye for quality that delivers a better outcome. My bet is that AI enhances specialization even further, even if a few roles collapse into each other, and the future toolchain and craft of the specialist will be much higher leverage and output far greater than anyone else as a hobbyist in that function.

我们正处在这样一个时期:由于 AI 让你更容易探索其他职能的相邻领域,一切都仿佛在不同角色之间被打乱、混杂在一起。我们所有人都必须共同摸索,在一个由 agent(智能体)构成的世界里,这些工作的新定义会是什么样子;当然,其中许多岗位看起来都会和过去不同。但随着时间推移,总有一些不可改变的规律会重新浮现,并再次变得清晰。举个例子,当你处在规模化扩张阶段时,product managers 应该花大量时间与客户交流、获取产品反馈,并思考下一步该构建什么、该如何设计才能让它易于使用,等等。Engineers 则应该理解业务目标,并在功能迭代速度提升 10X 的同时,构建可扩展且安全的系统。现在,双方都能多做一点对方的工作,这可能会暂时被混同为“把整件事都做了”,但最终工作量会累积到足以让再次专业化变得合理。同样地,在 GTM 中,product marketer 当然可以为一次发布生成一个能用的设计和视频,但 specialist(专家)总会——或者说应该会——对质量有更敏锐的把握,从而带来更好的结果。我的判断是,AI 会进一步强化专业化,即便有少数角色会相互合并;而未来 specialist 的工具链和技艺,其杠杆效应会高得多,产出也会远远超过任何只是把该职能当作爱好的业余者。

♥ 182↻ 18💬 405/15 · 04:28x.com ↗

He just spent a year building scaffolding for his agent harness. Now release a new model update that makes all of it obsolete.

他刚花了一年时间为自己的 agent harness 搭建脚手架。结果一个新的 model 更新发布,就让这一切全都过时了。

♥ 592↻ 39💬 375/15 · 04:00x.com ↗
2026 年 5 月 11 日 · 1 条 →

As advanced agents move from coding to the rest of knowledge work, it takes a real amount of work and know-how to get right. You need to ensure agents have the right context and data to work with, wire up systems to agents in a safe and secure way, ensure that the agents are producing quality output, design the end-state workflow where and how humans will be in the loop, maintain the agents when there are model and system upgrades, and more. This isn’t a side project or something you can just do on nights and weekends. You need to design and develop robust agents that will be used in mission critical workflows. It’s a highly technical job, very much akin to a forward deployed engineer for internal functions. This is why, at Box, we’re starting to hire for AI automation engineering roles. This a technical role that will partner with the business directly and help augment how they work to drive even more output, and deliver better experiences for employees and ultimately customers. This is just one example of the kind of role that AI will start to open up in the future. I expect most companies will have many flavors of this going forward.

随着先进的 agent 从 coding 扩展到其余各类知识型工作,要把这件事真正做好,需要投入相当多的工作和专业 know-how。你需要确保 agent 拥有合适的 context(上下文)和 data(数据)可供使用,以安全可靠的方式把各类系统接入 agent,确保 agent 产出高质量的 output(输出),设计最终态的 workflow(工作流)以及人类将在哪些环节、以何种方式参与 in the loop,在 model(模型)和系统升级时持续维护这些 agent,等等。这不是一个 side project,也不是你靠晚上和周末随便就能做好的事。你需要设计并开发稳健的 agent,用于 mission critical 的 workflow 中。这是一项高度技术性的工作,非常类似于面向内部职能的 forward deployed engineer。这也是为什么在 Box,我们开始招聘 AI automation engineering 相关岗位。这是一个技术岗位,将直接与业务团队合作,帮助增强他们的工作方式,以推动更高产出,并为员工以及最终的客户带来更好的体验。这只是 AI 在未来将会催生的这类岗位中的一个例子。我预计,今后大多数公司都会拥有这种岗位的多种形态。

♥ 298↻ 28💬 335/11 · 03:06x.com ↗
2026 年 5 月 10 日 · 1 条 →

For everything we’ve seen about agents so far, it’s clear that they will make it far easier for people to get into previously extremely complicated fields. That will most certainly mean far more people will build software, explore creative work, research spaces they couldn’t do before, and so on. Yet, equally, we’ve seen that people with experience in every one of those fields have a huge edge with the right judgment and historical context to leverage these tools in ways that exceed the output of the novices (if they choose to). They know when the agents are making catastrophic mistakes, can give the agents the right context to do the job better than they otherwise would have, and so on. The combination of these two facts essentially means that we will continue to get the same lift as we’ve seen in any other technological revolution. More democratization, but similarly greater output from the experts. This then makes the experts continue to be in higher demand because over time our expectation for what we can get out of any field will just go up. This is going to be true in essentially every important field. You’ll trust a lawyer using an agent for legal advice over someone who’s never had to experience how well a contract holds up. You’ll trust an engineer developing and running software over someone who’s never seen a production system. You’ll rely on the important instincts of a designer using agents over the average prompter. The quality and volume of output we expect from these functions will certainly go up meaningfully, but the person with experience will always have a leg up, which is why the jobs don’t go away.

就我们目前看到的关于 agent 的一切而言,很明显,它们将让人们更容易进入那些过去极其复杂的领域。这几乎肯定意味着,会有更多人构建软件、探索创意工作、研究他们以前无法涉足的领域,等等。但与此同时,我们也看到,在这些每一个领域里,有经验的人如果具备恰当的判断力和历史背景,就会拥有巨大的优势,能够以超过新手产出的方式来利用这些工具(如果他们愿意的话)。他们知道 agent 什么时候会犯下灾难性的错误,也能为 agent 提供正确的上下文(context),让它们比原本做得更好,等等。这两个事实结合在一起,基本上意味着,我们会继续获得和以往任何技术革命中相同的提升:更强的 democratization(普及化),但专家的产出也会同样更高。因此,专家会继续处于更高需求之中,因为随着时间推移,我们对任何领域能够产出什么的预期只会不断提高。这一点几乎会在每一个重要领域都成立。相比一个从未真正经历过合同到底有多经得起考验的人,你会更信任一位使用 agent 提供法律建议的律师。相比一个从未见过 production system(生产系统)的人,你会更信任一位开发并运行软件的工程师。相比普通的 prompter(提示词使用者),你会更依赖一位使用 agents 的设计师所具备的重要直觉。我们对这些职能的产出质量和产出规模的预期,当然都会显著提高,但有经验的人始终都会更占优势,这也是为什么这些工作不会消失。

♥ 321↻ 41💬 535/10 · 00:13x.com ↗
2026 年 5 月 9 日 · 1 条 →

A common trend emerging in larger enterprises is token budgeting as a major topic. As agents can do more and more long running tasks, and thus take vastly more compute, allocation of tokens across teams becomes a very real thing in the enterprise. Companies spend a meaningful amount of time deciding how much to spend on talent, marketing campaigns, events, laptop setups, and even the cost of lunches. Tokens will be no different. Tokens will similarly need to be excruciatingly well-managed because you’ll need to ensure you don’t blow up your budget, and you’ll need to ensure that the tokens are flowing to the highest and most useful parts of work. You don’t want to find out you burned your monthly budget on something relatively low value and then be blocked on the much higher value task later. Doing this at large company scale is extremely hard as you have layers of abstraction on data and visibility into the digital work being done by agents in any central way. This is going to mean that agentic spend will increasingly will expand beyond the confines of the IT budget, and end up in organizational budgets like other expenses. Ultimately team and org leaders will have to be given budgets for this, but even they don’t have adequate visibility and controls in most cases. We’ll need all new software just to solve this problem, and it’s probably an opportunity for startups in its own right. Going to be an all new era of enterprise resource allocation, especially while we compute constrained.

在较大型企业中,一个正在显现的普遍趋势是:token budgeting(token 预算)正成为一个重要议题。随着 agent 能执行越来越多的长时任务,因此也会消耗多得多的 compute(算力),企业内部按团队分配 token 会变成一件非常现实的事。公司本来就会花相当多时间决定在人才、营销活动、线下活动、笔记本电脑配置,甚至午餐成本上各花多少钱;token 也不会例外。token 同样需要被极其严密地管理,因为你必须确保不会把预算烧穿,也必须确保 token 流向那些价值最高、最有用的工作环节。你肯定不想发现自己把整个月预算烧在了某个相对低价值的事情上,结果之后面对高得多价值的任务时反而被卡住。在大型公司规模下做到这一点极其困难,因为你会面对多层数据抽象,而且几乎无法以某种中心化方式看清 agent 所完成的数字化工作。这意味着,agentic spend(agent 相关支出)将越来越多地突破 IT 预算的边界,像其他费用一样进入组织层面的预算中。最终,团队和组织领导者都必须为此拿到预算,但即便是他们,在大多数情况下也并没有足够的可见性和控制能力。我们将需要一整套全新的软件,仅仅为了处理这个问题,而这本身很可能就是创业公司的一个机会。这将开启一个全新的企业资源分配时代,尤其是在我们仍然受制于算力约束的时候。

♥ 328↻ 39💬 665/9 · 00:06x.com ↗
2026 年 5 月 8 日 · 1 条 →

When AI makes one thing easy to do, it’s always good to assume that will equally be the case for everyone else. If it’s the case for everyone else, then that means competitive forces will ensure that resources move to new or other areas that create differentiation. If AI makes building software easier, then there will be a relative increase in resources going into sales, marketing, and customer success, because standing out or going deeper with customers else becomes even more important. This will also apply to lots of other areas of work. If you automate getting financial advice and insights, then the differentiation is in client engagement. And on and on. Just ask yourself: if everyone else does exactly what I do with this technology, how will I stand out from everyone else? That’s what happens next.

当 AI 让某件事变得更容易做时,最好总是假设:对其他所有人来说,同样也会如此。如果对所有人都是这样,那就意味着竞争力量会确保资源流向新的或其他能够创造差异化的领域。如果 AI 让构建 software(软件)变得更容易,那么投入到 sales、marketing 和 customer success 的资源就会相对增加,因为在那种情况下,脱颖而出或与客户建立更深层次的联系会变得更加重要。这一点也适用于许多其他工作领域。如果你把获取 financial advice 和 insights 自动化了,那么差异化所在的就是 client engagement。而且这种情况会不断重复。你只需要问自己:如果其他所有人都用这项技术做着和我完全一样的事,我要怎样才能从所有人中脱颖而出?接下来发生的就是这个。

♥ 167↻ 8💬 305/8 · 01:50x.com ↗
2026 年 5 月 6 日 · 1 条 →

Both Anthropic and OpenAI have new initiatives to help enterprises deploy AI agents within their organizations. This is a trend that’s early but going to get very big fast. As agents enter knowledge work beyond coding, there is very real work to upgrade IT systems, get agents the context they need, modernize the workflows to work with agents, figure out the human-agent relationship in the workflow, drive adoption and do change management, and much more. While AI models have an incredible amount of capability packed into them, there’s no shortcut to getting that intelligence applied to a business process in a stable way. This is creating tons of opportunities across the market for new jobs and firms, and the labs are equally recognizing the criticality here.

Anthropic 和 OpenAI 都推出了新计划,帮助企业在其组织内部部署 AI agents(智能体)。这是一个尚处早期、但很快会变得非常庞大的趋势。随着 agents 进入编码之外的知识工作领域,企业确实需要完成大量现实工作:升级 IT 系统,为 agents 提供它们所需的上下文,改造工作流以便与 agents 协同运作,厘清工作流中人与 agent 的关系,推动采用并进行变革管理,等等。虽然 AI models(模型)本身封装了惊人的能力,但要把这种智能以稳定的方式真正应用到业务流程中,并不存在什么捷径。这正在为整个市场创造大量新的岗位和公司机会,而这些 labs(实验室)也同样在认识到这里的关键性。

♥ 836↻ 110💬 1245/4 · 16:54x.com ↗
2026 年 5 月 5 日 · 1 条 →

Both Anthropic and OpenAI have new initiatives to help enterprises deploy AI agents within their organizations. This is a trend that’s early but going to get very big fast. As agents enter knowledge work beyond coding, there is very real work to upgrade IT systems, get agents the context they need, modernize the workflows to work with agents, figure out the human-agent relationship in the workflow, drive adoption and do change management, and much more. While AI models have an incredible amount of capability packed into them, there’s no shortcut to getting that intelligence applied to a business process in a stable way. This is creating tons of opportunities across the market for new jobs and firms, and the labs are equally recognizing the criticality here.

Anthropic 和 OpenAI 都推出了新计划,帮助企业在其组织内部部署 AI agents(智能体)。这是一个尚处早期、但很快会变得非常庞大的趋势。随着 agents 进入编码之外的知识工作领域,企业确实需要完成大量现实工作:升级 IT 系统,为 agents 提供它们所需的上下文,改造工作流以便与 agents 协同运作,厘清工作流中人与 agent 的关系,推动采用并进行变革管理,等等。虽然 AI models(模型)本身封装了惊人的能力,但要把这种智能以稳定的方式真正应用到业务流程中,并不存在什么捷径。这正在为整个市场创造大量新的岗位和公司机会,而这些 labs(实验室)也同样在认识到这里的关键性。

♥ 836↻ 110💬 1245/4 · 16:54x.com ↗
2026 年 5 月 4 日 · 2 条 →

Whether it’s existing consulting firms, new ones that emerge, FDEs from agent vendors, or new internal agent engineering roles, the amount of work that is going to be created to implement agents in enterprises will exceed anything we imagine today. The complexity of implementing agents in any existing organizations is very real. When I talk to large enterprises, as you move from a chat paradigm to agents that participate in meaningful workflows, there are a number of things they need to do. First, you have to get agents to be able to talk to your data securely across your systems. In many cases, enterprises have decades of legacy infrastructure that contain the valuable context for AI agents. That’s going to take a ton of work to go modernize and move to systems that work well with agents. Then, you need to ensure that you’ve implemented agents with the right access controls and entitlements, the right scopes to be safely used, and have ways of monitoring, logging, and securing the work that they do. Next, you need to actually document the processes in the organization in a way that agents can utilize for doing the work. You also need to figure out what the new workflow looks like when agents and people are working together on a process, and who steps in where. Just replicating the old workflow will mute the gains. Oh and you likely need to create evals for your top new end-state processes. Finally, you have to keep up with a rapidly changing set of best practices and architectural shifts happening in the agent space. While it’s fun for people to change their personal productivity tools on a dime, it’s 100X harder to do this in a business process. The speed of change is a blessing and a curse right now for anyone trying to keep a stable system design. All of this means that individuals and companies that develop expertise on the above set of components (and more) are going to be needed to help organizations actually implement agents at scale. This is also the rationale for vertical AI agents right now that can go in deep on a business domain and help bring automation to it. This is a huge opportunity right now whether you’re doing this internally or as an external business provider.

无论是现有的咨询公司、新出现的公司、来自 agent vendor 的 FDE,还是新的内部 agent engineering 岗位,为了在企业中落地 agent 而将被创造出来的工作量,都会超过我们今天能想象的任何程度。在任何现有组织中实施 agent 的复杂性都非常真实。当我与大型企业交流时,我发现一旦你从 chat 范式走向能够参与有意义工作流的 agent,他们就需要做很多事情。首先,你必须让 agent 能够在你的各个系统之间安全地访问并对话你的数据。在很多情况下,企业拥有数十年的 legacy infrastructure,其中包含了 AI agent 所需的宝贵上下文。要把这些基础设施现代化,并迁移到能与 agent 良好协作的系统中,将需要海量工作。接着,你需要确保已经为 agent 实施了正确的 access controls 和 entitlements,设定了可被安全使用的合适 scopes,并且具备监控、记录日志以及保障其工作安全的方式。然后,你还需要以 agent 能够利用的方式,真正把组织中的流程文档化,好让它们能据此执行工作。你还需要弄清楚,当 agent 和人一起参与某个流程时,新的 workflow 应该是什么样,谁在什么环节介入。仅仅复制旧的 workflow,会削弱收益。哦,而且你很可能还需要为你最重要的新终态流程创建 evals。最后,你必须跟上 agent 领域中快速变化的一整套 best practices 和架构演进。人们随手就能更换自己的个人生产力工具,这当然很有趣,但要在业务流程里这么做,难度要高 100 倍。对于任何试图维持稳定系统设计的人来说,当下这种变化速度既是祝福,也是诅咒。所有这些都意味着,那些在上述这些组成部分(以及更多方面)发展出专长的个人和公司,将会被需要来帮助组织真正大规模实施 agent。这也是为什么当下 vertical AI agents 有其存在逻辑:它们可以深入某个业务领域,并帮助将自动化带入其中。无论你是在企业内部做这件事,还是作为外部业务服务提供商来做,这都是一个巨大的机会。

♥ 1.3K↻ 162💬 1025/3 · 21:53x.com ↗

In general, we should treat AI like a utility, not like a being. The more we confuse what AI is the more we will make ourselves go crazy with analogies that will never fully hold true.

总的来说,我们应该把 AI 当作一种 utility,而不是当作一种存在。我们越是混淆 AI 到底是什么,就越会让自己陷入各种永远不可能完全成立的类比之中,最终把自己逼疯。

♥ 245↻ 21💬 565/3 · 18:41x.com ↗
2026 年 5 月 3 日 · 1 条 →

If you think AI replaces software engineers, here’s a quick thought experiment. Imagine you’re a life sciences company. 10 years ago you want to invest heavily in lab automation, processing data at scale, and other software. You look at the cost of doing so and realize you can’t compete with tech for as many engineers as you need, so you pare down your goals and do what you can. Every new software project has a fixed cost of a certain sized team, so you can only do so much given budgets, ability to compete for talent, and other trade offs. Now, AI comes along. And all of a sudden you have the *exact same* output tokens as the best tech companies in the world. Your engineers are using the same AI models as the tech industry, which means you have just boosted your engineering team by a some meaningful amount, while also neutralizing your differences with tech. Do you continue with your pared down approach, or do you start to hire more engineers because each engineer is 2X or 5X more capable than before? In almost every company I’m talking to, they’re doing the latter. Now extrapolate this to every bank, manufacturer, industrial company, retailer, and on and on. And extrapolate it not to just large enterprises, but also every SMB up and down the stack of these value chains. Oh, and also extrapolate this to other job functions, not just engineers. Resource scarce domains in marketing, legal, finance, design, and so on. If you’re wondering why new jobs show up because of AI this is the reason. Any other view of what happens doesn’t contemplate the variety of unmet needs there are in the economy.

如果你认为 AI 会取代 software engineers,不妨做个简短的思想实验。假设你是一家 life sciences 公司。10 年前,你想在 lab automation、以规模化方式处理数据以及其他 software 方面进行大量投入。你评估这样做的成本后发现,你无法像 tech 行业那样招到所需数量的 engineers 来参与竞争,于是你只能下调目标,在能力范围内尽量去做。每个新的 software 项目都有一个固定成本,需要一支达到特定规模的团队,所以受限于预算、争夺人才的能力以及其他权衡,你能做的事情始终有限。现在,AI 出现了。突然之间,你拥有了与全球最优秀 tech 公司*完全相同*的 output tokens。你的 engineers 使用的是与 tech 行业相同的 AI models,这意味着你的 engineering team 在产能上得到了相当可观的提升,同时你与 tech 公司之间的差距也被抹平了。那么,你会继续维持原先那个下调后的方案,还是会开始招聘更多 engineers,因为现在每位 engineer 的能力都变成了过去的 2X 或 5X?几乎我接触到的每一家公司,选择的都是后者。现在,把这个逻辑外推到每一家 bank、manufacturer、industrial company、retailer,等等。再把它外推到不仅仅是大型 enterprise,还包括这些价值链上下游的每一家 SMB。哦,对了,还要把它外推到其他 job functions,而不只是 engineers。那些资源稀缺的领域,比如 marketing、legal、finance、design 等等。如果你在想,为什么 AI 会催生新的 jobs,原因就在这里。任何其他关于未来会如何发展的看法,都没有充分考虑到经济中存在着多少尚未被满足的需求。

♥ 301↻ 52💬 575/2 · 21:09x.com ↗
2026 年 5 月 1 日 · 1 条 →

As agents become the biggest users of software, then all software has to be available in a headless fashion. Agents won’t be using your UI, they’ll be talking to your APIs. So the question becomes what is the business model of software and this headless approach in the future? Here are a few thoughts on how everything plays out based on what we’re seeing and doing at Box, but also conversation with other platforms. 1) Seats don’t go away for *people*. Seats are still a convenient and efficient way to have a customer use technology predictably for a set of users within a baseline set of usage. The key, though, is that when the customer pays for a seat, it has to come with a set of usage of APIs on behalf of that user that the agent can use on their behalf. The user will need to be able to interact with their data and the underlying tool via any agent they work with, and an embedded amount of usage will come with the seat. I would imagine most software -Box included- will enable seats to work with their data at a relatively high volume via systems like ChatGPT, Codex, Claude, Gemini, Cursor, Copilot, Perplexity, Factory, Cogniton, et al. quite seamlessly. If you don’t do this, you’re DOA. 2) Agents may have “seats” if they are doing stateful work in the system, but they will be priced very differently than people. Seats (or the equivalent) can make sense when you have an agent that has its own workspace, stores its own data, needs a different set of permissions compared to the user, and so on. If a company wants this agent to be around for long period of time, that may very well look like another “user” in the system. Openclaw-style agents highlight what this future could look like. The only issue on pricing here is that one customer could decide to do all their work in 1 agent, and another might split it into 1,000 agents. So pricing like a human seat is nearly impossible and impractical; each company will have a different approach for this as it gets tricky perfectly trying to capture all the value within an agent seat. 3) The dominant pricing for headless use that goes above the seat allotment, or when an agent is firmly acting on their own, will be a consumption model. Many enterprises software platforms have previously operated like this with PaaS options, and agents will look like another machine user of their system. In some cases the APIs might get priced just as they did previously, but in other cases there may need to be new types of APIs that represent the work an agent would do in one go -more akin to an outcome- instead of a series of API calls. This is especially germane when the headless software also has an agentic use-case embedded within in, such as orchestrating the process within their own system via AI. Overall the growth of this usage pattern is effectively unbounded as the use-cases for agents operating on data in these systems will dramatically exceed what people do with their data and tools today. Every platform that goes headless (which will be anyone that wants to take advantage of agents) will need to adopt a model like this. Some may fight it initially but it’s an inevitably as there will always be more and more agents outside your platform than people. Overall, there’s a lot of really interesting changes left to come in software due to headless use of these systems. Early days.

随着 agent 成为软件最大的使用者,所有软件都必须能以 headless(无界面)方式提供。agent 不会使用你的 UI,它们会与你的 API 对话。所以问题就变成了:未来软件以及这种 headless 方式的商业模式会是什么?基于我们在 Box 看到和正在做的事情,以及与其他平台的交流,下面是我对事情将如何发展的几点看法。1)面向*人*的 seat 不会消失。对于让客户在一组基线使用量范围内、以可预测的方式让一组用户使用技术而言,seat 仍然是一种方便且高效的方式。不过关键在于:当客户为一个 seat 付费时,它必须附带一组代表该用户进行 API 使用的额度,供 agent 代其使用。用户需要能够通过他们合作的任何 agent 与自己的数据和底层工具交互,而 seat 中也会包含一部分内嵌的使用量。我猜想大多数软件——包括 Box——都会让 seat 能够通过 ChatGPT、Codex、Claude、Gemini、Cursor、Copilot、Perplexity、Factory、Cogniton 等系统,以相对高的吞吐量、几乎无缝地处理其数据。如果你不这么做,你就是 DOA。2)如果 agent 在系统中执行的是 stateful(有状态)工作,那么它们也可能会有“seat”,但其定价方式会与人非常不同。当一个 agent 拥有自己的 workspace、存储自己的数据、需要与用户不同的一组权限等等时,seat(或等价形式)是有意义的。如果一家公司希望这个 agent 长期存在,那它很可能就会看起来像系统中的另一个“user”。Openclaw 风格的 agent 突出了这种未来可能的样子。这里定价上的唯一问题在于:一个客户可能决定把所有工作都放进 1 个 agent,另一个客户则可能拆分成 1,000 个 agent。所以像人类 seat 那样定价几乎不可能,也不切实际;随着要准确捕捉一个 agent seat 中的全部价值变得棘手,每家公司都会对此采取不同做法。3)对于超出 seat 配额的 headless 使用,或者当 agent 明确地在自主行动时,占主导地位的定价将会是 consumption(按量消费)模型。许多企业软件平台此前已经通过 PaaS 选项这样运作,而 agent 看起来会像它们系统中的另一类机器用户。在某些情况下,API 的定价方式可能与以前一样,但在另一些情况下,可能需要新的 API 类型,用来表示 agent 一次性完成的工作——更接近某种 outcome(结果)——而不是一连串 API 调用。当 headless 软件本身还内嵌了 agentic(代理式)使用场景时,这一点尤其贴切,比如通过 AI 在其自身系统内编排流程。总体而言,这种使用模式的增长实际上是无上限的,因为 agent 在这些系统中基于数据进行操作的使用场景,将会大幅超出今天人们对其数据和工具的使用方式。每一个走向 headless 的平台(也就是任何想利用 agent 的平台)都需要采用类似这样的模型。有些平台起初可能会抗拒,但这是不可避免的,因为你的平台之外的 agent 数量总会越来越多,超过人。总的来说,由于这些系统的 headless 使用,软件领域还会出现许多非常有意思的变化。现在还只是早期。

♥ 357↻ 26💬 615/1 · 03:15x.com ↗
2026 年 4 月 30 日 · 1 条 →

Starting to hire and retrain for new agent engineering roles for *internal* functions to help get more powerful agents working well on critical business processes. I expect this type of role to be a very big deal over time at Box and other companies. It looks something like an internal FDE, whose job it is to wire up internal systems and get agents working with them effectively. The person will be extremely technical and capable of building secure, governed agents for internal workflows that connect to business systems (like Box, Salesforce, Workday, etc.), and codify workflows in skills. In some cases this person may understand the business process well enough to do it fully, but in most cases I expect them to work with the business directly in an embedded fashion. Ironically, that may introduce another new role on the business side that is more akin to agent product management for internal processes. The key is that you need technical + process people that can span multiple teams or functions in an organization. It’s not about brining automation to a job, but bringing automation to a process. This is going to be a very big trend in most companies going forward. Fun to watch the early innings of what this will look like.

开始为新的 agent engineering 角色招聘并再培训人才,以支持*内部*职能,帮助更强大的 agent 在关键业务流程中良好运作。我预计,随着时间推移,这类角色在 Box 和其他公司都会变得非常重要。它有点像内部的 FDE,其工作是把内部系统连接起来,并让 agent 能有效地与这些系统协同工作。这个人需要具备很强的技术能力,能够为连接业务系统(如 Box、Salesforce、Workday 等)的内部工作流构建安全、可治理的 agent,并把工作流编码为 skills。在某些情况下,这个人对业务流程的理解可能足够深入,能够独立完成全部工作;但在大多数情况下,我预计他们会以嵌入式的方式直接与业务团队合作。具有讽刺意味的是,这可能还会在业务侧催生出另一种新角色,更接近于面向内部流程的 agent product management。关键在于,你需要既懂技术又懂流程的人,能够跨越组织中的多个团队或职能协同工作。这不是把自动化带到某个岗位上,而是把自动化带到某个流程中。这将会成为未来大多数公司里的一个非常大的趋势。能看到这件事在早期阶段会呈现出什么样子,很有意思。

♥ 226↻ 10💬 324/30 · 04:56x.com ↗
2026 年 4 月 29 日 · 2 条 →

Will keep saying this, but software jobs aren’t going away. Agents are the single biggest form of leverage for anyone technical in history. Probably has never been a better time to be technical in terms of being able to accomplish something solo, in a team, or company. We think that most of the world’s software has already been built and that agents will just reduce work from an existing pie. In fact, we are about to experience 100X more software than before. Think about how many apps you regularly use that need to get better. How many legacy on prem systems that have to get replatformed for the cloud. How many SMBs never could hire developers. How many security issues are about to be uncovered and need to get patched. How many IT organizations are about to bring automation to workflows they never could have automated. How much data is about to processed and connected in most organizations. This is all what the agents will be working on. And every one of those agents will need a person to kick them off, manage their work, orchestrate them, and get their output into a workable and useful form. That person will generally need to be technical (or become technical quickly), and this will create a huge amount of opportunity for anyone up to the task.

我还是会继续这么说:software 工作岗位不会消失。agent 是历史上对任何技术人员而言最大的一种 leverage(杠杆)形式。就个人、团队或公司能够独立做成事情的能力而言,现在大概从来没有比这更适合做技术的时候了。我们常以为,世界上大部分 software 都已经被构建出来了,而 agent 只是把既有这块蛋糕中的工作量减少一点。事实上,我们即将经历比以前多 100 倍的 software。想想你经常使用的多少 app 还需要变得更好;多少 legacy on prem system 必须重新迁移到 cloud;多少 SMB 根本没法雇佣 developer;多少 security 问题即将被发现并需要打 patch;多少 IT 组织即将把 automation 引入过去根本无法自动化的 workflow;以及大多数组织中有多少 data 即将被处理并连接起来。这些,都会是 agent 将要处理的工作。而且每一个这样的 agent,都需要有人来启动它、管理它的工作、编排它们,并把它们的输出整理成可实际使用、真正有用的形式。这个人通常需要具备技术能力(或者迅速变得具备技术能力),而这将为任何能够胜任的人创造大量机会。

♥ 300↻ 42💬 334/29 · 03:44x.com ↗

Agentic coding is a huge boon for software developers that want to get far more done, great for IT people to build vastly more custom systems internally, great for domain experts that want to automate workflows or wire systems together, and absolutely fantastic for anyone curious to learn how to start coding. What it’s less great for is casually building complex software that you have to maintain on an ongoing basis and take on all the risk for. Upgrades, maintenance, keeping up to date with latest security issues, and so on, are taxes most knowledge workers aren’t familiar with or prepared for. Net net: we’re going to get 100X more software and vastly more software developers in the future. But that’s different from *everyone* rolling their own.

Agentic coding 对想要完成更多工作的 software developer 来说是巨大的 boon(助益);对想在内部构建大量更多定制系统的 IT 人员来说非常有利;对想要自动化 workflow 或把各类 system 连接起来的领域专家来说也很好;而对任何好奇并想学习如何开始 coding 的人来说,更是极其棒的工具。它没那么适合的场景,是随意去构建复杂 software,然后还要持续维护它,并自己承担全部风险。升级、维护、跟进最新的 security 问题等等,都是大多数 knowledge worker 不熟悉、也没有准备好应对的“税”。Net net(总的来说):未来我们会拥有多 100 倍的 software,也会有多得多的 software developer。但这和让 *每个人* 都自己从头做一套,是两回事。

♥ 419↻ 33💬 494/28 · 16:28x.com ↗
2026 年 4 月 28 日 · 1 条 →

[1]

♥ 402↻ 39💬 434/28 · 02:22x.com ↗
2026 年 4 月 27 日 · 1 条 →

Great read. AI lets you get tremendous leverage that wasn’t available before in almost any domain. That means we’re at a unique moment in history where anyone with a high level of ambition and core skills in any area can overcome a lot of historical experience requirements for their role. This can apply to anyone who’s junior or senior, but it’s pretty sweet that you can do far more than you could have accomplished as a newer employee than even a couple years ago. The people that take advantage of this will get ahead massively. And the companies that find this talent within or outside should put them in key positions to get as much out of them as possible. These people will seem strange and from the future, but they will help you figure out where things are going. Everyone company should be doing whatever they can to find them.

很值得一读。AI 让你在几乎任何领域都获得了此前不存在的巨大 leverage(杠杆效应)。这意味着,我们正处在历史上一个独特的时刻:任何在某个领域具备强烈抱负和核心技能的人,都能突破这个岗位过去对经验的大量要求。这一点对 junior 或 senior 的人都适用;但尤其令人惊喜的是,作为一名较新的员工,你如今能做到的事情,已经远远超过哪怕两年前你本来可能做到的程度。那些善于利用这一点的人,会获得巨大的领先优势。而那些能在公司内部或外部发现这类人才的公司,应该把他们放到关键岗位上,尽可能释放他们的价值。这些人看起来可能有些奇特,像是从未来来的,但他们会帮助你判断事情将走向何方。每一家公司都应该尽其所能找到他们。

♥ 396↻ 40💬 394/26 · 02:10x.com ↗
2026 年 4 月 22 日 · 1 条 →

The jump from working with a chatbot to having an agent that actually helps automate a process requires a real amount of work. Most companies will need to have dedicated people that are responsible for bringing automation to their teams, instead of leaving this up to every individual employee. Partly because the work is more technical than we imagine today, and partly because it’s just hard to do this as a side project. The job spec is to map out new workflows with agents, implement new systems to deploy agents, make sure the agent has all the right (up to date) context to work with, wiring up internal systems to connect to the agents, creating evals for the agents, figuring out where the human is in the loop, managing the system when there are new upgrades, helping with the change management of the existing business process, and so on. These jobs may come from IT or engineering, or live directly in the business function itself. They’ll be called different things depending on the company, and in some sense it’s the future of software engineering that you’ll see a huge growth of in non-tech companies. Most companies will have to be hiring for this now or in the future, and it’s another example of the kind of new jobs that will be created in AI.

从使用 chatbot(聊天机器人)到拥有一个真正能帮助自动化流程的 agent(智能体),这中间需要投入相当多的实际工作。大多数公司都需要配备专门负责把自动化引入团队的人,而不是把这件事留给每个员工各自去做。部分原因是,这项工作比我们今天想象的更具技术性;另一部分原因是,把它当作一个副业项目来推进本身就很难。这类岗位的职责包括:梳理基于 agent 的新工作流,实施用于部署 agent 的新系统,确保 agent 拥有开展工作所需的一切正确且最新的上下文信息,把内部系统连接到 agent,创建用于评估 agent 的 evals(评测),确定 human in the loop(人类参与环节)应该放在哪里,在系统有新升级时进行管理,帮助现有业务流程的变更管理,等等。这些岗位可能来自 IT 或 engineering(工程)部门,也可能直接设在具体业务职能部门内部。不同公司对它们的叫法会不一样;从某种意义上说,这也是 software engineering(软件工程)的未来形态,你会看到它在非科技公司中出现巨大增长。大多数公司现在或未来都必须为这类岗位招聘,而这也是 AI 将创造的新工作类型的又一个例子。

♥ 499↻ 59💬 554/21 · 01:17x.com ↗
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers