🎙 播客No Priors· 2026 年 5 月 1 日· 7,934 词 · 约 40 分钟

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

SPACE 播放 / 暂停←→ 上一句 / 下一句

Speaker 100:05 - 00:28

Hi, listeners. Today, Elad and I are here with Tuhin Srivastava, the founder and CEO of Base10, inference cloud. We're here to talk about capacity constraints for AI compute, why inference is the last market, how the workload is changing, the open source and perhaps multi chip future, and what 30x scale in a year looks like. Tuhin, welcome back.

Speaker 100:05 - 00:28

各位听众大家好。今天，Elad 和我请到了 Base10（inference cloud）的创始人兼 CEO Tuhin Srivastava。我们今天要聊的是 AI compute 的产能约束、为什么 inference（推理）是最后一个市场、工作负载正在如何变化、open source（开源）以及也许会到来的 multi chip（多芯片）未来，还有一年内实现 30x 规模增长到底意味着什么。Tuhin，欢迎回来。

Speaker 200:28 - 00:30

Hi. Good to see you.

Speaker 200:28 - 00:30

你好，很高兴见到你们。

Speaker 300:30 - 00:31

Thanks for having me.

Speaker 300:30 - 00:31

谢谢邀请我。

Speaker 100:31 - 00:41

All right. You are in one of the craziest markets, AI inference. It's very important. There's a lot going on. You guys have grown 30x over the last year.

Speaker 100:31 - 00:41

好。你所处的是一个最疯狂的市场之一，AI inference（AI 推理）。它非常重要，而且有很多事情正在发生。你们在过去一年里增长了 30 倍。

Speaker 100:42 - 00:46

And I think I can say you're expecting to do more than a billion dollars in revenue this year.

Speaker 100:42 - 00:46

而且我想我可以说，你们预计今年的营收会超过 10 亿美元。

Speaker 300:46 - 00:46

Mhmm.

Speaker 300:46 - 00:46

嗯。

Speaker 100:47 - 00:49

What's going on? Tell us about scale.

Speaker 100:47 - 00:49

这是怎么回事？跟我们讲讲规模增长吧。

Speaker 300:49 - 01:09

Yeah. No. It's been it's been nuts. I I think what's happened over the last, honestly, twenty four months, but just kinda keeps getting bigger and bigger, is that I think everyone is realizing that you can put AI everywhere. You have all these great options available from closed source to open source models.

Speaker 300:49 - 01:09

对，没错。这一切真的太疯狂了。我觉得在过去——说实话是过去 24 个月里——发生的事情，而且还在变得越来越大，就是我认为每个人都意识到，你可以把 AI 放到任何地方。现在从 closed source（闭源）到 open source（开源）model（模型），你有很多很好的选择。

Speaker 301:09 - 01:54

The open source models have crossed some sort of chasm in terms of their baseline capability. And then I think RRL techniques and post training for specialized models has become mainstream enough and, you know, there's enough examples of it work of it working. The customer's realizing they can, you know, kind of own their inference more and more. And what that's meant for us is more, you know, the long tail of models coming true, customers in housing a lot of that intelligence themselves. As the application layer just gets, you know, bigger and bigger and bigger and that's growing, we just someone index on that and we've been around to be able to collect the demand.

Speaker 301:09 - 01:54

open source model（开源模型）在基础能力方面已经跨过了某种临界点。然后我觉得，RRL 技术以及面向专用模型的 post training（后训练）已经足够主流了，而且，你知道，已经有足够多的成功案例证明它们确实有效。客户也逐渐意识到，他们可以越来越多地掌控自己的 inference（推理）。而这对我们来说意味着，更多模型的 long tail（长尾）正在真正出现，客户也在把大量这类智能能力放到内部自己承载。随着 application layer（应用层）变得越来越大、越来越大，而且还在持续增长，我们正好是押注在这一点上，因此也一直能够承接并汇聚这些需求。

Speaker 101:55 - 02:07

There's an existential question in here that I think everybody is continually asking of does the independent application layer get to exist at all versus the labs? You have to believe this. Why do you believe it?

Speaker 101:55 - 02:07

这里有一个我觉得所有人都在不断追问的根本性问题：独立的 application layer（应用层）究竟还能否存在，还是说最终都会被 labs 取代？你必须相信它会存在。你为什么会这样相信？

Speaker 302:07 - 02:15

Yeah. Look. I I I think it'd be it'd be a sad thing if it didn't exist in general, and I think that's, like, my but, you know, sadness is fine. The

Speaker 302:07 - 02:15

对。你看。我我我觉得，如果它整体上不存在了，那会是一件很遗憾的事，我觉得这算是我的一个……不过，你知道，遗憾也没关系。这个——

Speaker 202:16 - 02:17

I'm sad all the time.

Speaker 202:16 - 02:17

我一直都挺难过的。

Speaker 302:17 - 02:41

Oh, yeah. Sadness is fine. But, like, that that that's not the reason why I think the application layer will exist. I think the application layer will exist for a number of reasons. One is because, you know, I think this idea that what is valuable to a company is, you know, the user signal that they can gather that only they can gather.

Speaker 302:17 - 02:41

哦，对。难过没关系。但这这这并不是我认为 application layer（应用层）会存在的原因。我认为 application layer 会存在，有很多原因。其中一个是，因为，你知道，我认为一家公司的价值，在于他们能够收集到的 user signal（用户信号），而且那是只有他们自己才能收集到的。

Speaker 302:42 - 03:11

And to the extent that that is encoded in a model, I think a lot of their business will be at risk, but to the to the extent that it is encoded in workflows, that is where they will be able to develop mode. So a good I think a good example of that is, say, a company like a Bridge where the clinicians edits off the notes and what they do with those notes after the fact. And the thing that happens in inside the EMR three steps down, and that becomes a workflow that only

Speaker 302:42 - 03:11

如果这些东西被编码进 model（模型）里，我认为他们很大一部分业务都会面临风险；但如果这些东西被编码进 workflows（工作流）里，那才是他们能够建立 moat（护城河）的地方。我觉得一个很好的例子，比如像 Bridge 这样的公司：临床医生会对笔记进行编辑，以及他们事后如何处理这些笔记。还有在 EMR 里面再往下三步发生的事情，这就会变成一种只有——

Speaker 103:11 - 03:13

Can you explain what a bridge does?

Speaker 103:11 - 03:13

你能解释一下 Bridge 是做什么的吗？

Speaker 303:13 - 03:28

Sorry. A bridge a bridge is a ambient scribe that is used by physicians in, like, you know, almost all hospitals in The US. I think Lad's an investor. Great Shiv's amazing. Great great company.

Speaker 303:13 - 03:28

抱歉。Bridge，Bridge 是一种 ambient scribe（环境记录员工具），医生会在——比如说——美国几乎所有医院里使用它。我想 Lad 是投资人。Shiv 很厉害。非常棒的一家公司。

Speaker 303:28 - 04:01

Great team. Great product. And, you know, they they've basically, you know, got this very, very deep integration into into hospitals and to clinician workflows. And my argument would be here is that actually, you know, it's very, very hard for a frontier model company to be able to eat up where that could they just don't have access to that user signal. And what will happen over time is the folks who have access to that user signal can start to post train models on that reward signal and and start to get long long horizonogenic models running that.

Speaker 303:28 - 04:01

很棒的团队，很棒的产品。而且，你知道，他们基本上已经非常非常深入地整合进了医院体系以及临床医生的 workflows（工作流）里。我的观点是，实际上，frontier model（前沿模型）公司非常非常难以吞掉这类机会，因为他们就是拿不到那种 user signal（用户信号）。而随着时间推移，那些能接触到这种 user signal 的人，就可以开始基于这种 reward signal（奖励信号）对模型做 post-train（后训练），并开始让能够处理长时程任务的模型跑起来。

Speaker 304:01 - 04:34

And I think to the extent that that is possible and that signal is differentiated and unique and and is somewhat rare to to get access to, there will be an application layer. And I think, you know, it's like support company is another example of that where, you know, a support a support task isn't one shotted. Usually, at a company like BS10, when a ticket comes in, there's like, what, like, one, two, ten, twenty actions that get taken, and that is where, you know, someone can develop a specialized model.

Speaker 304:01 - 04:34

我认为，只要这件事是可能的，而且这种 signal（信号）是有区分度的、独特的，并且获取它的机会相对稀缺，就会出现一个 application layer（应用层）。我觉得，support company 就是另一个这样的例子：support 任务并不是 one-shotted（一次就完成）的。通常在像 BS10 这样的公司里，当一个 ticket（工单）进来时，往往会触发一、两个，甚至十个、二十个动作；而这正是有人可以去开发 specialized model（专用模型）的地方。

Speaker 204:34 - 04:57

So there's almost two versions of this then. There's new companies like Abridge or Decagon or some of these other things that you mentioned that are doing these new types of applications that are using AI and they sell it to customers. The other is enterprises building things in house or building their own models. What proportion of the market today do you think is these new application companies versus enterprises adopting AI? And how do you think that looks in a couple years?

Speaker 204:34 - 04:57

所以这里几乎可以分成两个版本。一个是像 Abridge、Decagon，或者你提到的其他一些新公司，它们在做这类新的应用，用 AI 来构建产品并卖给客户。另一个则是企业在内部自建这些东西，或者自己训练模型。你觉得今天的市场里，这些新应用公司和企业采用 AI 之间，大致各占多大比例？再过几年，你觉得这个格局会是什么样？

Speaker 304:57 - 05:03

Yeah. I I think that's a that's a you we did I think you asked me the same question two years ago Oh, yeah. On on the file of the Internet.

Speaker 304:57 - 05:03

对，我觉得这是个——我想你两年前也问过我同样的问题。哦，对，在那个 Internet 的 file 上。

Speaker 205:03 - 05:04

I have to be repetitive.

Speaker 205:03 - 05:04

我只能重复了。

Speaker 305:04 - 05:05

It it is crazy

Speaker 305:04 - 05:05

这实在太疯狂了。

Speaker 205:05 - 05:06

At inconsistent.

Speaker 205:05 - 05:06

或者说，太不一致了。

Speaker 305:06 - 05:25

The answer is just that it's crazy that the answer is still, I think I think if you look by inference count, it'd be 99% the fall. Yeah. And that is that kind of represents the scope of the opportunity here is that the majority of the market hasn't come online and and added AI into

Speaker 305:06 - 05:25

答案就是，这件事疯狂之处在于，这个答案到现在居然还是——我觉得，如果按 inference count（推理调用次数）来看，99% 还是后者。对。而这其实也体现了这里机会的规模：市场中的绝大多数还没有真正上线，也还没有把 AI 加进

Speaker 205:25 - 05:33

the system. Enterprise adoption is well ahead of us, and I think that's one of the very exciting things about AI. Yeah. There's just so much still to come, and people are underestimating that, I think.

Speaker 205:25 - 05:33

系统里。Enterprise adoption（企业采用）还远远在我们前面，我认为这正是 AI 最令人兴奋的地方之一。对，后面还有太多东西会发生，而我觉得人们低估了这一点。

Speaker 305:33 - 05:40

100. And I but but what's cool is that we're seeing the transition happen. Right? Before, it was like, hey. Are they are they using AI tools?

Speaker 305:33 - 05:40

100。不过真正很酷的是，我们正在看到这种转变正在发生。对吧？以前更像是在问，嘿，他们有没有在用 AI 工具？

Speaker 305:41 - 05:54

Mhmm. I don't think that was immediately obvious two years ago, and think it's obvious now that, yes, they are. Are they using closed source model APIs? I think they're starting to get there. I think once you do that and then you kind of see what is possible, then comes the whole custom model adoption.

Speaker 305:41 - 05:54

嗯。我不觉得这在两年前是立刻显而易见的，但现在我认为已经很明显了：是的，他们在用。那他们有没有在用 closed source model APIs（闭源模型 API）？我觉得他们也开始走到这一步了。我认为一旦你这样做了，然后你大概看到了什么是可能的，接下来就会进入整个 custom model（定制模型）的采用阶段。

Speaker 305:54 - 05:57

I think that is all that is ahead of us today.

Speaker 305:54 - 05:57

我觉得这就是如今摆在我们前面的一切。

Speaker 105:57 - 06:09

So if the majority of your customer base today is, as you described, the the former, like, application companies, AI natives, the fast growing mean, of them are at considerable scale now, like the abridge, cursor

Speaker 105:57 - 06:09

所以如果你们现在的大多数客户群，像你刚才描述的那样，属于前一种——比如 application companies（应用公司）、AI natives（AI 原生公司），以及那些增长很快的——我的意思是，它们中不少现在已经相当有规模了，比如 abridge、cursor

Speaker 306:09 - 06:10

Open Evidence.

Speaker 306:09 - 06:10

Open Evidence。

Speaker 106:10 - 06:21

Open Evidences of the world. What do they teach you? What does that push the company to do? How do you think about serving them versus evolving for the enterprise?

Speaker 106:10 - 06:21

像 Open Evidence 这样的公司。它们教会了你们什么？这又会推动公司去做什么？你会如何看待服务它们这件事，以及如何为 enterprise（企业客户）方向的演进做准备？

Speaker 306:21 - 06:53

Yeah. I I think, firstly, like, you just learn a lot by building with the company's greatest scale, doing the most interesting things. We we think of it two ways. Like, I think there's, like, the the the most obvious way, which is just build for the highest scale, you know, most the the customers that will push you the most from technologically and everything kinda will fall into play. Think the Stripe evolution as a company showed that was like Stripe now, like, serves like so many enterprises.

Speaker 306:21 - 06:53

对。我觉得，首先，和那些规模最大、在做最有意思事情的公司一起构建，你就是会学到很多。我们大概从两个角度来看这件事。一个我觉得是最明显的方式，就是为最高规模去构建，你知道的，为那些在技术上最能推动你的客户去构建，然后其他一切基本都会各就各位。我觉得 Stripe 作为一家公司的演进就说明了这一点——现在的 Stripe 服务着非常多的 enterprise（企业客户）。

Speaker 306:53 - 07:19

But twelve years ago, that wasn't the case. But they just built for the frontier and kind of went with them. I think the second way we think about that this is to just think about building for companies that are serving enterprises. So, yes, we don't serve the enterprise, but our customers serve enterprises. Ibrutinib, Pfizer, OpenEvenus, Decagon, all these Ryta, Gamma, all these Calc Clay, all all these companies serve enterprises en masse.

Speaker 306:53 - 07:19

但十二年前并不是这样。但他们当时就是为 frontier（前沿用户）去构建，并且某种程度上一路跟着他们走。我觉得我们思考这件事的第二种方式，是去为那些服务 enterprise（企业客户）的公司构建。所以，是的，我们并不直接服务 enterprise（企业客户），但我们的客户服务 enterprise（企业客户）。Ibrutinib、Pfizer、OpenEvenus、Decagon、所有这些，Ryta、Gamma，还有这些 Calc Clay，所有这些公司都在大规模地服务 enterprise（企业客户）。

Speaker 307:19 - 07:36

And what we actually get is, like, a translation of the requirements from them, which is, you know, they're like, hey. We need this sort of data retention. We need this type these web models need to be deployed. This is the types of GPUs or the latencies they're okay with. This is the model requirements from, like, a transparency perspective that they care about.

Speaker 307:19 - 07:36

而我们实际拿到的，其实有点像是他们需求的一份翻译，也就是，他们会说，嘿，我们需要这类 data retention（数据保留）；我们需要部署这类 web models；这是他们可接受的 GPU 类型或 latency（延迟）；还有这是他们从 transparency（透明度）角度所关心的 model requirements（模型要求）。

Speaker 307:36 - 07:55

And so I think that is actually the more nuanced answer is that if you listen to what their needs are, we actually get a full translation of what the enterprise will require. Like, I would say that by serving companies like Abridge and Open Evidence, we're probably pretty well suited to go serve the health care system given that they are selling and latent health given that they are selling to them.

Speaker 307:36 - 07:55

所以我觉得，更细致的答案其实是：如果你去听他们真正的需求，我们实际上就能得到一份完整的“企业需要什么”的翻译。比如我会说，通过服务像 Abridge 和 Open Evidence 这样的公司，考虑到它们在向医疗体系销售，以及 Latent Health 也在向他们销售，我们大概已经很适合去服务 health care system（医疗体系）了。

Speaker 207:55 - 08:16

How much of a shift are you seeing in terms of the types of open source models that are being used? And so I think we've seen an evolution where two, three years ago, I think the main thing was kind of Mistral and then a few other things, and then Meta kind of came along with LAMA, and then it kind of really shifted in terms of the misperformant models or of Chinese origin in different ways. Do you see that sort of mix reflected in terms of what's being used by your customers?

Speaker 207:55 - 08:16

就 open source models（开源模型）的使用类型而言，你们看到多大程度的变化？我觉得我们已经看到一种演变：两三年前，主要还是 Mistral 以及其他少数一些东西，后来 Meta 带着 LAMA 进场，之后在不同层面上就真的发生了转移，转向那些表现不佳的模型，或者是中国来源的模型。你是否看到，这种组合变化也反映在你们客户实际使用的模型上？

Speaker 308:16 - 08:39

Yeah. I think customers, at least the customers we are serving, are very and these are, like, the fastest growing AI companies in the world that are very forward thinking. They they wanna use the best model, and they they are optimizing. I think there is there there are a there's a subset of tasks, which I think is small today, where people really start to start with cost. Mhmm.

Speaker 308:16 - 08:39

对。我觉得客户，至少是我们正在服务的这些客户——也就是全球增长最快、而且非常有前瞻性的 AI 公司——他们非常希望使用最好的 model（模型），而且他们是在做优化。我认为确实有一类任务——不过今天看占比还很小——人们会真正从 cost（成本）出发。嗯。

Speaker 308:39 - 09:22

But everyone comes from capability first because that's really where the economic growth is being unlocked, where the value is being delivered, and then they optimize. And I think that's, like, actually been, you know and so with that in mind, you name everything from GPTOSS all the way to Moonshot Mall to DeepSeeks to Canopy or Orpheus, which is, like, really good text to speech models, customers generally wanna use whatever's at the frontier. And and I think the the difference has just been I think we have a lot more visibility into how to run these and how to run these really well. And secondly, that they're good now.

Speaker 308:39 - 09:22

但所有人首先看的还是 capability（能力），因为真正释放经济增长、真正交付价值的地方就在这里，然后他们才去做优化。我觉得这其实一直都是这样。所以带着这个前提来看，你提到的从 GPTOSS，到 Moonshot Mall、DeepSeeks、Canopy，或者 Orpheus——它在 text to speech（文本转语音）方面非常强——客户总体上就是想用任何处在 frontier（前沿）的东西。我认为区别只是在于：第一，我们现在对如何运行这些模型、以及如何把它们运行得非常好，有了更多可见性；第二，它们现在确实已经很好了。

Speaker 209:22 - 09:44

There have been a number of different concerns raised about the use of Chinese models, in particular security, or is there something embedded in the models or Trojan horses or other things? A, do you think there's any real concern there? And B, people often talk about how there should be US counterweights to this. From a geopolitical perspective, do think that's something that's legitimate or something we should be worried about? Or how do you think about the Yeah.

Speaker 209:22 - 09:44

关于使用 Chinese models（中国模型），外界提出了不少不同的担忧，尤其是安全问题，比如模型里是否嵌入了什么东西，是否有 Trojan horses（特洛伊木马）或其他风险。A，你觉得这里面是否真的存在实质性担忧？B，人们也经常谈到，美国应该对此有某种 counterweights（制衡力量）。从 geopolitical（地缘政治）角度看，你觉得这是合理的问题、是我们应该担心的事吗？或者你会怎么思考这个——对。

Speaker 209:44 - 09:46

Sort of origins of these models versus their uses?

Speaker 209:44 - 09:46

也就是，这些模型的 origins（来源）与它们的 uses（用途）之间的关系？

Speaker 309:46 - 09:53

Yeah. Look. I I I think these these models, firstly, are fantastic. They're amazing. We work with these teams.

Speaker 309:46 - 09:53

对。你看，我我我觉得，这些模型首先是真的非常出色。它们很了不起。我们也和这些团队合作。

Speaker 309:53 - 10:25

They're truly awesome. I'd say, look. I I don't it it is hard for me it is hard for me to see and I I could be wrong, but, like, you know, if if I if I network bound these models that they're not magically, you know, gonna be able to cross those network boundaries Mhmm. And to data center. And, you know, I I don't and we I've never seen any real evidence except from some very early models that I think people picked up on very quickly that there is some agenda or bias built into these.

Speaker 309:53 - 10:25

它们确实非常棒。我会说，你看，这对我来说很难判断，我也可能是错的，但是，像这样说吧，如果我把这些模型限制在 network（网络）边界内，它们并不会神奇地就跨越这些 network 边界。嗯。也不会跨到 data center。而且，你知道，我并不认为——我们也从未见过任何真正的证据，除了我觉得人们很早就迅速注意到的一些非常早期的模型之外，能证明这里面被植入了某种 agenda（议程）或 bias（偏见）。

Speaker 310:26 - 11:03

I do think that to some some extent is I I think there is importance to The US that we develop our own models. I think that that would be a massive loss if that there are five companies, you know, five different labs in China that are creating open source models, and we're struggling to get one set up. So it's necessary. I also think it's inevitable. And, you know, like the sea the deep seak moment a year ago, I remember someone saying to me, I thought it was very well said, which is like, and the world's changed a lot, but they said, hey.

Speaker 310:26 - 11:03

我的确认为，在某种程度上，The US 开发我们自己的模型是很重要的。我认为，如果 China 已经有五家公司、五个不同的 lab（实验室）在做 open source（开源）模型，而我们却还在艰难地想搭起一套，那将会是巨大的损失。所以这是必要的。我也认为这是不可避免的。还有，就像一年前那个 sea、那个 deep seak 时刻一样，我记得有人对我说过一句话，我觉得说得特别好，大意是，嘿。

Speaker 311:03 - 11:13

You know, we should kinda just forget Mhmm. That this is a Chinese model. We should just act like this came from Mhmm. From Meta and and build and build with that in mind. Mhmm.

Speaker 311:03 - 11:13

你知道，我们应该某种程度上先忘掉，嗯，这是一款 Chinese model。我们应该直接把它当成是来自，嗯，来自 Meta，然后以这种心态去构建、去开发。嗯。

Speaker 311:13 - 11:28

It's like, know, I I think you're kinda missing the forest from the trees. Like, there's two there's two scenarios. Right? Either America does not ever come up with good open source models Mhmm. And there's probably a fundamental problem there, or we will get there, and we need to be ready for that world.

Speaker 311:13 - 11:28

这就像是，你知道，我觉得你有点只见树木不见森林了。这里有两种情形。对吧？要么 America 始终拿不出好的 open source（开源）模型。嗯。那背后大概率就存在一个根本性问题；要么我们终究会做到，而我们需要为那样的世界做好准备。

Speaker 211:28 - 11:53

Yeah. That makes sense. It's interesting because, like you, I think it's very important for The US to have a strong open source footprint here. At least for now, it looks like effectively the Chinese government is subsidizing at least a large subset of these models, and that subsidy or surplus is effectively just being passed on to US enterprises who are adopting these models. In other words, it's a way for the Chinese government to effectively subsidize US enterprise in an indirect manner, and I think that's a little bit lost right now.

Speaker 211:28 - 11:53

对，这说得通。这很有意思，因为和你一样，我也认为 The US 在这里拥有强大的 open source（开源）存在感非常重要。至少从目前来看，Chinese government 实际上似乎是在补贴这些模型中的至少很大一部分，而这种 subsidy（补贴）或 surplus（盈余）实际上只是被转移给了采用这些模型的 US 企业。换句话说，这是 Chinese government 以一种间接方式，实际上在补贴 US enterprise，而我觉得这一点现在有点被忽视了。

Speaker 211:54 - 11:59

But it's always interesting to weigh that against some of the other concerns that are raised. I appreciate your comments on this.

Speaker 211:54 - 11:59

但把这一点和其他被提出来的担忧放在一起权衡，总是很有意思。我很感谢你在这方面的评论。

Speaker 311:59 - 12:25

Well, and I think the concern also just there just becomes it's like, what happened if we aren't able to like, if it is fun like, I I think if you think of the economics here, which is deep deep sea by most, deep seq's a very good model. Mhmm. You know? And, like, you can argue whether it's at the absolute frontier or not, but, like, let's let's go back three months and it's there. And so think about every and we're doing a whole lot of things three months ago.

Speaker 311:59 - 12:25

对，而且我觉得担忧还在于，事情会变成这样：如果我们做不到会怎样？如果它在根本上——我是说，我觉得如果你从这里的 economics（经济性）来考虑，deep deep sea 在大多数人看来，deep seq 是一个非常好的模型。嗯。你知道？你当然可以争论它是不是绝对的 frontier（前沿）水平，但我们就回到三个月前吧，它那时就已经在那里了。所以想想每一件事——而我们在三个月前还在做很多很多事情。

Speaker 312:25 - 12:42

Yeah. Yeah. And so let's just think about it that well. You know, if it you could run deepseq, probably 20% of the cost of running open andropic models in production with comparable better latency, probably better reliability. If we don't have access to that intelligence

Speaker 312:25 - 12:42

对，对。所以我们就这样来想这件事吧。你知道，如果你运行 deepseq，成本可能只要在生产环境中运行 open andropic models 的 20%，而且 latency（延迟）相当甚至更好，reliability（可靠性）可能也更好。如果我们无法获得那样的 intelligence（智能）

Speaker 212:42 - 12:43

Mhmm.

Speaker 212:42 - 12:43

嗯。

Speaker 312:43 - 12:56

In that form, I think it's just a massive loss. And as a country, we won't be able to innovate as fast because, like, the cost of intelligence going down in control of intelligence, what we have seen just means more intelligence. Yeah. Intelligence being embedded in more places.

Speaker 312:43 - 12:56

在那种形式下，我觉得这就是巨大的损失。对于一个国家来说，我们也将无法像现在这样快速创新，因为，随着 intelligence（智能）的成本下降，以及对 intelligence 的掌控，我们已经看到的其实只意味着会有更多 intelligence。对。也就是 intelligence 会被嵌入到更多地方。

Speaker 212:56 - 13:06

Yeah. An important note here that we didn't mention explicitly is that the state of the art models, the ones that are most far ahead on the frontier, are actually still the closed source, anthropic, OpenAI, Google, etcetera.

Speaker 212:56 - 13:06

对。这里有一个我们没有明确提到的重要说明：目前最先进的 models（模型），也就是在前沿上领先最多的那些，实际上仍然是 closed source（闭源）的，像 Anthropic、OpenAI、Google 等等。

Speaker 113:06 - 13:21

Yeah. Has been- Actually, maybe you can just characterize like workload a little bit. Like how of tokens being served on base 10, like how many of them are from custom models of some kind versus like vanilla open source today?

Speaker 113:06 - 13:21

对。一直是——其实，也许你可以大致描述一下 workload（工作负载）。比如说，在 base 10 上被提供服务的 token（词元）大概有多少，其中有多少来自某种 custom models（定制模型），相比之下又有多少是今天这种 vanilla open source（原生开源）的？

Speaker 313:21 - 13:24

It is all custom. It's basically yeah. Like, it

Speaker 313:21 - 13:24

全都是定制的。基本上是，对。就是，它

Speaker 113:24 - 13:26

is So, like, 95% plus?

Speaker 113:24 - 13:26

是这样。所以，像是 95% 以上？

Speaker 313:26 - 13:34

9095%. Like and I think that's really cool, to be honest. I mean, look, we have we have two businesses. We have we have three business. We have we have three businesses.

Speaker 313:26 - 13:34

90% 到 95%。说真的，我觉得这真的很酷。我的意思是，你看，我们有两个业务。我们有——我们有三个业务。我们有——我们有三个业务。

Speaker 313:35 - 13:36

Yeah. Should we

Speaker 313:35 - 13:36

对。我们要不要？

Speaker 113:36 - 13:36

help you count?

Speaker 113:36 - 13:36

帮你计数吗？

Speaker 313:36 - 13:48

No. No. So we have, like, dedicated dedicated inference, which is basically custom model inference. Your SLA is your SLA. Then we have shared inference with a shared inference endpoint, shared SLAs, and then we have a training business.

Speaker 313:36 - 13:48

不，不是。所以我们有专门的 dedicated inference，基本上就是定制模型推理（inference）。你的 SLA 就是你的 SLA。然后我们有 shared inference，配套的是共享的 inference endpoint、共享的 SLA，另外我们还有 training 业务。

Speaker 313:51 - 14:18

I'd say 95% of the tokens today are on the the first business. And almost all of them, there's probably a yeah. For almost all of them, the customer is making some modifications to the model with with their own data specialized for the use case. And I think what's even more important is they might be compiling it in different ways. No one is just running the vanilla open source weights.

Speaker 313:51 - 14:18

我会说，今天 95% 的 token 都在第一类业务上。而且几乎所有这些客户，大概都在做——对，几乎所有客户都会基于自己的数据、针对具体 use case 对模型做一些修改。我认为更重要的是，他们还可能会用不同方式去编译它。没有人只是直接运行原版的开源权重。

Speaker 314:18 - 14:22

Like, you you might be customizing it for quality, but you also might be customizing it for performance.

Speaker 314:18 - 14:22

比如，你可能是为了质量去做定制，但你也可能是为了性能去做定制。

Speaker 114:22 - 14:34

You made an acquisition of a research team a few months ago. You've mentioned post training customization. What was the rationale behind the acquisition? What is that team doing today?

Speaker 114:22 - 14:34

你们几个月前收购了一个研究团队。你刚才提到了 post training 定制化。这次收购背后的 rationale 是什么？那个团队现在在做什么？

Speaker 314:34 - 15:09

Yeah. So the the rationale around the acquisition was, you know, we we are infrastructure and product people. We are product people and now are really good infrastructure people. And the and we didn't have much of a research capability ourselves. And and what we saw was the market moving heavily and heavily, like, that we could accelerate the market itself with post training resources of either product types or onto even just as resources for that market.

Speaker 314:34 - 15:09

对。所以这次收购背后的 rationale 是，你知道，我们本质上是做基础设施和产品的人。我们是产品型的人，现在也成了非常擅长基础设施的人。但我们自己并没有太多 research 能力。我们看到市场越来越明显地朝那个方向发展，也就是我们可以通过 post training 资源——无论是产品形态的，还是哪怕只是为那个市场提供资源——来加速整个市场的发展。

Speaker 315:10 - 15:39

So Parzed was a company that was a base 10 customer. So they were post training models and running them on base on base 10. And I think what they realized was that they would eventually need to become an inference company. And what we realized was, like, hey. We we really needed that expertise because it represents a way for us to get closer to the customer earlier and be able to support them all.

Speaker 315:10 - 15:39

所以 Parzed 是一家 base 10 的客户。他们当时在做 post training model，并且把这些模型运行在 base 10 上。我觉得他们意识到，自己最终会需要成为一家 inference 公司。而我们意识到的是，嘿，我们确实需要那种专业能力，因为这让我们能够更早地贴近客户，并且能够全面支持他们。

Speaker 315:39 - 16:14

And it just made sense pairing them together. And just as I said in the opening statement here, which is, you know, as more and more post training models have come up, we've realized that the demand for people to either for software loops to do post training or for post training expertise is very high, and we're really, really investing in that. There are also a bunch of Australians. You know, I like to think that we had a bit of alpha there. But, yeah, that's been fantastic.

Speaker 315:39 - 16:14

所以把双方结合起来就很合理。也正如我在开场时说的那样，随着越来越多 post training model 出现，我们已经意识到，无论是对用于做 post training 的 software loop 的需求，还是对 post training 专业能力的需求，都非常高，而我们正在对此进行非常、非常大的投入。另外他们还是一群 Australians。你知道，我愿意认为我们在那里也有一点 alpha。不过，总之，这次合作非常棒。

Speaker 316:14 - 16:53

They're working with all sorts of customers. And it's also very interesting when you start you know, we were doing a lot of research on the performance side and less so on the post training side. It's interesting as we've started to do a lot more research on the post training side, you start to see how linked inference and post training are. And, like, you know, even even when you think about stuff like quantization and when you should do that and, like, you know, how how training how how you train the model affects how you need to quantize for inference and how paired these problems are Mhmm. Has become, like, very apparent.

Speaker 316:14 - 16:53

他们在和各种各样的客户合作。而且这也很有意思：一开始，你知道，我们在 performance（性能）这一侧做了很多研究，而在 post-training（后训练）这一侧做得相对少一些。随着我们开始在 post-training 这一侧做更多研究，你就会开始看到 inference（推理）和 post-training 之间联系有多紧密。比如说，甚至像 quantization（量化）这种事情，你应该在什么时候做，以及你如何训练模型，会影响你在 inference 时需要如何做量化——这些问题彼此是多么配套。嗯，这一点已经变得非常明显了。

Speaker 316:53 - 17:10

And more and more, rely on the post training inference are kind of both sides of the same problem. So because inference will ideally will beget more post training, where inference creates data, you do eval, you can now post train post train on the on that reward function that you that you found with those eval and and hopefully just set up an entire look.

Speaker 316:53 - 17:10

而且越来越多地，大家会发现 post-training 和 inference 某种意义上其实是同一个问题的两面。所以因为理想情况下，inference 会催生更多 post-training：也就是 inference 会产生数据，你做 eval（评估），然后你就可以基于那些 eval 里找到的 reward function（奖励函数）再去做 post-train（后训练），并且希望最终把整个闭环建立起来。

Speaker 117:10 - 17:42

Plenty of folks from Ant and OpenAI, Sam, Greg, etcetera, have said in recent months that, like, inference is super strategic, inference talent is strategic, capacity is strategic. So between that and post training, these are very difficult to gather, like, capabilities. Yep. I imagine that lots of your customers go to you guys for advice on like how to do this progression of moving to custom models. Like, what do you tell people about the life cycle and when they should invest in that?

Speaker 117:10 - 17:42

最近几个月，很多来自 Ant 和 OpenAI 的人，Sam、Greg 等等，都说过 inference 非常具有战略意义，inference 相关人才具有战略意义，capacity（算力容量）也具有战略意义。所以把这个和 post-training 放在一起看，这些都是非常难以聚集起来的能力。对。我猜你们很多客户都会来找你们咨询，比如如何完成这条向 custom models（定制模型）迁移的路径。你们通常会怎么跟大家讲这个 life cycle（生命周期），以及他们应该在什么时候为此投入？

Speaker 317:42 - 18:02

Yeah. I I think it's, hey. Go find go prove to yourself with the best in class model that you have something worth optimizing. And and I think, you know, a lot of you know, if a customer comes to us, you you there was was that meme which was like it was like two years ago. It feels like no GPUs, pre product market fit.

Speaker 317:42 - 18:02

对。我觉得核心是：先去用 best-in-class model（同类最佳模型）验证，并向你自己证明，你手上确实有某个值得优化的东西。而且我觉得，你知道，如果有客户来找我们，当时有个 meme（梗图）我记得特别贴切，大概是两年前吧。感觉像是在说：在 product-market fit（产品市场契合）之前，先别上 GPUs。

Speaker 318:02 - 18:07

It's like no post training pre product market fit is what I know. Yeah. Yeah. Yeah. It is what I'd say.

Speaker 318:02 - 18:07

按我现在的理解，这句话更像是：在 product-market fit 之前，先别做 post-training。对，对，对。这就是我的说法。

Speaker 318:07 - 18:07

So people that you're working

Speaker 318:07 - 18:07

所以和你们在这里合作的人

Speaker 118:07 - 18:10

with here are very very at scale first.

Speaker 118:07 - 18:10

都是非常非常以 scale（规模化）优先为先的。

Speaker 318:10 - 18:25

Yeah. They they they have a user signal that they know how to optimize, and they've shown that they can, you know, they can serve customer value and that value and that they have something special around that value. And once you have that value, it's like, okay. Now how can I do that better, faster, and cheaper? With the idea being that, hey.

Speaker 318:10 - 18:25

对。他们已经有明确的 user signal（用户信号），知道该优化什么；而且他们已经证明了自己可以向客户交付价值，这种价值是成立的，并且围绕这种价值他们确实有一些特别之处。一旦你有了这种价值，接下来就是：好，那我怎样才能把这件事做得更好、更快、更便宜？背后的想法就是，嘿。

Speaker 318:25 - 18:35

If you need to be very good at customer support, you can you maybe don't need to be that good at coding and that a specialized model might be a better fit for that problem, and you can do it better, faster, cheaper.

Speaker 318:25 - 18:35

如果你需要把 customer support（客户支持）做得非常好，那你也许不需要把 coding（编程）做得那么强；对于这类问题，specialized model（专用模型）可能反而更合适，而且能做得更好、更快、更便宜。

Speaker 118:35 - 18:47

What about the capacity side? You started with unifying capacity across all the clouds and neo clouds. How do you think about this when everybody keeps talking about a supply crunch and a multiyear supply crunch?

Speaker 118:35 - 18:47

那 capacity（算力容量）这一侧呢？你一开始提到要把所有 clouds 和 neo clouds 的 capacity 统一起来。当大家都在谈 supply crunch（供应紧张），而且还是一个会持续多年的 supply crunch 时，你是怎么思考这件事的？

Speaker 318:47 - 19:18

I think, you know, there's so much narrative around the supply crunch. And no matter like, as much as we hear about it, I don't think people realize how bad it really is. Like, there is, you know, there is very, very little slack compute available. You know, we we run pretty large clusters ourselves, and we run them at, like, uncomfortably high utilization. You know, we when I'm saying we're, mid nineties utilization most of the time.

Speaker 318:47 - 19:18

我觉得，关于 supply crunch 的叙事实在太多了。而且不管我们听到多少，我都不觉得人们真正意识到情况到底有多糟。现在确实几乎没有多少空闲的 compute（算力）可用。我们自己也在运行相当大的 clusters（集群），而且利用率高得让人不太舒服。我的意思是，我们大多数时候的 utilization（利用率）都在 95% 左右，甚至更高。

Speaker 319:19 - 20:10

There is we have made we we have we sit in 18 different clouds now. We have 90 clusters around the world across 18 different clouds. And, like, you know, initially, we started we, like, built this technology to be able to, like, kinda create one runtime fabric that spans all these different clouds and try to abstract that away from our customers as a way to think about reliability, latency, failover, all these things that we think are gonna be very important for very mission critical use cases. That same technology, just our ability to get compute wherever humanly possible has been really, really helpful in our ability to get supply. And and what what I mean by that is we can be introduced to a new provider in a different country and have it up and running with the whole base 10 inference stack.

Speaker 319:19 - 20:10

现在我们已经部署在 18 个不同的 clouds 里了。我们在全球有 90 个 clusters，分布在 18 个不同的 clouds 上。最开始，我们打造这项技术，是为了某种程度上构建一个横跨所有这些不同 clouds 的统一 runtime fabric（运行时网络层），并尽量把这些底层复杂性从客户面前抽象掉，把它作为思考 reliability（可靠性）、latency（延迟）、failover（故障切换）这些问题的方式——而我们认为，对于很多 mission critical（关键任务型）的 use cases（使用场景）来说，这些都会非常重要。同样是这套技术，让我们能够在一切人类可能做到的地方拿到 compute，这对我们获取 supply 真的非常非常有帮助。我的意思是，我们可以被介绍给另一个国家的新 provider（供应商），然后迅速让它连同整套 Base 10 inference stack（推理技术栈）一起上线运行。

Speaker 120:10 - 20:11

As part of the fabric. Yeah.

Speaker 120:10 - 20:11

作为这个 fabric 的一部分。对。

Speaker 320:11 - 20:29

Part fabric in half a day, half maybe less. Even for and that gives us enormous flexibility. Even for us, it is hard for us to grow. We have a we have a I think it's yeah. I'll say it.

Speaker 320:11 - 20:29

纳入这个 fabric，半天就够了，也许连半天都不用。这给了我们极大的灵活性。即便对我们来说，要扩张也依然很难。我们有一个——我想，嗯，还是说吧。

Speaker 320:29 - 21:17

We have a a 4PM standing meeting for the company where we basically, like like, how do we, like, how do we how do we manage capacity for the demand right now? I think the second part which people don't really the two the the second part that people don't really understand is that there are also a lot of suppliers right now that it's kinda grifty. You know, like, I I I think, you know, they haven't run they haven't run data centers before. You know, they don't understand SLAs for especially for inference. And so, like, you know, even when there is capacity available, there's a lot of like, there's probably we we run a lot more than this, and we have redundancy, and so it's fine.

Speaker 320:29 - 21:17

我们公司每天都有一个固定在下午 4 点的 standing meeting（固定例会），基本上就是在讨论：眼下这种 demand（需求）之下，我们到底该怎么管理 capacity？我觉得第二个部分，也是人们其实不太理解的地方，是现在也有很多 suppliers（供应商）带点 grifty（投机、靠不住）的味道。我的意思是，他们以前没有运营过 data centers（数据中心），也不理解 SLA（服务等级协议），尤其是在 inference（推理）场景下更是如此。所以，即便市面上有 capacity 可用，其中也有很多——大概可以这么说，我们实际运行的远不止这点，而且我们有 redundancy（冗余），所以问题不大。

Speaker 321:17 - 21:39

But if you you know, there's probably like a dozen good clouds, and I'd probably put three or four of them in the gold tier. And I think that just means that not only are we supply crunched, we're supplier and operationally crunched onto people who can run these data centers as well.

Speaker 321:17 - 21:39

但如果你真要看，真正好的 clouds 可能也就大概十来家，其中我大概会把三四家放进 gold tier（黄金档）。我认为这意味着，我们不仅仅是 supply crunched（供应受限），在 supplier（供应商）层面和 operationally（运营能力）层面也同样受限——因为真正会运营这些 data centers 的人，也很稀缺。

Speaker 221:39 - 21:48

How far ahead can you actually buy capacity right now? In other words, like, is there any any slack in the market if you buy two years ahead or five years you know,

Speaker 221:39 - 21:48

你现在实际上最早能提前多久买到 capacity（算力容量）？换句话说，比如说，如果你提前两年或者五年去买，市场上还有没有什么 slack（余量）？

Speaker 321:48 - 21:54

what mean, contract length or actually like, hey. I want this in January 28.

Speaker 321:48 - 21:54

你是指 contract length（合约期限），还是更像“我想在 28 年 1 月拿到这个”这种？

Speaker 221:54 - 21:59

Yeah. Either one. Yeah. Yeah. I mean, it's more the I want this in January 28, at least I have some visibility into

Speaker 221:54 - 21:59

对，哪种都行。对，对。我的意思是，更偏向于“我想在 28 年 1 月拿到这个”，至少这样我对

Speaker 321:59 - 22:14

my future supply. Yeah. You could buy that, but you gotta also remember how quickly the market how quickly the market is moving. And, like, you know, that gets balanced somewhat off, like, the fact that the h 100 is such a great chip. Yeah.

Speaker 321:59 - 22:14

我未来的 supply（供应）有一些可见性。对。你可以买这个，但你也得记住，市场变化得有多快，市场变化得有多快。而且，你知道，这在某种程度上也会被平衡掉，比如 h 100 是一款非常出色的 chip（芯片）。对。

Speaker 322:14 - 22:23

And, like and then set you know, it's crazy. If it's four years, four and a years old, the price is going up still. Yeah. Maybe it has a useful life of nine years. Yeah.

Speaker 322:14 - 22:23

而且，然后你知道，这很疯狂。它都已经四年、四年半老了，价格居然还在上涨。对。也许它的 useful life（有效使用寿命）有九年。对。

Speaker 322:23 - 22:53

So, you know, that's good. But at the same time at the same time, you know, yes, you can do that, but, you know, you're making a lot of bets Yeah. As part of that. And then in terms of I think that's the big thing that's changed over the last six months is that the term length that people want has just gone up. So if you if you wanted a thou a thousand thousand 24 b 2 hundreds Mhmm.

Speaker 322:23 - 22:53

所以，你知道，这当然是好事。但与此同时，与此同时，你知道，是的，你可以这么做，但你也确实是在做很多 bet（押注）。对。这其中的一大变化，我觉得是过去六个月里最明显的，就是人们想要的 term length（期限）明显变长了。所以如果你想要一千台 24 b 2 hundreds，嗯哼。

Speaker 322:54 - 23:20

Which is, you know, from a good cloud, right now, you're not getting that less than a three to five year contract Mhmm. Right now with a probably a 20 to 30% t c TCV prepay. So, like, actually, what becomes important when acquiring capacity is you need to have enough demand to supply it and to serve, and then you also need, like, a low cost of capital, which is which is actually changing the dynamic pretty significantly.

Speaker 322:54 - 23:20

也就是，你知道，从一个不错的 cloud（云服务商）那里，现在你拿这个基本不可能少于三到五年的 contract（合同），嗯哼。而且现在大概率还要预付 20% 到 30% 的 t c TCV。所以，实际上，在获取 capacity（算力容量）时，真正变得重要的是：你需要有足够的 demand（需求）来消化并服务这部分供给，同时你还需要较低的 cost of capital（资金成本）；而这一点其实正在相当显著地改变整个市场动态。

Speaker 223:20 - 23:24

Does that does that impact how you think about going public as a company? Because arguably

Speaker 223:20 - 23:24

这会不会影响你对公司 going public（上市）的看法？因为从某种意义上说

Speaker 323:24 - 23:27

Yeah. I think you'd go sooner. Yeah. Exactly. Yeah.

Speaker 323:24 - 23:27

对。我觉得你会更早去做。对。没错。对。

Speaker 323:27 - 23:59

I I think you need like, I I think the and I think there is demand for that. But I think, you know, the pull the you know, and also, you know, one of our one of the one of the realizations that we had recently and we're we're software people. And so we don't we don't think like this all the time is that, you know, our business has, like, very interesting working capital requirements. Like, we don't and and I and I think, you know, even and that as a result of that, it has very interesting financing Yeah. Yeah.

Speaker 323:27 - 23:59

我觉得你需要那样的东西，而且我觉得确实有这方面的需求。但我认为，你知道，驱动力，那个——你知道，还有一点，我们最近意识到的一件事是，我们毕竟是 software（软件）从业者，所以我们平时并不会总是这样思考。就是，你知道，我们的业务在 working capital（营运资金）需求方面非常有意思。我们并不——而且我觉得，你知道，甚至——也正因为如此，它在融资方面也有一些非常有意思的特点。对。对。

Speaker 323:59 - 24:03

Requirement. And we're not at least right now, we're not even going down to the debt.

Speaker 323:59 - 24:03

是资金需求。而且至少就现在来说，我们甚至还没有走到 debt（债务融资）那一步。

Speaker 224:03 - 24:06

There's also other things we could do in terms of debt or other structures. Yeah.

Speaker 224:03 - 24:06

在 debt（债务）或者其他结构方面，我们其实也还可以做别的事情。对。

Speaker 324:07 - 24:10

Yeah, I've learned a lot about debt recently.

Speaker 324:07 - 24:10

对，我最近学了很多关于 debt（债务）的东西。

Speaker 124:10 - 24:35

Given the supply crunch, in France being one of, you know, the top couple markets you've been going after, you have plenty of people who understand this problem and therefore, you know, some competition. How do you think about, like, what are the factors that create a dominant player here or a winning player? Is it, as you mentioned, cost of capital? Is it access to supply? Is it software?

Speaker 124:10 - 24:35

考虑到 supply crunch（供应紧张），而 France 又是你们一直在重点推进的前两大市场之一，你们面前显然有很多人理解这个问题，因此也就存在一些竞争。你们是怎么思考这个问题的——也就是说，什么因素会造就这里的 dominant player（主导者）或者 winning player（胜出者）？是像你提到的那样，cost of capital（资本成本）吗？是 access to supply（获取供应的能力）吗？还是 software（软件）？

Speaker 124:35 - 24:39

Is it demand? Yeah. Is it just being excellent at everything?

Speaker 124:35 - 24:39

是 demand（需求）吗？对。还是说，单纯就是要把每一件事都做到极致？

Speaker 324:39 - 24:44

Yeah. Look. I I I think what's so interesting about inference is

Speaker 324:39 - 24:44

对。你看。我我我觉得 inference 最有意思的地方在于：

Speaker 124:44 - 24:45

g Is it operations? I guess,

Speaker 124:44 - 24:45

g 是运营吗？我猜是。

Speaker 324:45 - 24:51

such as cloud. Think so. Yeah. I think, like, GPUs as a service is not sticky. I think that's been seen.

Speaker 324:45 - 24:51

比如 cloud。我也这么想。对。我觉得，像 GPUs as a service 并不具备粘性。我觉得这一点已经被证明了。

Speaker 324:51 - 25:12

Like, customers generally just see that as as commodity. Imprint with the software layer included is incredibly sticky. You know, like, just just like, you know, none of our top 30 customers have ever churned. You know, we're talking like 400% annual NDR Mhmm. Around our business.

Speaker 324:51 - 25:12

比如说，客户通常就是把那看成 commodity（大宗商品/标准化商品）。而 Imprint 加上软件层之后，粘性就高得惊人。你知道，就像——我们前 30 大客户里，没有一个流失过。我们说的是大约 400% 的 annual NDR。嗯哼。就是我们业务大概是这种水平。

Speaker 325:12 - 25:43

And so it's, like, very it's it's very, very sticky. So I think that software layer is very important. The optimist in me is like, oh, there's so much value in the software, and I we will build the best software layer for inference that exists. Think, you know, as I think it's becoming clear now, access to inference computers is a strategic advantage. And I think that is the strategy that even the labs are going after, which is like, if we have the if if we have all the compute, good luck running inference.

Speaker 325:12 - 25:43

所以它就是，非常——非常非常有粘性。所以我觉得那个软件层非常重要。我内心乐观的一面会想，哦，软件里有这么多价值，而我们会打造出目前最好的 inference（推理）软件层。我想，你知道的，正如我觉得现在已经越来越清楚的那样，对 inference computers 的访问权是一种战略优势。而我认为这也是连那些 labs 都在追求的策略：如果我们掌握了全部 compute（算力），那你想跑 inference，就祝你好运吧。

Speaker 225:43 - 25:52

Yeah. Yeah. And in a world of constrained compute, the number one thing to own is compute. Yeah. And so, you know, just owning it in and of itself is an asset, and I think people underappreciate that.

Speaker 225:43 - 25:52

对，对。在一个 compute（算力）受限的世界里，最应该掌握的东西就是 compute。对。所以，你知道，仅仅是拥有它本身就是一种资产，而我觉得这一点被很多人低估了。

Speaker 325:52 - 25:55

Yeah. You make a good hot chocolate without milk. Unless

Speaker 325:52 - 25:55

对。没有牛奶你也能做出不错的热巧克力。除非——

Speaker 225:57 - 25:57

you're vegan.

Speaker 225:57 - 25:57

你是 vegan。

Speaker 325:57 - 26:01

Unless you're a vegan. No one wants a vegan inference. Yeah.

Speaker 325:57 - 26:01

除非你是个 vegan。没人想要 vegan inference。对。

Speaker 126:02 - 26:28

Well, I got to ask you, people might want alternative milk, right? So, okay, when you, the H100 is a great chip, people, you know, want a B200, they want GB200, they want, of course, tons and tons of Nvidia. When you think about making a bet, you know, several years in the future, do you believe that there's a, like, multi chip world? Like, what do you what do you think happens from a compute perspective on the chip side?

Speaker 126:02 - 26:28

嗯，我得问你一个问题，人们可能会想要替代牛奶，对吧？所以，好吧，当你看时，H100 是很棒的 chip（芯片），大家，你知道，会想要 B200，他们想要 GB200，他们当然还想要大量大量的 Nvidia。那当你考虑做一个押注，比如押注几年的未来时，你是否相信会出现一种多 chip 的世界？比如，从计算的角度看，在 chip 这一侧你觉得会发生什么？

Speaker 326:29 - 26:41

Yeah. I think, you know, like, diversification everywhere is a Mhmm. Same way I wanna water many models. I think, you know, we wanna water many most things. And I think

Speaker 326:29 - 26:41

对。我觉得，你知道，像是到处都要做多样化，这是一种——嗯哼。就像我想给很多 model（模型）浇水一样。我觉得，你知道，我们想给大多数东西都浇水。我觉得——

Speaker 126:41 - 26:42

You'd be sad if it didn't happen.

Speaker 126:41 - 26:42

如果这没有发生，你会很失望。

Speaker 326:42 - 26:54

Yeah. And I and I think everyone would be sad. I I will say to some extent, which is yeah. And I think there will be inference specific chips. I think you have, like, decode specific chips.

Speaker 326:42 - 26:54

对。而且我也觉得每个人都会失望。我要说的是，在某种程度上，确实如此。对。而且我认为会出现专门用于 inference（推理）的 chip。我觉得你还会有那种专门用于 decode（解码）的 chip。

Speaker 326:54 - 26:55

Think and we're we're looking at that.

Speaker 326:54 - 26:55

我是说，我们正在看这个方向。

Speaker 126:55 - 26:56

And NVIDIA said this.

Speaker 126:55 - 26:56

而且 NVIDIA 也这么说过。

Speaker 326:56 - 27:15

Yeah. Yeah. I mean, that was that was a whole Croc LP thing. It's like, you know, I I think I think that is very straightforward and and makes sense. I think people really, really, really underestimate supply chain stuff with NVIDIA, like how good they are at that CUDA, how good CUDA is, the developer ecosystem around it.

Speaker 326:56 - 27:15

对，对。我的意思是，那整个就是 Croc LP 那套说法。就像，你知道，我觉得，我觉得那非常直接，而且也讲得通。我认为人们真的、真的、真的低估了 NVIDIA 在供应链方面的东西，比如他们在这方面有多强，CUDA 有多强，以及围绕它的 developer ecosystem（开发者生态系统）有多强。

Speaker 327:16 - 27:52

And, you know, we it the ability like, to me, like, one of the most important things as an infrastructure company in this moment is how fast you can move. And you can move fastest with NVIDIA today. And I think that is the reality. Like, it just, like, given the scale that they operate at given the scale that they operate at, it's it's hard to it's hard to see it's hard to see the the and I'm not saying it won't happen, like, the short term like, in the next couple years, how anyone's gonna be able to compete with that. Especially with, you know, so much of the other the other players.

Speaker 327:16 - 27:52

而且，你知道，在我看来，作为一家基础设施公司，在这个时刻最重要的事情之一，就是你推进的速度能有多快。而今天你用 NVIDIA 能移动得最快。我觉得这就是现实。就像，考虑到他们运营的规模，考虑到他们运营的规模，很难——很难看出，虽然我不是说这不会发生——在短期内，比如未来几年里，会有谁能够与之竞争。尤其是在其他那些参与者也都存在的情况下。

Speaker 327:52 - 28:19

Like, what you need to be able to compete here is the ecosystem to form around you. And if you tie up all your supply with one buyer, which, you know, a bunch of the other chip providers have done, it's actually hard for that ecosystem to form. You know? Like, if you if you think about if if you're a big lab and you have a proprietary deal with one chip type where you get 90% of the supply, it's actually in your best interest to make sure you get 95% of the supply to everything that's built for you, no one else could ever use

Speaker 327:52 - 28:19

比如说，你要想在这里具备竞争力，真正需要的是一个围绕你形成的 ecosystem（生态系统）。如果你把所有 supply（供应）都绑定给一个 buyer（买方），而你也知道，其他很多 chip（芯片）供应商就是这么干的，那这个 ecosystem 其实就很难形成。你明白吗？比如，如果你想想看，假设你是一个大型 lab（实验室），并且你和某一种 chip 签了专有协议，拿到了 90% 的 supply，那实际上你最符合自身利益的做法，就是确保你拿到 95% 的 supply，这样一来，所有为你构建的东西，别人就根本用不了。

Speaker 128:19 - 28:37

When you think about reacting to the market, what do you think is like happening with the actual workloads that you have to go invest in? Right? Like obviously code agents and long horizon agents over time have become a big deal. People talk a lot more about CPU compute, video inference is different.

Speaker 128:19 - 28:37

当你考虑如何对市场作出反应时，你会怎么看那些你实际必须去投资的 workloads（工作负载）正在发生什么变化？对吧？显然，code agents（代码 agent）和 long horizon agents（长周期 agent）随着时间推移已经变得非常重要。现在人们也更多在谈 CPU compute（CPU 计算），video inference（视频推理）也不一样。

Speaker 328:37 - 28:38

Yep.

Speaker 328:37 - 28:38

对。

Speaker 128:38 - 28:42

I don't know if it's that sandbox. Just like what what's important for you guys to invest in now?

Speaker 128:38 - 28:42

我不知道是不是就那个 sandbox（沙箱）的问题。更直接地说，你们现在需要重点投资的到底是什么？

Speaker 328:42 - 28:54

Yeah. Look, I I I think there's for for us, all the runtime stuff is obviously very important. And what that means is like what chips we run on, how we run, what kind of workloads we support. Like, do we get very good at diffusion transformers? Yes.

Speaker 328:42 - 28:54

对。你看，我我我觉得，对我们来说，所有 runtime（运行时）相关的东西显然都非常重要。这意味着，比如我们跑在什么 chips（芯片）上、我们怎么运行、我们支持什么 kinds of workloads（类型的工作负载）。比如，我们要不要把 diffusion transformers 做得非常强？要。

Speaker 328:55 - 29:17

Coding agents need sandboxes. We should go build sandboxes. There's all sorts of new speculation techniques to get faster inference. We need to do that. Even stuff like KBCatch aware routing and, you know, that stuff's a bit old now, but, like, getting continuing to be very good at that and somewhat disentangling pre fill and decode and starting to treat them as separate problems.

Speaker 328:55 - 29:17

Coding agents（编码 agent）需要 sandboxes（沙箱）。那我们就应该去做 sandboxes。还有各种新的 speculation techniques（推测技术）可以让 inference（推理）更快。我们需要去做。甚至像 KBCatch aware routing 这样的东西，你知道，这类东西现在算是有点老了，但我们还是要继续把这件事做得非常好，并且在一定程度上把 prefill 和 decode 解耦，开始把它们当作两个独立问题来处理。

Speaker 329:17 - 29:58

Think that's, you know, something we are very focused on, and we're seeing massive gains there. That's at the runtime level. I'd say, you know, beyond that, you know, everything we think about is how to create more of that loop between inference, post training, because we think that just begets more inference. And so, like, we we will build a partner in almost everything there. So, you know, we're gonna work with, you know, the best Evalis company in the world to make sure that's very well integrated, like Braintrust into and around Base 10, you know, we will partner with all on the sandbox society, build build the best sandbox experience that will exist.

Speaker 329:17 - 29:58

我觉得这是我们非常关注的事情，而且我们在这方面看到了巨大的提升。这是 runtime 层面的事。我会说，再往外看，我们思考的一切基本都围绕着如何在 inference 和 post training（后训练）之间建立更多那种 loop（闭环），因为我们认为那会自然带来更多 inference。所以，比如，在这方面几乎所有事情上我们都会去 build a partner（建立合作伙伴关系）。所以，我们会和世界上最好的 Evalis company 合作，确保它能像 Braintrust 那样，与 Base 10 里里外外都实现非常好的集成；我们也会和 sandbox 生态里的各方合作，打造出最好的 sandbox 体验。

Speaker 329:58 - 30:21

And then we'll create the the best training APIs to make it so continual learning becomes somewhat of a solved problem. It's not just like a discrete thing. That's, I think, the core base 10 product thesis. It's like how do we build that loop? Then everything else out around that becomes how do we make sure that we can do everything we can to ensure that gets as big as possible.

Speaker 329:58 - 30:21

然后我们会打造最好的 training APIs（训练 API），让 continual learning（持续学习）在某种程度上变成一个基本被解决的问题，而不只是一次性的、离散的事情。我认为，这就是 Base 10 产品的核心 thesis（产品论点）。也就是：我们如何把那个 loop 建起来？而围绕它展开的其他一切，都是在回答：我们怎样确保自己能做所有能做的事，让这个东西尽可能做大。

Speaker 330:22 - 30:47

That's access to compute. That's on infrastructure. Make sure we can get compute anywhere. Make sure we have access to our own compute. And then I think it's all the primitives that come off of that just that just become incredibly, like, margin accretive both for us and our customers, which is, you know, stuff like, you know, sandboxes and, like, the async batch inference, like, how do we drive utilization by having a first class batch inference experience.

Speaker 330:22 - 30:47

这包括获取 compute（算力）的能力，包括 infrastructure（基础设施）层面的建设。要确保我们在任何地方都能拿到 compute，也要确保我们能获得自己的 compute。然后我觉得，接下来就是从这些基础之上衍生出来的各种 primitives（基础能力），它们会同时为我们和客户显著提升 margin（利润率），比如 sandboxes，还有 async batch inference（异步批量推理）——比如我们要如何通过提供 first class（一级、原生级）的 batch inference 体验来提升 utilization（利用率）。

Speaker 330:47 - 31:07

To me, this is, what an inference cloud looks like. It's that you are very good at inference, and then you you start to do all the things tangential or that loop into inference and partner and where necessary and build where necessary. But we really do wanna own like, start with that core inference story and then go down to unblock supply, accrete margin, and go up the stack to unlock value.

Speaker 330:47 - 31:07

对我来说，这就是 inference cloud（推理云）应有的样子。也就是说，你首先非常擅长 inference（推理），然后再去做所有那些与推理相邻、或会回流到推理的事情；该合作的地方就合作，该自建的地方就自建。但我们确实想要掌握的，是先从那个核心的 inference 叙事出发，再向下打通 supply（供给）、积累 margin（利润率），并向上走到 stack（技术栈）更高层去释放价值。

Speaker 131:08 - 31:31

What would surprise people about some of the issues you discover only at scale? I'll give you an example. I was surprised when you guys ran into scale limitations, like fundamental limitations with some of the hyperscaler products that you were consuming. Yeah. And I'm because I kind of think of, you know, the AWS GCPs of the world as supporting infinite scale.

Speaker 131:08 - 31:31

在你们只到了 scale（规模化）之后才发现的一些问题里，有什么是会让大家感到意外的吗？我举个例子。让我意外的是，你们居然会碰到 scale 的限制，比如你们正在使用的一些 hyperscaler（超大规模云服务商）产品本身的基础性限制。对。因为在我看来，像 AWS、GCP 这样的公司，应该是可以支持近乎无限规模的。

Speaker 331:31 - 31:39

Yeah. I mean, I I think you just and, like, again, like, I think very, very large companies that run services of big scale is probably the same stuff.

Speaker 331:31 - 31:39

对。我的意思是，我觉得你只是——而且再次强调一下，我觉得那些运行超大规模服务的非常非常大的公司，大概也都会遇到同样的事情。

Speaker 231:39 - 31:39

Mhmm.

Speaker 231:39 - 31:39

嗯。

Speaker 331:39 - 31:42

Is that all the edge cases just become

Speaker 331:39 - 31:42

就是所有那些 edge case（边缘情况）都会开始变成

Speaker 131:43 - 31:44

actually experience them.

Speaker 131:43 - 31:44

你会真的遇到它们。

Speaker 331:44 - 32:12

You experience them. And, like, you know and you I'll give you a few examples here. Like, you you see you know, you start seeing you know, yesterday, we had for the first time ever, we saw some kernel panic. And that only happened because some fluent bit worker was creating too many logs and and the scale was too big and it was all into one node. And it was happening two two two terms at the same time by two different workers.

Speaker 331:44 - 32:12

你会亲身遇到它们。而且，你知道——我可以举几个例子。比如说，你会开始看到——就拿昨天来说，我们第一次遇到了 kernel panic（内核恐慌）。而那之所以发生，只是因为某个 fluent bit worker 产出了太多日志，规模又太大，结果全都打到同一个 node（节点）上。而且当时是两个不同的 worker 同时在发生这件事。

Speaker 332:12 - 32:54

So you see all, like, the systems level and current level problems. But then you start to see I think the the craziest stuff is that you start to see with with LLMs that these runtimes are pretty immature. Even how we use KV cache is, you know you know, probably a little less sophisticated than most people see than most people see. We we we are starting to see the the limitations of the current and the next set of primitives that need to be built from a scale security performance perspective. But I I think it's really at the runtime level and the systems level and then but the edge cases are, I'd say, a lot more systems level than they are LLM specific.

Speaker 332:12 - 32:54

所以你会看到各种 system level（系统层）和 current level（当前层）的问题。但接着你会开始看到，我觉得最疯狂的是，在 LLM（大语言模型）这边，这些 runtime（运行时）其实非常不成熟。甚至连我们使用 KV cache（键值缓存）的方式，你知道，可能都还没有大多数人想象得那么复杂、那么成熟。我们正在开始看到当前这套 primitive（基础原语）的局限，也看到从规模、安全和性能角度出发，下一批必须被构建出来的 primitive 会是什么。但我觉得问题确实主要还是出在 runtime 层和 system 层；而那些 edge case，我会说更多是系统层面的，而不是 LLM 特有的。

Speaker 132:54 - 32:56

What are the things that keep you up at night?

Speaker 132:54 - 32:56

有哪些事情会让你夜里睡不着？

Speaker 332:56 - 32:58

Capacity. Think I I think, you know

Speaker 332:56 - 32:58

产能。我想，我是说，我觉得，你知道——

Speaker 132:58 - 32:59

Quick answer.

Speaker 132:58 - 32:59

简短回答。

Speaker 332:59 - 33:22

Yeah. Yeah. I think capacity. I I think the other one is probably just this market's so big, and it's so like, it it represents a moment when you should be as aggressive as possible. And, you know, really, you know, we've we've grown a ton obviously over the last twelve months, the last few months, but the answer is always just go, you know, go bigger, go faster.

Speaker 332:59 - 33:22

对，对。我觉得是产能。我觉得另一个可能就是，这个市场实在太大了，而且它非常像是一个你应该尽可能激进推进的时刻。而且，你知道，过去十二个月、过去几个月里，我们显然已经增长了很多很多，但答案永远都是继续推进，你知道，做得更大，做得更快。

Speaker 333:22 - 33:45

And I think that's really, really fun. It's also a little exhausting, and it's also like we are we are all in somewhat uncharted territory in terms of how fast and how big you can go and how things can get. But I but I think the big one is compute. I think, like, there's no world in which there's enough compute to, you know, get the amount of the amount of value that we wanna get out of LLMs in the next five to ten years.

Speaker 333:22 - 33:45

我觉得这真的非常非常有趣。它也有点让人疲惫，而且也意味着，我们在“到底能多快、多大规模地推进，以及事情会发展到什么程度”这件事上，多少都处在某种未知领域里。但我觉得最大的问题还是 compute（算力）。我觉得，在未来五到十年里，不存在一个世界能有足够的 compute，让我们从 LLMs（大语言模型）中获得我们想要获得的那种规模的价值。

Speaker 133:45 - 33:47

We have to invent a lot of new stuff.

Speaker 133:45 - 33:47

我们必须发明很多新东西。

Speaker 333:47 - 33:47

Yeah.

Speaker 333:47 - 33:47

对。

Speaker 133:48 - 34:19

Maybe if we just talk a little bit about what you're learning scaling. You know, 30x is like an aggressive thing to go through as a company. You've brought in a lot of really amazing talent like Danny and Samir and Stephen Day, folks on both the technical and the go to market side. Like, what do you what do you think is working about how you are recruiting and scaling, or or what's your philosophy on that?

Speaker 133:48 - 34:19

也许我们可以稍微谈谈，你在 scaling（规模化扩张）这件事上学到了什么。30 倍增长，对一家公司来说是一个相当激进的过程。你们招来了很多非常厉害的人才，比如 Danny、Samir 和 Stephen Day，既有技术侧的，也有 go to market（市场推进）侧的。所以，你觉得你们在 recruiting（招聘）和 scaling（扩张）方面，哪些做法是有效的？或者说，你在这方面的理念是什么？

Speaker 334:19 - 34:36

We were very, very flat, like, until, I don't know, twelve to eighteen months ago. I remember I went on a walk with a lot actually, and a lot of it was like, you just need leaders. And and, like, it's actually, like, so contrary to everything. You know? As engineers, you're like, oh.

Speaker 334:19 - 34:36

一直到大概十二到十八个月前，我们的组织都非常非常扁平。我记得我去散步时和很多人聊过很多次，其中很大一部分内容都是：你就是需要 leaders（管理者/领导者）。而且，这其实和一切原本的直觉都很相反。你知道吧？作为工程师，你会觉得，哦。

Speaker 334:36 - 34:41

It's all overhead. Everything overhead. Is overhead. I and I You

Speaker 334:36 - 34:41

全都是 overhead（管理／协调等间接成本）。一切都是 overhead。都是 overhead。I and I You

Speaker 134:41 - 34:48

once told me, I think, that you you didn't you're like, hey, Sarah. Sarah, what about we just have engineers instead of salespeople? Yeah.

Speaker 134:41 - 34:48

你以前好像跟我说过，我记得，大意是你当时会说，嘿，Sarah。Sarah，要不我们干脆只要 engineers（工程师），不要 salespeople（销售）？对。

Speaker 334:48 - 34:51

Yeah. Yeah. That. It'd a

Speaker 334:48 - 34:51

对，对，就是那个意思。那会是个——

Speaker 234:51 - 34:52

Everybody learns it. Everyone learns

Speaker 234:51 - 34:52

每个人都会学到这一点。所有人都会学到

Speaker 334:52 - 35:13

the same. And we're we're all done that. But I remember, like, you know, you you said it so clearly at the time a lot, and I and I think that's what we've noticed. We're just, like, actually having a leadership team that you can trust that you can trust is is is so important. I I think the the two zero three things that I'll say is, like, you want people where you can give them whole problems.

Speaker 334:52 - 35:13

都是一样的。我们也都经历过。但我记得，当时你真的把这件事讲得特别清楚，我觉得这也是我们观察到的：真正拥有一个你能信任、而且确实值得你信任的 leadership team（领导团队）是非常重要的。我想我会说两三点：你需要的是那种你可以把完整问题直接交给他们的人。

Speaker 335:14 - 35:45

And to like, you know, if you feel like you are micromanaging, if you feel like you need if you feel like, you know, you have to be involved in everything, I think that's a bit of a cop out as a founder because you're just like, I just need to be involved in everything. It's like, no. You you probably don't have the right people. I think the second thing is be very, very clear what you're optimizing for. Because I think when you're very, very clear what you're optimizing for, the people on and, like, if it's something generic, like, we want the smartest, hardworking people, like, you can't do much with that.

Speaker 335:14 - 35:45

而且，你知道，如果你觉得自己在 micromanaging（事无巨细地管理），如果你觉得自己必须——如果你觉得，凡事你都得参与，我认为作为 founder（创始人），这多少有点像是在给自己找借口，因为你其实只是在说：我就是得参与所有事情。其实不是这样。更可能是你没有找对人。我觉得第二点是，要非常、非常清楚你在优化什么。因为当你非常、非常清楚自己在优化什么时，团队里的人——如果那只是一些很泛的话，比如“我们想要最聪明、最勤奋的人”，那其实没什么可操作性。

Speaker 335:45 - 36:02

Like, with us, what we cared about was, hey. Actually, we don't care about a lot of people who have done this before. We care about first print people who think from first principled print first principles. Work has to be a high priority, but they also have to be very kind and nice and, you know, care about the collaborative environment. We don't have a hero culture.

Speaker 335:45 - 36:02

比如对我们来说，我们在乎的是：嘿，实际上，我们并不在乎很多“以前做过这件事”的人。我们在乎的是那种从 first principles（第一性原理）出发思考的人。工作必须是他们的高优先级事项，但他们也必须非常善良、友好，并且在乎协作环境。我们没有 hero culture（英雄文化）。

Speaker 336:03 - 36:41

You know, very low ego. And, you know, if you need if you need a manager, like, it's probably not it's probably not the right place to be. But I think once you have when you have that clear rubric, the the people become very apparent that will fit into it, the people that don't fit into it also become very apparent. And I think what's more like, we've hired amazing people like you mentioned, but I think what's a lot more interesting is, like, I think we've we haven't had a ton of, like, turnover there unnecessarily. Like, peep people tend to work because we because we have a very we are very clear on what we want.

Speaker 336:03 - 36:41

你知道，就是 ego（自我）要很低。而且，如果你需要一个 manager（管理者）来一直带着你，那这里可能就不是合适的地方。但我认为，一旦你有了这样清晰的 rubric（评判标准），哪些人适合它就会变得非常明显，哪些人不适合也会变得非常明显。而且我觉得，更有意思的是，就像你提到的，我们确实招到了很棒的人，但我认为更值得一提的是，我们并没有出现很多不必要的 turnover（人员流失）。人们往往会留下来工作，因为我们对自己想要什么非常清楚。

Speaker 336:41 - 36:44

It took us a while to get there though.

Speaker 336:41 - 36:44

不过，我们确实花了一段时间才走到那一步。

Speaker 136:44 - 36:55

What about the idea of, like, an operations culture? You know, we were talking to Alyssa and Henry about this, and she's like, well, the hard thing about cloud is actually just operations. I slept with a pager under my pillow for a decade.

Speaker 136:44 - 36:55

那你怎么看“运营文化（operations culture）”这个想法？你知道的，我们之前在和 Alyssa 还有 Henry 聊这个，她就说，cloud 真正难的其实就是 operations。我有整整十年都把 pager 放在枕头底下睡觉。

Speaker 336:55 - 36:56

I don't

Speaker 336:55 - 36:56

我不——

Speaker 136:56 - 36:58

think I've seen you detached from your Slack channel.

Speaker 136:56 - 36:58

觉得我见过你脱离自己的 Slack 频道的时候。

Speaker 336:58 - 37:03

Yeah. My phone is buzzing right now. Yeah. I'm kidding. That's just not a step one.

Speaker 336:58 - 37:03

对。我手机现在就在震。对，我开玩笑的。但那绝不是第一步。

Speaker 337:03 - 37:05

Yeah. I'm anxious. So

Speaker 337:03 - 37:05

对，我会焦虑。所以——

Speaker 137:06 - 37:12

And you've been concerned before. Like, do people get it? Like, know, what is distinctive about that?

Speaker 137:06 - 37:12

而且你之前也担心过。比如，人们真的理解这一点吗？比如，他们知道这里面真正与众不同的地方是什么吗？

Speaker 337:12 - 37:37

I think I think, like, one, I think if you've worked at an infrastructure company like, we were once in a meeting with a bunch of AWS execs, and this was, you know, like, very senior AWS folks. All their pages went off multiple times during our forty five minute meeting. You know? Like, it's a I I I think, like, it's it's it's very much just a cultural thing. But, yeah, like, I I don't you know, our, like, inference can go down.

Speaker 337:12 - 37:37

我想，我想，首先，我觉得如果你在一家基础设施公司（infrastructure company）工作过的话——我们有一次和一群 AWS 高管开会，那次你知道，来的都是非常资深的 AWS 人士。在我们那场四十五分钟的会议里，他们所有人的 pager 都响了好几次。你知道吧？这——我我我觉得，这在很大程度上就是一种文化上的事情。不过，是的，我我不知道——我们的 inference（推理）确实也可能会挂掉。

Speaker 337:37 - 37:58

And, like, you know, we you know, you learn to like, you know, what's this? Like, I think, Amir, my cofounder, when his pager goes off, his seven year old said, is that a p zero? Oh, that is that is that a p zero? And so, you know, I I think that is you just have to get used to it. That's the culture you live in.

Speaker 337:37 - 37:58

而且，就像，你知道的，我们会、你知道，你会慢慢学会去适应这种事，比如，这是什么？我觉得，Amir，我的 cofounder（联合创始人），有一次他的 pager 响了，他七岁的孩子就说，这是个 p zero 吗？哦，那是、那是个 p zero 吗？所以，你知道，我觉得这就是你必须去习惯的。这就是你所处的文化。

Speaker 337:58 - 38:10

It just changes the speed, but it also it's you know, it becomes like a, you know, a cultural thing. I I think it's very, very it rejects people that don't fit into it very, very quickly.

Speaker 337:58 - 38:10

它改变的不只是速度，而且它也会、你知道，变成一种、你知道，一种文化层面的东西。我觉得它会非常、非常快地排斥掉那些不适应这种文化的人。

Speaker 138:10 - 38:13

Like engineers who avoid patriotism.

Speaker 138:10 - 38:13

比如会回避这种爱国主义的人所对应的 engineers（工程师）。

Speaker 338:13 - 38:20

Yeah. You know, when we when we have p zeroes, we're like, everyone on call. Like, you know, like, there's been a joke that there may as well be a siren that goes off in the office when when when they're interested. So

Speaker 338:13 - 38:20

对。你知道，当我们遇到 p zero 的时候，我们就会说，所有 on call（值班）的人全上。就像，你知道，大家还开玩笑说，办公室里干脆应该在他们开始关注这件事的时候直接响起警报器算了。所以——

Speaker 138:21 - 38:26

So people have been talking ad nauseam in the AI community about Jevan's paradox

Speaker 138:21 - 38:26

所以，AI 社区这些天一直在没完没了地谈论 Jevons paradox。

Speaker 338:26 - 38:26

Yeah.

Speaker 338:26 - 38:26

对。

Speaker 138:27 - 38:51

Where if you decrease the cost of it's really a question around price elasticity and availability. If you decrease the cost of a good, say intelligence as a good, people actually consume more of it. Yeah. Like the personal or business ROI of it, the demand for it goes up, Do not you see this? And are you are you working against yourself trying to make these models more efficient?

Speaker 138:27 - 38:51

这里其实是一个关于 price elasticity（价格弹性）和 availability（可获得性）的问题。如果你降低一种商品的成本——比如把 intelligence 当作一种商品——人们实际上会消费得更多。对吧。比如它对个人或企业的 ROI（投资回报）提高了，对它的需求就会上升，你不也是这么看的吗？而且你们在努力让这些 model（模型）更高效的时候，难道不是在某种程度上和自己对着干吗？

Speaker 138:51 - 38:54

Do people just use them more or less?

Speaker 138:51 - 38:54

人们到底只是会更多地用它们，还是更少地用它们？

Speaker 338:54 - 39:16

Yeah. I I think you think about this from a developer's perspective and a consumer perspective. I think, like, I think consumers just want the best answers and the and the and the best experience that's somewhat governed by, yeah, more intelligence to some extent. I think when you go to the developers from the developer's perspective, they would insert more intelligence if you make it cheaper. Like, that's yeah.

Speaker 338:54 - 39:16

对。我觉得你要从 developer（开发者）的视角和 consumer（消费者）的视角分别来看这个问题。我觉得，像 consumer 一样想的话，他们只是想要最好的答案，以及最好的体验，而这在某种程度上，没错，是由更强的 intelligence 来驱动的。我觉得如果从 developer 的视角看，如果你让它更便宜，他们就会往里面加入更多的 intelligence。对，差不多就是这样。

Speaker 339:16 - 39:30

And they will they will they will insert more intelligence anyway. Mhmm. But if you make it in more cheaper, they'll they'll insert a hell of a lot more intelligence. You see this with agents. Agents are just longer running now.

Speaker 339:16 - 39:30

而且无论如何，他们都会加入更多 intelligence（智能）。嗯。但如果你把它做得更便宜，他们就会加入多得多的 intelligence。你在 agent（智能体）上已经看到了这一点。现在的 agent 只是运行时间更长了。

Speaker 339:30 - 40:12

And I think that's what we have seen with the cost of inference going down, which is, you know, folks are just like, okay. We can we can run this for longer or we can make it do a bit more work, and we'll get to a larger end. I think, like, if like, compute scales from an inference perspective as well. And, you know, I think we are seeing that with almost all our customers, which is, you know, they either they either start with, like, this is the quality of answer I I need to get to, and this is the amount of inference I need to do to get there, or this is the base level model that I can start with that I can work with to get there. And I think the more we drive down the costs, what they realize is more intelligence just means better using I just want

Speaker 339:30 - 40:12

我觉得这就是我们在 inference（推理）成本下降时看到的情况。你知道，人们会觉得，好，那我们可以让它运行得更久一点，或者让它多做一点工作，这样就能达到更大的结果。我认为，compute（算力）从 inference 的角度看同样是会扩展的。而且，你知道，我觉得我们几乎在所有客户身上都看到了这一点：他们要么是先从“这是我需要达到的答案质量，这是我为达到它所需要做的 inference 量”出发；要么是从“这是我能用来起步、并最终达到目标的基础模型”出发。我觉得，随着我们不断压低成本，他们意识到的是，更多 intelligence 其实就意味着更好地使用——我只是想要

Speaker 140:12 - 40:12

a better answer.

Speaker 140:12 - 40:12

一个更好的答案。

Speaker 340:12 - 40:29

Answers, better experiences, more dollars, more dollars, more revenue. Yeah, I think inference going down just begets more. But it is truly I think we're kind of in a world that it is is the last market. Right? Even if there's AGI, all that's left is inference.

Speaker 340:12 - 40:29

更好的答案、更好的体验、更多 dollars（美元）、更多 dollars、更多 revenue（收入）。对，我觉得 inference 成本下降只会催生更多需求。但我认为这确实是——我们某种程度上处在一个“这是最后一个市场”的世界里。对吧？即使有 AGI，最后剩下的也只是 inference。

Speaker 140:29 - 40:38

Yeah. So you do not see in your customers a, this answer is enough and this action is enough dynamic?

Speaker 140:29 - 40:38

对。所以你并没有在客户身上看到一种“这个答案已经够了”“这个动作已经够了”的动态，是吗？

Speaker 240:38 - 40:40

No. Yeah. It's gonna keep going for

Speaker 240:38 - 40:40

没有。对。这种趋势还会持续

Speaker 340:40 - 40:41

a long time.

Speaker 340:40 - 40:41

很长时间。

Speaker 240:41 - 41:01

How do you view all this evolving towards the future? So basically, seems like it's gonna be one of the biggest markets of all times. We have this massive shift where we're moving from software and seats and digitization into actual intelligence, selling units of cognition, selling agentic workflows. What does this all look like in a couple years? What is your view of this future world?

Speaker 240:41 - 41:01

你怎么看这一切在未来会如何演变？基本上，这看起来会成为有史以来最大的市场之一。我们正在经历一个巨大的转变：从 software（软件）、seats（席位）和 digitization（数字化），走向真正的 intelligence，走向售卖 cognition（认知）单位、售卖 agentic workflows（智能体工作流）。再过几年，这一切会是什么样子？你对这个未来世界的看法是什么？

Speaker 341:01 - 41:20

I think for the for consumers, it's the best possible thing. Right? Like, every everything is somewhat smarter. You get better care because your doctors have access to better tools. There's all this stuff about there being less software engineers, and I think we just build more software.

Speaker 341:01 - 41:20

我觉得对消费者来说，这是尽可能最好的事情。对吧？比如说，所有东西都会变得更聪明一些。你会获得更好的照护，因为你的医生能够使用更好的工具。还有很多人在说软件工程师会变少，但我觉得我们只是会构建更多软件。

Speaker 341:21 - 41:34

I think we just build a ton more software. Like, you know, I see know, we're not slowing down hiring software engineers. We're just building more things. And think that for the consumers, that just means better tools, more software, all those all those good things.

Speaker 341:21 - 41:34

我觉得我们就是会构建大量更多的软件。比如，你知道，我看到的情况是，我们并没有放慢招聘软件工程师的速度。我们只是正在构建更多东西。而我觉得，对消费者来说，这就意味着更好的工具、更多的软件，以及所有这些好处。

Speaker 241:34 - 41:41

It's almost like everybody has their own team for everything. Right? You have an agent which helps with your doctor. You have an agent that helps you learn stuff. You have an agent that helps you organize your life.

Speaker 241:34 - 41:41

这几乎就像每个人在每件事上都有属于自己的团队。对吧？你有一个 agent（智能代理）帮你处理和医生相关的事。你有一个 agent 帮你学习东西。你有一个 agent 帮你整理生活。

Speaker 341:41 - 41:42

It's concierge. It's a concierge

Speaker 341:41 - 41:42

它就像 concierge（礼宾服务）。

Speaker 241:42 - 41:44

for everything. Yeah. Concierge is everything for everyone.

Speaker 241:42 - 41:44

适用于一切。对。Concierge 就是为每个人处理一切。

Speaker 341:44 - 41:52

Yeah. And and and I think, like, what that means well, that that's amazing. I think that's great. And I think the the the and education, same thing. You have concierge education.

Speaker 341:44 - 41:52

对。而且，而且，而且我觉得，这意味着——嗯，那真的很了不起。我觉得这很棒。我也觉得，教育也是一样。你会有 concierge education（礼宾式教育）。

Speaker 341:52 - 42:16

Like, you get personalized access to everything. I think then you go one step back and how it affects developers. I think, you know, and and companies, I think if you don't embrace this Mhmm. I think it's the extension moment for a bunch of folks, which is like, you know, everything needs and I don't think that means that, you know, forward design needs figma. I don't think that's a thing.

Speaker 341:52 - 42:16

比如，你会以个性化的方式获得对一切的访问。我觉得再退一步，看它会如何影响开发者。我觉得，你知道，还有公司，我认为如果你不拥抱这个趋势，嗯哼，我觉得对很多人来说这会是一个 extension moment（扩展时刻）。也就是说，一切都需要——而且我不认为那意味着，比如说，forward design 需要 figma。我不觉得是那回事。

Speaker 342:16 - 42:33

I think like what's more interesting is just like, you know, all these workflow and software companies need to figure out what is the intelligent or intelligent inserted versions that drive all that user value for those end consumers that we talked

Speaker 342:16 - 42:33

我觉得更有意思的是，这些 workflow（工作流）公司和软件公司都需要弄清楚，什么样的 intelligent（智能）版本，或者嵌入了 intelligent（智能能力）的版本，才能驱动我们刚才谈到的那些终端消费者用户价值。

Speaker 242:33 - 42:36

Yeah. Very exciting. Thank you so much for joining us today. Yeah.

Speaker 242:33 - 42:36

是的。非常令人兴奋。非常感谢你今天来参加我们的节目。是的。

Speaker 342:36 - 42:37

Thanks, guys.

Speaker 342:36 - 42:37

谢谢，各位。

Speaker 142:39 - 42:55

Find us on Twitter nopriorspod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at nopriors.com.

Speaker 142:39 - 42:55

欢迎在 Twitter 上关注我们：nopriorspod。如果你想看看我们的真人出镜，也欢迎订阅我们的 YouTube 频道。也请在 Apple Podcasts、Spotify 或你平时收听的平台上关注这档节目。这样你每周都会收到新一期节目。你还可以在 nopriors.com 订阅邮件，或查看每一期节目的文字稿。

原文 ↗https://www.youtube.com/watch?v=XAbKflCncDo