🎙 播客Unsupervised Learning· 2026 年 5 月 22 日· 11,291 词 · 约 56 分钟

Ep 87: Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

SPACE 播放 / 暂停←→ 上一句 / 下一句

Speaker 100:00 - 00:28

Oriol Vinyals is the co lead of Gemini alongside Noam Shazir and Jeff Dean. He's had an incredible career in AI, pioneering many of the breakthroughs in deep learning in the last decade, and it was a ton of fun to get to sit down with him after Google IO. If you've been following Google IO, they basically shipped a bunch of products across a a ton of interesting surface areas throughout AI. And so Oriel and I hit all of them. We talked about what's required for further advances in multimodal models and what's going to make these world models actually usable.

Speaker 100:00 - 00:28

Oriol Vinyals 与 Noam Shazir 和 Jeff Dean 一起担任 Gemini 的共同负责人。他在 AI 领域拥有极其出色的职业生涯，在过去十年里引领了 deep learning（深度学习）中的许多突破。Google IO 之后能和他坐下来聊一聊，真的非常有意思。如果你一直在关注 Google IO，就会知道他们基本上在 AI 的许多有趣应用层面上，一口气发布了大量产品。所以我和 Oriol 把这些都聊到了。我们谈了要让 multimodal models（多模态模型）继续取得进展，需要具备哪些条件；也谈了究竟什么会让这些 world models（世界模型）真正变得可用。

Speaker 100:29 - 00:55

We talked about the increase in memory and the importance of memory and how the advances there will look like reasoning these next few years as well as what Oriol thinks the path forward is. And we hit on the state of scaffolding today, what folks are building, and what Oriol thinks persists. It's a ton of fun to get to basically take all the top questions that founders, investors are thinking through, and just pose them to Oriole. So I think folks will really enjoy this conversation. Without further ado, here he is.

Speaker 100:29 - 00:55

我们还谈到了 memory（记忆）的增长、记忆的重要性，以及未来几年这方面的进展会如何呈现得像 reasoning（推理）一样；也谈了 Oriol 认为接下来的前进路径是什么。我们还聊到了当下 scaffolding（脚手架式系统封装/编排）的现状、大家正在构建什么，以及 Oriol 认为什么东西会持续存在。能把 founders（创始人）和 investors（投资人）正在思考的那些核心问题，直接拿来请 Oriol 回答，真的非常过瘾。所以我觉得大家一定会很享受这场对话。闲话少说，下面请他出场。

Speaker 100:57 - 00:59

Oriole, thanks so much for coming on the podcast.

Speaker 100:57 - 00:59

Oriole，非常感谢你来到这档播客。

Speaker 200:59 - 01:02

Yeah. It's great to be here. Thanks, Jacob.

Speaker 200:59 - 01:02

是的，很高兴来到这里。谢谢，Jacob。

Speaker 101:02 - 01:36

Yeah. Very exciting to have you a a day after IO. I know things have been have been busy, but I've been really excited for this because you're one of the people kinda most directly shaping the frontier of of models today in your work at Google. And you obviously in in the releases that happened yesterday at IO, they hit on like pretty much all the themes that people are thinking about in the in the space, where these products and models are going. And so I feel like there's just our our goal today is to to talk through kind of the research behind those announcements, you know, where where this is all headed, know, the kind of future path path of RL and post training and, you know, get your read on the on the space as a whole.

Speaker 101:02 - 01:36

是的，能在 IO 之后第二天请到你，真的非常令人兴奋。我知道这段时间你一定特别忙，但我一直都很期待这次对谈，因为以你在 Google 的工作来看，你绝对是当下最直接塑造模型前沿的人之一。而且很显然，在昨天 IO 上发布的那些内容里，几乎涵盖了这个领域里人们正在思考的所有主题——这些产品和模型将走向哪里。所以我觉得，我们今天的目标就是把这些公告背后的研究脉络梳理一遍，聊聊这一切将通向何处，聊聊 RL（强化学习）以及 post training（后训练）未来的发展路径，也请你谈谈你对整个领域的整体判断。

Speaker 101:36 - 01:57

I figured where I start was with world models because I think that was just a really impressive part of, you know, of yesterday and also I I think a pretty where Google's pretty distinct from a lot of the rest of the field. So you obviously shipped this incredibly impressive world model in Omni yesterday. And I think Demis has talked a lot about seeing world models as a path to AGI. And it's interesting. Right?

Speaker 101:36 - 01:57

我想我会先从 world models（世界模型）聊起，因为我觉得那是昨天内容中非常令人印象深刻的一部分，而且我也认为这是 Google 与业内许多其他团队相比一个相当鲜明的不同点。所以，很显然你们昨天在 Omni 里发布了一个非常惊人的 world model。与此同时，我想 Demis 也一直在很多场合谈到，他把 world models 视为通往 AGI 的一条路径。这一点很有意思，对吧？

Speaker 101:57 - 02:11

Because it seems like other labs maybe are more focused on code and, you know, getting to recursive self improvement. And so I'm wondering if it's a fair characterization and, you know, why you think, you know, you and the team and Google have been somewhat uniquely focused on this world model space.

Speaker 101:57 - 02:11

因为看起来，其他实验室也许更专注于 code（代码），以及如何走向 recursive self improvement（递归式自我改进）。所以我想问，这样概括是否公平；以及你为什么认为，你、你的团队以及 Google 会在某种程度上如此独特地聚焦于 world model 这个方向。

Speaker 202:11 - 02:55

First of all, I guess the the coding or, like, self improvement angle is is it is it at a bit of a different layer. Right? So you can certainly bet and believe that, you know, these models can can reprogram and improve themselves, and it's something I've been actually quite actively working on at the moment. But then the object that they improve, the model, whether it's multimodal and closer or a world model as we call it, and that even how to define that is a bit abstract. Since day one and way before actually Gemini program started, we were working on not just language, but, you know, understanding the visual world and kind of jointly modeling words in the context of vision, blind video, etcetera.

Speaker 202:11 - 02:55

首先，我觉得 coding（编程）或者说 self improvement（自我改进）这个角度，其实处在一个稍微不同的层次上，对吧？所以你当然完全可以押注并相信，这些模型能够重新编写自己、提升自己；而这也正是我目前实际上在非常积极推进的事情。但接下来还有一个问题：它们所改进的那个对象——也就是模型本身——究竟是什么？它是 multimodal（多模态）的，还是更接近我们所说的 world model？甚至连如何定义它，本身都有一点抽象。从第一天起，甚至早在 Gemini 项目启动之前，我们就在做的不只是 language（语言），还包括理解视觉世界，以及在视觉、无声视频等等语境中，对词语进行联合建模。

Speaker 202:56 - 03:22

So I think that part, you know, it's been at the core of Gemini and and before our research. And I think maybe one way to characterize it is, you know, language, clearly, there's a lot of information collectively that we wrote about the world, so that's clearly paid off big time. We've kind of distilled, in a way, all the knowledge written and that is being written at the moment into into these weights.

Speaker 202:56 - 03:22

所以我认为那一部分，你知道，一直都是 Gemini 以及在此之前我们研究工作的核心。我想，也许一种概括方式是：语言里显然包含了大量我们集体写下来的、关于这个世界的信息，所以这显然带来了巨大的回报。某种程度上，我们把已经写下的、以及此刻仍在被写下的所有知识，都蒸馏进了这些 weights（权重）里。

Speaker 103:22 - 03:24

It's definitely convenient that we put it all on the Internet too.

Speaker 103:22 - 03:24

当然，我们还把这些内容全都放到了 Internet 上，这确实也很方便。

Speaker 203:24 - 03:35

Yes. Exactly. Right? So and and and also, like, there's now with users, right, there's obviously, like, also a flywheel effect. But at the same time, there are there is lots of knowledge in videos and images.

Speaker 203:24 - 03:35

对。完全正确，对吧？所以，而且现在随着用户参与，显然也存在一种 flywheel effect（飞轮效应）。但与此同时，视频和图像里也包含着大量知识。

Speaker 203:36 - 04:21

And what I I would say it kind of has happened, but softly, I I think there probably might be a big moment is how would you extract all the knowledge that you would acquire if you were to look at all the videos and images, which we certainly use, right, in our in our training mixtures. But could that knowledge somehow add value and efficiency to the language component? And I think we've seen constructive sort of, let's say, transferred learning, right, from one to the other. We see that and we see generalization. But probably what I would characterize as the the GPT moment of video and images, I'm not sure we quite have seen that.

Speaker 203:36 - 04:21

我会说这件事某种程度上其实已经发生了，但发生得比较温和；不过我觉得，可能还会有一个重大的时刻。那就是：如果你去看完所有视频和图像，你本可以从中获得的那些知识，要怎样才能被提取出来？当然，我们在 training mixtures（训练混合数据）中确实已经在使用这些内容。但这些知识能否以某种方式为语言部分带来额外价值和效率？我认为我们已经看到了建设性的、可以说是 transferred learning（迁移学习），也就是从一种模态到另一种模态的迁移。我们看到了这一点，也看到了 generalization（泛化）。但如果要说视频和图像领域真正属于它们的那种 GPT moment，我不确定我们是否真的已经见到了。

Speaker 104:21 - 04:30

Do you have any, like, thoughts on, like, what that GPT moment might be for for for video and images as you as you kind of have this intuitive feeling that it hasn't yet been reached?

Speaker 104:21 - 04:30

对于视频和图像的那个 GPT moment，你有没有什么想法？既然你有一种直觉，觉得那个时刻还没有到来，那它可能会是什么样子？

Speaker 204:30 - 04:46

Yeah. So at the moment, we train all the modalities. We mix them, and we keep enhancing the recipe. So Omni is a a good way to see that progress in which we not only input videos and images. We've amazing capabilities with long context, understanding, etcetera.

Speaker 204:30 - 04:46

有的。所以目前，我们会训练所有 modalities（模态），把它们混合起来，并持续改进这套 recipe（方法配方）。Omni 就很好地体现了这种进展：我们不只是把视频和图像作为输入；我们在 long context（长上下文）、understanding（理解）等等方面，也已经拥有了非常惊人的能力。

Speaker 204:46 - 05:27

But we also now are able to output, you know, video, but also interact with it in a very natural way through language, editing it, combining, you know, the the modalities in a way that feels almost almost magical. Right? So that progress is absolutely there. But maybe one of the, you know, deep learning dreams, and it might be an original kind of dream from way before large language models, would be, hey. Can I train on all the, you know, image data without text perhaps as as a hard challenge and still somehow extract all the all the meaning and nuance from from that modality or set of modalities and vast amounts of data?

Speaker 204:46 - 05:27

但现在我们也能够输出视频，而且还能通过语言以非常自然的方式与之交互，对它进行编辑、组合，把各种 modalities（模态）结合起来，形成一种几乎近乎魔法的体验，对吧？所以这种进展是绝对存在的。但也许 deep learning（深度学习）长期以来的一个梦想——而且这可能是早在 large language models（大语言模型）出现之前就有的原始梦想——是：如果不依赖文本，纯粹把这当作一个高难度挑战，只用所有图像数据来训练，我是否仍然能够从那一种模态、或一组模态，以及海量数据中，提取出其中全部的意义和细微差别？

Speaker 205:27 - 05:57

Right? So could we train on all the videos ever produced and images and get to the same level of understanding that clearly the language models using language get to although probably slightly superficially and some missing links with cause effect and so on that, for instance, Demis talks about often. Right? So that is the moment that have I seen that? Probably not, and most likely we have the most advanced or one of the most advanced, you know, multimodal recipe that mixes everything.

Speaker 205:27 - 05:57

对吧？所以，我们能不能在有史以来产生的所有视频和图像上进行训练，并达到语言模型借助语言所达到的那种理解水平——虽然那种理解很可能也略显表层，而且在因果关系之类的问题上仍有一些缺失，这也是 Demis 经常谈到的，比如说这一点。对吧？所以，这才是那个真正的时刻。而我见到那个时刻了吗？大概还没有；而且很可能，我们现在已经拥有了最先进的、或者说最先进之一的 multimodal（多模态）recipe（方法配方），也就是把一切混合在一起的那套方法。

Speaker 205:57 - 06:05

But that pure transfer is, I think, one of the core quest of machine learning for the last decade plus.

Speaker 205:57 - 06:05

但我认为，那种纯粹的迁移，正是过去十多年里 machine learning 的核心追求之一。

Speaker 106:05 - 06:18

I mean, to the extent you you can talk about it, I'm curious. Could you give our listeners some context on, like, what are still the key problems that need to be solved around this? Or as you think about, like, the the kind of, you know, the the types of problems that you're you're trying to, you know, work on to further advance this?

Speaker 106:05 - 06:18

我的意思是，在你可以谈论的范围内，我很好奇。你能不能给我们的听众一些背景，比如说，这件事周围仍然有哪些关键问题需要解决？或者说，当你思考你正在努力推进的这类问题时，你主要在攻克哪些类型的问题？

Speaker 206:18 - 06:54

It's hard to describe, like, the the the solution space, but the idea of, you know, you could imagine observing or learning from all the video data and then somehow deriving, you know, the rules of gravity is one that is used often. Right? Like, how could you precisely describe how the world works based only on images? Right? And so the the issue there is linking language, or these concepts, as we sometimes call them, to what you see in the image without the explicit language linkage is fairly tricky.

Speaker 206:18 - 06:54

很难描述这个解决方案空间，但一个常被拿来举例的想法是：你可以想象，通过观察或学习所有视频数据，然后以某种方式推导出重力规则。对吧？比如，只基于图像，你怎么才能精确描述这个世界是如何运作的？对吧？这里的问题在于，要把语言，或者我们有时称之为 concepts（概念）的东西，和你在图像中看到的内容联系起来，如果没有显式的语言连接，这件事就相当棘手。

Speaker 206:54 - 07:39

Right? So so what you end up doing is trying to explicitly create datasets where there's some sort of correlation or connection between the images and video and some language, like maybe it's labels or descriptions and so on. But, of course, the amount of data now at your disposal is much less because we haven't clearly described and transcribed every single piece of media out there. So I think that's kind of extracting that those concepts at in the purest form, not in just some language that we associate to the words and what we see would be very very powerful. And there's lots of early research on discrete representations, representation learning, and, I mean, that's one of the things that probably I would say is in fairly research stage.

Speaker 206:54 - 07:39

对吧？所以最终你会去做的，就是尝试显式构建一些 datasets（数据集），让图像和视频与某种语言之间存在某种相关性或连接，比如 labels（标签）或 descriptions（描述）之类的东西。当然，这样一来，你手头可用的数据量就会少得多，因为我们并没有把世界上每一份媒体内容都清楚地描述并转录出来。所以我觉得，如果能够以最纯粹的形式提取这些 concepts（概念）——而不只是通过我们赋予词语、以及看到的内容所关联的某种语言——那将会非常非常强大。围绕 discrete representations（离散表征）、representation learning（表征学习）已经有很多早期研究，而我想，这大概就是那种仍然相当处于 research stage（研究阶段）的方向之一。

Speaker 207:39 - 07:51

So it's not something we can possibly scale up, but I think that's one of the possibly, I'm not sure it's needed. I mean, whether we agree with that or not is another question, but it's if it was to be unlocked, it would be massive.

Speaker 207:39 - 07:51

所以这并不是我们可能真正大规模扩展的东西，但我认为这或许是其中一个——我也不确定它是否真的必需。我的意思是，我们是否同意这一点是另一回事，但如果它真的被解锁，影响将会非常巨大。

Speaker 107:51 - 08:10

You mentioned kind of this the term world model and how it's thrown around a bunch, you know, obviously, you you kind of Omni was was positioned as a world model, and I'm curious, you know, how you thought about that categorization versus, you know, you obviously had really good video models for a while. Right? What makes Omni like a world model, and and, you know, how is it different from kind of the the generation of video models that you guys have been working on?

Speaker 107:51 - 08:10

你提到了 world model 这个说法，以及它现在被反复使用；显然，Omni 在某种程度上就被定位成一个 world model。我很好奇，你是怎么理解这个分类的？毕竟你们之前已经有非常强的视频模型了，对吧？Omni 为什么算是一个 world model？它和你们一直在做的那一代视频模型相比，区别又是什么？

Speaker 208:10 - 08:44

I I guess a pure aspect of world model would be representation learning. Right? So so you could imagine we take these modalities like the the the videos, which are like a sequence sequences of images or even just images, and then compressing that into sort of a set of concepts and what that those, you know, the movements, the objects, etcetera, are within those. That's kind of called representation learning, and it models the world in a very compact way that compresses away what's probably not relevant. Right?

Speaker 208:10 - 08:44

我想，world model 的一个更纯粹的方面会是 representation learning（表征学习）。对吧？你可以想象，我们拿这些模态，比如 videos（视频）——它本质上是一系列图像序列——甚至只是 images（图像），然后把它们压缩成某种 concepts（概念）的集合，以及其中那些运动、物体等等。这大致就叫 representation learning（表征学习）：它以一种非常紧凑的方式对世界建模，把那些可能并不相关的东西压缩掉。对吧？

Speaker 208:44 - 09:28

So probably that one is the more classical, but also probably not exactly what we mean or we see or we feel when we interact with Omni. Right? What what you see there is a bit more about you being able to really change how the video behaves or the kinds of videos you're you're getting out of an initial, maybe, image that you ask to animate. You explicitly ask all the movements or even, like, actions that would be, like, move forward, and you can see that being kind of precisely simulated. And so that is more of, like, the world model itself is acting as a renderer of the world that you can really just change by a language.

Speaker 208:44 - 09:28

所以，也许那一种更偏经典定义，但又可能并不完全等同于我们在与 Omni 交互时所说、所见、所感受到的东西。对吧？你在那里看到的，更多是你能够真正改变视频的行为方式，或者改变从一张初始图像——比如你要求它做动画化——所生成出来的视频类型。你可以显式指定各种运动，甚至是 actions（动作），比如向前移动，而你会看到这些东西被相当精确地模拟出来。所以更像是，这个 world model 本身在充当世界的 renderer（渲染器），而你几乎只通过语言就能真正去改变它。

Speaker 209:29 - 10:04

And then having that now object besides being a cool product to play with, of course, like, we love to generate, you know, all sorts of different, you know, movements or situations and so on very richly. It could also meaningfully add maybe a dimension of simulation that could make us, you know, use, for example, things like prediction before acting in the world. And, of course, obvious applications for for these kind of three d or video world models would be clearly, you know, self driving cars or or robotics.

Speaker 209:29 - 10:04

而拥有这样一个对象，现在除了本身是个很好玩的产品、可以拿来体验之外，当然，我们也很喜欢生成各种不同的运动、情境等等，而且可以做得非常丰富。它还可能在有意义的层面上增加一种 simulation（模拟）维度，让我们能够在现实世界中采取行动之前，先使用例如 prediction（预测）这样的能力。当然，这类 3D 或 video world models（视频世界模型）的明显应用，显然会包括 self-driving cars（自动驾驶汽车）或者 robotics（机器人）。

Speaker 110:04 - 10:33

It seems so relevant to robotics, and it feels like, you know, everyone's kinda still trying to figure out the right data mix of simulation data, you know, versus versus, you know, forms of teleop data and egocentric video data. But it seems like as these simulations continue to get better, you know, it's more and more of of of a compelling thing to put in into the data mix. And I'm curious, like, you know, does does this work then directly intersect with, you know, the broader robotics work you all are doing, and and how do you think about what's actually required to, you know, append robotic actions onto, you know, these types of models?

Speaker 110:04 - 10:33

这看起来和 robotics 非常相关，而且感觉现在大家都还在试图弄清楚正确的数据配比：simulation data（模拟数据）相对于 teleop data（遥操作数据）以及 egocentric video data（第一视角视频数据）到底该怎么混合。但看起来，随着这些 simulation 持续变得更好，它越来越像是应该被纳入数据组合中的一项非常有吸引力的东西。我也很好奇，这项工作是否会直接和你们正在做的更广泛的 robotics 工作发生交叉？以及你们如何看待：如果要把 robotic actions（机器人动作）附加到这类模型上，实际还需要什么？

Speaker 210:33 - 11:23

There is a bit of a also beautiful connection because, of course, if if we acquire, even if it's obviously a bit more expensive or time consume consuming, but if we get more data that is captured from robots that we we certainly are investing in, You know, that data could make it into the model, enhancing the world model capabilities themselves. And then the other direction, which is kind of what you're asking about perhaps is, okay. Now we can simulate, and we could create lots of different scenarios in which these robots or, you know, whatever one d, three d groups, etcetera, could be training on without the cost and the time latency of the physical world. Right? So for the latter to work better, I mean, it's it's still a very open problem.

Speaker 210:33 - 11:23

这里也有一种有点美妙的联系，因为当然，如果我们获取更多由机器人采集的数据——即使这显然更昂贵、也更耗时，但这确实是我们正在投入的方向——那么这些数据就可以进入模型，从而增强 world model 本身的能力。反过来，另一条方向，也许正是你在问的，就是：好，现在我们可以做 simulation 了，我们就能创建大量不同的场景，让这些机器人，或者任何 1D、3D 群体等等，在其中进行训练，而不需要承担物理世界中的成本和时间延迟。对吧？不过要让后者更好地发挥作用，这仍然是一个非常开放的问题。

Speaker 211:23 - 12:23

There's also all sorts of issues with transfer, but the more powerful these models get, clearly, there's kind of a inflection point where things start to be worth doing, and and and we might see an acceleration in robotics indeed in you know, definitely, we're seeing in the hardware space lots of investments, so things are accelerating and picking up there. But but for the world models to be useful, at least from my limited knowledge, but, of course, I've, you you know, I've been able to interact with these systems and see them. The the precision of even grasping a model, which we get for granted as humans, the the the visuals, the exact, you know, how would this feel to your hand, which is a modality we currently obviously don't even have data for, and then the the the exact forces, how things would move, it needs to be very, very accurate. Right? So that's where there is a gap, and and perhaps then some creativity and research are is still required and lots of investment in robotics over the years.

Speaker 211:23 - 12:23

另外还有各种 transfer（迁移）方面的问题，但这些模型越强大，显然就越会接近某个拐点，在那个点上，很多事情开始变得值得去做，而我们也确实可能会看到 robotics 的加速发展。硬件领域显然已经出现了大量投资，所以那边的进展正在加速、正在升温。但是，如果 world models 要真正有用——至少以我有限的了解来看，当然，我也确实有机会和这些系统互动并观察它们——哪怕只是 grasping（抓取）这种能力，它所要求的精度都非常高，而这对人类来说几乎是想当然的事。包括视觉上的细节、物体到底会给你的手带来什么感觉——而这种 modality（模态）显然是我们目前甚至还没有数据的——以及精确的受力、物体会如何移动，这一切都需要非常、非常准确。对吧？所以这里就存在一个差距，而这也意味着，可能仍然需要一些创造力和研究，以及未来多年在 robotics 上的大量投资。

Speaker 212:23 - 12:37

But it's promising, and at some level, maybe not at the precise motor control, but at the kind of planning and gross level, we are gonna start seeing how these models accelerate our progress into the quest of robotics.

Speaker 212:23 - 12:37

但这件事是很有前景的。某种程度上，也许不是在最精确的 motor control（运动控制）层面，而是在 planning（规划）和较粗粒度的层面上，我们将开始看到这些模型如何加速我们在 robotics 追求上的进展。

Speaker 112:37 - 12:53

A huge part of these models is kind of, like, you know, learning implicitly learning physics through, you know, consuming lots of of of video data. And so I think you mentioned gravity is, like, the canonical example of what people look for. Do have you any kind of gut sense being so close to these models of, like, when you think that will just be a solved problem within within the world models?

Speaker 112:37 - 12:53

这些模型的一个巨大组成部分，有点像是在通过消耗大量 video data（视频数据）来隐式地学习 physics（物理规律）。所以我想你提到过，gravity（重力）是人们最常拿来观察的那个经典例子。你离这些模型这么近，有没有一种直觉判断：你觉得在 world models 里面，这件事什么时候会变成一个基本已经被解决的问题？

Speaker 212:53 - 13:01

Yeah. It's a good question. Actually, you're kind of you made me think about evaluation. Right? Like, how would you evaluate if you train a very good, you know, video Yeah.

Speaker 212:53 - 13:01

对，这是个好问题。实际上，你这让我开始想到 evaluation（评估）这个问题。对吧？比如说，如果你训练了一个非常好的、你知道的、video——对。

Speaker 113:01 - 13:03

How do you evaluate physics in a model?

Speaker 113:01 - 13:03

你该如何评估一个模型中的 physics？

Speaker 213:03 - 13:22

It it yeah. It is a good question. Right? You could imagine the problem is as soon as you add language, all of a sudden, that knowledge is is there in in the way. So if you ask basic questions about gravity, of course, you would answer them by just having read, you know, explanations of them online and so on.

Speaker 213:03 - 13:22

对，对，这是个好问题。对吧？你可以想象，问题在于，一旦你加入 language（语言），那些知识一下子就会以某种方式在那里了。所以如果你问一些关于 gravity（重力）的基础问题，当然，你的回答很可能只是因为你读过网上对它的解释之类的内容。

Speaker 213:22 - 14:09

So you would need to somehow connect the the concept of gravity, which could be present or not in a world model, to then decode that into an explanation that would satisfy you know, maybe initially would be some basic explanation later on, could even derive, like, the the the equations and so on. That's how can you you you could build an eval. I don't think, to my knowledge, we've we've been thinking about this from this point of view. There's definitely lots of early work on an unsupervised machine translation where you you would try to translate to a language that you would never see during training, and you you could align the representation. So there's probably some ideas on you get a language model that can speak or you can decode from.

Speaker 213:22 - 14:09

所以你需要以某种方式把 gravity 这个概念——它可能存在于某个 world model（世界模型）中，也可能不存在——连接起来，然后再把它 decode（解码）成一种解释，能够让你满意；也许一开始只是某种基础解释，之后甚至可以推导出那些 equations（方程）之类的东西。这就是你可以如何构建一个 eval（评测）。据我所知，我们还没有从这个角度来思考这个问题。关于 unsupervised machine translation（无监督机器翻译）肯定已经有很多早期工作了：你会尝试把内容翻译成一种在训练期间从未见过的语言，并且你可以对齐 representation（表征）。所以这里面大概有一些思路，可以得到一个能够“说”语言的 language model（语言模型），或者说一个你可以从中进行解码的模型。

Speaker 214:09 - 14:30

You get these world models that would create this kind of concept conceptual level understanding and aligning both. You know, there are some papers. I mean, these are, like, old papers. The one I remember from, I think it was Stefan Gauss et al, was from 2014. But then you could try to start decoding that, and converting that to an eval seems then, you know, a trivial step.

Speaker 214:09 - 14:30

你会得到这些 world models，它们会形成这种概念层面的理解，并把两边对齐。你知道，确实有一些 papers。我是说，这些都是比较老的论文了。我记得其中一篇，好像是 Stefan Gauss et al 的，来自 2014 年。但接着你就可以尝试开始做 decode，而把这件事转成一个 eval，到了那一步看起来就几乎是顺理成章的事了。

Speaker 214:30 - 14:47

But, again, these evals need to then be meaningful from, like, an application point of view. So so, ultimately, you could also say, look. I mean, we have a world model. Can we decode or, I don't know, like, induce movement in a complex system from its representation, for example. Right?

Speaker 214:30 - 14:47

但是，再说一次，这些 eval 还必须从 application（应用）的角度来看是有意义的。所以归根结底，你也可以说，看看，我们有了一个 world model。比如说，我们能不能从它的 representation 中 decode 出、或者我不知道，诱导出一个复杂系统中的运动呢？对吧？

Speaker 214:47 - 14:51

That would be another indirect eval. So many ideas, but, yeah, evals are so important always.

Speaker 214:47 - 14:51

那会是另一种间接的 eval。所以思路很多，不过，是的，eval 一直都非常重要。

Speaker 114:51 - 15:19

You know, shifting gears to some of the other stuff that you you all shipped yesterday. You know, I definitely wanna talk agents. You shipped some really interesting consumer agents in in Spark, right, as part of as part of IO. And, you know, I think that's such it's so interesting because it seems like, from the outside, at least, like a like a really improved version of some of maybe the stuff you guys had explored in project Mariner in 2024 and some of this other, like, computer use work. And so it it does feel like there's been a real step change in in in what the capabilities are.

Speaker 114:51 - 15:19

说到别的话题，切换到你们昨天发布的另外一些东西。你知道，我肯定想聊聊 agents。你们在 Spark 里发布了一些非常有意思的 consumer agents，对吧，作为 IO 的一部分。你知道，我觉得这特别有意思，因为至少从外部看，这似乎像是你们在 2024 年的 project Mariner，以及其他一些 computer use 相关工作基础上的一个大幅改进版本。所以这确实让人感觉，在 capability（能力）上发生了一次真正的跃迁。

Speaker 115:19 - 15:26

And so I'd love to hear you just riff on, you know, the research breakthroughs that enabled that and then kind of how people should think about what these agents can and can't do today.

Speaker 115:19 - 15:26

所以我很想听你展开谈谈，究竟是哪些 research breakthroughs（研究突破）促成了这一点，以及大家应该如何理解这些 agents 今天能做什么、不能做什么。

Speaker 215:26 - 16:15

We knew that was gonna be a very important modality, actions, right, acting and changing the state of, let's say, a digital computer. And then I think the as you evolve and make the model better, you start realizing sort of out you know, you you get the model really good, and then you focus on the system, building a system around the model, then optimizing the system and the model jointly as much as you can and so on and so forth. So in terms of what creates the delta or the increase in capability, it's mostly focused on about it's about sequencing sort of releases. And and also, in some sense, the model capability needs to reach certain, you know, level for you to then be able to dream about what's the next stage of capability, what the model might do next.

Speaker 215:26 - 16:15

我们知道那会是一种非常重要的 modality（模态），也就是 actions（动作），对吧——去行动并改变，比如说，一个数字计算机的状态。然后我觉得，随着你不断演进并让 model（模型）变得更好，你会开始逐渐意识到某种路径：先把模型做到非常强，然后把重点放到围绕模型构建 system（系统）上，再尽可能对 system 和 model 做联合优化，诸如此类。所以就什么造成了那个 delta（增量）或者能力提升而言，主要还是围绕如何安排一系列 release（发布）的节奏。同时，从某种意义上说，model capability 也必须先达到某个层级，你才能开始去设想下一阶段的能力会是什么，模型接下来还可能做到什么。

Speaker 116:15 - 16:39

Yeah. And I guess what you know, one thing that's so interesting about the consumer footprint is, like, there's just such a broad array of things people wanna do with it. And so, you know, I I wonder, like, to date and how you see this evolving over time, that work of model plus system, how, like, bespoke is it to subcategories of the problems people wanna do versus, you know, incredibly general and, hey. You're just optimizing a a system and model combination that works across pretty much everything you might wanna do in Spark.

Speaker 116:15 - 16:39

对。我想你知道，consumer footprint（消费者足迹）很有意思的一点在于，人们想拿它做的事情实在太多、范围也太广了。所以我会想，到目前为止，以及从长期演进来看，model 加 system 的这项工作，对人们想做的那些问题的各个子类别来说，到底需要多大程度上的 bespoke（定制化）；还是说，它其实可以非常通用——你只是在优化一个 system 和 model 的组合，而这个组合基本适用于你在 Spark 里想做的几乎所有事情。

Speaker 216:39 - 17:39

There's always a sequencing to specializing to something that feels controllable and and that, you know, very useful already where, you know, if you look at Spark, I mean, it has access to, you know, information that it would be needed for it to be able to assist you in sort of scheduling and organizing your day and even thinking about how you should tackle different problems because it has this very rich context. So it's useful to build kind of the system slightly more narrowly around something you care deeply about. But if you if you look at the history of machine learning and deep learning, we always go from, you know, the components we're building are general. So so and and there's a big hypothesis, goes a bit to the world model point, actually, that training on everything jointly must be better than just focusing narrowly on just one domain. So even from the modeling perspective, that is very clear.

Speaker 216:39 - 17:39

把能力专门化到某个感觉更可控、而且已经非常有用的方向，总是有一个先后顺序的。比如说，如果你看 Spark，它可以访问许多为了协助你安排日程、组织一天的事务，甚至思考你该如何处理不同问题所需要的信息，因为它拥有非常丰富的 context（上下文）。所以，围绕一个你非常在意的事情，把 system 稍微更窄一些地构建出来，是很有价值的。但如果你回看 machine learning 和 deep learning 的历史，我们始终是在朝“我们构建的组件是通用的”这个方向前进。所以这里其实还有一个很大的假设，也和 world model（世界模型）这个点有些相关：把所有东西联合起来训练，一定会比只狭义地聚焦某一个 domain（领域）更好。所以即使只从 modeling（建模）的角度看，这一点也非常明确。

Speaker 217:39 - 18:19

But even from the systems perspective, a system that is fairly generic and then based on how you instruct it or you interact with it, you can then, of course, put it in the space of, like, hey. I mean, this user wants to do this, but I have all these capabilities. Let me just figure out which ones to use kind of at train time, not necessarily building it for that, but building something generic. And then the specialization happens through a layer of intelligence, right, of the model and the generality of the system. I think that's fairly clearly, you know, already here, and then maybe sometimes in practice, you know, limiting or making it maybe more efficient still makes sense to specialize.

Speaker 217:39 - 18:19

但即使从 systems 的角度看，一个相当通用的 system，然后再根据你如何指示它、如何与它交互，你当然就可以把它放进这样一种空间里：比如，“这个用户想做这件事，但我有所有这些能力，那我就在 train time（训练时）想办法弄清楚该调用哪些能力。” 不一定是专门为那件事去构建它，而是先构建一个通用的东西。然后，specialization（专门化）是通过一层 intelligence（智能）来发生的，对吧——来自 model 的智能，以及 system 的通用性。我觉得这已经相当清楚地出现了；当然，实践中有时为了限制范围，或者让它更高效一些，做一定程度的专门化也依然是有意义的。

Speaker 218:19 - 18:36

But the the special list to to general, we've seen it just keeps happening both from even architecture side. I mean, the transformer was a machine translation. Neural net. Right? And now it does everything from Omni to, you know, controlling your computer.

Speaker 218:19 - 18:36

但是，从 special list 到 general 的这个过程，我们已经看到它一直在发生，哪怕只看 architecture（架构）层面也是如此。我的意思是，transformer 最初是一个 machine translation 神经网络，对吧？而现在它从 Omni 到控制你的电脑，几乎什么都能做。

Speaker 218:36 - 18:39

So, yeah, I think that's that's a step that I expect.

Speaker 218:36 - 18:39

所以，对，我觉得这是我预期中的一个步骤。

Speaker 118:39 - 18:53

You've been vocal about the Bitter Lesson over the years. And I'm curious, like, as you look out at the field, are there places where you think, like, it it's not currently being followed or, you know, basically, places where you look out and you see kind of structure or clever scaffolding that you think scale is just kinda eventually gonna wash out?

Speaker 118:39 - 18:53

这些年来你一直很公开地谈论 Bitter Lesson。我很好奇，当你观察整个领域时，有没有哪些地方让你觉得，人们现在并没有真正遵循它；或者说，基本上有哪些地方是你看到某种结构、或者巧妙的 scaffolding（脚手架式设计）时，会觉得规模扩展最终会把这些东西慢慢冲刷掉？

Speaker 218:54 - 19:37

Yeah. I think I think so. I mean, one one one area that I I find exciting, there's some research on this already kind of published, is that, you know, in the limit, the system that we build now sort of by coding, sometimes a complex sort of scaffold around the model, you know, multi agents, sub agents, delegation, very long running, That system itself is a piece of code that eventually the the model itself could write on the fly. Right? So so you could imagine not having just a system that is very general, but actually maybe no system and just the model being able to write those depending on what is is it it's being asked to do.

Speaker 218:54 - 19:37

对，我觉得是有的。我的意思是，有一个我觉得很令人兴奋的方向，已经有一些相关研究发表了，就是：从极限情况来看，我们现在通过写代码构建出来的 system——有时是在 model 外面包上一层复杂的 scaffold（脚手架），比如 multi agents（多 agent）、sub agents（子 agent）、delegation（委派）、非常长时间运行——这个 system 本身其实也是一段代码，而最终 model 自己也可以在运行时即时把它写出来，对吧？所以你可以想象，不只是拥有一个非常通用的 system，甚至也许根本不需要 system，而是 model 能够根据它被要求完成的任务，自己把这些东西写出来。

Speaker 219:37 - 19:38

Like, the

Speaker 219:37 - 19:38

就像，那个

Speaker 119:38 - 19:45

most token efficient, like, highest quality output set of sub agents and whatever it is around around a set of problems.

Speaker 119:38 - 19:45

最 token 高效、比如能围绕一组问题给出最高质量输出的一组 sub agents（子 agent），以及与之相关的各种东西。

Speaker 219:46 - 20:28

Yeah. Exactly. I mean, we we've seen this also in the kind of one of the paradigm shifts we've seen in the last one year and a half or so is, of course, the reasoning models that, you know, can can reason for a long time in token space. But, of course, eventually, what becomes more important is should you reason, for how long should you reason, and adding that level of intelligence based on the complexity of what a user might be asking will make it more efficient. So I think what what you do this what you do around these systems, there is gonna be a level of not sure exactly if it's gonna be right to eat from scratch or some automation that will make it smart to create the right scaffold for the right task.

Speaker 219:46 - 20:28

对，没错。我的意思是，我们也看到了这一点。过去大约一年半里，我们见到的一种范式转变当然就是 reasoning models（推理模型）：它们可以在 token 空间里进行长时间推理。但当然，最终更重要的是：你是否应该推理、应该推理多久；并且要根据用户所提问题的复杂度加入这一层智能，这样系统才会更高效。所以我认为，围绕这些系统你所做的事情里，会有一个层面——我不完全确定最终会是从零手写搭建，还是某种自动化——它会足够聪明，能为合适的任务创建合适的 scaffold（脚手架/框架）。

Speaker 120:28 - 20:44

On the agent side, you know, I think there's a lot of and everyone's messing around, you know, and and experimenting with building these kind of long running agents. And I think, you know, obviously, they run into all sorts of issues trying to get them stable across hundreds of steps. How do you think about what's required to get to, like, further agentic reliability?

Speaker 120:28 - 20:44

在 agent 这边，我觉得现在有很多人都在折腾、都在实验，试着构建这种长时间运行的 agents。而且我想，很显然，他们在试图让这些 agent 跨越数百个步骤保持稳定时，会遇到各种各样的问题。你是怎么理解要进一步提升 agentic reliability（agent 式可靠性）所需要的东西的？

Speaker 220:44 - 21:10

Yeah. I mean, I think the the answer to these questions in the most obvious way is is kind of improving both the the scaffold around the model. If you think of how you train a neural network, it trains on some distribution of tasks or modalities or, you know, how to connect different words to video or whatnot. Right? All these are all about how you train or pre train or post train these weights.

Speaker 220:44 - 21:10

对。我的意思是，对这些问题，最直接的答案某种程度上就是：同时改进模型周围的 scaffold。如果你想想训练一个 neural network（神经网络）的方式，它是在某种任务分布或模态分布上训练的，或者说学习如何把不同的词和视频之类的东西连接起来，对吧？这些本质上都和你如何训练、预训练或后训练这些 weights（权重）有关。

Speaker 221:11 - 22:06

So if you think, well, there's a new type of work or modality that requires these very long running, you know, systems that need to somehow learn from these very long context, which which we have also always kind of innovated and pushed in 1.5 was kind of our long context breakthrough, then then it becomes obvious that the model also will catch up, right, to to to meeting the users and the futuristic use cases. And that's a bit of the researcher challenge. Right? Predicting what can be possible and then focusing not only on building a system that is robust to that, but also how how would the weights get less unhappy or happy about when you push all the context and all these crazy things that you you do and not just hoping on generalization from the prompt that induces that behavior, so to speak.

Speaker 221:11 - 22:06

所以如果你去想，出现了一种新的工作类型或模态，需要这种运行时间很长的系统，而且这些系统还必须以某种方式从非常长的上下文中学习——而这也是我们一直在创新和推进的方向，1.5 某种意义上就是我们在 long context（长上下文）上的突破——那么就很明显，模型本身也会跟上，对吧，去满足用户需求以及那些更偏未来的 use cases（使用场景）。而这有一部分就是研究者面临的挑战，对吧？要去预测什么是可能实现的，然后不仅专注于构建一个对此足够稳健的系统，还要思考：当你塞入海量上下文、做各种疯狂操作时，如何让这些 weights 对此没那么“难受”，甚至更“适应”；而不是只寄希望于通过 prompt 诱导出的那种行为所带来的泛化，差不多是这个意思。

Speaker 122:06 - 22:16

A pattern everyone's trying to figure out is, like, memory. Right? And how to kind of, like, solve this across across these agents. Do you have any any, like, gut instincts on on on where you think that ultimately gets solved?

Speaker 122:06 - 22:16

大家都在试图搞清楚的一个模式就是 memory（记忆），对吧？以及如何在这些 agents 之间、跨这些 agents 来解决这个问题。你有没有什么直觉，觉得这件事最终会怎么被解决？

Speaker 222:17 - 22:43

Yeah. I mean, memory is is fascinating. I think since very early days, right, you you you can sort of think of, I think I think initially we characterized this, and I this is probably through biases from actually that means having had a PhD in in memory systems in the brain. Right? But, you know, there's there's a few ways to think about memory, but, I mean, the simpler one that I like is, you know, just working memory.

Speaker 222:17 - 22:43

对，我觉得 memory 很有意思。我想从很早的时候开始，你就可以某种程度上这样去理解它。我觉得最初我们对它的刻画——这大概也带有一些我的偏见，毕竟我博士研究做的其实就是大脑中的 memory systems（记忆系统）——不过，关于 memory 有几种不同的思考方式，但我比较喜欢一个更简单的说法，就是 working memory（工作记忆）。

Speaker 222:43 - 23:20

Right? Things that are very present because of what we're doing or we're talking about. And then what's called episodic memory that is kind of kind of a retrieval system that you can access, and it's probably less precise, but, of course, longer context or potentially has all the context that, you know, you or I care to remember, right, holistically with all our experience that is accumulated. Now there there's not only two levels of memory, but it's useful to think about these kind of levels of memory. Computers have the same with cache, l one, l two, and so on.

Speaker 222:43 - 23:20

对吧？就是那些因为我们正在做的事情或正在讨论的话题，而当下非常鲜明、非常在场的东西。然后还有所谓的 episodic memory（情景记忆），它某种程度上是一种你可以访问的 retrieval system（检索系统）；它可能没那么精确，但当然它对应的是更长的上下文，或者潜在地包含了你我愿意记住的全部上下文——也就是我们积累下来的、整体性的全部经验。实际上，memory 不只有两个层级，但用这种“层级”的方式来思考 memory 是有帮助的。计算机也是一样，有 cache、L1、L2，等等。

Speaker 223:20 - 24:18

So so when it comes to models, I think working memory, because of transformers and and so on, is is we have a very powerful mechanism to to kind of use that memory, you know, have hundreds, thousands, millions of tokens at our disposal to modify that that memory and then do amazing things with it, proving complex CRMs, you know, gold medal level maths, and so on. And I think what where where where I'm seeing a lot of momentum is through then how can we consolidate then things that happen either previously in in different interactions or throughout an interaction that might be longer than you could possibly remember in this working memory. How do we store that knowledge? And, you know, through through different experiments, I think other you know, like, think now the the standard name is called, like, what do we call skills. But but it's more general than that.

Speaker 223:20 - 24:18

所以，说到模型，我认为 working memory（工作记忆）这件事，因为有了 transformer 等机制，我们已经有了一个非常强大的方式来使用这类记忆。你知道，我们手头可以动用数百、数千、数百万个 token，用它们来修改这部分记忆，并基于此做出很惊人的事情，比如处理复杂的 CRM、达到金牌级别的数学能力，等等。而我认为现在很有 momentum（发展势头）的方向是：我们怎样把先前在不同交互中发生的事情，或者一次长到超出 working memory 所能记住范围的交互过程中发生的事情，进一步整合起来。我们该如何存储这些知识？而且，通过各种不同实验，我觉得其他人——现在的标准叫法大概是 skills——但其实它比那个概念更宽泛。

Speaker 224:18 - 25:04

We do have access because it's an agent to a memory system, which is the computer itself. So you can start thinking about writing, you know, your thoughts into files, structure it into directories or folders, and doing that as you either interact with the same user, multiple episodes, or a very, very long episode. So the mechanism that is fairly good at the moment, but again, I don't think the weights of the model have caught up to this. Is this adding this kind of knowledge base into a file system or, you know, any format that is storage that you can modify and read from with some basic retrieval mechanism. So that's very powerful already, yet I think there's still a lot to be untapped there.

Speaker 224:18 - 25:04

因为它是一个 agent，所以我们确实可以访问一个 memory system（记忆系统），也就是计算机本身。因此你可以开始考虑把你的“想法”写进文件里，把它组织成目录或文件夹，并且在你与同一个用户进行多次 episode（交互回合）时，或者在一次非常非常长的 episode 中，这样去做。所以目前一个相当不错的机制——不过我还是觉得模型的 weights（权重）还没有跟上——就是把这类 knowledge base（知识库）加入文件系统里，或者说加入任何一种你可以修改、读取，并配有一些基础 retrieval（检索）机制的存储格式中。这已经非常强大了，不过我认为这里仍然还有很多潜力尚未被挖掘。

Speaker 225:04 - 25:54

I think many many of us call this kind of a form of continual learning, but I think the mechanism I I wanted to work kind of or, I mean, it's gonna clearly work better and better is this kind of file system style, like, nonparametric. It's a bit more convenient than integrating those back into the weights because even from a practical point of view, we try to serve one model at scale. So what what it would be really, like, painful to have to serve one model with different memories to users. So even from a practical point of view, I think we'll see better evaluations and and and ways in which these models accumulate this knowledge as they interact. And I think that's probably paradigm shifting as well in a way, similar to how we saw reasoning, you know, a year and a half or so ago.

Speaker 225:04 - 25:54

我想，我们很多人都把这称为 continual learning（持续学习）的一种形式，但我认为我想推动的机制——或者说，我的意思是，它显然会越来越有效——是这种文件系统风格的、类似 nonparametric（非参数）的方式。相比把这些内容重新整合进权重里，这样做要方便一些；哪怕从非常实际的角度看也是如此，因为我们会尝试大规模提供同一个模型服务。所以，如果必须为不同用户提供带有不同记忆的同一个模型，那会非常——怎么说——很痛苦。因此即使从实践层面看，我也认为我们会看到更好的评估方式，以及这些模型在交互过程中积累知识的更好方法。而我觉得这在某种意义上可能也会带来 paradigm shifting（范式转变），有点类似于我们大约一年半前看到 reasoning（推理）能力发展时的情况。

Speaker 125:54 - 26:03

Does that look like everyone having models that then have their own, you know, the file systems themselves being distinct, or do you think over time people have models whose weights look different based on, you know, what they've done?

Speaker 125:54 - 26:03

这是否会变成每个人都拥有各自的模型，也就是说，文件系统本身彼此独立；还是说你认为随着时间推移，人们会拥有权重也会因为他们做过什么而不同的模型？

Speaker 226:03 - 26:07

Well, as I said, the way it's different would be would be a a challenge.

Speaker 226:03 - 26:07

嗯，正如我刚才说的，让它在权重层面变得不同会是一个挑战。

Speaker 126:07 - 26:08

Yeah. Hard to serve. Yeah.

Speaker 126:07 - 26:08

对。很难提供服务。对。

Speaker 226:08 - 26:54

It would be it would be I mean, if it's the best way, then we'll find a way, right, to have hardware that, of course, we have lots of investment as well on on hardware design that would allow you to have more personal weight, so to speak. But at the very least, of course, you will have your own knowledge base that is maybe personal to you. You're seeing already many examples of these realized over the last maybe even years in the LLM space. And then perhaps there's another layer of knowledge which is more common to all the users for a given model that you could imagine having access to and enriching or enhancing the model capabilities without touching the weights. So that's very interesting, and getting to that would be awesome.

Speaker 226:08 - 26:54

这会——我的意思是，如果那是最好的方式，那我们总会找到办法，对吧？包括通过硬件来实现，当然，我们也在 hardware design（硬件设计）上投入了很多，这将使你能够拥有某种意义上更个性化的权重。但至少，你当然会有属于自己的 knowledge base，可能是只对你个人而言的。其实在过去几年里，甚至就在 LLM 领域，你已经能看到很多这类东西被实现出来的例子了。然后，也许还会有另一层知识，它对某个给定模型的所有用户都更通用；你可以设想模型能够访问这层知识，并通过它来丰富或增强模型能力，而无需改动权重。所以这非常有意思，如果能做到那一步会很棒。

Speaker 126:54 - 27:30

I feel like continued learning has been the topic du jour, everyone's talking about it. And you've you know, kind of a few interesting examples now and high profile examples of folks spinning out of, you know, OpenAI or other places and saying, you know, hey. I'm you know, sure. I mean, you can keep scaling what we're doing now, and I think that, you know, that no one's denying that those scaling laws are there, but they're saying, you know, it it feels like you need kind of almost a new research bet, you know, to to achieve, like, real continual learning and, you know, maybe it makes sense to pursue that outside of, you know, the kind of path of continually improving these core LLMs. Curious what you make of that whole dynamic and and, you know, yeah, maybe your your reflections on that.

Speaker 126:54 - 27:30

我感觉 continued learning（持续学习）已经成了当下最热的话题，几乎每个人都在谈它。你现在也看到了一些有意思的例子，而且还是一些高关注度的例子：有人从 OpenAI 或其他地方出来创业，然后说，嘿，我的意思是，当然，你可以继续扩展我们现在正在做的东西，而且我觉得没人否认这些 scaling laws（缩放定律）是存在的；但他们的意思是，你似乎几乎需要一种新的 research bet（研究押注），才能真正实现 continual learning，而且，也许把这件事放到不断改进这些核心 LLM 的路径之外去推进，会更合理一些。我很好奇你怎么看待整个这种动态，以及——对——也想听听你对此的整体看法。

Speaker 227:30 - 28:09

I was in Google Brain very early days and then moved to DeepMind in 2016. And at the moment, I think the the, you know, there is there is a challenge and an opportunity onto you want to obviously have investigate some research questions that might not be vipe for, hey, in the next three months, this makes it into the next training run. But at the same time, this cannot be very disconnected from the head, right, where the LLMs are moving. I mean, we're improving Gemini. I I mean, it's fascinating to see Flash outperforming Pro that of only few months ago, and that keeps happening.

Speaker 227:30 - 28:09

我很早期就在 Google Brain，后来在 2016 年转到 DeepMind。眼下我觉得，显然，既有挑战也有机会：你当然会想去研究一些问题，而这些问题未必适合被说成“好，接下来三个月内它就会进入下一轮 training run（训练轮次）”。但与此同时，这些研究也不能和前沿脱节，对吧，不能脱离 LLMs（大语言模型）正在前进的方向。我的意思是，我们一直在改进 Gemini。看到 Flash 超过几个月前的 Pro，确实很让人着迷，而且这种事还在不断发生。

Speaker 228:09 - 28:52

So keeping kind of at the at the head of capability, which which might enable or disable certain research, whilst having the protection for research and, of course, that's not multiyear anymore. Things are moving fast, but but kind of combining these two is kind of the magic of building these organizations. And all of us, of course, have different angles and have you know, can can kind of figure out how to bridge these and and identify the opportunity. That's a bit of what it takes, right, to, I mean, not have full visibility. This is too large of an organization, but have some intuitions and then be able to pull in these ideas eagerly sometimes, right, because it feels like the right thing to do.

Speaker 228:09 - 28:52

所以，一方面要尽量站在 capability（能力）前沿，因为这可能会决定某些研究能不能做；另一方面又要给研究保留空间。当然，这已经不是那种按多年周期推进的事了。事情变化得非常快，但把这两者结合起来，某种程度上就是构建这类组织的魔力所在。当然，我们每个人看问题的角度都不同，也都能想办法去搭起这两者之间的桥梁，并识别机会。这在某种程度上正是所需要的能力，对吧：并不是说你能拥有完整的可见性——组织太大了，不可能——而是你要有一些直觉，然后有时还要积极地把这些想法拉进来，因为那感觉像是正确的事。

Speaker 228:52 - 29:31

So that's really what defines actually organizations at that level, right, from a from a research perspective. So I can see from investment in robotics to, of course, the the the peak of the LLMs to research that either has made it or will make it through. Right? So but it's challenging. It's a it's it's it's resources are constrained, so it is it is an interesting trade off and and not one always you get right, but, I mean, I think it's it's a fascinating kind of different angle of research, not just what is the idea that will make it to the next paper or now into the model, but actually how to even organize this this whole, yeah, whole organization.

Speaker 228:52 - 29:31

所以从 research（研究）的角度看，这其实正是这种层级组织的定义性特征。我能看到的范围，从对 robotics（机器人）的投入，到当然还有 LLMs 的最前沿，再到那些已经落地或将会落地的研究成果。对吧？但这很有挑战。这是一个——资源是受限的，所以这确实是个很有意思的 trade-off（权衡），而且也不是每次都能做对。但我觉得，这是一种非常迷人的、关于 research 的不同维度：不只是“哪个想法能进下一篇 paper（论文）”或者“现在能进模型”，而是实际上要思考，究竟该如何组织这个——对，整个组织本身。

Speaker 229:31 - 29:32

It's fascinating.

Speaker 229:31 - 29:32

这很迷人。

Speaker 129:32 - 30:08

And this feels like one of the most interesting questions for for someone in a in a role like yours where it's hard not to feel excited about the, like, so many things that you can advance with these models today, and there's there's obviously so much going on. And I feel like, you know, even take an organization like OpenAI, they've kind of oscillated between like, hey, we should go. There's just so many low hanging fruits and things to go do on the AI side to, you know, now this kind of more focusing moment where it's like, god, we've just gotta really nail code and and catch up to to clawed code. I'm wondering how you think about the trade offs of, like, you know, focusing on the one thing and and, you know, having the org all rowing toward that versus maybe a broader surface area, all of which are are super interesting.

Speaker 129:32 - 30:08

这感觉像是一个对你这种角色的人来说最有意思的问题之一，因为你很难不为这些模型今天已经能推动的那么多事情而感到兴奋，而且显然现在有太多事情同时在发生。我觉得，哪怕拿 OpenAI 这样的组织来说，他们也在某种程度上来回摆动过：一方面会觉得，AI 这边有太多 low hanging fruits（低垂果实，容易摘的机会）和值得去做的事；另一方面现在又进入了一个更聚焦的阶段，像是，“天啊，我们真的必须把 code（代码能力）做好，并追上 clawed code。” 我很好奇，你是怎么思考这种 trade-off（权衡）的：是聚焦在一件事上，让整个 org（组织）都朝同一个方向划桨；还是保持一个更宽的 surface area（覆盖面），尽管其中每一块都同样非常有意思。

Speaker 230:08 - 30:27

You know, Google is in the unique place for a couple of reasons. First, we indeed have a lot of surface area in in Gemini at the moment. Right? This literally powering everything. But we have the advantage that it's already like, people, like the other parts of the organization, are completely bought into the LLM era.

Speaker 230:08 - 30:27

Google 之所以处在一个独特的位置，有几个原因。第一，Gemini 目前的确有很大的 surface area（覆盖面）。对吧？它实际上正在为一切提供动力。但我们的优势在于，组织里的其他部分，已经可以说是完全接受了 LLM era（大语言模型时代）。

Speaker 230:27 - 30:59

So in a way, they take the model and then they might do something. But if you feel like that's not the next way to advance frontier capabilities, then, you know, you can just rely that there's a very good group that will take the model to where it needs to go. Right? At the same time, we have stability from hardware procurement and Totally. Obviously, like, also investment of capital given, like, it's we're very end to end in terms of, you know, revenue streams and so on.

Speaker 230:27 - 30:59

所以从某种意义上说，他们会拿到模型，然后基于它去做一些事情。但如果你觉得那并不是推进 frontier capabilities（前沿能力）的下一种方式，那么你知道，你完全可以依赖一个非常优秀的团队，他们会把模型带到它需要去的地方。对吧？与此同时，我们在 hardware procurement（硬件采购）方面也有稳定性，完全是这样。显然，还有 capital（资本）投入方面的稳定性，因为我们在 revenue streams（收入来源）等方面是非常 end to end（端到端）的。

Speaker 230:59 - 31:37

So you you can probably push a little further the the risk taking for certain research areas, which which need to be done with taste as well. So you have kind of this it's not focused, but it's it's scalable because of how Google is organized. And then you can still invest in in innovation, which is at the very core of what we've always done. Right? Like, I mean, if if I look at Brain and DeepMind, the two organizations I've been part on of, like, now called Google DeepMind, which is which I appreciate, given I've obviously been in both over different periods of time, then I think, you know, like, there is in our DNA to keep innovating.

Speaker 230:59 - 31:37

所以你大概就可以在某些 research areas（研究领域）上把 risk taking（风险承担）再往前推一点——当然这同样需要判断力。于是你会拥有这样一种状态：它不是那种单点聚焦，但由于 Google 的组织方式，它是可扩展的。然后你仍然可以继续投资 innovation（创新），而这恰恰一直是我们所做之事的核心。对吧？比如说，如果我回看 Brain 和 DeepMind——这两个我都待过的组织，现在统称为 Google DeepMind，而我也很认同这个名字，毕竟我确实在不同时间段待过这两边——那么我会觉得，持续创新就是写在我们的 DNA 里的。

Speaker 231:38 - 32:19

But at the same time, I think what Gemini created is focused and unifying force, which was fascinating to do. It was very helpful that me and Jeff had known each other for many years and had gone on trips together, like, just for fun. Right? So I think that that time, though, was very special, and I think that's the the center being the Gemini kind of core modeling effort, being very focused on frontier capability, and then having kind of these inputs and outputs is a fairly, you know, reasonable way to to to go about being focused, but also being able to leverage, right, a bit of exploration, which might still be needed or not. Right?

Speaker 231:38 - 32:19

但与此同时，我认为 Gemini 所创造出来的是一种有聚焦性、也有整合力的力量，这一点很让人着迷。很有帮助的一点是，我和 Jeff 认识很多年了，还一起出去旅行过，就是单纯为了好玩。对吧？所以我觉得，那段时间非常特别，而我认为其中的核心，就是以 Gemini 这种核心建模工作为中心，非常专注于 frontier capability（前沿能力）；然后再配上一些这种输入和输出，这其实是一个相当——你知道——合理的做法：既能保持聚焦，同时也能利用一点探索性的东西，对吧，尽管这种探索最终可能仍然是需要的，也可能不需要。对吧？

Speaker 232:19 - 32:30

I mean, I I think do we need world models? I mean, if we make it work, definitely, we'll need it. If we don't, maybe it's okay. You know? But it's good to have the bets as well placed rightfully.

Speaker 232:19 - 32:30

我的意思是，我觉得，我们需不需要 world models（世界模型）？我的意思是，如果我们能把它做成，那我们肯定会需要它；如果做不成，也许也没关系。你知道吧？但把不同的赌注也合理地布好，终究是件好事。

Speaker 132:30 - 33:07

On the model side, maybe switching gears to to kind of just Gemini models and and, you know, the the path forward. You know, I think you called post training before still kind of a total greenfield, and I feel like what we've seen, you know, clearly, there's incredible progress on post training RL in, you know, coding and math. I think there's a new math problem solved hours before we we came on this podcast. What everyone's trying to, you know, figure out, I'm curious for your intuitions, is the characteristics of, like, the next set of domains where we'll see RL really take off. It feels like we're on this crazy exponential path on on the coding math side, and curious for your intuitions on on on what makes other domains, you know, good fits.

Speaker 132:30 - 33:07

说到模型这一侧，也许换个话题，谈谈 Gemini 模型本身，以及，你知道，接下来的路径。我记得你之前说过，post training（后训练）某种程度上仍然是一片 total greenfield（几乎完全未开垦的新领域）；而我的感觉是，从我们目前看到的情况来说，你知道，很明显，在 coding（编程）和 math（数学）上，post training RL（后训练强化学习）已经取得了惊人的进展。我想，就在我们录这期播客前几个小时，又有一个新的数学问题被解决了。现在每个人都在努力弄明白的一点——我很好奇你直觉上怎么看——是：接下来我们会看到 RL（强化学习）真正爆发的下一批领域，会具有什么样的特征。感觉 coding 和 math 这边正走在一条疯狂的指数级增长路径上，所以我很好奇，你对于什么样的其他领域会是好的适配对象，有什么直觉判断。

Speaker 233:07 - 33:18

Yeah. I mean, it's a good question. I I I mean, one must be quite humble in terms of the models are really good at many things. So so it's very hard to say

Speaker 233:07 - 33:18

对，我是说，这是个好问题。我——我是说，在这件事上人必须相当谦逊，因为模型在很多事情上都已经非常擅长了。所以，所以这很难说。

Speaker 133:18 - 33:19

Insanely good.

Speaker 133:18 - 33:19

强得离谱。

Speaker 233:19 - 33:22

Oh, yeah. You know? Like, yeah. This this doesn't work at all. Right?

Speaker 233:19 - 33:22

哦，对。你知道吗？就像——对——这完全不是那么回事。对吧？

Speaker 233:22 - 34:07

Like, I mean, almost bioprompting and a bit of, like, you know, smart prompting, maybe building the right system. Lots of amazing things, at least on the digital world, as I call it, like digital AGI, if you will, are very impressive. I think there's when I said that post training is greenfield, I think that's less about, like, a capability that I feel is kind of very far from being, you know, at the level that is acceptable from a, hey. Like, this is this is fairly intelligent and fairly advanced and and more about just mechanistically looking at how how some other efforts that have leveraged kind of imitation learning or pretraining plus post training. Right?

Speaker 233:22 - 34:07

我的意思是，几乎只靠 bioprompting，再加上一点——你知道——smart prompting（聪明的提示设计），也许再搭一个合适的 system（系统），就已经能做出很多惊人的事情了。至少在我所说的 digital world（数字世界）里——如果你愿意的话，也可以叫 digital AGI（数字 AGI）——这些表现都非常令人印象深刻。我想，当我说 post training 是 greenfield 的时候，我想表达的，与其说是我觉得某种 capability（能力）距离“可接受水平”还很远——那种“嘿，这已经相当智能、相当先进了”的水平——不如说，更多是在机制层面上看：其他一些工作是如何利用 imitation learning（模仿学习），或者 pretraining（预训练）加上 post training 的。对吧？

Speaker 234:07 - 34:45

Like and how much investment there has been compute wise in post training versus the, relatively speaking, smaller amount that today's models use currently. And, I mean, the reason there is kind of clear, and I'm not sure it's easy to fix, but the fact that even if you take a very narrow domain like Go, as you play the game of Go in reinforcement learning, right, you you have now a system that can play it. It play it places a few moves. And a few moves into the game, that scenario, that game is now unique. I mean, you've never had seen that particular configuration.

Speaker 234:07 - 34:45

比如说，以及从算力投入的角度看，post training 上投入了多少，相比之下，如今的模型目前在这方面使用的量其实还相对较小。我的意思是，这里面的原因其实很清楚，我也不确定它是否容易解决；但事实是，即使你拿一个非常狭窄的领域，比如 Go（围棋），当你在 reinforcement learning（强化学习）里下 Go 的时候，对吧，你现在会有一个能下棋的系统。它下出几步棋，而在几步之后，这个局面、这盘棋，就已经变成独一无二的了。我的意思是，你以前从没见过那个特定的配置。

Speaker 234:45 - 34:57

So so the environment's complexity, as you play, makes kind of trading data infinite for free. Right? Like, I mean, you play a few moves. Now you're in a new situation. And so you can learn from it.

Speaker 234:45 - 34:57

所以，随着你去玩，环境的复杂性会让某种交易数据几乎可以免费地变成无限的。对吧？比如说，你走了几步之后，现在就进入了一个新的情境。于是你就可以从中学习。

Speaker 234:57 - 35:28

And the more you play, the more hours you put into your RL algorithm, the more knowledge you gain. Right? So that is what we've seen kind of in the game in the reinforcement learning era. And in LLMs, we are data limited, and it's what is the source of infinite complexity is not so clear. I mean, are some ideas, but I think cracking that recipe could be big, at least in terms of the the beauty of the algorithm.

Speaker 234:57 - 35:28

而且你玩得越多，往你的 RL（强化学习）algorithm（算法）里投入的时间越多，你获得的知识也就越多。对吧？这有点就是我们在强化学习时代的游戏里看到的情况。而在 LLMs（大语言模型）里，我们是受数据限制的，至于那种无限复杂性的来源到底是什么，还不是那么清楚。我是说，也有一些想法，但我觉得如果能破解那个配方，意义会很大，至少就 algorithm（算法）的美感而言是这样。

Speaker 235:28 - 35:54

It would be like it would be much more satisfying knowing how this has worked in the past to see it work now in LLMs. Now is it needed? Are the capabilities not there? That would be hard to to say. But since you asked about which capabilities, I think I think the capabilities in terms of what the models do that are most fascinating to me is, like, I call these meta capabilities.

Speaker 235:28 - 35:54

如果知道这件事过去是怎样起作用的，然后现在又看到它在 LLMs（大语言模型）里起作用，那会让人感觉满足得多。那现在它是必需的吗？是不是能力还没到？这很难说。但既然你问的是哪些能力，我觉得最让我着迷的，其实是模型所展现出的那类能力，我把它们叫作 meta capabilities（元能力）。

Speaker 235:54 - 36:12

They're not math or coding. They're, like, kind of what are the traits or attributes of intelligence and can these models do it. Right? So, actually, the ability to continually learn or learn from experience very efficiently, that would be one, you know, in context learning, we used to call them meta learning, whatever. Right?

Speaker 235:54 - 36:12

它们不是数学或编程能力。它们更像是：智能有哪些特质或属性，这些模型能不能做到。对吧？所以，持续学习的能力，或者说非常高效地从经验中学习的能力，就是其中之一。你知道的，in-context learning（上下文学习），我们过去也叫它 meta learning（元学习），诸如此类。对吧？

Speaker 236:12 - 37:04

These are this is a capability that I can sort of measure or feel, and probably it's not super super good yet, right, for example. Of course, instruction following is a capability that you could argue is the ultimate capability because if I ask a model, b a g I, it either follows that instruction or or or doesn't. Right? So so but, I mean, trying to look at these capabilities that are less about one particular domain or or, you know, vertical and more like, okay, that is intelligent behavior. And so the ability to to learn and adapt rather than the ability to, you know, be a professional player or IMO gold medalist or whatnot is what I think fascinates me the most when I look at new releases and models that we are getting our hands onto every time we we train a new model, etcetera.

Speaker 236:12 - 37:04

这是一种我多少能够测量或感受到的能力，而它大概还没有特别特别好，对吧，比如说。当然，instruction following（指令遵循）也是一种能力，而且你可以说它是终极能力，因为如果我让一个模型“be AGI”，它要么遵循了这个指令，要么没有。对吧？所以我的意思是，我想看的是那些不那么局限于某一个特定 domain（领域）或 vertical（垂直赛道）的能力，而更像是：好，这就是智能行为。因此，相比于成为职业选手、IMO 金牌得主之类的能力，那种学习和适应的能力，才是每次我们看到新发布、每次我们拿到新模型、每次我们训练出一个新模型时，最让我着迷的东西。

Speaker 137:04 - 37:06

Do you have a go to way to, like, test that?

Speaker 137:04 - 37:06

你有没有一种自己常用的方法来测试这一点？

Speaker 237:06 - 37:20

I like games. So I usually my defined kind of a new game in context, right, or or, you know, just this is a fairly classic way to do it. Of course, you need to be careful because if the game is in the weights

Speaker 237:06 - 37:20

我喜欢游戏。所以我通常会在上下文里临时定义一个新游戏，对吧，或者说，你知道，这其实是一个相当经典的做法。当然，你得小心，因为如果这个游戏已经进了 weights（模型权重）

Speaker 137:20 - 37:24

If anyone else has put that game on the Internet, you're you're in trouble.

Speaker 137:20 - 37:24

如果其他任何人已经把那个游戏放到 Internet 上了，那你就麻烦了。

Speaker 237:24 - 37:27

Yes. But I remember, I think there was an eval. It does not exactly how I do it. Yeah.

Speaker 237:24 - 37:27

对。但我记得，我觉得当时有一个 eval（评测）。它并不完全是我现在会采用的那种做法。对。

Speaker 137:27 - 37:34

Actually, I'm be I realized I'm being rude by asking you to talk about it because then this podcast will be out there, and then the the next models will know how to do it. No problem.

Speaker 137:27 - 37:34

其实，我意识到我这么问你有点不礼貌，因为如果你谈这个，这期 podcast 就会被放出去，然后下一代 models（模型）就会知道该怎么做了。没事。

Speaker 237:34 - 37:42

Yeah. Maybe. Yeah. Hopefully, we we unless we need to crack world models, right, for unless it's, you know, fully transcribed, which I'm sure it will be. So maybe we we don't even need that.

Speaker 237:34 - 37:42

对，也许吧。对。希望我们不需要去破解 world models（世界模型），对吧，除非它是——你知道——被完整转录出来，而我相信肯定会这样。所以也许我们甚至都不需要那个。

Speaker 237:43 - 38:04

But I really like an eval. I think this eval is actually very old and, I mean, way older than, like, LLMs. It must be, like, let's say, 2015 minus, like, probably before 2015. And the eval was simple. You give the instruction manual for I think it was civilization, the game.

Speaker 237:43 - 38:04

但我真的很喜欢一个 eval（评测）。我觉得这个 eval 实际上非常老了，而且，我是说，比 LLMs（大语言模型）早太多了。它大概得是——比如说——2015 年之前，很可能在 2015 年以前。这个 eval 很简单：你把我记得应该是游戏 Civilization 的说明手册给它。

Speaker 238:05 - 38:26

And then, you know, you're meant to be able to play. Right? So so I like that style of eval. Like, I'm I'm you know, you can kind of create this differently, but that's one test that I like the to test the models, and they're not that good, especially as the games become either something I just invented and whatnot. And the ability there is dual.

Speaker 238:05 - 38:26

然后，按理说，你就应该能够玩这个游戏了，对吧？所以我喜欢这种风格的 eval（评测）。我是说——你当然可以用不同方式来构造它——但这是我很喜欢用来测试 models（模型）的一种测试，而且它们做得并没有那么好，尤其当游戏变成某种我刚发明出来的东西之类的时候。这里的能力是双重的。

Speaker 238:26 - 38:56

Right? Like, you could imagine, first, can you understand the instructions, and from there, follow follow the instructions to play the game. But there's another aspect, which is as you play the game, you learn to play it better. So can you do you see that happening in practice? And it's impressive, but, if you go very out of distribution of a game that could could be real but still not in the training, this one in particular is not an easy test to for the models to pass, for example.

Speaker 238:26 - 38:56

对吧？你可以想象，第一步是：你能不能理解说明；然后在此基础上，按照说明去玩这个游戏。但还有另一方面，就是你在玩游戏的过程中，会学着把它玩得更好。所以你会不会看到这种事情在实践中发生？这很令人印象深刻，但是，如果你把一个游戏设定得非常 out of distribution（分布外）——它可能看起来像真实存在的游戏，但仍然不在训练数据里——那么特别是这一类测试，对 models（模型）来说并不容易通过，比如说。

Speaker 238:56 - 39:11

Right? There's many others, but this one I really like, and it brings games in a in a way that is useful, yet you you will not train on the game at all. It's not about Go where you only train on Go. It's like the opposite. But I like this kind of thinking for for a capabilities point of view.

Speaker 238:56 - 39:11

对吧？当然还有很多别的测试，但这个我特别喜欢。它以一种有用的方式把 games（游戏）带了进来，同时你又根本不会在这个 game（游戏）本身上训练。它不是像 Go 那样，你只在 Go 上训练；它更像是相反的路子。但从 capabilities（能力）视角来看，我喜欢这种思路。

Speaker 139:11 - 39:43

Mean, obviously, feels like there's been a lot of effort. There's, you know, games where where kind of the conical first example of, like, an verifiable domain, and, you know, you've had this with coding now and math, and and I I wonder if, like, a big kind of outstanding question in the field is is the extent to which we'll see, like, generalization across RL. Right? It feels like sometimes these models will climb incredibly well on the domain that we're r l ing on, and, you know, you'd have a better insight into me into whether you see that then, you know, flow through to other aspects of the model. But, like, I I you know, it feels like in in some ways, it it it's it's almost an interesting, you know, we talked about, you know, the most general bitter lesson type moments.

Speaker 139:11 - 39:43

我的意思是，很明显，感觉这个方向已经投入了很多努力。你知道，games（游戏）某种程度上算是 verifiable domain（可验证领域）的典型早期例子，而现在 coding（编程）和 math（数学）里你也看到了这一点。我在想，这个领域一个很大的悬而未决的问题，可能是我们会在多大程度上看到 RL（强化学习）上的泛化。对吧？感觉有时候这些 models（模型）会在我们正在做 RL 的那个 domain（领域）上爬升得特别好，而你大概会比我更清楚，这种提升是否会进一步流向模型的其他方面。但我觉得，从某种意义上说，这几乎也是个很有意思的问题——你知道，我们刚才谈到过那种最普遍意义上的 bitter lesson（苦涩教训）时刻。

Speaker 139:43 - 39:57

This is a a moment of, like, you know, find data in a particular domain or all against that data and, like, improve the model on that one thing. I'm curious, does that feel like a fair characterization of of what's happening today or and and, you know, yeah, have you seen kind of signs of that of that generalization?

Speaker 139:43 - 39:57

这有点像是这样一个时刻：比如说，在某个特定领域里找到数据，或者围绕这些数据做各种工作，然后把模型在这一件事上进一步改进。我很好奇，这样描述今天正在发生的事情，算不算公平？还有，你知道，你是否已经看到了那种泛化（generalization）的迹象？

Speaker 239:57 - 40:48

Yeah. You look hard for for sources of hard problems that will induce indeed either deep reasoning that we see generalization from, actually. So, like, reasoning models reason mostly through, you know, let's say, coding and math, but then you see how how they reason about, you know, a question about whatever. Like, know, I just recently moved back to The US, so I asked a lot of questions about moving and, like, taxes and whatnot, and you can see the reasoning is is pretty good, and that's a very hard hard to believe that there's it's been, you know, trained on on that kind of question. So we're seeing definitely generalization, and you're you're creatively trying to get, you know, more data that induces, you know, deep reasoning and also deep, indeed, agentic behavior.

Speaker 239:57 - 40:48

对。你会努力去寻找那些真正困难问题的来源，因为它们确实会诱导出深度推理（deep reasoning），而我们实际上也确实从中看到了泛化。比如，推理模型（reasoning models）主要是在 coding 和 math 上进行推理训练，但随后你会看到它们如何去推理各种别的问题。像我最近刚搬回 The US，所以我问了很多关于搬家、taxes 之类的问题，你能看出它的推理其实相当不错，而且很难相信它曾经专门在这类问题上受过训练。所以我们肯定看到了泛化，而你也会有创造性地去想办法获得更多能诱导出深度推理，以及更深层 agentic behavior（能动式行为）的数据。

Speaker 240:48 - 41:49

Right? That's part of, like, the the, recent improvements that we're seeing is just finding those sources. But being limited to just verifiability is definitely unsatisfying because most of the times for the things I want the model to do, I would not even be able to write a verifier if I had all the time in the world, right? I think, but it feels like there is a bit of an asymmetry between creating the solution and evaluating the solution. And evaluating the solution is indeed simpler than creating the solution, which arguably, if you think of some arguments on, for example, NP hard problems, which are very hard to create solutions for but trivial to verify, it gives me hope that the models themselves will be able to judge even if there's no, you know, fully verifiable way to judge, you know, whether a piece of code creates a beautiful game or an engaging game, you know, all these kinds of things.

Speaker 240:48 - 41:49

对吧？我们最近看到的改进之一，某种程度上就是找到了这些数据来源。但如果只局限于可验证性（verifiability），那肯定是不令人满意的，因为大多数时候，对于我想让模型去做的事，就算给我无限多时间，我可能都写不出一个 verifier（验证器），对吧？不过我觉得，这里面似乎确实存在一种不对称：创造解决方案和评估解决方案并不一样，而评估解决方案确实比创造解决方案更简单。某种意义上说，如果你想到一些论证，比如 NP hard problems，这类问题很难构造解，但验证起来却很容易，这会让我抱有希望：即便没有一种完全可验证的方法，去判断一段代码是否创造了一个 beautiful game 或者一个 engaging game，或者类似这些东西，模型本身也许依然能够做出判断。

Speaker 241:50 - 42:15

And I think that's a a very interesting research and also in practice seeing lots of impact there already from these kinds of ideas. So the more we do that, the more we can train on more domains. The question is, do you even need that, or is just focusing on certain math and coding problems enough to induce this meta capability of being intelligent at problem solving. Right? I don't know.

Speaker 241:50 - 42:15

我认为这是一个非常有意思的研究方向，而且在实践中，这类想法也已经带来了很多影响。所以我们越是这么做，就越能在更多领域上进行训练。问题在于：你到底是否真的需要这样做，还是说，只专注于某些 math 和 coding 问题，就已经足以诱导出这种在问题求解上体现为智能的元能力（meta capability）？对吧？我不知道。

Speaker 242:15 - 42:17

I mean, I think it could go either way.

Speaker 242:15 - 42:17

我的意思是，我觉得两种情况都有可能。

Speaker 142:17 - 42:19

Do you have a gut instinct one way or another?

Speaker 142:17 - 42:19

你内心有没有更偏向某一边的直觉判断？

Speaker 242:19 - 42:55

I I want to believe, like, you you need to train on a broad distribution, and that should help the model. But it is very strong how much you get generalization, possibly through pretraining. So maybe it depends on the level of ambition of superhuman or what's the upper bound that these models can achieve. But, ultimately, I I feel like training kind of as much in distribution as possible seems desirable in machine learning. So that's, you know, one one of of the quests for for researchers to to crack in the next few months and years.

Speaker 242:19 - 42:55

我愿意相信，你需要在一个广泛的分布上进行训练，而那应该会帮助模型。但通过 pretraining（预训练）所获得的泛化能力之强，也确实非常惊人。所以也许这取决于你的目标有多大，比如是追求 superhuman（超人类）水平，还是这些模型最终能达到的上限在哪里。不过归根结底，我还是觉得，在 machine learning 里，尽可能多地做 in-distribution（分布内）训练，看起来是可取的。所以，这也是研究者们在未来几个月和几年里需要攻克的任务之一。

Speaker 142:55 - 43:29

One thing a lot of our our listeners are thinking through, you know, on that are that are founders or building companies is figuring out, you know, the extent to which they should be doing work at the model layer versus, you know, purely building the the application on top. And so, I'm you know, wondering, there's obviously been a trend of some companies are doing their own RL on top of models and saying, hey, there's this specific class of problem we can go solve or even obviously, maybe most notably cursor is kind of, you know, in the coding space, been like, we need to go train our own base model. I'm curious, your intuition on, like, when you know, if that does make sense or when that might make sense and and and when it doesn't.

Speaker 142:55 - 43:29

我们很多听众——尤其是那些 founders 或正在 build companies 的人——都在思考一件事：他们到底应该在多大程度上投入 model layer（模型层）的工作，而不是纯粹只是在上面构建 application（应用）。所以，我在想，显然已经有一种趋势是，一些公司会在现有模型之上做自己的 RL（强化学习），然后说，嘿，这里有一类特定问题是我们可以解决的；再比如，可能最显著的例子就是 cursor，在 coding 领域里某种程度上采取了这样的立场：我们需要去训练自己的 base model（基础模型）。我很好奇，你的直觉是，什么时候这么做是有意义的，或者说在什么情况下可能是合理的，以及什么时候又不合理。

Speaker 243:29 - 44:29

What I would tell folks is the value, and we discussed this a little bit, the value of evaluations and a and a as a sequence of, you know and and data, basically. Like, those two are very tied to each other. That is a huge amount of value there. So no matter even if you don't build your own model because, you know, maybe it's very early stage or or you just can't get access to the talent resources, all the things, thinking very carefully about how to evaluate progress on whatever the thing you try to do will will be actually very valuable, right, and something that might even become a standard eval that folks like ourselves might even adopt or or or monitor. And, of course, the value of data is immense given what we were discussing about post training in particular and the scarcity of, you know, enough data to be able to run these kind of months of, you know, Go training that we happily did like a few years ago.

Speaker 243:29 - 44:29

我想告诉大家的是，评估（evaluations）的价值——这个我们刚才也稍微谈到过——以及把它视为一种序列、你知道的、以及数据（data），本质上来说，这两者是紧密绑定在一起的。这里面有巨大的价值。所以即使你不自己构建模型，因为也许你还处在很早期阶段，或者你拿不到相应的人才资源，诸如此类，认真思考如何评估你想做的事情的进展，实际上都会非常有价值，对吧？而且这甚至可能发展成一种标准 eval，像我们这样的人都可能会采用、或者持续监测。当然，数据的价值也极其巨大，尤其考虑到我们刚才讨论的 post training，以及你知道的，足够数据的稀缺性——否则就没法进行那种我们几年前还很乐意一跑就是几个月的 Go training。

Speaker 244:30 - 45:21

So I would say that's where the opportunity is, and I know there's a lot of energy in the space as well in terms of, you know, people that are building. At the same time, I think building on top of a model, even though the model capabilities will keep moving, and, again, not being obviously an investor a professional investor or or or product person, actually, but even just focusing on something that you truly believe in might create kind of some opportunity for you to have that space at your disposal, understand it, you know, get the users, get critical mass. And if it's something that others are not focused on, big big players, let's say, I feel like there's a lot of value to be created by specializing even the product, even if you don't do any of the other things.

Speaker 244:30 - 45:21

所以我会说，机会就在这里。我也知道，这个领域现在有很多能量，很多人在做构建。同时我认为，基于模型之上去构建，即使模型能力会持续变化——再说一次，我显然不是投资人，不是专业投资人，也不是产品人员——但哪怕只是专注于某个你真正相信的方向，也可能给你创造某种机会，让你占据那个空间，理解它，获得用户，达到 critical mass（关键规模）。如果这是一个其他人、比如说大型玩家，并没有重点关注的东西，我觉得即使你不做其他那些事，光是把产品做得足够专业化、专门化，也能创造很多价值。

Speaker 145:21 - 45:56

Though it seems almost like certainly in the early days, you specialize the product, you know, you build on top of the models, get to some level of scale, learn the evals. And I think a lot of these companies are starting to try and figure out, like, do we, you know, do we then use that to, you know, to to post train a model or to, you know, to to to do something. And obviously, the the trade off of that is as as these models generalize and the capabilities improve, they're never gonna, you know, be training across the broad swath of things that, like, the largest labs do. And so you're probably in a in a a treadmill of every two, three months. Even if you get slightly ahead of state of the art, you probably have to keep constantly redoing it.

Speaker 145:21 - 45:56

不过看起来几乎像是，至少在早期阶段，你先把产品做专门化，基于模型之上去构建，做到一定规模，学会做 evals。然后我觉得，很多这类公司现在开始尝试搞清楚：我们是不是接下来要利用这些，去对模型做 post train，或者去做点别的什么。当然，这里面的权衡在于，随着这些模型越来越泛化、能力不断提升，它们终究不可能像最大的那些 labs 那样，在那么宽广的一大片任务上都持续训练。所以你大概率会处在一个每两三个月就要重新来一遍的 treadmill（跑步机式循环）里。即使你稍微领先于 state of the art（当前最先进水平），你可能还是得不断重复去做。

Speaker 245:56 - 46:15

Yeah. Perhaps the angle here to and again, another topic we discussed. Right? As as these models are more capable to continually learn or or use a knowledge base that is possibly very complex, then, you know, building that knowledge base for a certain application can also be is not like training weights. It's a bit more efficient.

Speaker 245:56 - 46:15

对。也许这里还有一个角度——这也是我们讨论过的另一个话题。对吧？随着这些模型越来越有能力持续学习，或者使用一个可能非常复杂的 knowledge base（知识库），那么，针对某个特定应用去构建那个 knowledge base，也可能是一条路。而且这不像训练 weights（权重）那样，它会更高效一些。

Speaker 246:16 - 46:40

But there might be a lot of kind of uniqueness that you can add to it that might be just make protect you, right, from, let's say, someone who hasn't spent a lot of time to to think carefully about how that interacts with current models. And I that capability will only get better. So perhaps that angle is a bit more scalable as well for for kind of a bit of early players in the game.

Speaker 246:16 - 46:40

但你也许可以往里面加入很多独特性，这种独特性可能会保护你，对吧？比如说，相比那些没有花很多时间认真思考它该如何与当前模型交互的人。我觉得这种能力只会变得越来越强。所以也许对于这个领域里稍早进入的玩家来说，这个角度也更具可扩展性。

Speaker 146:40 - 46:53

I I mean, I guess, obviously, it seems that there's such a a a compelling path forward on so many of the research directions we've talked about. What's the capability you're, like, least sure how to get to from here? I guess, you know, where the where the where you maybe don't see yet see the research path, but what you think is pretty important.

Speaker 146:40 - 46:53

我的意思是，我想，显然，我们刚才谈到的那么多研究方向，看起来都有很有说服力的前进路径。那么你最不确定该如何从现在走到的能力是什么？也就是说，在哪些地方你也许还看不清研究路径，但你觉得它又相当重要？

Speaker 246:53 - 47:44

I think I see the research path for for quite a few capabilities. I mean, I think I mean, the one that's fascinated me the most over the years when I especially when I joined DeepMind in 2016, meta learning or the ability of the mouse to learn, that is I mean, that's such a beautiful capability since you work on machine learning. So that one is one that I feel like there's a path, and and there's some base like now, and it will keep improving. But perhaps another one that I feel it might be a bit more I mean, there might be a path, but I'm not sure how practical it is at the moment is I I think people mention, hey, like, can these models truly innovate? And I think that part is important because, for instance, when you work on, hey.

Speaker 246:53 - 47:44

我觉得，对于相当多种能力，我其实能看到研究路径。我的意思是，这些年来最让我着迷的一项——尤其是我在 2016 年加入 DeepMind 的时候——是 meta learning（元学习），也就是像老鼠那样学习的能力。因为你是做 machine learning 的，所以这真的是一种非常优美的能力。所以这一项，我觉得是有路径可走的，而且现在已经有一些基础了，之后也会持续改进。不过也许还有另一项，我觉得它可能稍微更……我的意思是，也许也有一条路径，但我现在不太确定它在实践上有多可行。就是人们会提到：这些模型真的能够创新吗？我觉得这一部分很重要，因为比如说，当你在做，嗯。

Speaker 247:44 - 48:12

Like, can you can you come up with new ideas in machine learning and and then we implement them, coding is excellent, deploy them, and so on. Right? We're experimenting with these, you know, many folks are quite a bit. Truly taking all the knowledge we have now and innovating with taste is something that is hard to come by even from for humans is is fairly special and, to be honest, sometimes random. It's not like, this person is so smart.

Speaker 247:44 - 48:12

比如说，它能不能在 machine learning 里提出新的想法，然后我们把它们实现出来，coding 很出色，再部署出去，等等，对吧？我们现在很多人其实已经在相当多地做这类实验了。真正把我们现在拥有的所有知识整合起来，并带着品味去创新，这件事即使对人类来说也很罕见，是相当特殊的，而且说实话，有时候还带有随机性。并不是说，这个人就是特别聪明。

Speaker 248:12 - 48:37

Look. I mean, you just 10,000 people are trying and you obviously pick the one that was right and then glorify it. Right? So I think that ability is probably quite important for certain things like self improvement, and yet, I mean, it's obviously difficult to to try to even evaluate. And when it when something is hard to evaluate, it probably means it's hard to also climb on.

Speaker 248:12 - 48:37

你看，我的意思是，1 万个人都在尝试，而你显然只挑出那个做对的人，然后再把它神化。对吧？所以我觉得，这种能力对某些事情——比如 self improvement（自我改进）——可能相当重要；但与此同时，显然连评估它都很困难。而当一件事很难评估时，这大概也意味着，它同样很难被系统性地提升。

Speaker 248:37 - 48:48

So the ability to innovate in any aspects, but specifically on science, for example, is is a good one that I think more progress is required.

Speaker 248:37 - 48:48

所以，在任何方面进行创新的能力——尤其是比如在科学上——都是一个很好的例子，我认为这方面还需要更多进展。

Speaker 148:48 - 49:02

I mean, obviously, feel like Move 37 was a canonical example of this in the previous world. Like, is there anything you've seen recently that feels closest to this? I mean, I think even before we started recording, I guess, you know, OpenAI talked about this this this kind of combinatorial geometry problem they they just solved.

Speaker 148:48 - 49:02

我的意思是，很显然，Move 37 在之前的世界里算是这类事情的一个经典案例。比如说，你最近有没有见到什么最接近这种感觉的东西？我是说，甚至在我们开始录制之前，我想你也知道，OpenAI 就提到过他们刚刚解决的那种 combinatorial geometry problem（组合几何问题）。

Speaker 249:02 - 49:43

If I look inward to machine learning, there is that's that's kind of the point. I don't think I've seen truly kind of outstanding ideas that a model has generated yet, but I am sure I will very soon because there are some insights and and ways in which the models understand how, let's say, a model is being trained that feels superhuman because mechanistically, these these models have access to a bandwidth of information information we don't. So maybe that's part that part has been impressive. But I would like to see it at the ideal level as well, that level of impressiveness. And machine learning is the obvious thing I can Yeah.

Speaker 249:02 - 49:43

如果我把视角转回 machine learning（机器学习）内部，这其实正是问题所在。我觉得我还没有见到模型真正产生那种特别卓越的想法；但我确信很快就会看到，因为在某些洞见，以及模型理解一个模型是如何被训练的方式上，已经让人感觉有点 superhuman（超人类）了——因为从机制上说，这些模型能够获取一种我们所不具备的信息带宽。所以也许那一部分已经很令人印象深刻了。但我也希望能在 idea（想法）层面看到同样程度的惊艳。而 machine learning 显然是我可以——嗯。

Speaker 149:43 - 49:43

Of course.

Speaker 149:43 - 49:43

当然。

Speaker 249:43 - 49:48

More accurately evaluate. Right? So so yeah. More to do.

Speaker 249:43 - 49:48

更准确地评估。对吧？所以，是的，还有更多工作要做。

Speaker 149:48 - 50:10

Yeah. How do you reason about, like, you know, when we get to this level of, yeah, genuine insights into into machine learning research and and kind of this, like, world of of worker self improvement? I'm curious how you reason about, like, what that even, you know, or or how you even think about what that looks like over time. And, you know, even just, like, basic questions like, does the bitter lesson still hold? Or, you know, how what happens when, like, we we get into that world?

Speaker 149:48 - 50:10

对。你会怎么思考这样一件事：当我们达到这个层级时，也就是，对 machine learning research（机器学习研究）产生真正洞见，以及进入某种 worker self improvement（工作者自我改进）的世界时——我很好奇，你会如何理解那到底意味着什么，或者说，你会如何去想象它随着时间推移会呈现出什么样子。再比如，哪怕只是一些基础问题：bitter lesson 还成立吗？或者说，当我们进入那样一个世界时，会发生什么？

Speaker 150:10 - 50:12

I I love to hear you just riff on

Speaker 150:10 - 50:12

我很想听你就这个话题自由展开谈谈。

Speaker 250:12 - 50:27

that. There's certain efficiency level that probably will be enhanced. So so, I mean, there's a level in which you as as the the researcher or engineer use these tools to enhance your own productivity. We we've seen that a lot.

Speaker 250:12 - 50:27

对，会有某种效率水平很可能还会继续提升。所以我的意思是，研究者或工程师可以在某个层面上用这些工具来提升自己的生产力。我们已经看到很多这样的情况了。

Speaker 150:27 - 50:37

No. It's always impressive to talk to someone at the cutting edge of their field, and, you know, they're always like, you know, yeah. The numbers always vary, but some some pretty large percent improvement in in productivity across the board.

Speaker 150:27 - 50:37

是的。和某个领域最前沿的人交流总是很令人印象深刻，而且，你知道，他们通常都会说，嗯，对。具体数字总会有差异，但整体来看，生产力提升的幅度往往是相当可观的百分比。

Speaker 250:37 - 50:55

Yeah. So I think that one is is already happening and and obviously very powerful, but there's gonna be certain, you know, almost physical limitations to how much this process can keep going. Right? Because, I mean, models need to be trained. There's energy, hardware limitations.

Speaker 250:37 - 50:55

对。所以我觉得这一点其实已经在发生了，而且显然非常强大，但这个过程究竟还能持续推进多少，肯定会受到一些、你知道的，近乎物理层面的限制。对吧？因为模型需要训练，而训练又受制于能源和硬件限制。

Speaker 250:56 - 51:32

So I definitely I'm very keen to see what kinds of problems that are to be kind of more, let's say, automated and enhanced can be done more autonomously. But at the same time, there's probably gonna be a natural limit to to the speed at which things can happen and also a natural upper bound. Right? Certain things I mean, that was already more than a year ago. Someone reflected something on me, which I now feel very much, which is I mean, at the at the point a model writes English better than you, I mean, that's maybe too good, and it shouldn't be that good.

Speaker 250:56 - 51:32

所以我确实非常想看看，哪些原本需要被进一步 automated（自动化）和 enhanced（增强）的任务，能够以更 autonomous（自主）的方式完成。但与此同时，事情发生的速度大概也会有一个自然极限，同时也会有一个自然上限。对吧？有些事情——我的意思是，一年多以前就已经有人点醒过我一件事，而我现在非常认同——那就是，一旦模型写英文写得比你还好，我的意思是，这也许就已经“好过头了”，它不应该好到那种程度。

Speaker 251:32 - 52:14

Right? And and I'm like, okay. That that's an interesting realization that even if you could improve that capability and if maybe there's no ceiling or the ceiling is, you know, still far away, it might not even be that we need to see that ceiling. So there's the performance of the whole system overall, which is very impressive already, and there might be upper bounds, obvious obvious upper bounds in some cases. But, yeah, I think the the physical limits on the models and how you train them, even if you think we knew exactly the recipe, we could iterate very quickly and train the next generation models, There is some acceleration, but there's some upper bounds and rate limits that are still fairly fundamental.

Speaker 251:32 - 52:14

对吧？然后我就想，哦，这倒是个很有意思的认识：即便你还能继续提升这种能力，而且也许根本没有天花板，或者说天花板还很远，也未必意味着我们一定需要看到那个天花板。所以，整个系统的总体 performance（性能）已经非常令人印象深刻了，而且在某些情况下也可能存在明显的上限，确实是很明显的上限。不过，是的，我认为模型本身以及你训练它们的方式所受的物理限制，即便你认为我们已经完全知道 recipe（配方），可以非常快地迭代并训练下一代模型，虽然确实存在一些加速空间，但仍然有一些上限和 rate limits（速率限制），而且这些限制在相当大程度上是基础性的。

Speaker 152:14 - 52:28

Well, I always like to end my interviews with a with a a a quick fire round where I basically just stuff in all the broad questions that I haven't had time to fit in elsewhere. And so maybe to to kick things off, I'm curious, what's what's one thing you've changed your mind on in AI in the last year?

Speaker 152:14 - 52:28

嗯，我总喜欢在采访结尾来一个 quick fire round（快问快答），把那些来不及放到前面聊的宏观问题一股脑都塞进去。所以也许先从这个开始吧：我很好奇，过去一年里，在 AI 这个领域，有哪一件事是你改变了看法的？

Speaker 252:28 - 52:57

What I've changed my mind, I think the fact that even though I want to believe that training on a broad distribution is is probably gonna enhance the model, training on narrow kind of points of of great difficulty like, you know, maths or coding creates this generalization. I think that is not something I quite predicted to work as well as it as it did.

Speaker 252:28 - 52:57

我改变看法的一点是：虽然我一直愿意相信，在广泛分布的数据上训练，大概率会增强模型，但在一些范围较窄、却极其困难的点上——比如 maths（数学）或 coding（编程）——进行训练，居然会带来这种 generalization（泛化）能力。我觉得这一点的效果之好，是我之前没有完全预料到的。

Speaker 152:57 - 53:05

I think Democet at IO that we're, you know, at the the foothills of the Singularity and and AGI could come in the next few years. Do you feel similarly?

Speaker 152:57 - 53:05

我觉得 Democet 在 IO 上说，我们现在正处在 Singularity（奇点）的山脚下，而 AGI 可能会在未来几年内到来。你也有类似的感觉吗？

Speaker 253:06 - 53:21

I feel similarly. Yeah. And I'll say more. Right? Like, if you I mean, even with someone in the field, like, close to these models and neural nets in general, if if seven years ago I mean, I'm using a a time that is clearly, like, pre, you know, all that happened in LLMs.

Speaker 253:06 - 53:21

我也有类似的感觉。是的。而且我还想多说一点，对吧？比如说，如果你——我的意思是，即便是对这个领域里、对这些 model 和 neural net（神经网络）整体都很熟的人来说，如果是在七年前——我是故意用一个很明显还处在 LLMs 发生这一切之前的时间点。

Speaker 253:21 - 53:35

If seven years ago I had to experiment with a model that we have currently, would I have declared this is AGI? Right? Like, I mean and and I would say probably yes. I mean, it's An ever moving definition. It is very impressive, the the the progress.

Speaker 253:21 - 53:35

如果七年前我有机会体验我们现在拥有的这种 model，我会不会直接宣布这就是 AGI？对吧？我的意思是——我大概会说，会。我的意思是，这是一个不断变化的定义。但这个进展确实非常令人印象深刻。

Speaker 253:35 - 54:06

Yeah. So so I think just because now we're seeing it closer, and it it is a good thing to be more ambitious about, like, what what it is that we're building. But, again, based on sets different definitions or perhaps even the expectations we might have had about what AGI meant even only a few years ago, I would say in some way AGI is here. Right? So, I mean, all I'm saying is like, I don't think it is here in the way I want to see it, but it is fairly close.

Speaker 253:35 - 54:06

是的。所以我觉得，只是因为我们现在离它更近了，我们当然也应该对自己正在构建的东西更有雄心，这是好事。但话说回来，基于不同的一套定义，或者甚至基于我们几年前对 AGI 含义的预期，我会说，从某种意义上讲，AGI 已经在这里了。对吧？所以，我的意思只是，我不认为它已经以我想看到的那种方式到来，但它已经相当接近了。

Speaker 254:07 - 54:22

And maybe this ability for the models to truly learn from experience is what is missing in my mind, but everyone will have their own kind of test or or bias, I guess, onto what the models still feel like capability gaps exist.

Speaker 254:07 - 54:22

也许在我看来，model 真正从经验中学习的这种能力，才是目前所缺失的部分；但每个人都会有自己的一套测试标准，或者说某种偏向，用来判断这些 model 还存在哪些 capability gaps（能力缺口）。

Speaker 154:22 - 54:45

And we'll get there, and then we'll we'll move the goalpost again and have some other some other reason. Yeah. I think one huge advantage that you all have is is, you know, certainly incredibly bullish on on the models you're building. You have, you know, your own hardware. And I think a question a bunch of I I know my listeners will have in the back of their heads, so I'll ask it is, I think one thing that you've done that a bunch of people were were curious to better understand was, you know, taking some of the compute you have and selling it to Anthropic.

Speaker 154:22 - 54:45

而等我们走到那一步之后，我们又会再次移动门槛，然后再找出别的理由。是的。我觉得你们一个很大的优势在于，你们显然对自己正在构建的 model 非常 bullish（看好）。你们还有自己的 hardware。而我想一个很多人——我知道我的听众脑子里都会有的——问题，我就直接问了：我觉得你们做过的一件事，让很多人都很想更好理解，就是你们把自己手里的一部分 compute（算力）拿出来卖给了 Anthropic。

Speaker 154:45 - 54:55

Right? And I think there's been this narrative on Twitter of, well, if you were so bullish on models and the research, like, why not just keep all the compute yourself? And so I'm sure our listeners would just love to hear your your perspective on that.

Speaker 154:45 - 54:55

对吧？我觉得 Twitter 上一直有一种说法：如果你们真的这么看好 model 和 research（研究），那为什么不把所有 compute 都留给自己用？所以我相信我们的听众一定会很想听听你对此的看法。

Speaker 254:56 - 55:14

Yeah. How to invest kind of compute even within, you know, like, ourselves. I'd like the the compute is used to serve. We train small models, even smaller models, then trying to train frontier models. I think this is all, like, a an you know, a fine equation to to balance.

Speaker 254:56 - 55:14

是的。即使在我们自己内部，怎么去投资这类 compute 也是个问题。比如，这些 compute 也会被用来做 service（服务）。我们会训练 small models，甚至更小的 models，而不是一味尝试训练 frontier models（前沿模型）。我觉得这整体上都是一个需要权衡的精细方程。

Speaker 255:14 - 56:06

And I think just in general, like, one way to think about Alphabet is is there are things that create revenue and economical impact that then you can reinvest. So it's not just being greedy about, hey. What should we do now and take all these things together, and and that's it. Right? I think the the strategy, which, again, like, I I I kind of think often about, is just multipronged, and and I think the timelines, although we are bullish, of course, on on the technology advancing, you just think of, like, you know, revenue streams and so on, and I think hardware is a very important asset, and I think, you know, there's probably a trade off in which you don't use it all, but use use it strategically to create you know, to reinvest it, basically.

Speaker 255:14 - 56:06

而且我觉得，更广义地看 Alphabet 的一种方式是：有些事情会创造 revenue（收入）和经济影响，而这些又可以被重新投入。所以这并不只是贪心地想着，嘿，我们现在该做什么，然后把所有这些东西都攥在手里，仅此而已。对吧？我认为，这种 strategy（策略）——说实话这也是我经常会想的——本来就是多管齐下的；而且虽然我们当然非常 bullish 于技术的推进速度，但你还是会去考虑 revenue streams（收入来源）之类的因素。我认为 hardware 是一个非常重要的 asset（资产）；而且我觉得，其中大概存在一种 trade-off（权衡）：不是把它全部自己用掉，而是有策略地使用它，本质上是为了创造可以再投入的东西。

Speaker 256:06 - 56:25

Right? And I think that's what the current what makes what seems to make sense. And, I mean, the the the the obviously, calculations behind these are complex. Right? So I'm I'm not gonna enter into exactly the rationale, but I think, in general, it's just a strategic choice to have different levels of investment and timelines in mind.

Speaker 256:06 - 56:25

对吧？我觉得这就是当前看起来合理的做法。我的意思是，这些决策背后的计算显然很复杂，对吧？所以我不打算具体展开其中的原理，但我想，总体来说，这就是一种战略选择：在投资力度和时间线方面预先设定不同层级的安排。

Speaker 156:25 - 56:42

What's so interesting about your position is you are like the like the only frontier model provider with your your own, like, cutting edge, you know, or or state of the art chip. What is that, like, collaboration actually look like? Because it's it's such a unique motion. Right? I mean, obviously, NVIDIA works closely with other labs, but, like, they're not sitting under the same company.

Speaker 156:25 - 56:42

你这个职位特别有意思的一点在于，你们几乎是唯一一家既是 frontier model（前沿模型）提供方、又拥有自己 cutting edge（尖端）或者 state-of-the-art（最先进）chip（芯片）的公司。这种 collaboration（协作）实际运作起来是什么样的？因为这真的是一种非常独特的模式，对吧？我的意思是，NVIDIA 显然也会和其他 lab（实验室）密切合作，但它们毕竟不是处在同一家公司之下。

Speaker 156:42 - 56:44

And so what does that look like when it works really well?

Speaker 156:42 - 56:44

那么，当这种协作运转得特别好的时候，具体会是什么样子？

Speaker 256:44 - 57:09

As I was explaining before, I I I reflected on several moments. Right? And and this was early days. I mean, even even deep learning internally at Google had to be still proven out. But I remember it must have been 2013, maybe 2014, where a bunch of us, I think it was me, Jeff Hinton, Jeff Dean, and Ilya that were in a room trying to decide, hey.

Speaker 256:44 - 57:09

就像我之前解释过的，我回想起了几个时刻，对吧？而且那还是很早期的时候。我的意思是，哪怕是在 Google 内部，deep learning（深度学习）当时也还需要进一步被证明可行。但我记得那大概是 2013 年，也可能是 2014 年，当时我们几个人——我想有我、Jeff Hinton、Jeff Dean 和 Ilya——在一个房间里，试图决定一件事：就是说，

Speaker 257:09 - 57:37

What what should the servers have? I mean, how many at the time, we had obviously some CPUs, some GPUs, and you're trying to make a guess based on what you know about the research, where the models are going, and you you can literally have that impact. But, of course, there's delayed rewards because, like, this this is just an investment, and only in, you know, a few months, if if if not years, this will materialize in in data centers. So I've been sort of and I thought that was amazing. Right?

Speaker 257:09 - 57:37

server（服务器）到底应该配什么？我的意思是，当时我们显然已经有一些 CPU，也有一些 GPU，而你要根据自己对 research（研究）的理解、对模型将走向何方的判断来做猜测，而你确实可以直接产生那种影响。但当然，这种回报是延迟的，因为这本质上只是一次 investment（投资），而且只有在几个月之后，甚至不是几个月而是几年之后，它才会在 data center（数据中心）里真正落地。所以我当时某种程度上一直在参与这类事情，而且我觉得那真的很了不起，对吧？

Speaker 257:37 - 58:16

Like, I mean, obviously, hard to answer the question. I think we tried to I mean, you tried to predict what's gonna happen in research, and I mean, in the early days, that was even harder. But I think it's a very privileged position to be able to really influence, and we certainly do that and and especially with with with Jeff who's who's obviously been in the thinking about the infrastructure quite a bit for many you know, for basically the existence of Google. It's very interesting to then think about, hey. Like, these models are going this way, and then these investments because they have certain latency, you know, being under the same roof and seeing what we see just really, really helps.

Speaker 257:37 - 58:16

这个问题显然不太容易回答。我的看法是，我们当时是在努力——也就是你会努力去预测 research（研究）接下来会发生什么，而在早期，那就更难了。但我觉得，能够真正施加影响，是一种非常难得的位置；而我们确实也在这么做，尤其是和 Jeff 一起——他显然很多年都在深入思考 infrastructure（基础设施），基本上可以说贯穿了 Google 存在的整个时期。于是你就会很有意思地去想：这些模型正在朝这个方向发展，而这些 investments（投资）又因为本身存在一定的 latency（时滞）；在同一个屋檐下、能直接看到我们所看到的东西，真的会带来非常非常大的帮助。

Speaker 258:16 - 58:34

And again, I've seen it in the very scrappy early days, and it keeps happening and getting better. And, of course, and certainly in some way, it reduces, which makes the job easier, but still a fascinating choice that has, you know, deep consequences on on then the, you know, the the the faith of the the company, etcetera.

Speaker 258:16 - 58:34

而且，我在早期那种非常 scrappy（资源有限但执行力很强）的阶段就已经见过这种情况，后来它一直在发生，而且越来越好。当然，从某种意义上说，这也会减少一些不确定性，让这份工作更容易一些；但它依然是一种非常值得玩味的选择，而且会对公司随后的发展轨迹，乃至公司的命运等，产生深远影响。

Speaker 158:34 - 58:49

Well, this has been a a fascinating conversation. I feel like I could talk to you for a for a long time, but I'd be delaying our our our path toward AGI. And so, maybe I just wanna make sure to leave the the last word to you. Anything you'd like to share with our listeners or research you'd to point them to, anything in IO, the floor is yours.

Speaker 158:34 - 58:49

这真是一场非常精彩的对话。我觉得我还能和你继续聊很久，但那样我可能就会拖慢我们迈向 AGI 的进程了。所以，也许最后我想把最后的话留给你。有没有什么想和听众分享的，或者有什么 research（研究）想推荐他们去看看，或者在 IO 方面有什么想说的？现在交给你。

Speaker 258:49 - 59:07

I think it's a fascinating time as an anything in AI. So if you're a user, use the models. If you're a builder, use the models to build anything you do, even if you think there's no remote connections to to AI. So please, you know, play play with these models. They're amazing, and they will only get better.

Speaker 258:49 - 59:07

我觉得，现在无论你在 AI 领域做什么，都是一个令人着迷的时期。所以，如果你是用户，就去使用这些模型。如果你是 builder（构建者），那就用这些模型来构建你做的任何东西，即使你觉得它和 AI 根本没什么关联。所以，拜托了，去多玩玩这些模型吧。它们很惊人，而且只会越来越好。

Speaker 159:07 - 59:23

Awesome. Well, thank you so much. It's been an awesome conversation. I'm Jacob Efron, and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models, and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this.

Speaker 159:07 - 59:23

太棒了。非常感谢你。这次对话非常精彩。我是 Jacob Efron，这里是 Unsupervised Learning，一档 podcast（播客）。在这里，我可以和 AI 领域最聪明的人交流，向他们问很多关于模型领域正在发生什么，以及这对现实世界中的企业意味着什么的问题。希望大家已经能看出来，我做这件事真的非常开心。

Speaker 159:23 - 59:40

It's a nights and weekends project, in addition to my day job as an investor at Redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work. And so please consider doing that, and thank you so much for your support and listening. We'll see you next episode.

Speaker 159:23 - 59:40

这是一个我在 Redpoint 担任投资人这份日常工作之外，利用晚上和周末时间做的项目。但我们之所以能请到这些了不起的嘉宾，确实离不开像你这样订阅 podcast、并把它分享给朋友的人。归根结底，正是这些支持，才让整件事能够运转下去。所以，也请你考虑这样做，非常感谢你的支持和收听。我们下期见。

原文 ↗https://www.youtube.com/watch?v=NQczevdpxq0