🎙 播客Unsupervised Learning· 2026 年 6 月 3 日· 13,059 词 · 约 65 分钟

Ep 89: AI Research Legend’s Honest Assessment of Where We Are

SPACE 播放 / 暂停←→ 上一句 / 下一句

Speaker 100:00 - 00:04

Is reasoning enough to get to generalization, or is another method needed?

Speaker 100:00 - 00:04

仅靠 reasoning（推理）就足以实现 generalization（泛化）吗，还是还需要另一种方法？

Speaker 200:04 - 00:08

It does feel like there is something else that possibly could generalize much better.

Speaker 200:04 - 00:08

确实感觉还有别的东西，可能会更好得多地实现泛化。

Speaker 100:08 - 00:12

Why do you think Anthropic was the first to be, like, really successful on the coding side?

Speaker 100:08 - 00:12

你为什么认为 Anthropic 是第一个在 coding（编程）这件事上真正做得非常成功的公司？

Speaker 200:12 - 00:21

Anthropic made this very good decision to focus on coding. OpenAI was like, we're doing chat GPT. Partway Anthropic made this decision was that they just could not compete.

Speaker 200:12 - 00:21

Anthropic 做了一个非常正确的决定，就是聚焦 coding。OpenAI 当时更像是在说，我们要做 chat GPT。Anthropic 作出这个决定的部分原因是，他们根本无法竞争。

Speaker 100:21 - 00:26

What's your kind of gut intuition on the gap we'll see between closed source and open source models and whether that widens or shrinks in the next few years?

Speaker 100:21 - 00:26

对于 closed source（闭源）和 open source（开源）模型之间的差距，以及未来几年这个差距会扩大还是缩小，你的直觉判断大概是什么？

Speaker 200:26 - 00:28

I think it's a fair question, but

Speaker 200:26 - 00:28

我觉得这是个很合理的问题，但是

Speaker 100:28 - 00:56

Lucas Kaiser is one of the authors of the transformer paper and has had amazing roles at both Google and OpenAI. On unsupervised learning, I got to ask him all the top of mind questions of what's happening in AI today. Of course, we had to talk about the transformer and how he thinks about its persistence and whether it will remain the dominant architecture and what its shortcomings are. We also got his thoughts on what changed in the fall to really make coding models so much better and why Anthropic was really first to code. We talked about what the future research directions that he's really excited about.

Speaker 100:28 - 00:56

Lucas Kaiser 是 transformer 论文的作者之一，也曾在 Google 和 OpenAI 担任过非常重要的职位。关于 unsupervised learning（无监督学习），我有机会把当下 AI 领域最萦绕在脑中的那些问题都问了他。当然，我们也谈到了 transformer，以及他如何看待它持续存在的原因、它是否会继续作为主导架构存在、还有它有哪些不足。我们也聊到了，到了秋天究竟发生了什么，才让 coding 模型真正变得好得多，以及为什么 Anthropic 在 coding 上确实是最先做出来的。我们还谈了他真正感到兴奋的未来研究方向。

Speaker 100:56 - 01:20

And we also hit on a bunch of things around how he thinks the ecosystem will evolve from open versus closed source model to application companies. I think folks will really enjoy this episode with a top researcher whose research really set off a lot in the space. Without further ado, here's Lucas. It's it's a pleasure to have a transformer paper coauthor on the podcast. I feel like you've been at the forefront of of of so many major changes in the AI world.

Speaker 100:56 - 01:20

我们还谈到了很多关于他如何看待整个生态系统演化的问题，从 open source（开源）与 closed source（闭源）模型之争，到 application companies（应用公司）。我想大家会非常喜欢这一期节目，因为嘉宾是一位顶尖研究者，而他的研究确实在这个领域引发了很多变化。闲话少说，下面有请 Lucas。很高兴能在播客里请到 transformer 论文的共同作者。我感觉你一直站在 AI 世界许多重大变革的最前沿。

Speaker 101:20 - 01:27

And, you know, our goal is really to get your thoughts on on all the questions around the AI frontiers today. So I really appreciate you coming on the podcast.

Speaker 101:20 - 01:27

而且，你知道，我们的目标其实就是听听你对当下 AI 前沿领域所有这些问题的看法。所以非常感谢你来参加这期播客。

Speaker 201:27 - 01:29

Thank you very much. Thank you for having me.

Speaker 201:27 - 01:29

非常感谢。谢谢你邀请我。

Speaker 101:29 - 01:51

I can think of kind of no better place to start than generalization. Right? It feels like that's the the question in the air right now. And I think in November, I heard you say, you know, basically, the big this big question of is reasoning enough to get to generalization, or is another method needed? I'm wondering, I guess you said that, you know, maybe six months ago now, which is, you know, dog ears in AI world, so years ago.

Speaker 101:29 - 01:51

我觉得，没有比从 generalization（泛化）开始更好的切入点了，对吧？感觉这就是现在大家都在讨论的那个问题。我记得在 11 月的时候，我听你说过，大意是：这个重大的问题在于，reasoning（推理）是否足以通向 generalization，还是说还需要另一种方法。我在想，毕竟那已经是大概六个月前你说的话了，而在 AI 的世界里，六个月简直像狗年一样，等于已经过去好多年了。

Speaker 101:51 - 01:54

How is your thinking on that that question evolved since then?

Speaker 101:51 - 01:54

从那以后，你对这个问题的看法是如何演变的？

Speaker 201:54 - 02:28

If we we take the current transformers with reasoning, right, and and and agents and they have access to a shell and and and stuff, they can do amazing things. Right? It's incredible how far we've gotten, like, two years ago even, not to mention before transformers. I would have never believed that, you know, you just take this next word predictor, give it then chain of thought and a r l dot and tools, and that it will, I know every day spend hours talking to Codex, in my case, or other people just and it works. Right?

Speaker 201:54 - 02:28

如果我们看现在的 transformers，配上 reasoning，再加上 agents，并且让它们能够访问 shell 之类的东西，它们已经能做出非常惊人的事情了，对吧？我们已经走了这么远，真是不可思议——哪怕只是和两年前相比，更不用说 transformers 出现之前了。我以前绝不会相信，你知道，只是把这个“下一个词预测器”拿来，再给它 chain of thought、RL 之类的东西以及工具，它居然就能做到这些。拿我自己来说，我现在几乎每天都会花好几个小时和 Codex 交流，或者其他人也是，而且它确实能用，对吧？

Speaker 202:28 - 03:03

You you talk to it about hard problems at work, and it makes sense, and it implements things. And So that's incredible. On the other hand, there is this feeling that it is not quite us, that it's not quite at the edge of what's that we we all feel that it possibly should be even better. Right? That that we can generalize from less data, like, somehow make, like, bigger leaps, get these concepts from way less.

Speaker 202:28 - 03:03

你会跟它讨论工作中的难题，它的回答是讲得通的，而且还能把东西实现出来。所以这确实很惊人。另一方面，又总有一种感觉：它还不太像我们，它还没有真正到达那个边界——我们都觉得，它本来或许应该还能更好，对吧？也就是说，我们可以用更少的数据来实现 generalize（泛化），可以以某种方式完成更大的跃迁，用少得多的信息就获得这些概念。

Speaker 203:04 - 03:25

I recently have this saying that, you know, people say, like, Americans will do the right thing after exhausting all other options, and, like, LLMs, they will learn the concept. They will learn it. But after exhausting all other options. You need this trillion tokens. You need to, like, learn all the surface level things, and only when that doesn't explain something, they will finally learn the concept.

Speaker 203:04 - 03:25

我最近常说一句话：人们常说 Americans 会在把其他所有选项都试过之后，最终做出正确的事情；而 LLMs 也是这样，它们最终会学会那个概念，会学会的——但前提是先把其他所有可能性都试尽。你需要 trillion tokens；你得先把所有那些表层层面的东西都学一遍，只有当这些仍然无法解释某件事时，它们才终于会学到那个概念。

Speaker 203:26 - 03:46

That's not how we learn. We just get concepts from, like sometimes we make them up and they're not great. But so so it does feel like there is something else that possibly could generalize much better, that could possibly have this, like, a bit of a different form of understanding, more like long term. But it's a feeling. Right?

Speaker 203:26 - 03:46

我们可不是这样学习的。我们会直接获得概念；有时候甚至是自己构造出来的，虽然未必都很好。但确实会让人觉得，可能还存在某种别的东西，能在 generalize（泛化）这件事上做得好得多，可能会拥有一种稍微不同的 understanding（理解）形式，更像是长期性的那种。不过，这目前更多还只是一种感觉，对吧？

Speaker 203:46 - 04:12

And every time we try to put up our thumb on it, it seems to evaporate or more like maybe it doesn't even evaporate, but it's like the transformer just catches up. Right? It's it was like so so so so both sides in this time have grown. Right? Transformers have gotten even better, but the case for something else has also gotten even better, I would say.

Speaker 203:46 - 04:12

而且每次我们试图把这件事按住、把它说清楚时，它似乎就会蒸发；或者更像是，它甚至不是蒸发，而是 transformer 很快又追了上来。对吧？就像这一次两边都在增长。对吧？transformer 变得更强了，但我会说，支持“别的东西”的理由也同样变得更强了。

Speaker 204:14 - 04:37

There's now like a number of labs that pursue post transformers, and and people see interesting results. There are certainly interesting things out there. So so, you know, who wins? I I still don't know, to be honest. I I think there's good arguments for both sides, and and it will be extremely interesting to see how this goes.

Speaker 204:14 - 04:37

现在已经有不少 lab 在探索 post-transformer（后 transformer）路线，而且人们也看到了一些有趣的结果。外面确实有一些很有意思的东西。所以，你说最后谁会赢？老实说，我现在还是不知道。我觉得两边都有很好的论据，而看到这件事最终会怎么发展，将会非常有意思。

Speaker 104:37 - 04:56

I think it'll be interesting for our listeners. You know, you you obviously, I think at at at at talking near con more recently, like, alluded to this, whiff in the air. Right? That there's something that that that's happening in progress that's inspired, like, these neo labs and other folks to spin out and, you know, work on things that are maybe alternatives to the dominant architectures that are being worked on within the labs. What is that feeling?

Speaker 104:37 - 04:56

我觉得这对我们的听众来说会很有意思。你知道，我想你最近显然也在一些谈话里暗示过这一点：空气中好像有一种隐约的迹象。对吧？好像有某种正在发生的进展，激发了像这些 neo labs 以及其他一些人 spin out（拆分创业）出来，去做一些也许是对当前各大 lab 内部主流架构的替代方案。那种感觉到底是什么？

Speaker 104:56 - 05:03

Is it seeing some of these early results, or or what is it like or is it just like researchers' intuition? Like, maybe making it a little more concrete for for our listeners.

Speaker 104:56 - 05:03

是因为看到了这些早期结果吗，还是说那到底更像什么，或者它只是一种研究者的直觉？也许可以把它说得更具体一点，方便我们的听众理解。

Speaker 205:03 - 05:26

I think a lot of this is intuition. And, you know, you need to be aware because it's like a lot of this happens in San Francisco at parties and, like, people talk to each other. So it may be or, like, on podcasts. So so it may be that it's self fueled to some extent. It it's but But I think there is a part of it that's very fundamental.

Speaker 205:03 - 05:26

我觉得这里面很大一部分是直觉。而且，你也得意识到，这里面很多事情都发生在 San Francisco 的派对上，人们彼此交谈。也可能是在 podcast 上。所以某种程度上，它也许会有一点自我驱动、自我强化的成分。但我觉得，其中确实也有一部分是非常根本性的。

Speaker 205:26 - 06:12

I mean, Jan LeCun has been saying something like this for years, way before now, which is the models we have in in the long, long history, they were meant they're called neural networks because they were meant to imitate our brain, but they don't really. They they were quite different even if they may have some similarities. Right? And if you look at how humans learn, what what we can do, it it is quite hard not to say that from matchless data, we can do much more than our current models. So it feels like there is this fundamental ability that we as learning machines have that our models currently don't.

Speaker 205:26 - 06:12

我的意思是，Jan LeCun 多年来——远在现在之前——就一直在说类似的话：我们现在手里的这些模型，从很长很长的历史脉络来看，它们之所以被称为 neural networks（神经网络），是因为它们原本是要模仿我们的大脑，但它们其实并没有真正做到。它们其实相当不同，哪怕它们也许有一些相似之处。对吧？如果你看看人类是怎么学习的，看看我们能做什么，你很难不承认：仅凭少得无法相比的数据，我们能做到的事比当前模型多得多。所以会让人感觉，我们这些学习机器天生具备某种根本能力，而当前的模型还没有。

Speaker 206:12 - 06:43

Fundamentally, there should be something there, not just a vibe. Now you can say as a counterargument that these models always had a trillion tokens to train on and people never do, so we just didn't optimize them for training with less. And if you, you know, if you had the same amount of compute but limited data, you can tweak transformers to do much better than than they do today. So so so, you know, it's like some people say, why why would you? Right?

Speaker 206:12 - 06:43

从根本上说，那里应该确实有某种东西存在，而不只是一种 vibe（感觉）。当然，反方论点会说，这些模型一直以来训练时都有 trillion tokens 可用，而人类从来没有，所以只是我们没有把它们优化到“用更少数据训练”这个方向上。假如你有同样数量的 compute（算力），但数据受限，那么你是可以调整 transformer，让它们比今天表现得好得多。所以，你知道，有些人会说，为什么非要这么做呢？

Speaker 206:43 - 07:11

We have the data. Now it's a big enterprise, but it does feel, even when we try to push with as little data as people Well, it's also like we get a lot of data from visual things, from moving in the world, we take actions. It's very different kinds of data. It's not truly comparable. That's why it's hard to make a very firm scientific statement about it.

Speaker 206:43 - 07:11

我们有数据。现在这已经是一个庞大的 enterprise（系统性工程），但即便如此，当我们尝试只用尽可能少的数据去推进时，感觉仍然是——人类获得的数据也很不一样：我们从视觉事物中获取大量数据，通过在世界中移动来获取数据，我们还会采取行动。这些都是非常不同类型的数据，没法真正直接比较。这也是为什么很难就此做出一个非常坚实的科学判断。

Speaker 207:13 - 07:34

There is this feeling that we have not exploited all that is there in machine learning. Obviously, the exciting feeling is that maybe if we find out what's out, it could make what we have even more amazing. Maybe not. Maybe maybe it vanishes when you have that much data. Who knows?

Speaker 207:13 - 07:34

有一种感觉是，我们还没有把 machine learning（机器学习）里现有的一切都充分挖掘出来。显然，令人兴奋的感觉在于，也许如果我们把其中“还没被发现的东西”找出来，就能让我们现在已有的成果变得更加惊人。也可能不会。也许当你有那么多数据时，那些东西反而会消失。谁知道呢？

Speaker 207:34 - 07:45

Right? But but it's definitely extremely interesting. To me as a as a researcher, and I think to many people, it's like I mean, transformers were fascinating. Right? They're they're great.

Speaker 207:34 - 07:45

对吧？但这肯定是极其有意思的。对我这个 researcher（研究者）来说，我觉得对很多人也是这样——我的意思是，transformers 真的很迷人。对吧？它们确实很棒。

Speaker 207:45 - 08:11

Reasoning is I mean, it can solve research math problems. I find I'm sure you've heard about the recent Erdos things, of course. I I was a mathematician before in my life, so this is extremely exciting. I never thought a computer in this timeframe will talk to me about mathematics at a high level as a real researcher. That exists now, and this is insane.

Speaker 207:45 - 08:11

至于 reasoning（推理），我的意思是，它已经能解决研究级的数学问题了。我想你肯定听说过最近那些 Erdos 的事情，当然。我以前的人生阶段是个数学家，所以这让我非常兴奋。我从没想过，在这个时间尺度上，computer（计算机）竟然能像真正的 researcher 一样，在高水平上和我讨论数学。现在这已经存在了，这太疯狂了。

Speaker 208:11 - 08:26

But then as an ML researcher, I'm like, okay. But we haven't really figured out this learning. There is this feeling, right, that that it learns certainly, but it needs so much data. It needs so much compute. This this feels like we're not quite there yet.

Speaker 208:11 - 08:26

但接着，作为一名 ML researcher（机器学习研究者），我会想，好吧，可我们其实还没有真正搞明白这种 learning（学习）。会有这种感觉，对吧？就是它当然是在学习，但它需要那么多数据，需要那么多 compute（算力）。这让人觉得我们还没有真正到位。

Speaker 208:26 - 08:37

Now is this only a feeling? Is this a vibe? It it seems to be reality to some extent. Right? But but we'll need to we'll need to see.

Speaker 208:26 - 08:37

现在，这只是一种感觉吗？只是一种 vibe（氛围感）吗？看起来在某种程度上，这似乎确实是现实。对吧？不过我们还得继续观察。

Speaker 208:37 - 08:39

The research appeal of of figuring that

Speaker 208:37 - 08:39

从研究角度看，弄清楚这件事的吸引力

Speaker 108:39 - 08:55

out makes a ton of sense, and I think other folks might look at it and be like, well, so what if it's not like people? Right? Like, we have the data. We have a a method that works. You know, obviously, there's gonna be some areas where there is limited data, like, you know, medic like, you know, drug development and other things where learning from more limited data would be really helpful.

Speaker 108:39 - 08:55

是非常说得通的，而且我觉得另一些人可能会看着它然后说，那又怎样？就算它不像人，又怎么样？对吧？我们有数据。我们有一种有效的方法。你知道，显然会有一些领域数据是有限的，比如说 medic——比如 drug development，以及其他一些事情，在这些领域里，从更有限的数据中学习会非常有帮助。

Speaker 108:55 - 09:04

But so many problems that exist in the world actually aren't that data constrained. Right? Sometimes I feel like these sides almost, like, talk past each other. Right? Like, people at the labs will roll their eyes at Yan Lakun or something like that.

Speaker 108:55 - 09:04

但是，世界上存在的很多问题其实并没有那么受 data（数据）约束。对吧？有时候我觉得，这两边几乎像是在各说各话。对吧？比如 lab（实验室）里的人会对 Yan Lakun 之类的人翻白眼。

Speaker 209:04 - 09:51

I think this is fair to say. But on the other hand, given how how quickly and with the whole investment in AI, the problems that are not data limited get solved very rapidly. So very soon, all bottlenecks that remain will be quite data limited or already are becoming. In particular, it does feel that to work well in the physical world, you do need to solve some part of it at least, because the physical world, if you train on one robot hardware, it it's not doesn't quite scale data the way that the virtual or text worlds or or Internet worlds do. So and the physical world is a is a sizable chunk of Yeah.

Speaker 209:04 - 09:51

我觉得这么说是公平的。但另一方面，考虑到 AI 的投入规模以及它发展的速度，那些不受 data（数据）限制的问题会被非常快地解决。所以很快，剩下的所有瓶颈都会相当受 data 限制，或者已经正在变成这样。尤其是，确实会让人感觉到：如果要在物理世界里很好地工作，你至少需要解决其中一部分问题，因为物理世界里，如果你只在一种 robot hardware 上训练，data 的可扩展性确实不像虚拟世界、文本世界或者 Internet 世界那样。所以，物理世界确实是相当大的一块。

Speaker 109:51 - 09:56

So people are certainly trying, right, with simulation data and with egocentric video data or cheaper sources. Yeah.

Speaker 109:51 - 09:56

所以人们当然也在尝试，对吧？比如用 simulation data（仿真数据）、egocentric video data（第一人称视角视频数据）或者更便宜的数据来源。是的。

Speaker 209:56 - 10:04

I mean, know, I'm a huge fan of Waymo's. Right? They I I have always this joke. Like, people say, where are my self driving cars? Well, I drive them.

Speaker 209:56 - 10:04

我的意思是，你知道，我是 Waymo 的超级粉丝。对吧？我一直有个玩笑：人们会说，我的自动驾驶汽车在哪里？嗯，我就在开它们。

Speaker 210:04 - 10:31

They're here. But then they just canceled the highway driving, right, because they couldn't deal with some construction zone again. It feels almost like you know, they have had this construction zone things for years in this. I'm sure just like millions of miles in simulation and quite some in real driving, and it still can't generalize to the construction zone on a highway. This feels just off.

Speaker 210:04 - 10:31

它们就在这里。但后来他们不是又取消了 highway driving（高速公路驾驶）吗？对吧，因为他们还是处理不了某个 construction zone（施工路段）。这几乎让人感觉像是——你知道的——他们这些 construction zone 的问题都已经存在很多年了。我敢肯定，他们在 simulation 里已经跑了几百万英里，真实驾驶里也积累了相当多的数据，但它仍然没法泛化到高速公路上的施工路段。这种感觉就很不对劲。

Speaker 210:31 - 10:52

I don't know what exactly didn't work there, but I certainly know no teenager has this problem or no human. We have many other problems, not that we can drive in a construction zone in the city, but not on a highway. That just construction zone is a construction zone. And do

Speaker 210:31 - 10:52

我不知道那里面具体是什么没有起作用，但我很确定，没有哪个 teenager（青少年司机）会有这个问题，或者说没有哪个正常人会有这个问题。我们有很多别的问题，不是说我们能在城市里的施工路段开车，却不能在高速上开。施工路段就是施工路段。而且——

Speaker 110:52 - 11:03

you think that, like, some of this stuff, you know, will be you know, or or could be solved within the, you know, within the transformers? And, like, I guess, like, what would you what are what are you kind of looking for, I guess, you know, in in the next few years to to get a better answer to this question?

Speaker 110:52 - 11:03

你觉得，像这类东西，是否会——或者说是否可能——在 transformers（Transformer 架构）之内被解决？然后我想问的是，接下来几年里，你会关注什么，来更好地回答这个问题？

Speaker 211:04 - 11:29

The exciting part in in ML research is that it is so broad. Right? You never know whether you need to tweak the architecture or do you need to tweak the data or do you need to tweak the loss or do you need to tweak the optimization process. And they're fair arguments for all. And on top of that, it might turn out that you need to tweak all of them to some extent.

Speaker 211:04 - 11:29

ML（机器学习）研究里最让人兴奋的一点，就是它的范围实在太广了。对吧？你永远不知道你需要调整的是 architecture（架构）、data（数据）、loss（损失函数），还是 optimization process（优化过程）。而且这些方向各自都有充分的理由。除此之外，最后也很可能会发现，你需要把它们每一个都在某种程度上调整一下。

Speaker 211:29 - 11:43

Right? It's it's like transformer is great, but it's also great with the next word prediction loss. Right? Or you can make it work with RL, but but you need the chains of thought. Or it's like these puzzles only work when you click them together.

Speaker 211:29 - 11:43

对吧？就像 transformer 很棒，但它和 next word prediction loss（下一个词预测损失）搭配起来也很棒。对吧？或者你也可以让它和 RL（强化学习）一起工作，但那样你就需要 chains of thought（思维链）。又或者说，这些谜题只有在你把它们一个个卡合到一起时才会奏效。

Speaker 211:43 - 12:04

So it is possible that the the like, if there is a new thing that there might need to be tweaks to everything. But it's also possible that that parts of transformers will survive, for example. Probably attention will be somewhere there. Right? But maybe you need other things to it.

Speaker 211:43 - 12:04

所以，确实有这种可能：如果出现了某种新东西，那么可能整个体系都需要做一些 tweaks（调整）。但也同样可能，transformers 的某些部分会保留下来，比如 attention（注意力）大概还会在某个位置。对吧？只是也许还需要在它上面再加别的东西。

Speaker 212:04 - 12:33

Yeah. Like, maybe I've started my machine learning life with RNNs, and and I certainly hold recurrence deep in my heart. I I like it as a construct. It feels and reasoning brought it back because every new token you produce, it's the same weights that we currently produce it. So in some sense, it's back, but it does feel that this RL is very sparse losses and you do so much.

Speaker 212:04 - 12:33

对。比如说，我的 machine learning 生涯是从 RNNs 开始的，而且我内心深处确实很认同 recurrence（递归/循环）这个思路。我喜欢把它当作一种构造。它给人的感觉是——而且 reasoning 又把它带回来了——因为你每生成一个新 token，当前用来生成它的其实还是同一套 weights（权重）。所以从某种意义上说，它又回来了；但也确实会让人感觉，这种 RL（强化学习）里的 loss（损失）非常稀疏，而你却要做这么多事。

Speaker 212:34 - 12:55

But it works. Every time we try to do recurrence in other ways, it somehow does not seem to click yet. But but but there is always the question, how hard have we tried? Right? It's I I I don't know if if you or the audience knows there are models like TRM and HRM.

Speaker 212:34 - 12:55

但它确实有效。每次我们尝试用其他方式去做 recurrence，不知为什么，到目前为止似乎就是还没真正奏效。不过，不过，不过问题总是存在：我们到底尝试得有多充分？对吧？我不知道你或者听众是否了解，已经有一些模型，比如 TRM 和 HRM。

Speaker 212:55 - 13:12

It's like very small models that turned out to do very well on problems like Sudoku, but also RKGI. So they're a little bit toy tests, but they do quite well. I think a lot of the post transformer architectures are trying to merge this with LLMs. It's interesting, certainly. Right?

Speaker 212:55 - 13:12

这些模型都非常小，但结果是在像 Sudoku 这样的问题上表现得非常好，也包括 RKGI。所以它们多少有点像 toy tests（玩具式测试），但表现确实相当不错。我觉得很多 post-transformer 架构都在尝试把这类思路和 LLMs 融合起来。这当然很有意思，对吧？

Speaker 213:12 - 13:48

It's like the pure transformer can't do so well on it, but you add some recurrence, you add some bit of architectural tweaks, maybe a little different loss, it does really well. So even on the small scale, you can do a lot, but then will it generalize to the language and give you the things you want? It will be very interesting to see. Luckily, there is a number of labs that are trying. The other thing though is this year we have the agents.

Speaker 213:12 - 13:48

有点像是，纯 transformer 在这类任务上做得没那么好，但你加上一些 recurrence，再加一点架构上的 tweaks（调整），也许再换一种 loss，它就会表现得非常好。所以即便在小规模上，你也能做很多事；但它是否能泛化到 language，并给你想要的那些能力，就非常值得观察了。幸运的是，现在已经有不少 labs 在尝试。另一个点是，今年我们有了 agents。

Speaker 213:48 - 13:59

Me, this is a totally it's the biggest change in the way I work as a ML researcher in, I would say, the last fifth, twenty years probably.

Speaker 213:48 - 13:59

对我来说，这完全是——我会说，这是过去差不多十五、二十年里，作为一名 ML researcher，我工作方式上最大的变化。

Speaker 114:00 - 14:03

I don't know if you try to quantify it, but how much more productive do you think it makes you?

Speaker 114:00 - 14:03

我不知道你有没有试着量化过，但你觉得它大概让你的生产力提高了多少？

Speaker 214:04 - 14:43

Oh, I I I can I can fairly well quantify it because I I tried recently just on the on a private machine to reproduce a bunch of papers, like old papers that they were was always quite interested in? Even some of my papers that I lost code for. At least one of them I tried to reproduce before, and I knew it took me about three weeks to get to a runnable state. With Codecs, I could get there in two days. So it's about, let's say, a week to a day, that's already whether it's a 10x or a 5x, maybe I could have been faster back then.

Speaker 214:04 - 14:43

哦，我我我其实还挺能量化这件事的，因为我最近确实试过，在一台 private machine 上复现一批 papers，比如一些老论文——那些论文我一直都很感兴趣，甚至还包括我自己的一些 papers，因为代码已经丢了。至少其中有一篇，我以前就尝试复现过，而且我知道，当时我花了大约三周才把它弄到可以运行的状态。用 Codecs 的话，我两天就能做到。所以大致可以说，是从一周到一天这种量级；这已经算是 10x 还是 5x 的提升了——当然，也许当年的我本来也还能更快一点。

Speaker 214:45 - 15:26

But it certainly it changes your rhythm because you can just take on things. It's also I can just start three things in parallel and let it go. While when I was doing, I would be able to just do one thing usually. So so it both makes it faster and makes it more parallel, which is but but but, like, I mean, when I do, like, private things, not not not in the production repo, I basically stopped looking at the code. And and a friend asked me, like, do you think you're less sharp now?

Speaker 214:45 - 15:26

但它确实会改变你的节奏，因为你可以直接把事情接下来做。还有就是，我现在可以同时开三个任务并行跑着让它自己推进。而以前我亲自做的时候，通常一次只能做一件事。所以它既让速度更快，也让流程更并行。不过，不过，不过，像我自己做一些私人的东西时，不是在 production repo 里，我基本上已经不怎么看代码了。然后有个朋友问我，你觉得自己现在是不是没那么敏锐了？

Speaker 215:26 - 16:04

And I gave it some thought, and I think it's actually to the contrary. Because due to the fact that I don't look at every class name, every small function, but I still know that these agents can go off the rails, like, once it it it with this paper, it runs something and there were some aux losses and it just added it just thought it should have another auxiliary loss, it was totally off the charts and out of place. So so you need to have a full control in your head of what exactly is it doing. What is the loss? What is the but you don't need to have control of, like, what's the name of the class?

Speaker 215:26 - 16:04

我认真想过这个问题，我觉得其实恰恰相反。因为虽然我不会去看每一个 class name、每一个小函数，但我依然知道这些 agent 可能会跑偏。比如有一次做这篇 paper 相关的东西时，它跑了某个实验，里面有一些 aux losses（辅助损失），结果它自己又加了一个辅助损失，它觉得这样应该更合理，但其实完全离谱、完全不合适。所以你脑子里必须对它到底在做什么有完整的控制。loss 是什么？但你不需要去控制那种，比如 class 叫什么名字。

Speaker 216:04 - 16:52

What's you know, what are the exact words and the function? It's it's quite impressive that you can trust the agents to be, you know, trustful and, like, that they're really implementing what you think. But, I mean, sometimes we check and and and they are. But since you need to have a full control in your head of what's actually running machine learning wise, what are the losses, what are the batches, what then I feel it gives me actually more mental control over what I'm doing than it was before. Because before, I would implement it, but, you know, sometimes in this time be be before running it, I would have to, like, forget a little bit about what the big picture was exactly and focus on the little things, debug, then go back to the big picture.

Speaker 216:04 - 16:52

你知道，就是那些精确的措辞、函数具体怎么写。很令人惊讶的一点是，你居然可以在某种程度上信任这些 agent，觉得它们确实会忠实地实现你所想的东西。我的意思是，我们有时候也会检查，而很多时候它们确实做对了。但因为你必须在脑子里完整掌控实际运行的 machine learning（机器学习）层面的东西：loss 是什么、batch 是什么，等等，所以我反而觉得，相比以前，这让我对自己在做的事有了更多心理上的控制感。因为以前是我自己去实现，但你知道，在这段时间里、在真正跑起来之前，我有时会有点忘掉整体图景到底是什么，转而去盯那些小细节、debug（调试），然后再回到整体图景。

Speaker 216:52 - 17:10

By that time, maybe I forgot some detail, and then I would remember it when it when it was wrong. Now it's like it it's this beautiful thing where you can just be in this flow. Like, you just think machine learning wise, what's supposed to happen. You tell it, verify it, and it's happening. Yeah.

Speaker 216:52 - 17:10

到那时候，也许我已经忘了某个细节，等它出错时我才重新想起来。现在就像一种很美妙的状态，你可以一直待在这种 flow（心流）里。你只需要从 machine learning 的角度去想，应该发生什么；你告诉它，验证一下，然后事情就真的在发生。对。

Speaker 217:10 - 17:21

So it it it's not just about the time saved. It it just makes the work so nice. It's a mild psychosis, I guess, among researchers. These these things, we just can't stop.

Speaker 217:10 - 17:21

所以这不只是节省时间的问题。它就是让工作变得非常舒服。我猜这在研究人员中算是一种轻度 psychosis（精神亢奋/近乎上头）的状态吧。碰上这些东西，我们就是停不下来。

Speaker 117:21 - 17:34

OpenAI very, you know, publicly said, hey. Our our goal is kind of, I think, a a research level intern by, by November of this year. You know, as as someone who plays around with Codex all the time in your research, does it feel like you're you're close to that, or or how are you feeling about that milestone?

Speaker 117:21 - 17:34

OpenAI 很公开地说过，嘿，我们的目标大概是到今年 11 月做到 research-level intern（研究水平的实习生）那种程度。你自己在研究里一直都在玩 Codex，感觉现在已经接近这个水平了吗，还是说你对这个里程碑是什么看法？

Speaker 217:34 - 17:58

It does feel, like, close to an intern, but you need to be very carefully checking. Like, as I said, it can just add you a loss that you did not ask for because it seems reasonable to it. I don't know if interns do that. Maybe sometimes, I guess, sometimes when they're creative. But, like, I I try sometimes.

Speaker 217:34 - 17:58

它确实感觉已经有点接近 intern（实习生）了，但你必须非常仔细地检查。就像我刚才说的，它可能会自己给你加一个你根本没要求的 loss，只因为在它看来这很合理。我不知道 intern 会不会这样做。也许会吧，我猜有时候会，尤其是在他们比较有创造性的时候。不过，像我，我有时会试试看。

Speaker 217:58 - 18:17

You know? It's like I I will just let it go for the night, I give it the goal, you know, make a better model for for this lower perplexity. That never works. It it will just start doing some very trivial tweaks that are not really interesting or useful. So it's certainly not at the level of a researcher.

Speaker 217:58 - 18:17

你知道吗？就像我会让它自己跑一个晚上，我给它一个目标，比如把这个模型做得更好、把 perplexity（困惑度）降下来。但那从来都行不通。它只会开始做一些非常琐碎的 tweak（微调/小改动），既不真的有意思，也不真的有用。所以它肯定还没到 researcher（研究员）的水平。

Speaker 118:18 - 18:20

Yeah. Dude, what's the path forward to make it better there?

Speaker 118:18 - 18:20

对啊。Dude，要把那边做得更好，接下来的路径是什么？

Speaker 218:20 - 18:41

It goes back to to to our question. Right? For a long time, I worked on long context in in machine learning even before transformers, you could say, on on memory and so on. And and then we worked on it with transformers, and and, you know, the context got longer. We got, like, a million tokens, which is huge given what attention does.

Speaker 218:20 - 18:41

这又回到了我们的问题，对吧？很长一段时间里，我一直在做 machine learning 里的 long context（长上下文），甚至在 transformers 出现之前，就已经在做 memory（记忆）之类的东西了。后来我们又用 transformers 来做这个，你知道，context 也越来越长了。我们甚至做到了大概一百万个 tokens，这已经非常惊人了，考虑到 attention 的机制本身是怎么运作的。

Speaker 218:43 - 19:17

But now with agents, it really does feel like grep or rib grep, excuse me, is you know, our solution to long context is let's write a bunch of stuff in files and give it access to grep so it can find and, you know, tell it to write index files, and it's like a little library. Of course, to me as an ML researcher, you told me five years ago, I'd say that's not a solution. That's a hack. Right? But but you know machine learning could I think it's a hack of sorts.

Speaker 218:43 - 19:17

但现在有了 agents（智能体），感觉上还真有点像 grep，或者说 rib grep，抱歉，就是说，我们针对 long context 的解决方案变成了：把一大堆东西写进文件里，再给它 grep 的访问权限，这样它就能去查找；然后你再让它去写 index files（索引文件），于是它就像一个小型图书馆。当然，作为一个 ML researcher（机器学习研究者），如果你五年前跟我说这个，我会说这不算解决方案，这是个 hack（权宜之计），对吧？但 machine learning 在某种意义上我觉得本来就有点像各种 hack 的集合。

Speaker 219:17 - 19:25

So so dropout is is hack. Right? Like, we don't we don't judge. Right? We we take what works, and it works amazingly.

Speaker 219:17 - 19:25

所以 dropout 也是 hack，对吧？我们并不会评判它。我们只看什么有效，而它的效果好得惊人。

Speaker 219:25 - 19:42

It it works. And and you add a little bit of a RL, like, for example, compaction. Like, if there is one reason I like Codex over Cloud Code, it it is compaction. It can go on with the thread, and it's good at compacting. Why is it good at compacting?

Speaker 219:25 - 19:42

它确实有效。然后你再加一点 RL（强化学习），比如说 compaction（压缩整理）。如果说有什么原因让我更喜欢 Codex 而不是 Cloud Code，那就是 compaction。它可以沿着 thread（对话线程）继续下去，而且特别擅长做 compacting（压缩整理）。它为什么擅长这个？

Speaker 219:42 - 20:15

There is I don't think there's, like, very mysterious. Right? People prompted it well and then put some RL on it to just make make make and if you told this to me, like, some years ago that that the long context, well, you just RL a bit that it can use tools and find stuff in files and then summarize good enough to keep the I would tell, okay, that's a Band Aid. It doesn't feel like this deep thing, but we don't judge solutions by how they look. We judge them by how they work, and it works really well.

Speaker 219:42 - 20:15

这里面我觉得没什么特别神秘的，对吧？无非就是人们把 prompt（提示词）写得很好，然后再在上面加一点 RL，让它能够使用 tools（工具）、在文件里找东西，然后再总结得足够好，从而把——如果你几年前跟我这么说，说 long context 的办法其实就是：稍微做一点 RL，让它会用工具、会在文件里找内容，然后总结得足够好以保持——我会说，好吧，这就是个 Band Aid（创可贴式补丁）。它听起来不像什么很深刻的东西，但我们不根据一个方案“看起来像什么”来判断它，我们根据它“效果怎么样”来判断，而它的效果确实非常好。

Speaker 220:18 - 20:34

To that point of, can it become a researcher? Well, some people would say, Well, maybe no. Maybe you'll need this new architecture. Maybe you'll need a post transformer thing that has concepts that are bigger and and follows goals. And it's a fair argument.

Speaker 220:18 - 20:34

说到“它能不能成为 researcher（研究者）”这一点，有些人会说，也许不能。也许你需要一种新的 architecture（架构）。也许你需要一种 post transformer 的东西，里面有更大的 concepts（概念），而且能够追踪 goals（目标）。这是个很合理的观点。

Speaker 220:34 - 21:03

Right? It currently, it feels like it canceled. But then there are other people who say, well, well, well, you'll have your conversations with Codex for a month, and then you're gonna prompted to go over them and find meta patterns, write us to some files, and and just think how it can use them. And, Maybe if you have some data over a thousand people and do some RL on it, it will start behaving like a researcher. In some ways, this is how researchers learn.

Speaker 220:34 - 21:03

对吧？就目前来看，感觉这件事像是被卡住了。但也有人会说，嗯，你会和 Codex 连续对话一个月，然后你会 prompt 它回头梳理这些内容、找出 meta patterns（元模式），把这些写进一些文件里，再去思考它该怎么使用这些东西。也许如果你手上有一千个人以上的数据，再在上面做一些 RL，它就会开始表现得像个 researcher。在某种意义上，这也是研究者学习的方式。

Speaker 221:03 - 21:07

We look how other people do research. We do our trials, see what works better.

Speaker 221:03 - 21:07

我们会看别人是怎么做研究的。我们也做自己的试验，看看什么效果更好。

Speaker 121:07 - 21:10

Why doesn't that work today? Like, I'm sure people have tried that.

Speaker 121:07 - 21:10

为什么这在今天行不通呢？我是说，我敢肯定人们已经试过了。

Speaker 221:10 - 21:27

Oh, hey. No. I I don't think people have tried yet very hard. It you know, it does because, like, some people do some problems and they work for them. It is important to I mean, to me, the Codex era started, like, this year or Christmas.

Speaker 221:10 - 21:27

哦，嘿，不。我觉得大家其实还没有非常认真地去尝试。你知道，这之所以会这样，是因为有些人会处理某些问题，而且那对他们来说是有效的。我的意思是，对我而言，Codex 时代差不多是从今年，或者说从 Christmas 开始的。

Speaker 221:27 - 21:34

Right? It's a I mean, Codex existed before, and we used it, and Cloud Code existed, and and we also used parts of it. But

Speaker 221:27 - 21:34

对吧？我的意思是，Codex 以前就存在了，我们也用过；Cloud Code 也存在过，而且我们也用过其中的一些部分。但是——

Speaker 121:35 - 21:36

I think everyone felt at Christmas.

Speaker 121:35 - 21:36

我觉得每个人在 Christmas 的时候都有那种感觉。

Speaker 221:36 - 22:01

But it it seems like only the newer but it's not just the models. It's also the harness and the some tweaks. So, yeah, it's it's barely half a year. And and there's still many people, if you go a bit outside of the you know, our SFAI bubble, that totally don't get it. They're like, you know, you're a little psychotic, but why?

Speaker 221:36 - 22:01

但看起来好像只有较新的那些——不过这不只是 model（模型）的问题，也包括 harness 以及一些 tweaks（微调）。所以，是的，这其实才不过刚刚半年多一点。而且如果你稍微走出一点，你知道的，我们这个 SFAI 小圈子，仍然有很多人完全不理解这件事。他们会觉得，你知道，你有点精神不太正常，但为什么呢？

Speaker 222:01 - 22:42

Right? I think it's a fair question, but it started working very recently. We don't truly understand It was not like a big pre training that changed it that much, even though big pre trainings came too, but when we went from RNNs to transformers, it was very easy to attribute the change to this. Well, now I feel then there was reasoning, clearly is important Narel. And but the change last winter or last Christmas, it's a little hard to pin down.

Speaker 222:01 - 22:42

对吧？我觉得这是个很合理的问题，但这件事真正开始起作用，其实就是非常最近的事。我们并不真正理解原因。并不是说某一次大规模 pre-training（预训练）让它发生了那么大的变化，尽管大规模 pre-training 也确实来了；但当我们从 RNNs 转向 transformers 时，很容易把这种变化归因到这一点上。可现在我的感觉是，后来出现了 reasoning，这显然很重要，Narel。不过去年冬天或者说上一个 Christmas 发生的变化，就有点难以准确说清到底是由什么导致的。

Speaker 222:42 - 23:01

I mean, harness changed and the little post training change, and and then new pretrained models come, which, course, made things better. But but it felt like a big jump, which is not that easy to pin down what did it. So so so it's a little bit messy. Right? We improve everything all the time.

Speaker 222:42 - 23:01

我的意思是，harness 变了，再加上一些很小的 post-training（后训练）改动，然后新的 pretrained models（预训练模型）也出现了，当然，这些都让事情变得更好了。但是它给人的感觉像是一次很大的跃升，可又没那么容易准确指出到底是什么带来了这种变化。所以这件事多少有点混乱。对吧？我们一直都在不断改进所有东西。

Speaker 223:02 - 24:22

But then because it works and it feels so important, then there is also the necessity to just bring it to people, make it work everywhere, promote it. There is competition going on. I think in all of this, people did not have truly the time yet to think, how do you really do this meta level? People are starting, But also, it also feels you because the meta level is something like you research for a week and then you get some patterns and start applying them, that feels like a relink this needs to take weeks to Our current reinforcement learning methods unluckily need to run basically all rollouts on this. If your rollouts are weeks long, then your training starts to be months long and that all becomes a little impractical, which maybe is an argument that the post transformer that the human side has something to learn because clearly, humans can do research over years and they do this once in their life or twice.

Speaker 223:02 - 24:22

但随后，因为它确实有效，而且感觉又如此重要，于是也就产生了一个必要性：把它带给更多人，让它在各处都能工作，并去推广它。竞争也在发生。我觉得在这一切之中，人们其实还没有真正来得及去思考：你到底要如何在这个 meta level（元层面）上做这件事？人们开始在想了，但同时它也会给人一种感觉：meta level 好像是那种你研究一周，得到一些模式，然后开始应用它们的东西；可这又让人感觉，这种东西其实需要花上好几周。我们当前的 reinforcement learning（强化学习）方法，不幸的是，基本上需要在这上面跑完所有 rollouts（展开轨迹）。如果你的 rollouts 一跑就是几周，那么训练就会变成几个月长，这一切就会显得有点不切实际。这或许可以被看作一个论据：在 post-transformer（后 Transformer）时代，human side（人类这一侧）可能有一些东西值得学习，因为很明显，人类可以花上数年做研究，而且他们一生中可能也就这样做一次或两次。

Speaker 224:22 - 24:47

Some mathematicians spend twenty years on one problem, that's their magnum opus, and that's it. So they did not have 200 problems twenty years long before to learn from, and somehow they manage. How does this work? It's a fascinating question. Clearly, with some relevance to this, we haven't figured it out.

Speaker 224:22 - 24:47

有些数学家会在一个问题上花二十年，那就是他们的 magnum opus（代表作），然后也就到此为止了。所以，他们在此之前并没有做过 200 个各花二十年的问题来从中学习，但他们 somehow 还是能做到。这是怎么运作的？这是个非常迷人的问题。很显然，我们还没有把这件事搞明白，而它与这里讨论的内容是有一定相关性的。

Speaker 224:47 - 25:05

But on the other hand, now we don't gather since a lot of people work with it, We'll gather a lot of the data on the, like, weeks to months long humans. Someone will run this RL, and it may just turn out that that that it gets you Yeah. Further.

Speaker 224:47 - 25:05

但另一方面，现在既然已经有很多人在做这个，我们会收集到大量这种持续数周到数月的人类相关数据。总会有人去跑这个 RL（强化学习），而结果也可能就是——它确实能把你再往前推进一些。是的，更进一步。

Speaker 125:05 - 25:31

So It's such an interesting point because, basically, you know, like you're saying, as as folks were scaling pretraining or as folks were scaling the original sort of reasoning models, it was kind of it was straightforward or or or at least made sense, like the vector you were scaling on. And then this kind of big advance we've had in codex and clogged code over Christmas, you don't actually know what the source of that is or not, you're not fully crystal clear on it, it's very hard to then determine what you should be pushing on to continue to improve these capabilities.

Speaker 125:05 - 25:31

所以，这真是个特别有意思的点，因为基本上，就像你说的，当人们在扩展 pretraining（预训练）时，或者在扩展最初那类 reasoning models（推理模型）时，你大致是清楚的——或者至少是说得通的——知道自己是在沿着哪个 vector（方向）去扩展。然后，像我们这次在 Christmas 期间在 codex 和 clogged code 上看到的这种重大进展，它的来源你其实并不真正清楚，或者说你并没有完全看得非常透彻；这样一来，就很难判断接下来到底应该推动什么，才能继续提升这些能力。

Speaker 225:31 - 25:59

Yes. It's a little it's a little confusing. But I mean, if The fact that I don't know doesn't mean nobody knows. I think maybe some people have stronger opinions on what exactly pushed it through, I don't think it's that clear at this point, which yeah. It it's just I mean, the other it's been improving for a while, but something happened.

Speaker 225:31 - 25:59

对，是有一点让人困惑。但我的意思是，我不知道并不代表没人知道。我想，也许有些人对到底是什么把它推进过去这件事有更强烈的看法；只是我觉得，就现阶段而言，这还没有那么清楚。是的。它——我的意思是，另外一点是——它已经持续改善了一段时间了，但确实发生了某件事。

Speaker 225:59 - 26:07

Yeah. Because, yeah, it did not feel possible to to do this, and now it does. On this kind

Speaker 225:59 - 26:07

是的。因为，是的，之前感觉这件事根本做不到，而现在看起来可以了。在这种——

Speaker 126:07 - 26:41

of current scaling regime on the on the RL side, I think one question a lot of folks have is we've seen, you know, obviously, you know, tons of of of coding improvement and math and these kind of, like, verifiable domains. And I feel like the two big questions around RL continue to be, like, you know, how well is this gonna work on the nonverifiable side? And and then also, you know, the extent to which we'll we'll get generalization and not having to keep you know, do tons of data in each space. Maybe we'll take them one at a time. But starting with the first, you know, how do you think about the problems that need to be solved on the nonverifiable domain side and, you know, any inklings as to to which spaces might be might be next beyond code and math?

Speaker 126:07 - 26:41

——当前的 scaling regime（扩展范式）下，在 RL（强化学习）这一侧，我觉得很多人都有一个问题：我们已经看到，显然，coding（编程）方面有大量提升，math（数学）以及这类可验证的 domain（领域）也一样。我感觉围绕 RL 的两个大问题仍然是：第一，在 nonverifiable（不可验证）这一侧，它到底会有多好用？第二，我们究竟能在多大程度上获得 generalization（泛化），而不必在每一个空间里都持续投入海量数据。也许我们可以一个一个来谈。但先从第一个开始，你怎么看 nonverifiable domain 这边需要解决的问题？以及，你是否对除了 code 和 math 之外，接下来可能推进到哪些领域，有一些初步判断？

Speaker 226:41 - 27:05

I do think there is a there's been a fair progress on the nonverifiable side. If you look, for example, at things like Harvey and Law or things in medicine, they're not verifiable, but there's a lot of parts of them that are verifiable. Right? So so so there's been good progress on that. And I I think I mean, GDP value is one benchmark that Yeah.

Speaker 226:41 - 27:05

我确实认为，在 nonverifiable（不可验证）这一侧已经有了相当不错的进展。比如说，如果你看 Harvey 和 Law，或者医学里的事情，它们并不是可验证的，但其中有很多部分其实是可验证的。对吧？所以在这方面已经有了不错的进展。而且我觉得——我的意思是，GDP value 是一个 benchmark（基准）——对。

Speaker 227:05 - 27:22

In some sense benchmarks things like that too. And and I do think there is really good progress, and there's really good incentives to to make progress in in these domains. I I'm not sure if it's fully fair to call them nonverifiable.

Speaker 227:05 - 27:22

从某种意义上说，benchmarks（基准测试）之类的东西也在做这件事。而且我确实认为，在这些领域已经有非常好的进展，也有很强的激励去推动进展。我不确定把它们称为“不可验证”是否完全公平。

Speaker 127:22 - 27:26

They're certainly not as as as as perfectly set up as coding and math. Right?

Speaker 127:22 - 27:26

它们当然没有像 coding 和 math 那样被设置得那么完美，对吧？

Speaker 227:26 - 27:38

They're they're not coding and math. Right? But but math I think people overstate how verifiable math is. Like, coding is fairly verifiable in the sense, like, programming competitions are verifiable. Yeah.

Speaker 227:26 - 27:38

它们确实不是 coding 和 math，对吧？但我觉得，人们有点夸大了 math 的可验证性。比如说，coding 在某种意义上确实比较容易验证，像 programming competitions 就是可验证的。对。

Speaker 227:38 - 28:11

Once you go to, like, front end coding and stuff, it's it's also not that verifiable, but but still, the math, the proofs are not that easy or clean. I mean, you can do lean, but but most of the math, at least from GPTs, it's not formalized, so it's not that verifiable. So it's a spectrum, and then things get less and less verifiable. I I had this pet project of of translating poetry into Polish, which seems fairly not verifiable. But then you run these models as verifiers, and, you know, they get a fair bit of stuff.

Speaker 227:38 - 28:11

但一旦到了 front end coding 之类的东西，它其实也没那么容易验证。不过 math 也是，proofs 并没有那么简单或干净。我的意思是，你可以用 Lean，但大多数 math，至少从 GPTs 产出的这些来看，并没有被 formalized（形式化），所以也没那么可验证。所以这是一个 spectrum（光谱），然后事情会变得越来越不可验证。我之前有个 pet project，就是把诗歌翻译成 Polish，这看起来就相当不可验证。但你也可以让这些 models 充当 verifiers（验证器），然后你会发现，它们其实能抓住不少东西。

Speaker 228:12 - 28:45

They get, like, rhyme and things and ref they can get cultural references. They so it turns out once you read how people have verified things before, you can get to some level of verifiability. But then, I mean, I think what this poetry thinks that was also what it was meant to show is you can verify a lot of things and then have still have kind of no taste. And and and that I mean, since it's not verifiable, it's not so easy. You know, if it were easy to describe in words

Speaker 228:12 - 28:45

它们能抓到 rhyme（押韵）之类的东西，也能识别 cultural references（文化典故）。所以结果是，一旦你去看人们以前是怎么验证这些东西的，你就能把事情推进到某种程度的 verifiability（可验证性）。但我的意思是，我觉得这个 poetry 的例子，以及它本来想说明的点，也在于：你可以验证很多东西，但仍然可能完全没有 taste（审美）。而这一点——因为它不可验证——就没那么容易处理。你知道，如果它能很容易用语言描述出来，

Speaker 128:45 - 28:46

Yeah. Then it would be verifiable.

Speaker 128:45 - 28:46

对，那它就会是可验证的。

Speaker 228:46 - 29:15

But it doesn't mean it isn't there. Right? It it it it you read this and there is something in your brain that that reinforces this idea that there is something they're missing. You know, we but we have we have driven ourselves into this hole basically on purpose because what is reinforcement learning? It tells you whenever you have a a teacher, a validator, someone telling you this is good, this is bad, I can train against it, and I'm gonna get good.

Speaker 228:46 - 29:15

但这并不意味着它不存在，对吧？你读到这些东西时，你的大脑里确实会有某种反应，强化这样一种想法：它们就是缺了点什么。你知道，我们其实是几乎故意把自己逼进了这个坑里，因为 reinforcement learning（强化学习）是什么？它告诉你，只要有一个 teacher、一个 validator（验证者）、一个会告诉你这好、那不好的角色，我就可以针对它训练，然后我就会变强。

Speaker 229:16 - 29:38

And that's what the models do. So, you know, whenever I will come and say, look. I don't think this does this very tastefully in something, then someone will tell, okay, show me, and then it will nail it. I think some people even run runs that go basically against like, for image generation, you you can ask, is this beautiful or not? Okay.

Speaker 229:16 - 29:38

而 models 做的正是这个。所以你知道，每当我站出来说，你看，我觉得这东西在某方面做得不够有 tastefully（品味/审美感），就会有人说，好，那你证明给我看，然后它往往就能做得非常到位。我觉得有些人甚至在做基本上类似这样的 runs（实验轮次）：比如在 image generation 里，你可以直接问，这张图美不美？好。

Speaker 229:38 - 29:51

Not verifiable. But you just get a bunch of people who during training click, this is beautiful. This is done. And lo and behold, the images start to be more beautiful. So the verifiability thing is very weak.

Speaker 229:38 - 29:51

不可验证。但你基本上只是让一群人在训练过程中不断点击“这个很美”“这个完成得好”，结果果不其然，图像就开始变得更漂亮了。所以所谓可验证性这件事其实是很弱的。

Speaker 229:51 - 30:04

Right? You can it's just a very sparse signal when you ask you can ask people, is this nice? Is this not nice? Which then you know, how do we you know, why do I think this is not very tasteful? Right?

Speaker 229:51 - 30:04

对吧？你能得到的其实只是一个非常稀疏的信号：你可以问人们，这个好吗？这个不好吗？可接下来你就会发现，我们该怎么理解，比如说，为什么我会觉得这个东西不太有品位？对吧？

Speaker 230:04 - 30:40

It's Clearly, some of my experiences and some way that I have processed it make me say this statement. Now so so why does the model not say it? There are two possibilities. One is that it hasn't seen enough experience that would make it do it, and the other is that it's not processing it in the right way. I believe in both actually, but even with the way it is processing, if you just put more experience, you ask a thousand people to to tell it, then it gets better.

Speaker 230:04 - 30:40

很明显，是我的一些经历，以及我处理这些经历的某种方式，让我会说出这样的判断。那为什么模型说不出来呢？有两种可能。一种是它还没有见过足够多能让它形成这种判断的经验；另一种是它没有用正确的方式去处理这些经验。其实这两点我都相信。不过即便在它现有的处理方式下，如果你只是给它更多经验，让一千个人去告诉它，那它也会变得更好。

Speaker 230:41 - 31:04

So so there is a every hole you have, you can kind of plug by by hammering on it. But it would be so nice if you didn't have to. Right? It's like be be because also every hole you plug stops being a bottleneck, and then the bottlenecks that emerge are again the holes that you have not plugged. And and We're in this interesting circle.

Speaker 230:41 - 31:04

所以，你现有的每一个漏洞，某种程度上都可以靠反复硬砸的方式把它补上。但如果根本不用这么做，那该有多好，对吧？因为每补上一个漏洞，它就不再是瓶颈了，接着冒出来的新瓶颈，又会变成那些你还没有补上的漏洞。于是我们就陷在这样一个很有意思的循环里。

Speaker 231:05 - 31:16

But if we had this method, this brain like method that would just not have so many holes that need plugging, wouldn't this be great?

Speaker 231:05 - 31:16

但如果我们有一种这样的方法，一种像大脑一样的方法，不会有这么多需要去补的漏洞，那不是很好吗？

Speaker 131:16 - 31:42

Does that kind of imply that any problem area that someone does focus on under the current architectures can be figured out? It's just that to your point, it probably is is far more you know, requires curated data and far more manual than, you know, a potentially more beautiful way of of of doing things down the line. But there's not like a set of problems or or a set of domains that you're like, god, under the current RL methods, you know, that would be too hard for for for the models.

Speaker 131:16 - 31:42

这是否有点意味着，在当前这些 architecture（架构）下，只要有人愿意专注去做，任何问题区域其实都能被解决？只是按你的说法，这大概率会需要更多精心策划的数据（curated data），也会更加依赖人工，而不是说，未来可能会出现一种更优美的做法。但并不是说会有某一类问题、或者某一些领域，会让你觉得，天啊，在当前的 RL（强化学习）方法下，这对模型来说会难到做不了。

Speaker 231:42 - 31:56

That it does not feel so. But Yeah. You do need to take economics into account. Right? I mean, currently, to to make these models work really well, you need to start from a fairly strong model, which is fairly big and expensive.

Speaker 231:42 - 31:56

我的感觉倒不是这样。不过，是的，你确实得把 economics（经济性）考虑进去。对吧？我的意思是，目前要让这些模型真正表现得很好，你得先从一个相当强的模型开始，而这样的模型通常都相当大，也相当昂贵。

Speaker 231:57 - 32:18

On top of this, it's usually closed, so you can't really you do it. I mean, there is the RL fine tuning API, which I quite like from OpenAI and some similar ones, but you don't truly have full access to it. Even with the API, it can be a little hard. And even on top of that, the investment you'll need to make into the data and things, it's it's substantial. Right?

Speaker 231:57 - 32:18

除此之外，它通常还是 closed（封闭）的，所以你实际上并不能真正自己去做。我的意思是，虽然有 RL fine-tuning API，我还挺喜欢 OpenAI 的这个，还有一些类似的产品，但你并没有真正获得对它的完全访问权限。即使用 API，做起来也还是会有点难。再加上你还需要在数据以及其他方面投入不少资源，这个成本是相当可观的，对吧？

Speaker 232:18 - 32:38

You you couldn't do this US. You you'd need a company. You'd need need some contracts. You'd need which, you know, if that's important enough, it's a fair method. But then, right, wouldn't it be great if if if you could just talk to the model and and and it would work on its own?

Speaker 232:18 - 32:38

你、你在美国没法这样做。你、你需要一家公司。你需要一些合同。你还需要——你知道的——如果这件事足够重要的话，那也是一种合理的方法。但接着，对吧，如果、如果、如果你可以直接跟 model（模型）说话，而它自己就能运转起来，那不是很好吗？

Speaker 132:38 - 33:04

Does it feel like there's any signs of, you know, general capability improvement as you do you you could imagine a world where it's like, okay. We'll we'll start with code, and then we'll do math, and then we'll, you know, do this for legal and health care. And you could you could tackle each of these one by one even if you're not getting any sort of generalization across. Or, you know, ideally, I think that maybe the hope would be at some level of of having done reinforcement learning on a bunch of different domains, maybe somewhere to pretraining At some level, like, generalization emerges or something.

Speaker 132:38 - 33:04

你觉得有没有任何迹象表明，整体能力在提升？因为你、你、你可以想象这样一个世界：好吧，我们先从代码开始，然后做数学，然后再把这个用到法律和医疗保健上。即使完全没有任何跨领域的泛化，你也可以把这些一个一个攻下来。或者，理想情况下，我想也许大家的希望是：在很多不同 domain（领域）上做过 reinforcement learning（强化学习）之后，也许在某个地方——甚至追溯到 pretraining（预训练）的某个层面——泛化会涌现出来，之类的。

Speaker 233:04 - 33:07

But I think generalization emerges in in reinforcement learning.

Speaker 233:04 - 33:07

但我认为，泛化确实会在 reinforcement learning（强化学习）中涌现。

Speaker 133:07 - 33:09

So you think already the models get better across the board?

Speaker 133:07 - 33:09

所以你的意思是，模型其实已经在各个方面都变得更好了？

Speaker 233:09 - 33:42

Oh, yes. They they they they certainly do. Like, if you look at, like I think law is simply not in the RL pipeline at all. And you talk to Harvey or someone and and they say like, it it either emerges or they need like a little train, like just a few touches on top of it and and it suddenly catches it. So so there is definitely generalization, but but it's the generalization doesn't seem just to go as far as or, like yeah.

Speaker 233:09 - 33:42

哦，是的。它们、它们、它们、它们当然是会的。比如说，如果你看——我觉得 law（法律）根本就不在 RL pipeline（强化学习流程）里。但你去和 Harvey 或其他人聊，他们会说，这种能力要么会自己涌现出来，要么他们只需要做一点点训练——就是在上面稍微补几下——它就突然会了。所以，泛化肯定是存在的，但是、但是这种泛化似乎并没有走得那么远，或者说，像、对。

Speaker 233:42 - 33:57

It it just, like, works in not the ways that we would hope. Yeah. Sometimes, like, it doesn't generalize even from math to other areas of math. There there's if you look at the even the IMO. Right?

Speaker 233:42 - 33:57

它就是、就是会以一种并非我们所希望的方式起作用。对。有时候，它甚至不会从数学泛化到数学里的其他领域。你看，哪怕是 IMO（International Mathematical Olympiad，国际数学奥林匹克）也是这样。对吧？

Speaker 233:57 - 34:14

Like, now it seems like so far away that models are the IMO. But it it it would have, like, types of exercises. For for a long time, was geometry that it just couldn't crack. It would do very hard It would solve very hard problems in other domains. But in geometry, we're like, oh, okay.

Speaker 233:57 - 34:14

比如，现在看起来，model（模型）离 IMO 还很遥远。但它、它、它会有某些类型的题目。很长一段时间里，geometry（几何）就是它始终攻不破的。它能做非常难的——它能解出其他领域里非常难的问题。但一到几何，我们就会说，哦，好吧。

Speaker 234:14 - 34:41

It has no spatial understanding. Then it just saw more data and started cracking it, but not spatial understanding data or physical, just smart geometry problems. But but it it has this jaggedness. Right? It will generalize from here to here, but not to something that seems very close, but somehow in this representation of these chains of thought is is not like, it's close to me, but it's not close to the model.

Speaker 234:14 - 34:41

它没有 spatial understanding（空间理解）。后来它只是看了更多数据，就开始攻克了，但不是空间理解数据，也不是 physical（物理）方面的数据，就只是聪明的几何题而已。但是、但是它有这种锯齿感。对吧？它会从这里泛化到那里，却不会泛化到某个看起来非常接近的东西；但不知为何，在这些 chains of thought（思维链）的表征里，那并不接近——对我来说它很接近，但对 model（模型）来说并不接近。

Speaker 234:41 - 35:08

Right? So so it's not like it's not generalizing. It's generalizing, but in its weird alien way, and and that just doesn't cover some ways that I can generalize. And you know, it's possible that with more data, it would just cover more of this space. But I also understand people who say, like, you know, when it's like that, it's very hard to trust, like, to commit to it.

Speaker 234:41 - 35:08

对吧？所以这并不是说它不能泛化（generalize）。它会泛化，只是用一种很古怪、很像外星人的方式来泛化，而那种方式就是覆盖不到我能够泛化的一些情况。你知道，也有可能如果给它更多数据，它就会覆盖这个空间里更多的部分。但我也理解那些会说“像这样的话，就很难信任它、很难真正投入使用”的人。

Speaker 235:08 - 35:38

Like, it it be because you know there may be the spike that it just hasn't gotten, so you need to be on the lookout for for problems. And as I use it, as I'm a researcher, I think it keeps me very honest because I think need to be it keeps me sharp, so maybe this is good in this way, but it's not good in the capabilities way because you just hope that that it doesn't have these sharp edges. Right? For now,

Speaker 235:08 - 35:38

比如说，可能只是因为你知道，它存在某个还没学到的尖峰情况，所以你必须时刻留意问题。对我来说，在使用它时，作为研究者，我觉得它会让我保持非常诚实，我觉得自己需要这样；它也会让我保持敏锐，所以也许从这个角度看这是件好事，但从能力（capabilities）的角度看这并不好，因为你真正希望的是它没有这些尖锐边缘（sharp edges）。对吧？至少目前，

Speaker 135:38 - 36:06

it does. You mentioned some of the application companies, obviously, that benefit from these models getting better. And I think there's, like, this big question of if you're an application company right now, should you be working super closely with with one of the labs and sharing kind of all these evals and, like, understanding you have of the domain? Or, you know, is that, like, actually you know, you're better off kinda building almost your own model based on information versus, you know, sharing it back. I'm curious how you think about, like, the room for, you know, applications on top of, you know, the the core models.

Speaker 135:38 - 36:06

它是有的。你提到了一些应用公司（application companies），显然，这些公司会从模型变得更强中受益。我觉得这里有一个很大的问题：如果你现在是一家应用公司，你是不是应该和某一家 lab 非常紧密地合作，分享各种 evals（评测）以及你对这个领域的理解？还是说，实际上你更适合基于这些信息自己去构建几乎属于自己的模型，而不是把这些东西再反向分享回去？我很好奇你怎么看：在核心模型（core models）之上做应用，这里面到底还有多大空间。

Speaker 236:06 - 36:37

For now, what is certainly true is that the bigger and better your pretrained model is, the less of these sharp edges you get and generally the easier all of your life becomes. Whether you do RL on it or fine tuning on whatever bigger model, things just get easier. It's insane how this has continued to be the case. We we have don't know. You remember, like, a year ago, two years, people were saying, oh, LLMs are that SLMs are the future.

Speaker 236:06 - 36:37

目前可以确定的一点是：你的 pretrained model（预训练模型）越大、越好，这类尖锐边缘就越少，而且总体上你的所有事情都会变得更容易。不管你是在上面做 RL（强化学习），还是在某个更大的模型上做 fine-tuning（微调），事情都会变得更简单。很疯狂的是，这一点居然一直都还成立。我们，我们也不知道。你记得吗，大概一年前、两年前，人们还在说，哦，LLMs（大语言模型）已经那样了，SLMs（小语言模型）才是未来。

Speaker 236:37 - 36:49

Small models. And we have amazing small models. Like, the that recently were, like, few billion. I remember g p t three people said, oh, you go you do no zero shot learning under 100,000,000,000. No.

Speaker 236:37 - 36:49

小模型。我们现在也确实有很棒的小模型。比如最近那些，大概只有几十亿参数。我记得在 GPT-3 的时候，人们还说，哦，你要做 zero-shot learning（零样本学习），模型规模得不到 100,000,000,000 是不行的。不。

Speaker 236:49 - 37:11

Yeah. You have we have, like, you know, three b models that that are so so so that's all amazing. But if you really want to solve big problems easily, adjust to your data and context Yeah. There just doesn't seem to be anything like a really, you know, elephant model. But they're, of course, expensive and and and hard to use and and even harder to trade.

Speaker 236:49 - 37:11

对。我们现在有，比如说，3B 模型，而且它们真的非常非常——所以这当然都很了不起。但如果你真的想轻松解决大问题，并且适配你的数据和上下文，嗯，似乎还是没有什么东西能替代那种真正的、你知道、elephant model（大象级模型）。但它们当然也很昂贵，而且很难用，甚至更难训练。

Speaker 237:11 - 37:11

The one thing

Speaker 237:11 - 37:11

有一件事——

Speaker 137:11 - 37:40

I think would be interesting for our listeners is I think something that's maybe less obvious to folks outside the cutting edge is just what's enabled by new generations of hardware. Right? And so I wonder if you could speak a little bit about to I mean, you know, obviously, it seems like for for for certain things as as we've you know, as we waited for, like, Blackwall chips to come online, it's like, hey. They came online and, like, the models got better. And it's always hard to tell, how much that is is just, yes, you could now do lots of things on the hardware you couldn't do before, how much of that was just timing correlation.

Speaker 137:11 - 37:40

我觉得对我们的听众来说，一个可能会很有意思的点是：对前沿领域之外的人来说，也许没那么显而易见的一件事，就是新一代硬件到底带来了什么能力，对吧？所以我想请你稍微谈谈这个。我的意思是，你知道，很显然，对某些事情来说，随着我们等待像 Blackwall chips 这样的芯片上线，感觉就像是：嘿，它们一上线，模型也变好了。但一直很难分辨，这里面到底有多少真的只是因为，现在你终于可以在这些硬件上做很多以前做不了的事；又有多少其实只是时间上的相关性而已。

Speaker 137:40 - 37:47

But maybe just speak to to that, and I think it's kinda relevant to this conversation around, like, are these architectures just gonna get better as the as the hardware just get better?

Speaker 137:40 - 37:47

不过也许你可以谈谈这个，我觉得这和我们刚才讨论的内容挺相关的，比如说：随着硬件不断变强，这些架构是不是也只会越来越好？

Speaker 237:47 - 38:03

I mean, hardware gets better, and and hardware, you know, it's it's easy. It's flops and memory access. Right? So you need memory fast enough to feed the flops. But but but it's a very simple, call it, performance.

Speaker 237:47 - 38:03

我的意思是，硬件确实在变好，而硬件这件事，你知道，其实很直接。就是 flops 和 memory access（内存访问），对吧？所以你需要足够快的内存来喂给这些 flops。但这本质上是一个非常简单的、姑且称之为性能的问题。

Speaker 238:03 - 38:30

And I I recently so I I got a I got a personal compute computer, one for myself, and I bought a fifty ninety GPU. And it felt like, oh, you know, it's one GPU and, like, you're under your desk. What can this do? So it can do a little bit of some tests, and and it's just insane to think. So the fifty ninety, it's about 200 teraflops.

Speaker 238:03 - 38:30

我最近给自己买了一台 personal compute computer（个人计算机），然后买了一张 fifty ninety GPU。感觉就像，哦，你知道，就一张 GPU，而且就在你桌子底下。这能干什么呢？结果它其实能做一些测试，而真正让人觉得疯狂的是：这张 fifty ninety，大概有 200 teraflops。

Speaker 238:30 - 39:18

I mean, says 400, but but some are turned off on b f 16. So the the GPUs we research transformer on, they had nine teraflops, We had eight GPU machines, so in absolute scaling, you could say be like seventy, eighty teraflops for real on a machine. So now I have under my desk something that's like five of these machines in one GPU, which is much more convenient than but I think we used around 10 or something. So so you could do all of transformer research on this few thousand dollars GPU under your desk that that, you know, you could have in your kitchen. Like, it's it's it's it's a normal little tower.

Speaker 238:30 - 39:18

我的意思是，参数上写的是 400，但在 BF16 上有一部分是关掉的。所以当年我们用来研究 transformer 的那些 GPU，单卡只有 9 teraflops，我们当时用的是 8-GPU 的机器，所以按绝对规模算，一台机器真实大概也就是 70、80 teraflops。可现在我桌子底下这一台设备，单张 GPU 的算力就差不多相当于那样 5 台机器，而且方便得多——不过我记得我们当时好像总共用了 10 台左右。所以你现在完全可以用一张几千美元、放在桌子底下的 GPU 来做当年全部的 transformer 研究，甚至你把它放在厨房里都行。就是，它真的只是一个普通的小塔式主机。

Speaker 239:19 - 39:35

And oh, okay. It it it it's a few years. It's it's not even a decade, though. It's quite amazing what they can do. Now we run everything in BF16, but of course you can go lower even in precision, especially with MOEs, then you pack more.

Speaker 239:19 - 39:35

而且，哦，好吧，这才过去几年，甚至还不到十年。他们能做到这些真的相当惊人。现在我们什么都用 BF16 来跑，但当然你还可以把 precision（精度）继续压低，尤其是在 MOEs 上，这样你还能塞进更多内容。

Speaker 239:36 - 40:11

In inference, this is amazing. Our ability to run these models has dramatically increased and it increases the things you can research. You can now run so many interesting ways. Now it does give you the ability to just oh, and also there's more GPUs in the world, like the big labs are building out. You can train huge models on a huge number of very fast GPUs.

Speaker 239:36 - 40:11

在 inference（推理）方面，这太惊人了。我们运行这些模型的能力已经大幅提升，而这也扩大了你能研究的东西。现在你可以用很多非常有趣的方式去运行它们。当然，这也确实给了你那种“直接就能上手”的能力——而且世界上的 GPU 数量也更多了，比如那些大型实验室都在持续扩建。你可以用大量速度极快的 GPU 来训练超大的模型。

Speaker 240:12 - 40:49

NVIDIA has kept the pace, TPUs at Google have kept the pace. They're really speeding up very quickly, and their numbers are growing, and then it's a very paralyzable process. So we can now train much bigger models much faster. That is amazing. I do still think that the even more interesting things is that we can do more research And like like, it it it's inter I I remember when when when I was joining Google, people were talking about, like, how much flops do you need for the to do something like the brain.

Speaker 240:12 - 40:49

NVIDIA 一直在跟上这个节奏，Google 的 TPU 也是一样。它们的速度提升得非常快，数量也在增长，而且这本来就是一个非常 paralyzable（可并行化）的过程。所以现在我们可以更快地训练大得多的模型。这很了不起。不过我仍然觉得，更有意思的其实是我们现在能做更多研究了。就像——我记得当年我刚加入 Google 的时候，人们还在讨论，类似要做出像大脑那样的东西，到底需要多少 flops。

Speaker 240:49 - 41:11

Right? Yeah. And it's a very vague question because to really simulate a ring in the brain, it's maybe impossible, maybe still very much. But but people for decades have been doing these estimates, they always felt like somewhere between one and a 100 petaflops. And I remember back then we were like, okay, so this is going to take a few decades for us to get there.

Speaker 240:49 - 41:11

对吧？是的。而这其实是个很模糊的问题，因为如果真要精确模拟大脑里的一个回路，也许是不可能的，也许仍然还差得很远。但几十年来，人们一直在做这类估算，而结果似乎总是在 1 到 100 petaflops 之间。我记得那时候我们还在想，好吧，那看来我们还得再过几十年才能达到这个水平。

Speaker 241:11 - 41:34

Now you can buy a single GPU. So that is quite insane. You have this one thing you can and then of course you can, on the cloud, get machines with many of them. So potentially, you can, you know, run like a year of worth of human processing in a day at a at a cost. Right?

Speaker 241:11 - 41:34

现在你甚至可以买到一张单独的 GPU。所以这相当疯狂。你手上有这样一个设备，当然你也可以在 cloud 上租到装有很多张 GPU 的机器。所以理论上，你可以用某种成本，在一天之内跑完相当于人类一年处理量的计算。对吧？

Speaker 241:34 - 41:48

But it's not a cost of millions. Right? It's a cost of hundreds to thousands of dollars. If you believe you can maybe figure out this algorithm, like, I mean, it's questionable whether we have the data that people have. Some people are trying to do recordings of of kids.

Speaker 241:34 - 41:48

但这个成本不是几百万美元，对吧？而是几百到几千美元。如果你相信自己也许能想出这个 algorithm（算法），我是说，问题在于我们是否拥有人类真正具备的那些 data（数据）还很可疑。有些人在尝试做儿童的记录采集。

Speaker 241:48 - 42:35

Right? There's a question how well, there's a lot of questions, but we're getting to this level where where someone at a university will be basically able to run like a childhood you know, if you have an idea for how the brain learns, you'll be able to run it in a few days to the whole ten years of learning of a human being and see if it works or doesn't, maybe if you know how to evaluate it. I think this is even more powerful than the fact that we can build these huge models, which is also powerful because they will help you implement this all. And we're getting this loop where I always felt limited with RNNs, for example, with because they're very sequential. So if you just run them like in Torch, they're very, very slow.

Speaker 241:48 - 42:35

对吧？这里面有个问题，到底效果会有多好——其实有很多问题——但我们正在接近这样一个阶段：大学里的人基本上也能跑一遍“童年”。就是说，如果你对大脑如何学习有一个想法，你将能够在几天内把人类完整十年的学习过程跑完，看看它是否有效，或者无效，前提也许是你知道该如何评估它。我觉得这甚至比我们能够构建这些巨型 model（模型）更有力量——那当然也很强大，因为它们会帮助你把这一切实现出来。我们正在进入这样一个循环：比如以前我一直觉得自己受限于 RNNs，因为它们非常 sequential（串行）。所以如果你只是用 Torch 去跑它们，它们会非常、非常慢。

Speaker 242:35 - 42:55

Right? But you can write a special CUDA kernel that makes them very fast, but writing CUDA kernels is awful. Right? You really don't want to do this except when you can have a unit test that does exactly the same thing as your slow thing and an agent that writes them for you. And they're not yet amazing at it, but they're already do it.

Speaker 242:35 - 42:55

对吧？不过你可以写一个专门的 CUDA kernel，让它们变得非常快，但写 CUDA kernel 很痛苦。对吧？你真的不会想做这件事，除非你有一个 unit test（单元测试），它和那个慢版本做的事情完全一样，再加上有一个 agent（智能体）替你来写。它们现在在这方面还不算惊艳，但已经能做了。

Speaker 242:55 - 43:19

And, you know, bigger model will probably be so good that you'll be you'll just say, you know, use this hardware as best as it can be and come a few hours later, and here it is. So so the bottlenecks that were, like, because the hardware did not fit your idea, well, the hardware is still the way it is. Right? It can't do anything you want. It still needs to be parallel.

Speaker 242:55 - 43:19

而且，你知道，更大的 model 很可能会好到这样一种程度：你只需要说一句，“尽可能把这套硬件用到极致”，几小时后再回来，结果就已经在那里了。所以那些过去的 bottleneck（瓶颈）——比如硬件不适配你的想法——嗯，硬件本身还是那个样子，对吧？它不可能做到你想要的任何事。它仍然需要 parallel（并行）。

Speaker 243:19 - 43:26

But it can do much more than it could do before, right, because you can write, like, SAS pages to write kernels for you.

Speaker 243:19 - 43:26

但它能做的已经比以前多得多了，对吧，因为你可以写出像好几页的 prompt，让它替你写 kernels。

Speaker 143:26 - 43:39

Yeah. It's so interesting because some people will say, god. It you know, without the scale of compute that that existed, if you only a few places, you know, it's it's so hard to do. You maybe you can do basic research, but, like, ultimately, the the the rubber hits the road on seeing whether these techniques scale. Right?

Speaker 143:26 - 43:39

对，这很有意思，因为有些人会说，天哪，如果没有那种只有少数几个地方才拥有的大规模 compute（算力），这件事就太难做了。也许你可以做一些基础研究，但最终，真正见真章的地方在于：这些技术到底能不能 scale（扩展）。对吧？

Speaker 143:39 - 43:53

And and you need to be in a lab to to kind of experience that. But it's awesome to hear your kind of bullishness around the opportunity for academia and hobbyists and folks that are just messing around with with single GPUs to be able to to to contribute here.

Speaker 143:39 - 43:53

而且你得身处实验室里，某种程度上才能真正体会到这一点。但很棒的是，听到你对 academia（学术界）、hobbyists（业余爱好者）以及那些只是在摆弄单张 GPU 的人也能够在这里作出贡献这件事，持这么 bullish（乐观积极）的看法。

Speaker 243:53 - 43:59

Well, I I think especially if you believe that there are some radical changes that you should do.

Speaker 243:53 - 43:59

嗯，我觉得，特别是如果你相信确实存在一些你应该去做的激进变化的话。

Speaker 143:59 - 44:02

Do you think it's more likely than not that that's the case? Like, I guess

Speaker 143:59 - 44:02

你觉得这种情况发生的可能性是不是大于不发生？比如说，我猜——

Speaker 244:02 - 44:25

It's it it depends on the day. On my on my positive days, I do. It it, you know, research has always brought us beautiful things. There is no reason to think it won't. But then the techniques we have also seem to work so well that that that that that it just feel feels mind blowing too.

Speaker 244:02 - 44:25

这这这要看是哪一天。在我比较乐观的那些日子里，我是这么觉得的。你知道，research 一直都给我们带来了很多美好的东西，没有理由认为它以后不会继续如此。但与此同时，我们现有的这些技术看起来也确实效果好得惊人，这这这这这也让人觉得非常震撼。

Speaker 244:25 - 44:43

It it will be a big mistake to not push on those two. But luckily, there's, you know, there's enough labs. I I feel the whole thrill of being an academic, I was in academia before I I joined the labs, is that you can go wild with your ideas. Right? You you can't go you can't scale up that much.

Speaker 244:25 - 44:43

不去推进那两条路线会是一个很大的错误。不过幸运的是，你知道，现在有足够多的 labs。我我觉得，做 academic 的那种全部魅力——我在加入这些 labs 之前是在 academia——就在于你可以让自己的想法尽情驰骋。对吧？你你没法把规模扩得那么大。

Speaker 244:43 - 45:09

But but on the lower scale, which now is not that low, you can go really wild. You you you can try, you know, beautiful ideas that that are totally out of the current paradigm, and you should. That that that's you know, that that's the fun of being a researcher. Then, well, you know, not not many won't work. Some will work in the small scale and not scale up.

Speaker 244:43 - 45:09

但是在较小的规模上——而现在这个“较小”其实也不算那么小了——你可以真正地放开手脚。你你你可以去尝试，你知道，那些非常美、而且完全跳出当前 paradigm（范式）的想法，而且你也应该这么做。这这这就是，你知道，这就是做 researcher 的乐趣所在。然后，嗯，你知道，不是很多想法都会成功。有些会在小规模上奏效，但没法 scale up。

Speaker 245:09 - 45:35

But, I mean, at the scale the current, like, eight GPUs machine are I mean, sure. There will always be ideas that, you know, work up to a certain scale and don't don't work further. But I think you're at much higher level now than than than it was, like, five years ago. Because five years ago, it was, like, really, like, amnesty tiny things. There was a lot of tweaks that were just really small scale tweaks.

Speaker 245:09 - 45:35

但是，我的意思是，以当前这种规模来看，比如八块 GPUs 的机器，我是说，当然，总会有一些想法，你知道，只在某个规模之前有效，再往上就不行了。但我觉得你们现在所处的层级已经比比比五年前高得多了。因为五年前，很多东西真的是那种极其微小的东西。有很多 tweaks（小调整）都只是非常小规模的 tweaks。

Speaker 245:35 - 45:56

Now you're getting even on on one machine, you're getting to a scale where it's not tweaks anymore. It's it's it's it's it's like a you know, like like a a a a used NanoChat from from Angi. Yeah. It's a GPT two level model that that you get in a few hours on on on one on one box. Right?

Speaker 245:35 - 45:56

现在即使是在一台机器上，你也已经到了一个不再只是 tweaks 的规模了。这这这这这就像是，你知道，就像一个来自 Angi 的 NanoChat。对，它是一个 GPT-2 级别的 model，你只需要几个小时，就能在一台机器、一个 box 上跑出来。对吧？

Speaker 245:58 - 46:14

These boxes have got a bit more expensive these these, unluckily, but but but they all you know, a new generation of GPUs will come. The older will get cheaper. It's it's yeah. It it's just quite astounding what you can actually do. Yeah.

Speaker 245:58 - 46:14

这些 box 现在确实变贵了一点，这这这点很不幸，不过不过不过你知道，新的 GPUs 一代还会出来，旧的就会变便宜。是的。你实际上能做到的事情，真的相当惊人。是的。

Speaker 246:14 - 46:22

And, yes, not all of this will scale, but but the fun you can have on the way is is Yeah. Totally.

Speaker 246:14 - 46:22

而且，是的，这里面并不是所有东西都能 scale（扩展），但一路上你能获得的乐趣，确实是——对，完全如此。

Speaker 146:23 - 46:38

I guess one more research frontier, you know, I'd love to hear your take on before we shift gears is, you know, multimodal models. And I think you then on a previous podcast, you said, you know, we haven't made a ton of progress there. Do you still feel that's the case, and what's your kind of current state of the union on on on the multimodal world?

Speaker 146:23 - 46:38

我想还有一个研究前沿，在我们转换话题之前，我很想听听你的看法，就是 multimodal models（多模态模型）。我记得你之前在一个 podcast 里说过，我们在这方面还没有取得特别大的进展。你现在还这么认为吗？你对整个 multimodal 领域的最新整体判断是什么？

Speaker 246:38 - 47:16

So people are certainly making progress. Maybe this goes a little bit to towards JEPPA, but, like, the way we do multimodal in transformers or even with diffusion models, it's like in the end, you predict, like, every pixel of these things around. And if you think of me being here in the environment and I think humans sense an amazing amount of information every second or less than the micro But we can't act our neurons are slow. Right? They have, like, hundreds of millisecond processing, but we get all these sensors everywhere all the time.

Speaker 246:38 - 47:16

所以，人们当然是在取得进展的。也许这和 JEPPA 稍微有点关系，但比如说，我们现在用 transformers（Transformer）做 multimodal，甚至用 diffusion models（扩散模型）做 multimodal，归根结底都像是在预测周围这些东西的每一个 pixel（像素）。如果你想想我现在身处环境之中，我觉得人类每秒钟，甚至远小于微秒的时间尺度内，都在感知惊人的信息量；但我们没法同样快地行动，因为我们的 neurons（神经元）很慢。对吧？它们的处理大概是几百毫秒级别，但我们始终在从四面八方接收所有这些传感器信息。

Speaker 247:18 - 47:56

If we somehow manage to learn from this insane stream without maybe, you know, like predicting every pixel autoregressive thing. Like, the the it's it's both, like, way more parallel and and and and, like, much larger. So I feel like the models we have, they have not truly done justice to to to this yet. Maybe it needs new research. Maybe I mean, it's but they're also very sim like, I think thinking machines has recently had this, like, multistream transformers, and it feels so easy.

Speaker 247:18 - 47:56

如果我们 somehow（以某种方式）能学会从这种疯狂的信息流中学习，而不必去做那种预测每一个 pixel 的 autoregressive（自回归）式事情。因为这种流本身既更加并行，而且规模也大得多。所以我觉得，我们现有的模型还没有真正对这一点给予应有的处理。也许这需要新的研究。也许——我的意思是——不过它们也非常简洁，比如我觉得 Thinking Machines 最近做过这种 multistream transformers（多流 Transformer），而且感觉好像很容易。

Speaker 247:56 - 48:22

Right? I mean, in the transformer, you pay attention to the previous tokens. You could have a bunch of streams that that do this. Right? That feels like an easy tweak to the architecture, but but, you know, maybe it's an easy tweak, but but just an amazing tweak because because I I always when I work with, like, codecs and and, you know, I just forget something, I say it, but then it's executing some bash command, so it needs to wait for my thing to steer it, and it takes three minutes.

Speaker 247:56 - 48:22

对吧？我的意思是，在 transformer 里，你会去关注前面的 tokens（词元）。你完全可以有一堆 stream（流）也这样做，对吧？这听起来像是对架构一个很简单的 tweak（调整），但你知道，也许这是个简单调整，却是个了不起的调整。因为我每次和 codecs 打交道时——或者说，我有时说完一句话之后，它却在执行某个 bash command（bash 命令），于是它就得等我进一步引导它，而这要花三分钟。

Speaker 248:22 - 48:40

And I'm like, this is just so not interact. Like, it should just and then you can have the side thing. And, like, there's a bunch of hacks again that kind of make it feel better, but it feels like, of course, everything happens everywhere all at once for us. Right? We see, hear, talk all at the same time.

Speaker 248:22 - 48:40

我就会想，这也太不 interactive（交互式）了。它本来就应该——然后你还可以有旁路的东西。再说，当然也有一堆 hack（权宜办法）能让体验稍微好一点，但感觉上，对我们来说，一切本来就是 everywhere all at once（同时在各处发生）的，对吧？我们是同时在看、在听、在说话的。

Speaker 248:41 - 48:57

That should be how our models behave. Now that there is a bigger lab putting pressure on that, maybe it will come. But it yeah. It it it has to be like like, we do multimodal without it. Right?

Speaker 248:41 - 48:57

我们的模型也应该这样运作。现在既然有更大的实验室在推动这件事，也许它会到来。但确实——它必须得像——我们的确是在没有这些东西的情况下也做出了 multimodal，对吧？

Speaker 248:57 - 49:20

With with without all of these, like, truly architectural changes to be to be parallel and and absorb, like you know, transformer can't currently at at the speed it does absorb a high resolution image every, like, millisecond. Right? It it just because it splits them and is so sequential in this that that it just doesn't work. That feels somehow wrong. Right?

Speaker 248:57 - 49:20

也就是，在没有这些真正属于架构层面的改动——让系统变得并行、并能够吸收信息——的情况下。比如你知道，transformer 目前不可能以它现在这种速度，每隔大约一毫秒就吸收一张高分辨率图像。对吧？因为它会把图像切分开，而且在这方面又是如此 sequential（顺序式）的，所以根本行不通。这在某种意义上让人觉得不对劲，对吧？

Speaker 249:20 - 49:42

It's like we shouldn't be putting these tiny patches there, which should just go in, be processed somehow. So I don't think we have, on this deeper level, gotten there yet. But but on the other hand, I feels like a lot of people are working on it. So so yeah. Then for coding, I mean, does it matter all that much?

Speaker 249:20 - 49:42

这就像是，我们不该把这些很小的 patch（补丁）塞在那里；它们本来就应该直接进去，然后以某种方式被处理掉。所以我觉得，在更深一层的层面上，我们还没有真正走到那一步。不过另一方面，我感觉确实有很多人在做这件事。所以，嗯，是的。然后说到 coding（编程），我是说，这真的有那么重要吗？

Speaker 249:42 - 49:45

Harder to say. Totally. Well, we'll I I'm sure it will come.

Speaker 249:42 - 49:45

这就更难说了。完全同意。嗯，不过我我相信它会到来的。

Speaker 149:46 - 50:12

I'd love to kinda switch gears and maybe just talk a little bit about your time at OpenAI and your your kind of journey there, because, obviously, it's been quite the eventful past years. And, you know, maybe, you know, there there's there's a few moments everyone kinda thinks about, and so I'm curious to get your perspective. But maybe just like on the on the on the OpenAI side, the company's had some very public, like, moments. And I'm wondering, like, what were some of the difficult decisions that really, like, defined the company, I guess, in your in your time there?

Speaker 149:46 - 50:12

我很想稍微换个话题，也许聊聊你在 OpenAI 的那段时间，以及你在那里大致的经历，因为显然，过去这些年真的是非常多事。然后，你知道，可能有那么几个时刻是大家都会想到的，所以我很好奇想听听你的视角。不过也许先从 OpenAI 这边说起，这家公司经历过一些非常公开、非常受关注的时刻。我在想，在你在那里期间，哪些艰难的决策，算是真正定义了这家公司？

Speaker 250:12 - 50:44

So so I I, you know, I wasn't there for the earliest things. I I I think for my time there, there there was this big question at some point whether to pivot to reasoning. And I feel it was very brave of the company and and and and the leadership and all of us to actually take this plunge and say, reasoning will be as important as pretraining. Our models will be reasoning models. They will be launched.

Speaker 250:12 - 50:44

所以，我我，你知道，最早期的那些事情我并不在场。我我我觉得，就我在那里的那段时间而言，某个时点上有一个很大的问题，就是要不要转向 reasoning（推理）。而我觉得，公司、领导层以及我们所有人，真的决定纵身一跃，说 reasoning 会和 pretraining（预训练）一样重要，我们的模型会是 reasoning 模型，而且它们会被发布出来——这是非常勇敢的。

Speaker 250:45 - 51:03

And, you know, at at the beginning, it was like the reasoning models were not that chatty. Somehow personality was was harder. They were slow, there's they still are to some extent. And it was like, you know, should we should you ever do it? Like, maybe people just prefer chat models.

Speaker 250:45 - 51:03

而且，你知道，在一开始，reasoning 模型并没有那么 chatty（健谈、互动感强）。不知怎么地，personality（人格感、个性）更难做出来；它们很慢，到现在某种程度上也还是慢。于是当时就会有一种想法：你们到底该不该这么做？也许人们就是更喜欢 chat model（聊天模型）。

Speaker 251:03 - 51:13

Yeah. But, openly, I was, yeah, very good at taking this hard bet and saying, yes. We're we're we're gonna launch it. We're we're gonna go this way. Well, try to figure out how to how to manage.

Speaker 251:03 - 51:13

对。但是，坦率地说，OpenAI 在做这种艰难押注、并且说“对，我们要发布它。我们要往这条路走。再去想办法看看怎么管理好”这件事上，做得非常好。

Speaker 251:14 - 51:23

There were two lines of models at the same time. That that's obviously awful. Right? You you want to unify this. The unification took a lot of time because everything is moving.

Speaker 251:14 - 51:23

当时同时存在两条模型线。这显然很糟，对吧？你会想把它们统一起来。而这个统一花了很多时间，因为一切都在变化。

Speaker 251:23 - 52:01

It it it it it's it's a very hard decision, but, no, we wouldn't have possibly all of the Yeah. These amazing things we have if it didn't push on it. And it feels like, you know, even some bigger labs still have trouble catching up to the RL quality that so so there is some win that you get when you commit to things. And and, you know, I I wonder these days, you know, OpenAI has since then grown probably, like, 20 times or something like this, become a much bigger company. All of the labs have I mean, Google was big even before, but but everyone now, like, Anthropic has become big.

Speaker 251:23 - 52:01

这这这这真的是一个非常艰难的决定，但是，如果当时没有强力推动这件事，我们根本不可能拥有现在这些——对，这些了不起的成果。而且感觉上，你知道，即便是一些更大的 lab（实验室），现在在追赶 RL（reinforcement learning，强化学习）质量这件事上也仍然有困难。所以当你真正 commit（投入、押注）到某件事上时，确实会得到某种胜利。我现在也会想，OpenAI 从那以后大概增长了 20 倍之类，已经变成一家大得多的公司。所有 lab 都是——我的意思是，Google 甚至在那之前就已经很大了——但是现在每一家，像 Anthropic，也都变大了。

Speaker 252:02 - 52:23

Having been at Google before for a long time, I think it's much harder for a big company to take wild bets like that. Right? Because you have so much more to lose because you have processes. Like, it's it's just harder. I just hope OpenAI retains this ability and the other labs too because yeah.

Speaker 252:02 - 52:23

我以前在 Google 待过很长时间，所以我觉得，大公司要做那种激进的豪赌会难得多。对吧？因为你有更多东西可能会失去，而且你有各种流程。就是——这确实更难。我只是希望 OpenAI 还能保留这种能力，其他 labs 也是，因为，是的。

Speaker 252:23 - 52:39

Like, the current techniques are amazing. Right? They get us very far, but but if there were, you know, sparks of post transformer world, would these labs be able to jump on it, or or would they be on the more conservative side now?

Speaker 252:23 - 52:39

比如说，当前这些技术已经非常惊人了。对吧？它们已经把我们带得很远，但如果——你知道——已经出现了 post-transformer 世界的一些火花，这些 labs 会不会有能力立刻扑上去，还是说它们现在会变得更保守一些？

Speaker 152:39 - 52:49

It feels that with reasoning, there were some, like, early sparks, but obviously not a ton of data. And then, you know, that it was kind of a almost a I've heard it articulated as, like, a religious belief that this is just gonna work if we if we double down on it.

Speaker 152:39 - 52:49

感觉在 reasoning（推理）这件事上，之前是有一些早期火花的，但显然数据并没有多到那个程度。然后，你知道，这几乎有点——我听别人把它表述成一种宗教式信念：只要我们继续加码，它就一定会成功。

Speaker 252:49 - 53:10

And and we don't have, like, the successor yet or at least I don't know about it. But but with the hope that it will appear, will you need a new lab to to push on it, or or or will will I mean, I I think if anything, OpenAI is is good at, you know, wild bets.

Speaker 252:49 - 53:10

而且我们现在还没有看到那个 successor（继任者／下一代范式），至少我不知道。但如果抱着它终将出现的希望，到时候是不是需要一个新的 lab 来推动它，还是——或者——我的意思是，我觉得如果说 OpenAI 有什么特别擅长的，那就是去下注这种激进的赌注。

Speaker 153:10 - 53:22

It's obviously interesting to see this whole this whole trend of Neolabs, right, and folks like Jerry Toric spinning out and and and saying that, like, it's it's it's almost, you know, easier to do this work outside of a of a large lab, right, and make one kind of strong convicted bet.

Speaker 153:10 - 53:22

很明显，看到整个 Neolabs 这股趋势很有意思，对吧？还有像 Jerry Toric 这样的人出来单干，去说，像这种工作几乎——你知道——在大型 lab 外部做反而更容易，对吧？然后去做一个强烈、坚定的单点押注。

Speaker 253:22 - 53:39

Yeah. It's it's a it's a fair point it's a fair point tool. Right? But then, you know, you you start looking at the GPU numbers, and it's a little sad when you're outside of the lab. It's hard to get them, and they're very expensive.

Speaker 253:22 - 53:39

对。这确实是个很公允的观点，是个很公允的角度。对吧？但接着你就会开始看 GPU 的数量，然后当你身处 lab 之外时，就会有点沮丧。很难拿到它们，而且它们非常贵。

Speaker 253:39 - 53:56

So so so but but but then GPUs are not everything. It it's a it's it's quite nice to have this whole ecosystem. Right? You have both these little labs now and the big labs. Yeah.

Speaker 253:39 - 53:56

所以——不过——GPU 也不是全部。有这样一个完整生态其实挺好的。对吧？现在你同时有这些小 labs，也有大 labs。是的。

Speaker 253:56 - 54:33

We'll we'll you know, it I I it's it's so fun because being in this this AI little bubble here, you clearly see that that there is a ton of competition, that change is coming, that, you know, We have not exhausted even on the current paths. There's still a lot of techniques to do. There's a lot of data and improvements and bigger models to train in. Then there's all these new things that are bubbling. Maybe they're not ready, but they're very actively pursued with good resources.

Speaker 253:56 - 54:33

我们会——你知道——这真的很有意思，因为身处这里这个小小的 AI 泡泡里，你会非常清楚地看到，竞争是非常激烈的，变化正在到来，而且——你知道——即使沿着当前这些路径，我们也还远远没有走到尽头。还有很多技术可以做，还有很多数据、很多改进，还有更大的模型可以训练。与此同时，还有所有这些正在冒出来的新东西。也许它们还没准备好，但人们正在用很好的资源非常积极地推进它们。

Speaker 254:34 - 55:00

Then I feel like you step outside of San Francisco and people treat AI basically as if it was from the last year before codex and would never change again. And then now that is a wrong way of treating it. It has all I mean, to to to me, the coding agents have been such a reveal that that it's hard to hard to get over it. I I I call it AGI. You know?

Speaker 254:34 - 55:00

然后我感觉，你一走出 San Francisco，人们对待 AI 的方式基本还像它停留在 codex 之前的去年，而且仿佛它此后永远不会再变化。但现在这么看就是错的。我是说，对我来说，coding agents 的出现实在太震撼了，很难不被它彻底改变看法。我——我把这叫作 AGI。你知道吧？

Speaker 255:00 - 55:10

Yeah. Everyone should call AGI what they want. I we we may get one day past AGI the way we got past the Turing test. Right? We we don't really argue about the Turing test anymore.

Speaker 255:00 - 55:10

对。每个人都应该可以按自己想要的方式去称呼 AGI。我们——我们也许有一天会像跨过 Turing test 一样，跨过 AGI 这个说法。对吧？我们现在其实已经不怎么争论 Turing test 了。

Speaker 255:11 - 55:25

Is it passed? Is it not passed? Right. Who cares? The these these things that they code with are clearly intelligent and and coding and and and and and the any disputable side.

Speaker 255:11 - 55:25

它到底通过了没有？到底没通过？对吧，谁在乎呢？这些——这些会写代码的东西，显然就是智能的，而且它们在 coding 这件事上——任何可争议的那一面都已经没什么意义了。

Speaker 255:25 - 55:26

You know?

Speaker 255:25 - 55:26

你知道吧？

Speaker 155:26 - 55:42

You know, obviously, the AI coding wars are, like, quite fierce right now. Like, you know, what do you think ultimately will determine, you know, which of these AI coding products ends up, you know like, how how do they become better than each other? And and, you know, how do you see this the next frontiers for, like, Codex and CloudCode?

Speaker 155:26 - 55:42

你知道，显然，现在 AI coding 的战争相当激烈。比如说，你觉得最终会由什么来决定，这些 AI coding 产品里哪一个最后会胜出？还有，你觉得它们彼此之间会怎样变得比对方更强？以及，你怎么看 Codex 和 CloudCode 的下一步前沿会是什么？

Speaker 255:43 - 56:17

You know, I think coding market is good to have two big enough to have two programmers in it. It it it I I I I I think the bigger question will be how well do they go to other fields. Right? I mean, coding is great and it's important for us, but but you could do the work of many people. And current like, currently, Codex, I I I tried to recommend it to some friends, but, you know, it used to start with the question, what is your GitHub rep?

Speaker 255:43 - 56:17

你知道，我觉得 coding 市场足够大，完全容得下两个大玩家。它——它——我——我——我觉得更大的问题会是，它们进入其他领域的能力到底有多强。对吧？我的意思是，coding 很棒，也对我们很重要，但你其实可以用它去完成很多人的工作。可现在，比如说 Codex，我——我之前试着把它推荐给一些朋友，但你知道，它以前一上来问的就是：你的 GitHub repo 是什么？

Speaker 256:17 - 56:28

Well well, that Right. Hats off a lot of people. Now it's a little bit friendlier, but it's still called Codex. You know? Like, even so people kind of don't hear this is your accountant tool.

Speaker 256:17 - 56:28

嗯，嗯，对，这一下就把很多人挡在门外了。现在它稍微友好一点了，但它毕竟还是叫 Codex。你知道吧？所以即便如此，人们还是不太会把它听成“这是给你的 accountant 用的工具”。

Speaker 256:28 - 56:53

Right? And Totally. In contrast to chat, GPT, where you just said something, I think Codex takes a little bit of getting used to. It and Clot even more, I would say, and if you if you go on the code side. So so so I I think that there there is some question, how do you get this power to people in other occupations and places?

Speaker 256:28 - 56:53

对吧？完全同意。相比之下，chat GPT 是你直接说点什么就行；而我觉得 Codex 还是需要一点时间去适应。至于 Clot，我会说更是如此，尤其是如果你走到 code 这一侧的话。所以——所以——所以我觉得确实存在一个问题：你要怎样把这种力量带给其他职业、其他场景中的人？

Speaker 256:53 - 56:55

And that may be the more important question.

Speaker 256:53 - 56:55

而这也许才是更重要的问题。

Speaker 156:55 - 57:03

Anthropocritus with Cloud Cowork and, like, basically making a friendlier version of of of of of the core code products.

Speaker 156:55 - 57:03

Anthropocritus 和 Cloud Cowork 一起，基本上是在把那些核心 code 产品做成一个更友好的版本。

Speaker 257:03 - 57:15

I I certainly feel like the abilities are there. Right? As a as an ML person, I feel like these ones obviously can do these things. They obviously can do XO. They obviously can do this or that or that.

Speaker 257:03 - 57:15

我当然觉得这些能力是已经具备了，对吧？作为一个 ML（机器学习）从业者，我觉得这些模型显然能做这些事。它们显然能做 XO。它们显然能做这个，或者那个，或者那个。

Speaker 257:17 - 57:43

But then, of course, they again, I watch them as a like a hawk. I know how to like, the the there is some level of skill that that that you need to put in to to to get this. Now this is totally a learnable skill, but I understand that people are busy in their lives and don't necessarily want to learn this, so you need to smoothen it in in some way. There are some fundamental things that I don't think will allow you to, like, just let it run not watched. Yeah.

Speaker 257:17 - 57:43

但当然了，我还是会像鹰一样盯着它们。我知道该怎么做；要得到这个结果，确实需要投入某种程度的技巧。这个技巧当然是完全可以学会的，但我也理解大家生活都很忙，不一定想学这个，所以你必须在某种程度上把它打磨得更顺滑一些。有些根本性的东西让我觉得，你没法就那样让它在无人看管的情况下自己跑。对。

Speaker 257:43 - 57:58

You don't think you'll wanna do this. But on the other hand, I I don't think you would even want to do this even if it was super good at first. Right? You need to gain some trust. And so so the question becomes, how do you convince people to start putting some effort into gaining this trust?

Speaker 257:43 - 57:58

你不会觉得自己会想这么做。但另一方面，我也觉得哪怕它一开始就做得特别好，你也未必会想这么做，对吧？你需要先建立一些信任。所以问题就变成了：你要怎样说服人们，开始投入一些精力去建立这种信任？

Speaker 257:58 - 58:01

It it will pay back, but but but but there is a hump.

Speaker 257:58 - 58:01

它最终会有回报，但中间确实有一道坎。

Speaker 158:01 - 58:07

On the coding side, like, why do you think Anthropic was the first to be, like, really successful on the coding side?

Speaker 158:01 - 58:07

在 coding 这边，你为什么觉得 Anthropic 是第一个在 coding 方面真正取得成功的？

Speaker 258:07 - 58:20

I think Anthropic made this very good decision to focus on coding. Right? This this was at a time when OpenAI was like, we're doing CHADGPT, and, you know, great. I mean, CHADG. CHADG is great.

Speaker 258:07 - 58:20

我觉得 Anthropic 做了一个非常正确的决定，就是专注于 coding，对吧？那时候 OpenAI 更像是在说，我们在做 CHADGPT，当然，这也很好。我是说，CHADG。CHADG 很棒。

Speaker 258:21 - 58:47

But and and I think partway Entropic made this decision was that they just could not compete in in chat, but they made a very good decision on what else to do. And and this goes back to, you know, AI goes through these upheavals. Right? It it you need to put a bet on something that is not what is today even though the things today, like, isn't ChatGPT amazing? Of course.

Speaker 258:21 - 58:47

不过我认为，Entropic 之所以在中途做出这个决定，部分原因是他们在 chat 上确实就是竞争不过，但他们对接下来该做什么的判断非常好。这也回到了一个老问题：AI 会经历这种剧烈动荡，对吧？你必须把赌注押在某个“不是今天主流”的方向上，哪怕今天这些东西——比如说，ChatGPT 难道不惊艳吗？当然惊艳。

Speaker 258:47 - 59:09

Right? It it was the most amazing AI of 2025, but clearly not of 2026. We and maybe in 2027, we'll have we'll have another thing. So so things change quickly if you put a good bet on on something else. You can and it's not like OpenAI didn't do coding.

Speaker 258:47 - 59:09

对吧？它也许是 2025 年最惊艳的 AI，但显然不会是 2026 年的。到了 2027 年，也许我们又会有另一种新东西。所以如果你把赌注下在别的方向，而且这个判断是对的，情况变化会非常快。也不是说 OpenAI 没有做 coding（编程）这件事。

Speaker 259:09 - 59:21

It did. Right? And and that's why it could catch up reasonably quickly, but it was just not the focus. I mean, you know, you these companies are tiny. You grow to a billion users.

Speaker 259:09 - 59:21

它做了，对吧？这也是为什么它后来能相当快地追上来，只不过那不是它的重点。我的意思是，你知道的，这些公司其实规模都很小。等你增长到十亿用户的时候，

Speaker 259:21 - 59:26

You you you have stuff to do. Right? So, you know, some just fall apart.

Speaker 259:21 - 59:26

你手上有一大堆事情要处理，对吧？所以有些方向就会自然散掉。

Speaker 159:26 - 59:50

You mentioned this kind of, like, almost tension between, you know, nailing the stuff that's working today and then, you know, keeping other areas open so that if there's a a glimmer of hope in a different area, you you kind of double down on that bet. And I'm wondering what you make of that. You know, obviously, opening, I think, very publicly has gone in this, like, focusing moment now. Right? And and you've seen it in the results of Codex and, you know, maybe slashing Sora and some of the other things that were that were different.

Speaker 159:26 - 59:50

你提到了这样一种几乎可以说是张力：一方面，要把当下已经奏效的东西做到极致；另一方面，又要把其他一些方向保留开放，这样如果某个不同领域里出现一丝希望，你就会加倍押注那个方向。我想知道你怎么看这个问题。显然，OpenAI 我觉得现在就非常公开地进入了这样一个“聚焦时刻”，对吧？而且你已经从 Codex 的结果里看到了这一点，以及，可能还有对 Sora 和其他一些不同项目的大幅削减。

Speaker 159:50 - 59:59

Yeah. How do you think about, like, kind of navigating that tension of of, like, really nailing some the here and now versus, like, keeping these other embers open that could potentially be really interesting down the line?

Speaker 159:50 - 59:59

是啊。你会怎么理解、或者说怎么处理这种张力：一边是真正把眼前正在发生的事情做到极致，另一边又要把这些火种保留下来，因为它们未来可能会变得非常有意思？

Speaker 2 | 1:00:00 - 1:00:06 It's a matter of culture and size and and money and perspective. Famously, Google. Right? Google is the lab that will keep

Speaker 2 | 1:00:00 - 1:00:06 这取决于 culture（文化）、size（规模）、money（资金）和 perspective（视角）。最典型的就是 Google。对吧？Google 是那种会一直保留

Speaker 1 | 1:00:07 - 1:00:15 all its I think some people have been quite critical of Google for this. Right? Missing, you know, missing your your invention, not being the ones to capitalize on it.

Speaker 1 | 1:00:07 - 1:00:15 所有这些项目的实验室。我觉得有些人其实一直都很批评 Google，对吧？说你发明了东西，却错过了它，没能成为真正把它商业化、把它价值吃到手的人。

Speaker 2 | 1:00:15 - 1:00:27 You know? But then it it works for them. Right? It it works be because it it whatever good comes out, it's very easy to catch up because you already have a strong team in in in it. Right?

你知道吗？但后来那套方式对他们确实有效，对吧？之所以有效，是因为不管最后冒出什么好结果，都很容易追上来，因为你在这方面本来就已经有一支很强的团队，对吧？

Speaker 1 | 1:00:27 - 1:00:32 Yeah. Do you think they've caught up? I feel like there's a lot of discourse claiming that, you know, they're still a bit behind.

对。你觉得他们已经追上了吗？我感觉现在有很多讨论都在说，你知道，他们还是稍微有点落后。

Speaker 2 | 1:00:32 - 1:00:40 Do you think they've caught up in the chat GPT world? They haven't yet caught up in the, I mean Yeah. I don't know if you've seen anti gravity tool. Yeah. Yeah.

你是说在 chat GPT 这个世界里他们追上了吗？他们还没有追上——我是说，对。你不知道有没有看过 anti gravity tool。对，对。

Speaker 2 | 1:00:40 - 1:00:44 I opened it after IO, and it I would couldn't tell which one is Codex and which one is RTG.

我在 IO 之后打开了它，我都分不出来哪个是 Codex，哪个是 RTG。

Speaker 1 | 1:00:45 - 1:00:47 Lot of lot of funny funny tweets about that.

关于那个，有很多很多很好笑的 tweets。

Speaker 2 | 1:00:47 - 1:01:04 So so so there that's great. I I tried to to do some of my Codex c things with the new 3.5 Flash, and it just doesn't work. Right? It it has not the barrier that wasn't Christmas, I feel like it hasn't crossed it yet to me, but but it will. Right?

所以，所以，所以这很棒。我试着用新的 3.5 Flash 去做一些我平时用 Codex 做的 coding 的事，但就是不行，对吧？它还没有达到那个门槛——那个门槛以前还不是这样——我感觉对我来说它还没跨过去，但它会的，对吧？

Speaker 2 | 1:01:06 - 1:01:47 So so so so so like, if you're very broad, it can make it safer later if you need to catch up, but then you may not get the immediate win of, like, you know, like, anthropic encoding. You're just the first to nail it. And, you know, and great that there are labs that that just go and are the first to nail it. That that that's exciting, and and, I feel like that's how it should be. OpenAI had a good culture of making bets, but now it is also a bigger thing and it has some GPT has a billion users.

所以，所以，所以，所以如果你铺得非常广，那么如果以后你需要追赶，这会让你更安全；但这样一来，你可能就拿不到那种立刻见效的胜利，比如说，you know，像 anthropic 在 coding 上那样。就是你率先把它真正做好了。而且，你知道，有一些 labs 就是会直接冲过去，成为第一个把它做好的人，这很棒。那很令人兴奋，而且我觉得事情本来就该这样。OpenAI 以前有一种很好的文化，就是敢下注，但现在它也已经变成一个更庞大的东西了，而且 GPT 也有十亿用户。

Speaker 2 | 1:01:47 - 1:01:58 It's important for many people in the world. You should and Google search has 3,000,000,000 users. Right? It's important for many people in the world. You you don't want these things to be hampered

这对世界上很多人都很重要。你应该——而且 Google search 有 3,000,000,000 用户，对吧？这对世界上很多人都很重要。你不会希望这些东西受到拖累。

Speaker 1 | 1:01:58 - 1:01:59 and Totally.

Speaker 1 | 1:01:58 - 1:01:59 对，完全同意。

Speaker 2 | 1:01:59 - 1:02:10 You you should go fast, but but this breaking things is is is is not so good, and and I actually feel like it's quite good if the labs don't break everything on the way.

Speaker 2 | 1:01:59 - 1:02:10 你确实应该快一点推进，但这种“打破一切、边做边砸”的方式并不那么好；而且我其实觉得，如果这些 labs（实验室）在推进过程中不要把所有东西都搞坏，那反而挺好的。

Speaker 1 | 1:02:10 - 1:02:42 I guess, you know, a lot of people wonder about the the kind of gap between closed source models and open source models, and it feels like there's two, you know, diff distinct things pulling in different directions. One is it feels relatively easy to distill models, and, know, you've seen a lot of a lot of claims around folks doing that on the on the Chinese open source side with with the closed source providers. Then on the other hand, it feels like more and more of these models, even in the big labs, are getting too big to serve, so they have to be distilled within the big labs themselves. What's your kind of gut intuition on the gap we'll see between closed source and open source models and and whether that widens or shrinks in the next few years?

Speaker 1 | 1:02:10 - 1:02:42 我想，很多人都在想 closed source models（闭源模型）和 open source models（开源模型）之间的那种差距，而且感觉有两股不同的力量在朝相反方向拉扯。一方面，distill models（模型蒸馏）似乎相对容易；你也看到过很多说法，尤其是在 Chinese open source 这边，大家会拿 closed source providers（闭源提供方）的模型来做这件事。另一方面，又感觉越来越多模型——哪怕是在那些 big labs（大实验室）里——都大到难以部署和提供服务，所以它们在实验室内部也不得不先做蒸馏。你直觉上怎么看未来几年 closed source 和 open source 模型之间的差距？它会扩大还是缩小？

Speaker 2 | 1:02:43 - 1:02:59 Yeah. It it is not that easy to to predict, I feel. It so bigger models are better. You can distill them, but the distilled models are never quite like, they're great. Right?

Speaker 2 | 1:02:43 - 1:02:59 是啊，我觉得这其实没那么容易预测。更大的模型就是更好。你可以把它们 distill（蒸馏）出来，但蒸馏后的模型始终不完全是那个水平——当然，它们也很棒，对吧？

Speaker 2 | 1:02:59 - 1:03:19 If if especially if you need a model for for some money. But they're not quite as good as the big models. I if you know, I just like, the 3.5 Flash, I could not quite feel it's on par with 5.5. Maybe because it's a distilled pro. Right?

Speaker 2 | 1:02:59 - 1:03:19 特别是如果你需要的是一个更省钱的模型时，这类模型会很有用。但它们终究没有大模型那么好。比如说，像 3.5 Flash，我就不太觉得它能和 5.5 相提并论。也许是因为它本质上是个 distilled pro（由 pro 版本蒸馏出来的模型），对吧？

Speaker 2 | 1:03:19 - 1:03:36 Maybe we maybe we just need to wait for the pro. So even within the like, I for example, I don't remember when I have used the mini model. The the the mini Animals, I think they're very good. They're very useful. I just haven't used them in a in a while.

Speaker 2 | 1:03:19 - 1:03:36 也许我们只是需要等 pro 版本出来。所以即便在同一体系里，比如我自己，就已经不记得上一次用 mini model（迷你模型）是什么时候了。那些 mini——我觉得它们很好，也很有用，我只是有一阵子没用了。

Speaker 2 | 1:03:36 - 1:04:04 Right? So and whenever I use them, they're fine until the trip and cost me so much time, I go back to the big one. And and so so so so you can distill things. And, open source can distill or not distill. I mean, labs just try to not make you distill everything naturally, but I think they also don't fight you to death.

Speaker 2 | 1:03:36 - 1:04:04 对吧？所以每次我用它们的时候，一开始都还行，直到它们出岔子、让我多花很多时间，我就又会回到那个大模型上。所以，东西当然可以 distill（蒸馏）；open source 也可以做蒸馏，或者不蒸馏。我的意思是，labs 自然会尽量不让你把所有东西都蒸馏走，但我觉得他们也不会拼命阻止你。

Speaker 2 | 1:04:04 - 1:04:29 Yeah. It would be very sad if open source had models that are very, very far behind, but I don't think there is a risk of that. There is enough companies and now there's notions of I also very much understand, like, if you're a country. Right? Do you want to depend, like, say, police stations or hospitals run AI to, like, help you do administration?

Speaker 2 | 1:04:04 - 1:04:29 是啊，如果 open source 的模型远远落后，那会非常可惜，但我不觉得这有那种风险。现在已经有足够多的公司参与进来，而且也越来越能理解这样一种想法：如果你是一个国家，对吧？你会愿意让像警察局或医院这样的机构，在行政管理上运行的 AI 完全依赖别人吗？

Speaker 2 | 1:04:30 - 1:05:00 Maybe you don't want to rely on one company that may just have an outage or you understand. There'll be a lot of people who want sovereign, they say, models. Even if they're slightly weaker, maybe the tasks are not so hard. So so I think there will be enough incentives to have open models that they will exist, and there will be a very good incentives for the labs to still keep ahead. So so so, you know, people people keep paying for for for this.

也许你不想依赖某一家公司，因为它可能会宕机，或者你明白的。会有很多人想要他们所说的 sovereign（主权、自主可控）models。即使它们稍弱一点，也许任务本身并没有那么难。所以我觉得，支持 open models（开放模型）存在下去的激励会足够强，它们会存在；同时，对这些 labs（实验室）来说，继续保持领先也会有非常强的激励。所以，你知道，人们还是会继续为此付费。

Speaker 2 | 1:05:00 - 1:05:14 So so it feels like a state that should persist for a while, but, you you know, I I it's famous last words. In in AI and tech, you can say things, they may turn out. I don't want to make future predictions.

所以这感觉像是一种会持续一段时间的状态，不过，你也知道，我这话也可能变成 famous last words（事后被证明打脸的话）。在 AI 和 tech（科技）领域，你可以说很多事，最后结果未必如此。我不想对未来做预测。

Speaker 1 | 1:05:15 - 1:05:26 Of course. But what what job is a podcast if not to try and force you into them? But, no. That that all that all makes a ton of sense. You know, we always like to enter interviews with kind of a quick fire round where we stuff in a bunch of of broad questions at the end.

当然。但如果 podcast（播客）的工作不是尽量逼你做这种预测，那还能是什么呢？不过没有，你说的这些都非常有道理。你知道，我们总喜欢在采访快结束时来一轮 quick fire round（快问快答），把一堆比较宽泛的问题塞进去。

Speaker 1 | 1:05:26 - 1:05:31 And and so maybe to start, just love what's one thing you've changed your mind on in the AI world in the last year?

那也许先从这个开始吧：在过去一年里，在 AI 这个世界里，你改变看法的一件事是什么？

Speaker 2 | 1:05:31 - 1:05:56 Well, definitely, I I I did not believe that that they will be like a intern kind of thing so fast, and I have definitely changed my mind. Actually used to not talk to AI very much every day. People were always like, So how do you use ChatGPT? And I was like, Yeah, I don't know. I asked it one query yesterday and one, three days ago.

嗯，肯定是，我之前并不相信它们会这么快就变得像 intern（实习生）那样好用，而我现在绝对改观了。其实我以前并不会每天都跟 AI 说很多话。大家总会问我，“那你平时怎么用 ChatGPT？”我就会说，“嗯，我也不知道。昨天我问了它一个问题，三天前问过一次。”

Speaker 2 | 1:05:57 - 1:06:08 But they was always like, I'm not gonna talk to my computer very much. And now I do about work. Yeah. And and and so so I've yeah. It it like, yeah.

但我以前总觉得，我不会经常跟我的电脑说很多话。而现在，为了工作，我确实会这么做。对。所以，是的，我已经，嗯，确实变了。就是，嗯，对。

Speaker 2 | 1:06:08 - 1:06:21 I I also did not think I'm gonna, like, not use a editor for programming, and now I don't. I just tell it to change the code. And, yeah, that that was a big update.

我以前也没觉得自己会不再用 editor（编辑器）来写程序，但现在我确实不用了。我就是直接告诉它改代码。对，这算是一个很大的变化。

Speaker 1 | 1:06:21 - 1:06:33 That's awesome. I guess, you know, as as you worked at these models more closely these past years, have your concerns around, like, the existential risk of of of safety around these models, have they they gone up or down these past years?

太棒了。我想，随着你过去这些年更近距离地接触这些 models（模型），你对这些模型在 safety（安全）方面、比如 existential risk（生存性风险）这类问题的担忧，这几年是增加了还是减少了？

Speaker 2 | 1:06:33 - 1:07:12 I I don't think they have changed very much for me. I was always on the you know, not too worried, but also we should not be complacent side. I still feel with all the skills they have now with programming and so on, I still feel the small risks, the risk that they will hack some of our systems, make the grid go down or things like that. I still feel these are the risks I would focus on right now. Not to say that the extension risks don't you know, it's good that there are people thinking about it.

我觉得对我来说，它们并没有发生太大变化。我一直都是那种——你知道，不算太担心，但也不该自满——的立场。我现在仍然觉得，就凭它们现在具备的这些能力，尤其是编程之类的能力，我还是认为风险主要是一些较小的风险，比如它们会入侵我们的某些系统，让电网瘫痪之类的事情。我现在仍然觉得，这些才是我会重点关注的风险。倒不是说那些 extinction risks（生存性风险）不重要——有人在思考这些问题是件好事。

Speaker 2 | 1:07:12 - 1:07:31 It's good to have some guardrails. It's it's in the end good to you know, we should be able to turn off these these data centers if we so decide and and have control over all of that. But I I don't feel even though the models have become much better, I don't feel any threat from them.

有一些 guardrails（防护栏、约束机制）是好事。归根结底，能够在我们决定这么做时关闭这些 data centers（数据中心），并且对这一切保持控制，是件好事。但是我——即便这些模型已经变得强得多了——我也并不觉得它们对我构成什么威胁。

Speaker 1 | 1:07:32 - 1:07:43 On on the on the lab side, it feels like the buzzy news of the last week was that, like, you know, Andre Karpathy was going to Anthropic to work on on RSI, right, as as a a team there. Like, what'd you what'd you make of

说到 lab（实验室）这边的动态，感觉上周最热的话题新闻是——你知道——Andre Karpathy 要去 Anthropic 做 RSI，对吧，算是那边的一个团队。你怎么看这件事——

Speaker 2 | 1:07:43 - 1:07:53 that? You you know, I I I am part of the psychosis. Right? It's it's like you can do so much research with with this assistant. Right?

——这个？你知道，我自己也算是这种 psychosis（近乎狂热的状态）的一部分，对吧？因为用这个 assistant（助手）你真的可以做非常多研究，对吧？

Speaker 2 | 1:07:53 - 1:08:13 And and it's amazing and and you should and and you can make many parts of the systems better too, like much faster. So that that's certainly true. On the other hand, like, you think about these post transformers things and, you know, the space of ideas is vast. And, luckily, most of them are wrong. That that's why it's called research.

而且这很惊人，你也确实应该这么做——你还可以把系统中的很多部分做得更好，比如快得多。所以这一点当然是真的。另一方面，你想想这些 post-transformers（后 Transformer）之类的东西，那个想法空间是极其巨大的。幸运的是，其中大多数都是错的。这也正是它之所以叫 research（研究）的原因。

Speaker 2 | 1:08:13 - 1:08:33 Right? And and you need enormous lack and and skill, but but also lack to to to happen upon the right one. And we kind of feel like maybe it's somewhere there in the air, but but it's research. It may be years away. And and even with the best AGIs of the world, we you know, they're they're like human level.

对吧？你既需要巨大的运气，也需要技能，但——也需要运气——才能刚好撞上那个正确的想法。我们多少会觉得，也许答案就在那里，就飘在空气中，但这终究是 research（研究）。它也许还要很多年。而且即便是世界上最好的 AGI（通用人工智能），你知道，它们也只是人类水平。

Speaker 2 | 1:08:33 - 1:08:56 They're maybe researcher level. Maybe they'll be like a 10x researcher. But but for years, there there was a huge community of researchers trying to crack these things, and and and they didn't. So so it may be just very hard. And and, yeah, like, we understand very little about the human brain yet, and we cannot connect it to RML in any great way yet.

它们也许能达到 researcher（研究者）水平。也许会像一个 10x researcher（十倍效率研究者）。但是这么多年来，一直都有一个庞大的研究者社区在试图攻克这些问题，而他们并没有做到。所以这件事可能就是非常难。而且，是的，我们现在对 human brain（人脑）的理解仍然非常有限，也还无法以什么很有力的方式把它和 RML 联系起来。

Speaker 2 | 1:08:56 - 1:09:20 So so so I'm like, on the one hand, I think it's great. I think we'll see the current things getting better. But if you're thinking of of a research breakthrough, it it may just require something that even when you have like this if you're searching in a very efficient way, and even if you're searching some interesting ideas, that that still doesn't mean you're gonna find it. Right?

所以——所以——所以我的看法是，一方面，我觉得这很棒。我觉得我们会看到现有这些东西继续变得更好。但如果你想的是某种 research breakthrough（研究突破），那它可能就是需要某种东西——即便你已经能像这样以非常高效的方式去搜索，即便你也在搜索一些有趣的想法，这仍然不意味着你就一定能找到它，对吧？

Speaker 1 | 1:09:20 - 1:09:21 Yes.

是的。

Speaker 2 | 1:09:21 - 1:09:36 Just that the space of all ideas is so vast that that that even very efficient searches can just not get there. So I I'm I'm not that I'm not that worried existentially about this. One thing I found interesting is I think, you know,

只是因为所有想法构成的空间实在太过巨大了，以至于即便是非常高效的搜索，也可能根本到不了那里。所以我——我在这个问题上并没有那么强的存在主义层面的担忧。我觉得有一件事挺有意思的，我想，你知道，

Speaker 1 | 1:09:36 - 1:09:49 if I'm correct, like, all of your your transformer paper coauthors have have gone on to to start companies. Right? And I'm wondering if that was ever something you thought about or or or Well, I was certainly asked about it many many many times.

如果我没记错的话，像，你那篇 transformer 论文的所有合著者后来都去创办公司了，对吧？我在想，这件事你有没有想过，或者—— 嗯，关于这个问题，的确一直有很多很多很多人问过我。

Speaker 2 | 1:09:49 - 1:10:09 Yeah. I, well, I I'm I'm very happy that I did it so far. I I I thought both my time at Google and my time at OpenAI has have been great. It was a privilege to be there and to be able to do the work. I love technical work.

对。我，我目前为止都很庆幸自己这么做了。我觉得我在 Google 的时间，还有我在 OpenAI 的时间，都非常棒。能在那里、能做这些工作，是一种荣幸。我热爱技术工作。

Speaker 2 | 1:10:10 - 1:10:30 Everyone who started a company has thought maybe they won't need to spend so much time on the company work, it feels like they had to. Totally. But then, sometimes companies do amazing things. That's a

每个创办过公司的人，大概都想过，也许自己不用在公司事务上花那么多时间，但感觉他们最终都不得不这么做。完全是。不过话说回来，公司有时确实能做出很了不起的事情。这是个——

Speaker 1 | 1:10:31 - 1:10:39 It's been a fascinating conversation. I wanna make sure to leave last word to you. Anything you wanna point our listeners to or or or thoughts you wanna leave them with? The the mic the mic is yours.

这是一场非常精彩的对话。我想确保把最后的发言留给你。有什么想让我们的听众关注的内容，或者想留给他们的想法吗？麦克风交给你。

Speaker 2 | 1:10:39 - 1:11:16 I thank you. I I I just want to I think I said it already, but I just want to repeat. I feel this time now that you have, like, powerful GPUs that you can put under your desk and and coding agents that can really help you push them to their limits. And the time where, you know, all the big things are pushing the transformers and great they are because they're amazing, but but there is this whiff of of possibly other things. I think it is still and again, the most exciting time to be a researcher in machine learning.

谢谢你。我，我，我只是想——我想我已经说过了，但我还是想再重复一遍。我觉得现在这个时代，你有强大的 GPU（图形处理器）可以放在桌子底下，还有 coding agents（编程 agent）能真正帮助你把它们的能力推到极限。与此同时，现在所有大事物都在推动 transformer，这当然很棒，因为它们确实很惊人，但——但同时也隐约让人感觉，可能还存在别的东西。我认为，现在依然——而且我再说一次——是成为 machine learning（机器学习）研究者最令人兴奋的时代。

Speaker 2 | 1:11:17 - 1:11:50 I want to encourage everyone to just go and try their ideas, to learn from others. If anything, I feel like we should publish more of wild things. I feel always a little sad when so many papers are about, Oh, we took a pre trained model and RL ed it in a slightly different way. I mean, it's good, but you know, you don't need to catch up with what is there. You can just do new things even if even if they'll start smaller, even if, you know, maybe it won't work the first time.

我想鼓励每一个人都去尝试自己的想法，去向别人学习。要我说的话，我甚至觉得我们应该多发表一些更“野”的东西。每当我看到那么多论文都在写，“哦，我们拿了一个 pre-trained model（预训练模型），然后用稍微不同一点的方式做了 RL（强化学习）”，我总会有点难过。我的意思是，这当然很好，但你知道，你不需要只是去追赶现有的东西。你完全可以去做新的东西，即便它们一开始规模更小，即便——你知道——第一次也许并不会成功。

Speaker 2 | 1:11:51 - 1:12:17 You know, no no nobody talks to me about the paper I had before attention is all you need, which is you don't need attention. I had the paper at the year before. It's it's invasive with active number. Well, it wasn't wasn't quite a good advice, but you you you need to explore the wrong things because they may lead you to the right thing. And this is also what models are still so bad at, which I think Jerry is is trying to push.

你知道，在《attention is all you need》之前，我还有一篇论文，但没人来跟我聊那篇；那篇的意思其实是“你不需要 attention”。我是前一年发的。它更像是一种带 active number 的侵入式做法。嗯，那并不算是一个特别好的思路，但你确实需要去探索那些错误的东西，因为它们可能会把你带到正确的东西上。这也是 models 现在仍然很不擅长的一点，我觉得 Jerry 正在努力推动这个方向。

Speaker 2 | 1:12:17 - 1:12:44 Models are very bad at, like, learning from a totally wrong direction to actually twist it to a right one. That's what we humans can can still do very well, so we should do more of it. We should just do wild explorations even if they fail. I feel now that the you know, it's if you put a lot of your own effort without an agent, it's it's done very hard too when it fails. I think with agents, it's even easier.

models 很不擅长的一点是：从一个完全错误的方向里学习，然后真的把它拧到一个正确的方向上。这是我们人类现在仍然很擅长做的事，所以我们应该更多去做。我们就该大胆探索，哪怕会失败。我现在的感觉是，如果你投入很多自己的努力、又没有 agent 的话，失败时会特别难受。我觉得有了 agents，这甚至会更容易一些。

Speaker 2 | 1:12:44 - 1:12:55 So I want to encourage everyone to, you know, do research explorations, fail when it comes to this this is how we'll how we can get to interesting things.

所以我想鼓励大家去做研究探索，在这个过程中失败；这正是我们通往有趣事物的方式。

Speaker 1 | 1:12:55 - 1:13:00 I love that. Well, I feel like that's the the perfect note to end on. Thank you so much for for coming on the pod. This was super fun.

我太喜欢这句话了。我觉得这简直是最完美的收尾。非常感谢你来上这个播客。这次聊得特别开心。

Speaker 2 | 1:13:00 - 1:13:01 Thank you so much for having me.

非常感谢邀请我。

Speaker 1 | 1:13:01 - 1:13:27 I'm Jacob Efron, and this has been unsupervised learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at Redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work.

我是 Jacob Efron，这里是 unsupervised learning，一档播客。在这里，我可以和 AI 领域最聪明的人交流，问他们一大堆关于 models 正在发生什么、以及这对现实世界中的企业意味着什么的问题。希望大家已经能听出来，我做这件事非常开心。除了我在 Redpoint 做投资人的本职工作之外，这还是一个我利用晚上和周末来做的项目。但我们之所以能请到这些了不起的嘉宾，确实离不开像你们这样订阅播客、并把它分享给朋友的人。归根结底，正是这些让整件事运转起来。

Speaker 1 | 1:13:27 - 1:13:31 And so please consider doing that, and thank you so much for your support and listening. We'll see you next

所以，也请你考虑这样做，非常感谢你的支持和收听。我们下

Speaker 2 | 1:13:31 - 1:13:32 episode.

期节目再见。

原文 ↗https://www.youtube.com/watch?v=N1geOimmdDo