🎙 播客Unsupervised Learning· 2026 年 5 月 15 日· 14,433 词 · 约 72 分钟

Ep 86: Yann LeCun on Leaving Meta, Breaking The LLM Paradigm, & Why Hinton is Wrong

SPACE 播放 / 暂停←→ 上一句 / 下一句

Speaker 100:00 - 00:03

You're one of the godfathers of AI. What's your kind of view of the path of progress here?

Speaker 100:00 - 00:03

你是 AI 的教父之一。你怎么看这里这条进步路径？

Speaker 200:03 - 00:11

Five years complete world domination. The best way to get breakthrough research is you hire the best people, and you get the fuck out of the way. Pardon my French.

Speaker 200:03 - 00:11

五年内彻底统治世界。获得突破性研究成果的最佳方式，就是雇来最好的人，然后你他妈的别挡路。请原谅我的粗话。

Speaker 100:11 - 00:14

You shared the Turing Award with two others. When did your views start diverging?

Speaker 100:11 - 00:14

你与另外两人共同获得了 Turing Award。你们的观点是从什么时候开始出现分歧的？

Speaker 200:14 - 00:15

In 2023.

Speaker 200:14 - 00:15

在 2023 年。

Speaker 100:15 - 00:16

How you do know it

Speaker 100:15 - 00:16

你是怎么知道

Speaker 200:16 - 00:16

was time

Speaker 200:16 - 00:16

是时候

Speaker 100:16 - 00:19

to leave Meta? It sounds like you were thinking through some of these things over a period of time.

Speaker 100:16 - 00:19

离开 Meta 的？听起来你在一段时间里一直在反复思考其中的一些事情。

Speaker 200:19 - 00:25

It was a big misconception about my role, my relation to Alex, and how AI was run at Meta.

Speaker 200:19 - 00:25

关于我的角色、我和 Alex 的关系，以及 Meta 是如何运行 AI 的，外界一直有很大的误解。

Speaker 100:25 - 00:27

What's like one thing you've changed your mind on in the last year?

Speaker 100:25 - 00:27

过去一年里，有没有哪一件事是你改变了看法的？

Speaker 200:27 - 00:28

I mean, the whole idea of

Speaker 200:27 - 00:28

我的意思是，整个这个想法——

Speaker 100:28 - 01:00

Jan Lakun is one of the godfathers of AI. He's an absolute legend in the field, someone I've admired for a long time. And so it was such a treat to get him on unsupervised learning. He's been a noted skeptic of LMs in many ways, and so we dug into what LLMs can do, what they can't do, some of the limitations he sees, and why he ultimately decided to pursue a different architecture. And we also talked about his time at Meta, you know, the things he's proud of in setting up Fare, how the last few years proceeded, and what ultimately led him to, spin out and start his own company, Ami.

Speaker 100:28 - 01:00

Jan Lakun 是 AI 的教父级人物之一。他绝对是这个领域的传奇，也是我长期以来一直非常敬佩的人。所以，能请他来到 Unsupervised Learning，真的是一种难得的享受。他在很多方面一直是 LMs 的著名怀疑者，因此我们深入聊了 LLMs 能做什么、不能做什么、他所看到的一些局限，以及他为什么最终决定去探索一种不同的 architecture（架构）。我们也谈到了他在 Meta 的经历，比如他为建立 Fare 感到自豪的事情、过去几年是如何发展的，以及最终是什么促使他离开、出来创办了自己的公司 Ami。

Speaker 101:00 - 01:26

I think it's just fascinating to get Jan's thoughts on everything happening in the AI ecosystem today, this tension between basic research and then pushing LLMs forward, and how that's happening in a bunch of organizations today, as well as his thoughts on just where the whole space is headed. He's just an absolute giant in the field, and when I started this podcast, I hope we get guests like him. So it is just such a treat. I think folks will really enjoy hearing the conversation we had. Without further ado, here's Jan.

Speaker 101:00 - 01:26

我觉得，能听到 Jan 对当下 AI ecosystem（生态系统）中正在发生的一切的看法，真的非常吸引人——包括基础研究与推动 LLMs 前进之间的这种张力，以及这种情况如今如何在许多组织中上演；还有他对于整个领域未来走向的判断。他绝对是这个领域的巨擘，而当我开始做这个 podcast（播客）时，我就希望我们能请到像他这样的嘉宾。所以这真的是一种难得的享受。我想大家一定会很喜欢我们这次的对话。闲话少说，下面有请 Jan。

Speaker 101:28 - 01:37

Jan, this is such a pleasure. You're one of the godfathers of AI. I feel like when I started doing this podcast years ago, I was really hoping we might one day get someone like you on.

Speaker 101:28 - 01:37

Jan，真的非常高兴请到你。你是 AI 的教父级人物之一。我觉得，多年前我刚开始做这个 podcast 的时候，真的就在希望，有一天我们也许能请到像你这样的人。

Speaker 201:37 - 01:43

You know, I don't like that term because I live in New Jersey. When you're a godfather in New Jersey, it doesn't mean the same thing.

Speaker 201:37 - 01:43

你知道，我不太喜欢这个说法，因为我住在 New Jersey。在 New Jersey，“godfather” 这个词可不是同一个意思。

Speaker 101:43 - 02:15

Very fair. Very fair. You know, obviously, you know, your bet on on neural nets when everyone doubted them is legendary, and I feel like today you're making a similar bet in many ways against LMs and the kind of predominant generative architectures that that so many believe in. You've recently started a new company behind this theme. And so, you know, our goal today in the conversation is to leave our listeners with a lot more information about me, what you're doing there, some of your work at Tapestry, why you think the rest of the field is pointed in the wrong direction around some of these generative models, and then also just get your reflections on the way the field's unfolded, your time at Meta and all that.

Speaker 101:43 - 02:15

很公平，很公平。显然，你当年在所有人都怀疑 neural nets（神经网络）的时候押注它们，这件事已经成了传奇；而我感觉，今天你在很多方面又做出了类似的押注——站在 LMs 以及如今许多人所信奉的主流 generative architectures（生成式架构）的对立面。你最近也围绕这个主题创办了一家新公司。所以，我们今天这场对话的目标，是让听众更充分地了解 Ami、你在那里的工作、你在 Tapestry 的一些工作、你为什么认为这个领域在某些 generative models（生成模型）上走错了方向，以及也想听听你对这个领域如何发展至今、你在 Meta 的那段经历等等的回顾。

Speaker 102:16 - 02:38

Modest goals for a single podcast episode. I figured it'd be great to start with Amit because the company feels like the clearest statement of your technical thesis going forward. And so you recently launched the company. It's focused on world models and scaling the JEPR architecture, which you obviously pioneered over at Meta. And I'm wondering if you could talk a little bit about the origins of that architecture and the extent to which you drew inspiration from the human brain and the way that works.

Speaker 102:16 - 02:38

对一集 podcast 来说，目标还真是挺谦虚的。我觉得，从 Amit 开始会很合适，因为这家公司感觉像是你接下来技术主张最清晰的表达。所以，你最近刚刚创办了这家公司。它聚焦于 world models（世界模型）以及扩展 JEPR architecture，而这个架构显然是你在 Meta 时期开创的。我想请你谈一谈这种架构的起源，以及你在多大程度上是从人脑及其工作方式中获得启发的。

Speaker 202:38 - 03:03

So first of all, I wanna say there's nothing wrong with LLMs in the sense of LLMs, you know, are the basis for a lot of very useful AI products that all of us use, including me. Yeah. They're great, okay, for what they do. They're just not a path towards human level or human like intelligence or even animal like intelligence. So that's my claim.

Speaker 202:38 - 03:03

首先，我想说，就 LLM（large language models，大语言模型）本身而言，它们并没有什么问题。你知道，LLM 是很多非常有用的 AI 产品的基础，我们所有人都在用，包括我自己。对，它们很棒，至少在它们擅长的事情上是这样。只是，它们并不是通向 human level 或 human-like intelligence，甚至 animal-like intelligence 的路径。这就是我的观点。

Speaker 203:03 - 03:09

Okay? I'm not saying are useless. Right? I'm I'm just saying they're not a path towards,

Speaker 203:03 - 03:09

好吗？我不是说它们没用，对吧？我只是说，它们不是通向那种目标的路径。

Speaker 103:09 - 03:11

you know I mean, help build some of the first major open source ones.

Speaker 103:09 - 03:11

你知道，我的意思是，我还参与帮助构建了一些最早期的重要 open source 模型。

Speaker 203:12 - 03:25

Right. Absolutely. So what is AMI? So AMI really stands for advanced machine intelligence. And the the the kind of subtitle, the motto, if you want, is AI for the real world.

Speaker 203:12 - 03:25

对，当然。所以，什么是 AMI？AMI 实际上代表 advanced machine intelligence。而它的副标题，或者说口号，如果你愿意这么叫的话，就是：面向真实世界的 AI。

Speaker 203:26 - 03:43

So, basically, a lot of, you know, AI techniques that people know about today are good for language manipulation, either human language or computer code or mathematics or legalese, which barely qualifies as human language.

Speaker 203:26 - 03:43

所以，基本上，人们今天所熟悉的很多 AI 技术，都很擅长处理语言操作，不管是人类语言、computer code、mathematics，还是 legalese（法律行话）——后者几乎都快算不上真正的人类语言了。

Speaker 103:43 - 03:45

Unfortunately, a lot of human language used for it.

Speaker 103:43 - 03:45

很遗憾，很多人类语言就是被这样使用的。

Speaker 203:45 - 04:07

Right. Sadly. You know, language is very special in a way, and it's particularly well suited for the type of, you know, architectures that have been so successful recently, the the, you know, large language models, GPT style architectures. But what about the real world? What about, like, understanding the physical world?

Speaker 203:45 - 04:07

对，确实如此。可惜的是，language 在某种意义上非常特殊，而且它特别适合那类近来非常成功的 architectures，也就是 large language models、GPT 风格的 architectures。但真实世界呢？比如说，对物理世界的理解呢？

Speaker 204:08 - 04:21

Turns out reality is way more complicated than language because it's high dimensional. It's continuous. It's noisy. It's messy. And training a system to understand the real world is much, much harder.

Speaker 204:08 - 04:21

结果是，reality 比 language 复杂得多，因为它是 high-dimensional 的，是 continuous 的，是 noisy 的，是 messy 的。而要训练一个系统去理解真实世界，要困难得多得多。

Speaker 204:21 - 04:52

So that's really what we are after. That's what I've been after for most of my career and really kind of, you know, working on in an accelerated fashion over the last five, six years or so and making significant progress over the last two years. And so it made sense to really do a startup around it and sort of go to into high gear, you know, and pushing that. And it became clear, you know, by the end of last year that Meta was really not the right place for that. So which is why I left and started Amin Labs.

Speaker 204:21 - 04:52

所以这才是真正我们要追求的东西。这也是我职业生涯大部分时间一直在追求的事，尤其是在过去五六年里明显加快了推进速度，并且在过去两年取得了显著进展。因此，围绕这件事真正做一家 startup（初创公司）是合理的，相当于把它切换到高速档，全力推进。到去年年底时也已经很清楚，Meta 确实不是做这件事的合适地方。这也是为什么我离开并创办了 Amin Labs。

Speaker 104:52 - 05:20

I think it's an interesting, like, you know, trend that we're seeing across the board, right, where it feels like there you're there's there's many folks spinning out of, you know, either some of the large companies or research labs, that have a a particular direction of research they're excited about. And you'd have such an interesting vantage point of this from your time at Fair, this almost tension that exists between go pursue as many different research directions as possible in these companies versus, hey. Something's really working. This is the thing that we're gonna sell for the next six, twelve months. Like, go focus on that.

Speaker 104:52 - 05:20

我觉得这是一个很有意思的趋势，几乎在各个地方都能看到：感觉有很多人从一些大公司或 research labs（研究实验室）里 spin out（分拆出来），因为他们对某个特定的研究方向非常兴奋。而你在 Fair 的经历又给了你一个很有意思的观察视角：这些公司里几乎总存在这样一种张力——一方面是尽可能去探索更多不同的研究方向，另一方面是，嘿，有些东西真的开始奏效了，这就是我们未来六个月、十二个月要卖的东西，那就去专注做这个。

Speaker 105:20 - 05:23

You know, I'm curious your your thoughts on that and and what you've of seen in the industry at large.

Speaker 105:20 - 05:23

我很好奇你对这件事的看法，以及你在整个行业里都观察到了什么。

Speaker 205:23 - 05:32

Well, it's a strange trade off. There's really two models of. Right? There's a lot of exploratory research, a lot of direct research directions. Right?

Speaker 205:23 - 05:32

这个权衡其实很奇怪。基本上存在两种模式，对吧？一种是大量探索性研究，很多直接的研究方向。对吧？

Speaker 205:32 - 05:49

And sometimes something kind of seems to work, and you you need to push it further. And it's not research anymore. I mean, the people working on it are, say, researchers, or they're called researchers, at least, in the press. But but, really, it's becoming more engineering and pushing for for products. Right?

Speaker 205:32 - 05:49

有时候某个东西看起来似乎行得通了，你就需要把它继续往前推进。而这时候它已经不再是研究了。我的意思是，做这件事的人可能还是 researchers（研究人员），或者至少媒体上还是这么称呼他们。但实际上，它已经越来越变成 engineering（工程）了，变成在为产品做推进。对吧？

Speaker 205:49 - 06:42

So that happened a number of times at Meta because of things that were started at FAIR. Such a thing happened in, you know, early twenty twenty three, essentially, when, you know, LAMA, which was developed at FAIR, LAMA one, was very promising. And Meta created a a whole organization, Gen AI, to turn it into something real and a series of products and produce, you know, Lama two, Lama three, Lama four, which was a bit of a disappointment. And because, you know, Mark Zuckerberg was disappointed by it, he kind of rebooted the entire organization, reorganized it, and hired new people, etcetera. But what also happened over the last year is that, basically, the company meta realized that they'd fallen behind a little bit.

Speaker 205:49 - 06:42

这种情况在 Meta 发生过很多次，因为一些最初始于 FAIR 的东西就是这样。比如在 2023 年初，基本上就是那时候，由 FAIR 开发的 LAMA——LAMA one——展现出了很大潜力。于是 Meta 成立了一个完整的组织 Gen AI，想把它变成真正落地的东西，做成一系列产品，并推出了 Lama two、Lama three、Lama four。后者多少有点令人失望。因为 Mark Zuckerberg 对它不满意，他基本上把整个组织重启了一遍，重新做了架构调整，又招了新的人，等等。但过去一年里还发生了一件事：公司 Meta 基本上意识到，他们已经有一点落后了。

Speaker 206:42 - 07:34

And so that kind of refocused the strategy on trying to catch up with the industry. And the sad side effect of it is that a lot of the exploratory research was basically not given high priority anymore. I mean, it didn't concern the stuff I was working on, all the and world models, because, you know, Mark himself and Andrew Bosworth, the CTO, and a bunch of other people in the company were really interested in that project and really believed in the long term impact. But the rest of the company was just, you know, totally entirely focused on LLMs and made it clear to me that Meta was really not the right place to push for that project anymore. And then we started to have good results, and so it was clear that, you know, we had to kind of make that transition between research and actually kind of developing the technology, scaling it up, and building products out of it.

Speaker 206:42 - 07:34

所以，这就让整体战略重新聚焦到如何追赶整个行业上。而它带来的一个令人遗憾的副作用是，很多探索性研究基本上不再被赋予高优先级了。我的意思是，这倒不直接涉及我当时在做的那些 stuff（工作内容）以及 world models（世界模型），因为 Mark 本人、CTO Andrew Bosworth，以及公司里的不少其他人，都对那个项目非常感兴趣，也真的相信它的长期影响力。但公司其他部分则完全把注意力放在 LLMs（大语言模型）上，这也让我清楚地意识到，Meta 已经不再是推动那个项目的合适地方了。后来我们也开始拿到不错的结果，所以事情就很明确了：我们必须完成从 research（研究）到真正开发这项技术、把它 scale up（规模化）、并基于它构建产品的转变。

Speaker 207:34 - 07:51

And we realized also that most of the applications were probably for things that Meta was not particularly interested in. A lot of applications of the kind of stuff that we've been working on is in industry, like manufacturing industry and stuff like that.

Speaker 207:34 - 07:51

我们也意识到，大多数应用场景很可能是 Meta 并没有特别感兴趣的那些方向。我们一直在做的这类东西，很多应用其实都在工业领域，比如 manufacturing industry（制造业）之类的地方。

Speaker 107:51 - 08:09

Obviously, you're you're kind of pursuing world models and and and in that broader world. And I think there's other people that have come at the world model pace from a more, like, generative approach. And so I think you've got folks, you know, you've Google got folks and Genie and the video models. You've got folks, you know, building VLAs on the robotic side. You've got Fei Fei and and kind of like the three d spatial models.

Speaker 107:51 - 08:09

很显然，你们某种程度上是在探索 world model（世界模型），以及更广义上的那个方向。我觉得也有一些人是从更偏 generative（生成式）的方法切入 world model 这条路线的。所以你会看到一些团队，比如 Google 的一些人，在做 Genie 和 video models；也有一些人在机器人这边构建 VLA；还有 Fei Fei 在做那种 3D spatial models（3D 空间模型）。

Speaker 108:09 - 08:22

As you think about kind of the the body of of of evidence that got you excited about the JEPA models and how you kinda compare them to what the general folks have done. You know, where do you think we are today in in terms of, like, comparing these architectures and approaches?

Speaker 108:09 - 08:22

当你回头看那些让你对 JEPA models 感到兴奋的证据体系，并把它们和更主流那些人所做的方法作比较时，你觉得我们今天在这些架构和路线的对比上，大概处在什么阶段？

Speaker 208:22 - 08:29

Okay. So one model is quickly becoming a buzzword Yeah. Right now. Right? Certainly in research, but also in industry to some extent.

Speaker 208:22 - 08:29

好的。所以现在 world model 正迅速变成一个 buzzword（流行术语）。对吧？当然在研究领域是这样，在某种程度上工业界也是如此。

Speaker 208:29 - 08:45

And and then there are two factions, if you want. I'm not gonna talk about VLA because VLA is clearly now being seen as not going anywhere. Like, it's really not working. So VLA is, you know, vision language action models. Right?

Speaker 208:29 - 08:45

如果你愿意这么说的话，现在大致有两个阵营。我先不谈 VLA，因为现在大家已经很清楚地看到，VLA 这条路走不通，基本上并没有真正奏效。VLA 就是 vision language action models（视觉-语言-动作模型），对吧？

Speaker 208:45 - 09:09

So, basically, use the LLM technology to train a system to produce actions for, like, controlling a robot or something like this. Right? So you have vision in, language in, action out, maybe language out too. And that's pretty much now seen as a failure, not being reliable enough, requiring too much training data, you know, things like that. Okay.

Speaker 208:45 - 09:09

所以，本质上就是用 LLM 技术来训练一个系统，让它输出 action（动作），比如去控制机器人之类的。对吧？也就是 vision 输入、language 输入、action 输出，也许还会有 language 输出。现在这基本已经被视为失败了：可靠性不够高，需要的训练数据太多，诸如此类。好。

Speaker 209:09 - 09:22

Then there is world models. Okay. So what is a world model? A world model, at a very general level, is something that allows an agentic system to anticipate the consequences of its own actions. Okay?

Speaker 209:09 - 09:22

然后就是 world models。好，那么什么是 world model？在非常宽泛的层面上，world model 是一种能让 agentic system（智能体系统）预判自身行为后果的东西。对吗？

Speaker 209:23 - 09:41

Predict the consequences of its own actions. From my point of view, I cannot imagine how you can even think of building an agentic system without that system having the ability to predict the consequences of its actions. I mean, that's pretty essential. Right? When we act in the world, we have this ability.

Speaker 209:23 - 09:41

也就是，预测自己行动会带来什么后果。从我的角度看，我无法想象，一个 agentic system 如果不具备预测自身行为后果的能力，你要怎么去设想构建它。我的意思是，这几乎是最核心的能力。对吧？我们人在世界中行动时，就具备这种能力。

Speaker 209:42 - 10:05

And when we take an action without thinking about the consequences, we are taking a big risk. And very often, you know, other people think we're an idiot. We have plenty of examples on the international political scene at the moment of people who have complete, you know, no ability to predict the consequences of their actions. So that's the one model. That's all it is.

Speaker 209:42 - 10:05

而当我们采取行动却不去考虑后果时，我们就是在冒很大的风险。而且很多时候，别人会觉得我们是白痴。眼下的国际政治舞台上，这样的例子多得很——有些人完全没有能力预测自己行为的后果。所以，这就是 world model。它本质上也就只是这个。

Speaker 210:05 - 10:31

Right? Ability to predict the consequences of your own actions. If you if you have this ability, then you can plan a sequence of actions to accomplish a task, you know, satisfy a goal. And you do this by planning, reasoning, by a process of search and optimization. You don't do this by predicting one action after the other auto aggressively, like a VLA we do.

Speaker 210:05 - 10:31

对吧？就是能够预测你自己行为后果的能力。如果你有这种能力，那你就能规划出一连串动作来完成一项任务，也就是满足一个目标。而你做到这一点，是靠规划、推理，靠一个 search（搜索）和 optimization（优化）的过程。你不是像我们现在做的 VLA 那样，通过一种自回归的方式一个接一个地预测动作来做到这一点。

Speaker 210:32 - 10:55

You do this by searching for a sequence of actions that will accomplish the task you set you set for yourself. So the blueprint for this is completely different from what, you know, LLMs can do at the moment. LLMs do not have the ability to predict the consequences of their actions, and they do not have any planning abilities, because inference is by predicting the next token. Right? It's not by search.

Speaker 210:32 - 10:55

你是通过搜索一串能够完成你为自己设定任务的动作序列来做到的。所以，这背后的 blueprint（蓝图）和目前 LLMs 能做的事情完全不同。LLMs 没有预测自己行为后果的能力，也没有任何规划能力，因为它们的 inference（推理）是通过预测下一个 token（词元）来进行的。对吧？不是通过 search。

Speaker 210:55 - 11:29

Okay? So right there, you have the two characteristics that I think are essential for intelligent behavior, ability to predict consequences of your actions, and second, ability to plan by optimization, by search, find a good sequence of actions that will produce the correct outcome. And then there is a third characteristic, which is how you how you predict the consequences of your actions. Okay. So, you know, if you have a water bottle in front of me I realize some people would just listen to this and not have the picture.

Speaker 210:55 - 11:29

好吗？所以在这里，你就得到了我认为智能行为必不可少的两个特征：第一，能够预测自己行为的后果；第二，能够通过 optimization、通过 search 来进行规划，找到一组好的动作序列，从而产生正确结果。然后还有第三个特征，就是你如何预测自己行为的后果。好吧。比如说，如果我面前有一个水瓶——我意识到有些人只是听这段内容，看不到画面。

Speaker 211:29 - 11:58

So I have an open, uncapped water bottle in front of me. If I push at the bottom, it's gonna slide on the table. If I push near the top, it's probably gonna flip. We can't predict exactly how the the bottle will will fall in which direction. We can't exactly predict how it's gonna slide, you know, how the water will spill, you know, whether the table is tilted in one way and the water will, you know, kind of flow in one direction or another.

Speaker 211:29 - 11:58

所以我面前有一个打开了、没盖盖子的水瓶。如果我从底部推它，它会在桌面上滑动；如果我从靠近顶部的地方推它，它很可能会翻倒。我们没法精确预测这个瓶子会怎么倒、朝哪个方向倒。我们也没法精确预测它会怎么滑动，比如水会怎么洒出来，桌子是不是有一点倾斜，于是水会大致朝某个方向流，而不是另一个方向。

Speaker 211:58 - 12:07

There's no way we can predict this at a pixel level. And so our mental model of the world predicts, but at an abstract level of representation.

Speaker 211:58 - 12:07

我们不可能在 pixel（像素）级别上预测这些事情。所以，我们关于世界的心理模型是在做预测，但它是在一种抽象的表征层级上进行预测。

Speaker 112:07 - 12:13

So as you were working on this architecture, was a lot of it inspired by the human brain? I mean, obviously, like, the, you know, the way you're articulating things is exactly how how we do things.

Speaker 112:07 - 12:13

所以，当你在研究这种架构时，其中很多想法是受人脑启发的吗？我是说，很显然，你表述这些事情的方式，和我们实际做事的方式完全一致。

Speaker 212:13 - 12:22

Right. Or at least by, you know, cognitive science. Right? Whether you can sort of translate this into a neural architecture and things like this, that's there's a big gap there. Okay.

Speaker 212:13 - 12:22

对，或者至少是受 cognitive science（认知科学）启发。对吧？至于你是否能把这些东西转化成一种 neural architecture（神经架构）之类的实现，那中间还有很大的鸿沟。好吧。

Speaker 212:22 - 13:19

So that that you know, certainly, cognitive science was a bit of a motivation or or, you know, what psychologists call system two, which is this idea of the way you behave in sort of deliberate, reflective behavior, is that you do imagine, predict the consequences of your actions, and you plan accordingly, contrary to system one, where you just act, you know, reactively and instinctively. So, yeah, there is an inspiration. But also, there is a lot of empirical evidence that you don't want to generate pixels. Okay? I've been I've been really interested in that problem of learning models of the world by prediction for a very long time, and then had an epiphany about five years ago, realizing that all of the architectures that have been successful to learn representations of images and videos are non generative architectures.

Speaker 212:22 - 13:19

所以，没错，cognitive science 的确是一部分动机，或者说，心理学家所谓的 system two（系统二）也是一种启发。它表达的是这样一种观念：当你以一种深思熟虑、会反省的方式行动时，你会先想象、预测自己行为的后果，再据此进行规划；这和 system one 相反，在 system one 里，你只是以反应式、凭直觉的方式行动。所以，是的，这里面确实有启发来源。但同时，也有大量经验性证据表明，你并不想去生成 pixels。好吗？我对通过预测来学习世界模型这个问题感兴趣已经很久了，大约五年前我突然想通了一件事：所有那些在学习图像和视频表征方面取得成功的架构，都是 non-generative（非生成式）架构。

Speaker 213:19 - 13:46

And all the generative ones basically have been failures. Right? So VAE, right, variational autoencoders, or autoencoders more generally, is kind of a natural way to think about, like, learning abstract representations of inputs. Right? So you put an image at the input of a of a neural net, and then you train it to just reproduce the input on its output now with a big neural net.

Speaker 213:19 - 13:46

而所有那些 generative（生成式）的方法，基本上都失败了。对吧？所以 VAE，也就是 variational autoencoders（变分自编码器），或者更广义地说 autoencoders（自编码器），本来是一种很自然的思路，用来学习输入的抽象表示。对吧？你把一张图像送进一个 neural net（神经网络）的输入端，然后训练它在输出端把这个输入原样重现出来——当然这里用的是一个很大的 neural net。

Speaker 213:46 - 13:54

Now if you just do it this way, your neural net will not do anything interesting. We just learn the identity function. Yeah. Completely uninteresting. It doesn't work.

Speaker 213:46 - 13:54

但如果你只是这么做，你的 neural net 不会做出任何有意思的事情。它学到的只会是 identity function（恒等函数）。对，完全没意思，行不通。

Speaker 213:54 - 14:12

Like, if you train a VAE to learn representations of images, you get something, but it's really not that great. Same with sparse autoencoders. Then you have another set of techniques, and it's kinda derivative of something called denoising autoencoder. Masked autoencoder is a version of this. BERT is a version of this for NLP.

Speaker 213:54 - 14:12

比如说，如果你训练一个 VAE 去学习图像的表示，你确实会得到点什么，但效果真的不怎么样。sparse autoencoders（稀疏自编码器）也是一样。然后还有另一组技术，它有点源自一种叫 denoising autoencoder（去噪自编码器）的方法。masked autoencoder（掩码自编码器）就是它的一个版本。BERT 则是它在 NLP（自然语言处理）里的一个版本。

Speaker 214:12 - 14:44

So you take the image, you corrupt it in some way, and then you train this big neural net to recover the original the original image. There's a huge project at at fair on this called MAE, mass auto encoder. It was very disappointing. A lot of computation and not not really great, satisfying result. Simultaneously, some of the same people working on MAE and and some other people in Paris and in New York were working on other techniques using non generative architecture, joint embedding architecture.

Speaker 214:12 - 14:44

也就是说，你拿一张图像，用某种方式把它破坏掉，然后训练这个大型 neural net 去恢复原始图像、原始图像。fair 当时有一个很大的项目就是做这个，叫 MAE，mass auto encoder。结果非常令人失望。算力花了很多，但结果并不算特别好，也谈不上令人满意。与此同时，一些同样在做 MAE 的人，以及 Paris 和 New York 的另外一些人，在研究别的技术：不用 generative architecture（生成式架构），而是用 joint embedding architecture（联合嵌入架构）。

Speaker 214:44 - 15:00

So take an image, corrupt it in some way, and then run the two images to encoders, and then try to predict the representation of the original image from the representation of the corrupted one. That's JPA. Yeah. Okay? So JPA means joint emitting predictive architecture.

Speaker 214:44 - 15:00

也就是，拿一张图像，用某种方式把它破坏一下，然后把这两张图分别送进 encoders（编码器），接着尝试根据被破坏图像的表示，去预测原始图像的表示。这就是 JPA。对。明白吗？所以 JPA 的意思是 joint emitting predictive architecture。

Speaker 215:00 - 15:28

Right? So you have one encoder that observe makes an observation, another encoder that makes a different observation. You try to predict the representation of the first one from the second one with a predictor. And those techniques turned out to work much better for representing images and video. So things like Dino, Dino v one, v two, v three, project that is still going on at at a fair in Paris, projects like iJEPA and then VJEPA.

Speaker 215:00 - 15:28

对吧？你有一个 encoder 观察到一种观测，另一个 encoder 得到另一种观测。你用一个 predictor（预测器）去根据第二个表示，预测第一个表示。而结果证明，这些技术在表示图像和视频方面要好得多。所以像 Dino、Dino v one、v two、v three——这个项目现在还在 fair 的 Paris 团队继续做——还有像 iJEPA，以及 VJEPA 这样的项目。

Speaker 215:28 - 15:55

And then before that, there were, like, SimSIAM and MoCo and, like, a whole bunch of different techniques, mostly from Meta. There was a bunch of others from other groups. But that turned out to be a much better way of learning representations of images than predicting pixels. Yeah. And so it just clicked in my in my mind, but, you know, not just mine, that this was the way to go, and predicting pixels was kind of a a losing proposition.

Speaker 215:28 - 15:55

再往前还有像 SimSIAM、MoCo，以及一大堆不同的技术，大多来自 Meta。当然其他团队也提出过不少别的方法。但事实证明，相比预测像素，这是一种好得多的图像表示学习方式。对。所以我脑子里一下子就想通了——当然不只是我一个人这么想——这才是正确方向，而预测像素基本上是一条没什么前途的路。

Speaker 115:55 - 16:20

You know, it feels like there's all these robotics demos that are released Uh-huh. You know, from from some of the model companies that are feel increasingly impressive and maybe, you know, seem to resemble things like planning and reasoning when, you know, they maybe haven't seen a a room or or a specific, know, a version of a task before and are still able to execute that task. You know, what would you say to our listeners, I guess, that that that observe that stuff and feel like, it feels like we're trending toward some real progress with some of the generative approaches.

Speaker 115:55 - 16:20

你知道，现在会发布很多 robotics（机器人）demo。嗯。来自一些 model companies（模型公司）的那些 demo 看起来越来越令人印象深刻，而且你知道，它们似乎开始有点像 planning（规划）和 reasoning（推理）了——比如它们以前可能没见过某个房间，或者没见过某个任务的具体版本，但依然能够把这个任务执行出来。那我想问的是，对于那些看到这些东西、并觉得“我们好像正通过某些 generative approaches（生成式方法）朝着真正的进展前进”的听众，你会怎么说？

Speaker 216:20 - 16:39

Well, there is real progress, and some of those demos are really impressive. But they are trained with enormous amounts of data collected either from today operation or from just, you know, human action with things you hold in your hand. Yeah. Yeah. Grippers.

Speaker 216:20 - 16:39

嗯，确实有真正的进展，而且其中一些 demo 的确非常令人印象深刻。但它们是用海量数据训练出来的，这些数据要么来自当前的操作过程，要么就是，你知道的，人类拿着手中物体进行动作的数据。对。对。Grippers（夹持器）。

Speaker 216:39 - 17:01

Grippers that you know? And you and you collect the data for that or just, you know, tracking hands and fingers of a of a person and then translating this into kind of commands for for a robot. And so those things are trained with imitation learning mostly. Right? And a little bit with, you know, reinforcement learning to fine tune in mostly in simulation.

Speaker 216:39 - 17:01

就是那种，你知道的，Grippers？然后你会为此收集数据，或者只是，嗯，追踪一个人的手和手指动作，再把这些转化成某种给机器人的指令。所以这些东西大多是用 imitation learning（模仿学习）训练的。对吧？再加上一点点 reinforcement learning（强化学习）来做微调，而且大多数是在 simulation（仿真）里完成的。

Speaker 217:01 - 17:54

So the issue with this is that you need a lot of data to train those systems to to imitation. And it it becomes expensive, and it's a little brittle in the sense that, you know, you need to collect lots of data for every task you want the robot to to solve. Whereas if the system had a world model that allows you to predict the, you know, the outcome of an action, it would just plan an action to solve a new task without actually having to be trained to accomplish this task. So the degree of generalization you would get with a one model based system is much, much larger, you know, kind of wider spectrum of of tasks with less training data that would be required than a a system train with imitation learning and and, you know, fine tuning.

Speaker 217:01 - 17:54

所以这里的问题在于，你需要大量数据来把这些系统训练到能够进行模仿。而这会变得很昂贵，并且还有点脆弱，意思是说，你得为机器人想要解决的每一个任务都收集大量数据。相较之下，如果系统有一个 world model（世界模型），能够让你预测某个动作的结果，那它就可以直接规划一个动作去解决新任务，而不必真的为完成这个任务专门训练。因此，基于 world model 的系统所能获得的泛化程度会大得多，也就是能覆盖宽得多的任务谱系，同时所需训练数据也会少于一个用 imitation learning 再加微调训练出来的系统。

Speaker 117:54 - 18:10

No no doubt those approaches require data. I guess this question of generalization really is the big question, right? And I think some folks have shown some results around getting better at task A helps with task B, but that obviously feels like there's still the big unanswered question around those architectures.

Speaker 117:54 - 18:10

毫无疑问，这些方法都需要数据。我想，泛化这个问题确实才是最大的那个问题，对吧？我觉得有些人已经展示过一些结果，说明在任务 A 上变得更好会有助于任务 B，但很明显，这些架构上仍然存在一个巨大的、尚未回答的问题。

Speaker 218:10 - 18:32

I mean, you get this synergy between tasks. So the more tasks that you train the system to solve, the more task it's gonna be able to acquire with with a small amount of data regardless of what the what technique you use. But but the hope with world models is that the system can solve new tasks zero shot, which humans are completely capable of doing. Right? And many animals as well.

Speaker 218:10 - 18:32

我的意思是，任务之间确实会产生这种协同效应。所以，你训练系统去解决的任务越多，它之后能够用少量数据学会的新任务也就越多，不管你用的是什么技术。但对 world model 的期待在于，系统可以 zero-shot（零样本）解决新任务，而这是人类完全做得到的。对吧？很多动物也是如此。

Speaker 218:32 - 19:04

So so that's really the the hope, like, you know, solving a lot more problems with either a small amount of training data or or no training data at all and just a little bit of maybe, you know, RL style fine tuning. Yeah. Like, you know, how how is it that a 17 year old can learn to drive in, like, a dozen hours or maybe twenty hours? We have millions of hours of training data of, you know, people driving cars. We still don't have level five surviving cars.

Speaker 218:32 - 19:04

所以这才是真正的希望所在：用少量训练数据，甚至完全不用训练数据，就能解决更多得多的问题，也许再加上一点点 RL（强化学习）风格的微调。对。比如说，一个 17 岁的少年为什么能在十几个小时，或者二十个小时内学会开车？我们已经拥有数百万小时的人类驾驶训练数据了，却依然没有 level five 自动驾驶汽车。

Speaker 219:04 - 19:09

Right? So imitation learning obviously doesn't work even for just the task of autonomous driving.

Speaker 219:04 - 19:09

对吧？所以很显然，单靠 imitation learning 甚至连 autonomous driving（自动驾驶）这个任务本身都做不好。

Speaker 119:09 - 19:32

Yeah. I guess it'll be a race between the ability to develop some of those capabilities, which may take time and lots of data versus this kind of architecture. I feel like there's this dream of using video models to just generate, like, tons of synthetic data for for, you know, simulation. And, you know, even if it's not perfect, these video models from a physics perspective, it's, like, helpful enough to, you know, improve robotics in in the underlying physical world. What have you made of some of those approaches?

Speaker 119:09 - 19:32

对。我想，这最终会是一场竞赛：一边是发展出其中某些能力的能力，这可能需要时间和大量数据；另一边则是这种架构。我感觉现在有一种愿景，是利用 video models（视频模型）直接生成海量 synthetic data（合成数据）用于 simulation（仿真）。而且你知道，即便这些视频模型从 physics（物理）角度看并不完美，它们似乎也已经足够有帮助，足以改进真实物理世界中的 robotics（机器人技术）。你怎么看这些方法？

Speaker 119:32 - 19:35

Obviously, I think NVIDIA has been focused there. Google seems to be going down that road.

Speaker 119:32 - 19:35

很明显，我认为 NVIDIA 一直在把重心放在那里。Google 看起来也在往那条路上走。

Speaker 219:35 - 19:54

I'm sort of asking you again the question, you know, why can't 17 year old launch a drive in twenty hours? You don't need millions of hours of demonstration, and you don't need synthetic data. You don't need any of that. So, you know, I I want a system that can learn as fast that. If we crack that, then we don't need, you know, generated data.

Speaker 219:35 - 19:54

我算是在再问你一次那个问题：你知道，为什么一个 17 岁的孩子不能在二十个小时内学会开车？你不需要数百万小时的 demonstration（示范），也不需要 synthetic data（合成数据）。这些你都不需要。所以，你知道，我想要的是一个能以那样的速度学习的系统。如果我们攻克了这一点，那我们就不需要，你知道，生成数据了。

Speaker 219:54 - 20:08

Right? I mean, we might need to train a system in simulation, but not with the same amount of, you know, of time or or trials as as current systems Yeah. Require. It's really a question of data efficiency.

Speaker 219:54 - 20:08

对吧？我的意思是，我们也许需要在 simulation（仿真）中训练一个系统，但不需要像当前系统那样投入同样多的时间或试验次数。对。归根结底，这其实是一个 data efficiency（数据效率）的问题。

Speaker 120:08 - 20:32

You know, I was I interviewed Jerry Toric on the podcast. He was at OpenAI and spun up to start his own lab, and you could sense a similar tension where I think he actually might even agree that, you know, if you continued scaling RL the way we're scaling, you get more continue getting very impressive results. But I think he felt, God, there's just gotta be some way more efficient way to do this. And it's interesting. It's an interesting tension because you could imagine if you're OpenAI and you know something is gonna continue like you could continue scaling it and it will keep getting better.

Speaker 120:08 - 20:32

你知道，我之前在播客里采访过 Jerry Toric。他当时在 OpenAI，后来出来创办了自己的 lab（实验室）。你能感觉到一种类似的张力：我觉得他实际上甚至可能也会同意，你知道，如果你继续按照我们现在这种方式扩展 RL（强化学习），你会继续得到非常令人印象深刻的结果。但我觉得他的感觉是，天啊，这件事一定有某种效率高得多的做法。这很有意思。这是一种很有意思的张力，因为你可以想象，如果你是 OpenAI，而且你知道某件事会继续奏效——你可以持续扩展它，它也会持续变得更好。

Speaker 120:33 - 20:37

There's not a ton of incentive necessarily from a business perspective to do something more data efficient.

Speaker 120:33 - 20:37

但从商业角度来看，未必有很强的动力去做数据效率更高的事情。

Speaker 220:37 - 20:50

Right. There's no incentive for the other companies to do anything different either because they're all chasing the same like, they can't afford to kind of fall behind the others. Right? So they all work on the same thing. Yeah.

Speaker 220:37 - 20:50

对。其他公司其实也没有动力去做任何不同的事，因为他们都在追同一件事——他们承担不起落后于其他人的代价。对吧？所以他们都在做同样的事情。对。

Speaker 220:50 - 21:03

And there's a bit of this sort of, you know, kind of herd behavior you know, mostly in Silicon Valley where everybody is digging the same trench.

Speaker 220:50 - 21:03

而且这里面还有一点这种，你知道，某种从众行为，主要是在 Silicon Valley，每个人都在挖同一条沟。

Speaker 121:03 - 21:04

Yeah.

Speaker 121:03 - 21:04

对。

Speaker 221:04 - 21:19

And, you know, so I purposely set up the headquarters of Emilabs in Paris. Yeah. The American office being in New York, not Silicon Valley.

Speaker 221:04 - 21:19

所以，你知道，我是有意把 Emilabs 的总部设在 Paris 的。对。美国办公室则放在 New York，而不是 Silicon Valley。

Speaker 121:19 - 21:46

It's really interesting because I think it points to attention that, you know, it exists in the broader ecosystem today where, you know, you could imagine the other side being, sure, maybe there are more data efficient methods out there, but like almost who cares because we can keep scaling what we have to better and better results. And then obviously, I think from both, you know, net new things you can accomplish from these models as well as just the joy of being a researcher and finding these new things, I get why there's such an attraction to to to these other architectures as well.

Speaker 121:19 - 21:46

这很有意思，因为我觉得它确实把当今更广泛生态系统中的一种关注点凸显了出来。你可以想象另一种立场是：当然，也许外面有一些数据效率更高的方法，但几乎谁会在乎呢，因为我们完全可以继续扩展现有方法，并获得越来越好的结果。显然，我也理解为什么大家会被这些其他 architecture（架构）所吸引：一方面，这些模型确实可能带来全新的能力；另一方面，作为研究者，发现这些新东西本身也是一种乐趣。

Speaker 221:46 - 21:51

And it's a bet. Yeah. But, you know, we're pretty confident because, you know, we we have results already

Speaker 221:46 - 21:51

这是一种押注。对。但我们其实相当有信心，因为我们已经有结果了。

Speaker 121:52 - 22:03

Yeah. Actually. And as you think about, like, the the kind of the initial spaces you're most excited about for the EME technology, like, what gets you know, where do you think, you know, the the technology goes, and and what are you most excited about?

Speaker 121:52 - 22:03

对，确实如此。那当你考虑 EME 技术最让你兴奋的那些初始应用空间时，哪些方向最吸引你？你觉得这项技术会走向哪里？你又最期待什么？

Speaker 222:03 - 22:14

Well, I mean, you know, AI for the real world. Like, you know, can where is your domestic robot? Where is your level five self driving car? Yeah. Where is and that's

Speaker 222:03 - 22:14

嗯，我的意思是，AI 用于现实世界。比如说，你的 domestic robot（家用机器人）在哪里？你的 level five self driving car（L5 级自动驾驶汽车）在哪里？还有——这就是

Speaker 122:14 - 22:16

When am I gonna get a domestic robot? I'm excited about this.

Speaker 122:14 - 22:16

我什么时候才能有一个 domestic robot（家用机器人）？我对这个很期待。

Speaker 222:17 - 22:29

So this is several years down the line. Okay? Despite the fact that there is, like, a huge number of companies building robots, none of those companies actually has any idea how to make them smart enough to be useful.

Speaker 222:17 - 22:29

这还要再过几年。好吗？尽管现在有大量公司在造机器人，但这些公司实际上都还不知道该怎么让机器人变得足够聪明、聪明到真正有用。

Speaker 122:29 - 22:31

Right? Or trusted around with a baby in the house or something.

Speaker 122:29 - 22:31

对吧？或者说，聪明到能让人放心它和家里的婴儿待在一起之类的。

Speaker 222:31 - 22:45

Certainly not that. But but even for, like, you know, relatively narrow manufacturing task. Right? You know? I mean, none of them really knows how how to do this reliably other than, you know, for by imitation learning for a small number of tasks.

Speaker 222:31 - 22:45

当然还远远达不到那种程度。但即便只是相对狭窄的 manufacturing task（制造任务），对吧？我的意思是，除了通过 imitation learning（模仿学习）去完成少数几个任务之外，他们其实都还不知道怎样才能把这件事做得可靠。

Speaker 222:47 - 23:29

So how how do we make those things useful? So that's kind of a relatively long term objective. Shorter term, there is a huge amount of applications in industry where you need to have a a system, an intelligent system, that has the ability of, you know, predicting what's gonna happen if I change this control variable on this complex system, be it a jet engine, a chemical plant, a power plant, some manufacturing line, a patient, a human cell. Right? Those are systems that are sufficiently complex that you can't model their behavior with a small number of equations.

Speaker 222:47 - 23:29

那么，我们该如何让这些东西变得有用呢？这其实算是一个相对长期的目标。短期来看，工业界有大量应用场景需要这样一种系统——一种智能系统——它能够预测：如果我改变这个复杂系统中的某个控制变量，会发生什么；无论这个系统是 jet engine、chemical plant、power plant、某条 manufacturing line、一个 patient，还是一个 human cell。对吧？这类系统都足够复杂，无法用少量方程来刻画其行为。

Speaker 223:29 - 24:02

Right? So the traditional way of modeling does not work. And what you need to do is train a neural net, a deep learning system, to to, you know, model the dynamics of that system from data. And what you get at the end is a a phenomenological model of of that process, of that system. And if it's action condition, then you get basically a wide model of that system that allows you to control it optimally for whatever purpose you have.

Speaker 223:29 - 24:02

对吧？所以传统的建模方法是行不通的。你需要做的是训练一个 neural net（神经网络）、一个 deep learning system（深度学习系统），让它从数据中建模这个系统的 dynamics（动力学）。最终你得到的是对这个过程、这个系统的一个 phenomenological model（现象学模型）。如果它是 action-conditioned，那么你基本上就得到这个系统的一个 world model（世界模型），它能够让你为了任何既定目标对其进行最优控制。

Speaker 224:02 - 24:08

And I think the number of applications of this in industry is mind boggling.

Speaker 224:02 - 24:08

我认为，这在工业中的应用数量之多，简直令人难以想象。

Speaker 124:08 - 24:16

Where do you think we'll be with, you know, JEPPY models over the next couple Are there, like, you know, milestones you'd point to? Or, like, what what's your kind of view of the path of progress here?

Speaker 124:08 - 24:16

你觉得未来几年里，JEPPY models 会发展到什么程度？有没有什么你会指出来的里程碑？或者说，你对这条进展路径的整体看法是什么？

Speaker 224:16 - 24:20

Okay. Couple of years is a little short. Like, five years, complete world domination, essentially.

Speaker 224:16 - 24:20

好吧，几年这个时间稍微有点短。要说五年，那基本上就是彻底统治世界了。

Speaker 124:21 - 24:25

Okay. So somewhere between on the path to world domination in five years.

Speaker 124:21 - 24:25

好的。所以大概是在五年内走向统治世界的路上。

Speaker 224:25 - 24:36

I mean, this is kind of a joke, obviously, but this is a quote from Linus Torvalds. Right? I mean, when people ask him, what's your goal with Linux? He said total world domination. He actually managed to do that.

Speaker 224:25 - 24:36

我的意思是，这显然有点玩笑成分，但这是 Linus Torvalds 的一句话。对吧？人们问他，你做 Linux 的目标是什么？他说是彻底统治世界。而他实际上真的做到了。

Speaker 124:36 - 24:37

Yeah. Very fair.

Speaker 124:36 - 24:37

嗯，这么说很公允。

Speaker 224:37 - 25:00

The first approximation, every computer in the world runs Linux. Right? So so that's kind of a joke. But But in the end, I think this is the blueprint for intelligent systems of the future. There'd still be a small place for LLMs, you know, for, like, a language interface, basically, but what we're designing are systems that are capable of thinking.

Speaker 224:37 - 25:00

粗略地说，世界上每一台电脑都在运行 Linux。对吧？所以这算是个玩笑。但归根结底，我认为这就是未来智能系统的蓝图。LLM（大语言模型）仍然会有一小块位置，比如说，基本上作为一种语言接口，但我们正在设计的是具备思考能力的系统。

Speaker 225:00 - 25:11

They they may not be capable of talking or listening initially, but they'll do the thinking. And then you can add the talking and listening on top of that.

Speaker 225:00 - 25:11

它们一开始可能还不具备说话或听的能力，但它们会负责思考。然后你可以在这个基础之上，再加上说话和听的能力。

Speaker 125:11 - 25:23

I'm sure you and the team are are are eagerly working to kind of you know, get the early proof points of this. And, obviously, you've already had some of the work you've done. How do you think about, like, the interim steps of what you'll be able to show on that path to to five year world domination?

Speaker 125:11 - 25:23

我相信你和团队一定都在积极推进这件事，想要拿出一些早期的 proof points（验证性证据）。而且显然，你们已经做出了一些成果。你怎么看这条通往未来五年“统治世界”的道路上的中间步骤——也就是你们能够逐步展示些什么？

Speaker 225:23 - 26:14

Well so I think, you know, within a year or so, we'll have, I think, a a general methodology to train hierarchical world models on, you know, a very wide variety of modalities. We know we can do a good job on video with some techniques that we're not completely happy with because they have some shortcomings. But and we have sort of small scale demonstration of a methodology that we think is really what we want. So we need to scale that one up and get it to the same level of performance as the the other techniques that are not as satisfying, if you want, on on things like video, but also on other types of datasets that we would get from industry partners. Okay?

Speaker 225:23 - 26:14

是这样的，我认为，在一年左右的时间内，我们会拥有一套通用的方法论，用来在非常广泛的模态上训练分层的 world models（世界模型）。我们知道，借助一些我们并不完全满意的技术，我们可以在视频上做得不错，因为那些技术有一些短板。但与此同时，我们也已经有了某种小规模的方法演示，而我们认为那才是真正想要的方向。所以我们需要把那套方法扩展起来，让它达到与其他那些虽然没那么令人满意、但在视频等任务上表现还不错的技术同样的性能水平；同时也要把它应用到其他类型的数据集上，比如我们会从 industry partners（产业合作伙伴）那里拿到的数据。好吗？

Speaker 226:14 - 26:34

So we'll have demonstrations that we can train world models, perhaps action condition world models that allow us to plan for a number of different use cases. Some of them would be robotics. Some of them will be industrial process control of various types. Maybe some of them in health health care as well because we are partners in that Yeah. Yeah.

Speaker 226:14 - 26:34

所以我们会展示，我们能够训练 world models，或许还包括 action-conditioned world models（动作条件世界模型），它们可以让我们针对多种不同用例进行规划。其中一些会是 robotics（机器人），一些会是各种类型的工业过程控制，也许还有一些会在 health care（医疗健康）领域，因为我们在那方面也有合作。对，对。

Speaker 226:34 - 26:57

In that domain. And that should be within a year to eighteen months. And then we'd push the this methodology and those models into those use cases with partners, some of which are investors already Yeah. You know, in our company, and gain experience on how to kind of essentially build a somewhat universal world model, if you want.

Speaker 226:34 - 26:57

在那个领域里。这应该会在一年到十八个月内实现。然后我们会和合作伙伴一起，把这套方法论和这些模型推进到那些用例中去，其中有些合作伙伴本身已经是我们公司的 investors（投资人）了。你知道的，并借此积累经验，去构建某种程度上可以说是 somewhat universal world model（某种较为通用的世界模型）。

Speaker 126:57 - 27:20

I mean, you've obviously had this, you know, this experience before of of kind of making this really contrarian bet on neural nets and and being certainly proven abundantly right in in in the history books. I guess as you think about this bet, which I think, you know, if you talk to the majority of people maybe at at at the cutting edge of various parts of AI, maybe would would say is contrarian today. In what time frame do you think it will become apparent, like, you know, that this was right?

Speaker 126:57 - 27:20

我的意思是，你显然以前就有过这样的经历：做出这种非常逆势的押注，把赌注押在 neural nets（神经网络）上，而且历史已经充分证明你是对的。我想，当你看待这一次的押注时——而我觉得，如果你去问今天 AI 各个前沿领域里的大多数人，他们可能会认为这仍然是逆势的——你觉得要在多长时间内，大家才会明显看出来，这条路其实是对的？

Speaker 227:20 - 27:49

I think it'll happen faster than expected, perhaps, because, I mean, you can see that role model is already becoming a buzzword. Right, at least at a research level, and it's starting to kinda permeate into the industry. Yep. And a lot of people are realizing, like, VNAs suck, and, you know, LLMs don't work for real world data. Industry has realized this already, certainly on the on the on the user side.

Speaker 227:20 - 27:49

我觉得这件事可能会比预期发生得更快，因为，你可以看到，world model（世界模型）已经开始变成一个 buzzword（流行热词）了。对吧？至少在研究层面是这样，而且它也开始逐渐渗透到产业界。对。很多人也开始意识到，VNA 很糟糕，而且，LLM 对真实世界数据并不好用。产业界其实已经意识到这一点了，至少在用户侧肯定如此。

Speaker 227:50 - 28:13

And I think because of the importance of the robotics industry, you know, a lot of people are kind of trying to figure out, like, how how do we how do we get there? How do you get how you make those robots useful? So so I think it's I think the realization that you need to change a paradigm is is happening as we speak. It will become completely obvious to people by early twenty twenty seven, I think. Yeah.

Speaker 227:50 - 28:13

我认为，正因为 robotics 行业的重要性，很多人都在努力弄清楚：我们到底该怎么走到那一步？怎么才能让那些机器人变得有用？所以我觉得，人们已经开始意识到，你需要改变一种 paradigm（范式），而且这个过程此时此刻就正在发生。我想，到了 2027 年初，这一点会对所有人都变得完全显而易见。对。

Speaker 228:13 - 28:18

Now that doesn't mean we'll have a solution by then. We hope we will, but, you know, we'll see.

Speaker 228:13 - 28:18

当然，这并不意味着到那时我们就一定已经有了解决方案。我们希望会有，但你知道，还得走着看。

Speaker 128:18 - 28:25

I guess, you know, switching gears to the LM side, you mentioned some of this work you're doing with with Tapestry, which I think would be really interesting for our listeners. And so maybe just speak to that a little bit.

Speaker 128:18 - 28:25

我想，换个话题，聊聊 LM 这一边，你提到过你和 Tapestry 正在做的一些工作，我觉得这对我们的听众会非常有意思。所以也许你可以稍微讲一讲。

Speaker 228:25 - 28:29

Okay. So this is kind of a little bit orthogonal to to AmyLabs.

Speaker 228:25 - 28:29

好的。这个方向和 AmyLabs 算是有一点正交，也就是不完全在同一条线上。

Speaker 128:29 - 28:31

As if that wasn't enough to keep you busy.

Speaker 128:29 - 28:31

好像你手头的事还不够多似的。

Speaker 228:32 - 29:20

Well, it's a it's a kind of an idea I've I've been forming over the last three years or so is the fact that people increasingly use AI assistants for various things. Right? I mean, you see a decrease in the use of, you know, traditional search engines, and you just ask a question to your favorite AI assistant. And, you know, if the plan that Meta and others are are developing of, you know, having smart devices like smart glasses and stuff like that, you know, is realized, basically, you'd just be talking to your AI assistant, you know, by voice with you know, to your smart glasses or maybe some other smart device. And so all of your information diet will be mediated by AI assistants.

Speaker 228:32 - 29:20

嗯，这是一个我在过去大概三年里逐渐形成的想法：人们越来越多地把 AI assistants（AI 助手）用于各种事情。对吧？我的意思是，你已经能看到传统 search engines（搜索引擎）的使用在下降，人们会直接向自己喜欢的 AI assistant 提问。而且，如果 Meta 以及其他公司正在推进的那种计划——也就是让 smart devices（智能设备），比如 smart glasses（智能眼镜）之类真正普及——实现了，那么基本上你就会一直通过语音和你的 AI assistant 对话，可能是对着你的 smart glasses，也可能是对着别的 smart device。所以，你摄入的全部信息饮食，都会由 AI assistants 来中介。

Speaker 229:22 - 30:21

And if you are someone, you know, somewhere in the world, let's say outside The US or China, and you have an AI assistant, and that AI assistant was built in California or, you know, Beijing or Shanghai or Shenzhen, it's not good for you. Like, you may speak a language that those systems really haven't been trained to handle particularly well. You may have a culture that is not particularly well understood by people in Silicon Valley and China, not well represented by the training data that is publicly available on the Internet, you may have a value system that is absolutely not represented by, you know, people building those models. And certainly, you'll almost certainly have political opinions that are absolutely not represented by the handful of AI assistant you you might be able to get from the, you know, West Coast tech companies or from Chinese companies. So what is the solution to this?

Speaker 229:22 - 30:21

如果你是世界上某个地方的人，比如说你在 The US 或 China 之外，而你有一个 AI assistant，但这个 AI assistant 是在 California，或者 Beijing、Shanghai、Shenzhen 造出来的，那对你并不好。因为你说的语言，这些系统可能根本没有被充分训练到能很好处理；你所属的文化，Silicon Valley 和 China 的人未必真正理解，而互联网上公开可得的训练数据也未必能很好代表这种文化；你的价值体系，也很可能根本没有体现在构建这些模型的人身上。当然，你几乎肯定还会有一些政治观点，而这些观点也绝不会被你能从 West Coast 的科技公司或 Chinese companies 那里拿到的少数几个 AI assistants 所代表。那么，这个问题的解决方案是什么呢？

Speaker 230:21 - 31:26

Like, how do you serve, you know, a farmer in India or even a philosopher in France or Germany? And what you need is a platform which basically is an open free foundation model, LLM style, that is fine tunable by anyone to cater to the interest of people speaking a particular language, having a particular culture, having particular value systems, political biases, creeds, whatever it is. And so what you need is a wide diversity of AI assistance. There's a lot of countries around the world that are neither The US nor China who absolutely want some level of sovereignty for AI, not just for the industry, but also for the citizen. They don't want the citizen to get brainwashed by a Chinese model or a Californian model, actually.

Speaker 230:21 - 31:26

比如说，你要怎么服务 India 的一个农民，甚至 France 或 Germany 的一位哲学家？你需要的是一个平台：它本质上是一个开放、自由的 foundation model（基础模型），是 LLM 风格的，而且任何人都可以对它做 fine-tune（微调），以满足特定语言使用者、特定文化背景人群、特定价值体系、政治偏见、信条，或任何其他取向的需求。所以，你需要的是高度多样化的 AI assistants。世界上有很多国家，既不是 The US，也不是 China，但它们绝对希望在 AI 上拥有一定程度的 sovereignty（主权），而且不仅是产业层面的，也包括公民层面的。它们不希望自己的公民被 Chinese model 或 Californian model 洗脑，确实是这样。

Speaker 231:28 - 32:08

And so they want sovereignty. How do you get that? So the way you get a platform like an open platform like this to get to the frontier is you just train it on more and higher quality data than the than the proprietary systems. If you talk to people in India, in France, in Vietnam, in Morocco, in Switzerland, in Korea, Japan, Kazakhstan, everyone wants basically sovereignty. And you tell them, like, you guys have been training your model, you know, locally.

Speaker 231:28 - 32:08

所以他们想要 sovereignty（主权）。那怎么获得它呢？要让这样一个平台、这样一个 open platform（开放平台）达到前沿水平，办法就是用比 proprietary systems（专有系统）更多、质量更高的数据来训练它。如果你和 India、France、Vietnam、Morocco、Switzerland、Korea、Japan、Kazakhstan 的人交流，基本上每个人都想要 sovereignty。然后你会告诉他们，比如说，你们一直都在本地训练自己的 model（模型）。

Speaker 232:08 - 32:40

You don't have to share your data. So that's the cultural aspect of Tapestry. You would have international contributor contributors to Tapestry contributing to training a a global model that would basically constitute a repository of all the world knowledge and culture, if you want. But the contributors would contribute data and computing resources, but they would preserve the control on their data. They would not have to share their data with the other contributors.

Speaker 232:08 - 32:40

你们不必共享自己的数据。所以这就是 Tapestry 在文化层面的特点。你会有来自国际上的 contributor（贡献者）为 Tapestry 做贡献，共同训练一个 global model（全球模型）；如果你愿意，也可以把它看作一个汇集全球知识与文化的 repository（知识库）。但这些贡献者贡献的是数据和 computing resources（计算资源），同时仍然保留对自己数据的控制权。他们不需要把自己的数据分享给其他贡献者。

Speaker 232:40 - 33:03

What they would contribute is parameter vectors. Interesting. So it would be a kind of federated learning style thing where you have a bunch of data centers. You know, they they get the parameter vector from the the the global consensus of a model. Think of it as an average of all the all the parameter vectors of all the contributors.

Speaker 232:40 - 33:03

他们真正贡献的是 parameter vectors（参数向量）。这很有意思。所以这会有点像 federated learning（联邦学习）的风格：你有一批 data centers（数据中心）。它们会从一个模型的 global consensus（全局共识）中得到 parameter vector。你可以把它理解为所有贡献者全部 parameter vectors 的某种平均值。

Speaker 233:03 - 33:28

Right? So all the contributors periodically tell everyone else through maybe a central server, Here is my parameter vector. What is yours? Okay? And so you exchange parameter vectors like this, and a local worker, basically, whenever it updates its parameter vector, it tries to also makes make it as close as possible to the global consensus vector.

Speaker 233:03 - 33:28

对吧？所以所有贡献者会周期性地通过某个 central server（中央服务器）之类的机制告诉其他人：这是我的 parameter vector，你的呢？好吧？于是你就这样交换 parameter vectors，而本地的 worker（工作节点）每次更新自己的 parameter vector 时，基本上也会尽量让它接近那个 global consensus vector（全局共识向量）。

Speaker 233:28 - 34:20

So as the training of this thing kind of progresses, all those parameter vectors converge towards, like, a a consensus model, essentially, which is kind of a repository of all human knowledge. Now you have an open an open model that is as good as if it had been trained on all the data in the world, and now you can fine tune it for your own purpose, your own political, cultural, and linguistic biases, whatever you want, or centers of interest. And I think there is a natural force for this to happen because, you know, most countries that are not The US nor China want sovereignty, but also because AI is fast becoming a platform, and there is a natural tendency for platforms to become open. That's what happened with Linux. Right?

Speaker 233:28 - 34:20

所以随着这个系统的训练逐步推进，所有这些 parameter vectors 都会逐渐收敛到一个 consensus model（共识模型），本质上，它有点像是一个汇集全人类知识的 repository。这样一来，你就拥有了一个 open model（开放模型），它的效果几乎等同于在全世界所有数据上训练出来的模型；然后你可以再根据自己的用途，对它进行 fine tune（微调），加入你自己的政治、文化和语言偏好，或者任何你想要的东西，也可以按你关心的重点方向来调。我认为这件事发生有一种自然推动力，因为你知道，除了 The US 和 China 之外，大多数国家都想要 sovereignty；同时也因为 AI 正在迅速变成一个 platform（平台），而平台天然就有走向开放的趋势。Linux 就是这样发生的，对吧？

Speaker 234:20 - 34:33

And that's what happened with the software infrastructure of the Internet or the wireless network. It's all open source. It was proprietary initially, but that was all wiped out.

Speaker 234:20 - 34:33

Internet 或 wireless network（无线网络）的软件基础设施也是这样。它们全都是 open source（开源）的。最初它们是 proprietary（专有）的，但那些后来都被彻底淘汰了。

Speaker 134:33 - 34:53

That's a really clever way to get around, you what would seem to this trend of decreasing open source. And Yeah. And, obviously, I think there's been many fears that as the closed source models get better, they'll be held back, they'll be used to train the next generation. And they'll they'll kinda be this almost escape scenario for for closed source models where they get Yep. You know, so much better than than their open source counterparts.

Speaker 134:33 - 34:53

这真是个很聪明的办法，用来绕过——你知道——那种看起来像是 open source（开源）不断减少的趋势。对。显然，我觉得很多人一直担心，随着 closed source models（闭源模型）变得越来越强，它们会被保留起来，用于训练下一代模型。于是就会出现一种近乎“逃逸”的局面：closed source models 会——对——比它们的 open source 对应物强出很多。

Speaker 234:53 - 35:19

So remember what you know, who the big players of the Internet infrastructure were in 1996? Sun Microsystems, HP, Dell, and a few others. So Sun Microsystem was selling you Solaris with their, you know, proprietary hardware, HP with HP UX. They were claiming, you know, Unix is so much more reliable than Windows. You're not gonna run a web server on Windows.

Speaker 234:53 - 35:19

所以，回想一下 1996 年时 Internet 基础设施领域的大玩家是谁？Sun Microsystems、HP、Dell，还有其他几家。比如 Sun Microsystem 当时向你出售 Solaris，搭配他们自己的 proprietary hardware（专有硬件）；HP 则提供 HP UX。他们那时宣称，Unix 比 Windows 可靠得多。你总不可能在 Windows 上跑一个 web server。

Speaker 235:19 - 35:33

Dell was doing this, you know, with Windows NT, but, like, who is running Windows NT now as a web server? All of this was totally wiped out by Linux. Like, the entire Internet runs on Linux. Even Azure. Right?

Speaker 235:19 - 35:33

你知道，Dell 当时也在做这个，配合 Windows NT，但现在还有谁会把 Windows NT 用作 web server（Web 服务器）呢？这一切后来都被 Linux 彻底横扫了。比如，整个 Internet 都跑在 Linux 上，连 Azure 也是。对吧？

Speaker 235:33 - 35:48

Even Microsoft even Linux. So, basically, OpenAI, Anthropic, etcetera, today are the Sun Microsystem and HPE UX of yesterday.

Speaker 235:33 - 35:48

连 Microsoft 也得用 Linux。所以，基本上，今天的 OpenAI、Anthropic 等等，就相当于昨天的 Sun Microsystem 和 HPE UX。

Speaker 135:49 - 36:03

Yeah. I mean, I guess implicit in that is is obviously, you know, I think your, you know, your view of, like, the limitations of of what, like you know, these models can only get so good, and so it'll be possible over time for for the open source folks to to catch up.

Speaker 135:49 - 36:03

对。我的意思是，我想这里面隐含的一点显然是，你的看法是这些 models（模型）的能力是有上限的，它们只能好到某个程度，因此随着时间推移，open source（开源）阵营是有可能追上来的。

Speaker 236:03 - 36:15

They've already run out of data. Right? I mean, the the the open openly available publicly available data, text data, is already all used. I mean, there's not more of it. Right?

Speaker 236:03 - 36:15

他们已经把数据用完了。对吧？我的意思是，那些公开可得、面向公众开放的 text data（文本数据），其实已经全都被用过了。已经没有更多了。对吧？

Speaker 236:15 - 36:25

So what what those companies are doing is licensing commercial copyrighted data or training on synthetic data.

Speaker 236:15 - 36:25

所以这些公司现在在做的，要么是给商业版权数据付费授权，要么就是用 synthetic data（合成数据）来训练。

Speaker 136:25 - 36:38

And I guess I'm curious because, obviously, there's been some impressive results in the last few years that they that they have been able to drive, you know, post these large scale free trainings. You know, IMO Gold, you know, the meter task horizon benchmark keeps going up. Okay.

Speaker 136:25 - 36:38

不过我也很好奇，因为显然，过去几年里他们确实拿出了一些很令人印象深刻的结果，说明在那些大规模、免费的训练之后，他们还是推动了不少进展。比如 IMO Gold，还有那个 meter task horizon benchmark，数值一直在上升。好吧。

Speaker 236:38 - 36:42

That's okay. That's very interesting. Now think about those two domains. Right? Mathematics and code.

Speaker 236:38 - 36:42

没关系。这很有意思。现在想想这两个领域。对吧？数学和代码。

Speaker 236:43 - 37:08

Those are two domains where the language itself is the substrate of reasoning. It's not the only substrate of reasoning, but a lot of when you do mathematics, right, the the formal way on a piece of paper, not the intuitive stuff, but the human manipulate language. Right? And LLMs are really good at this. So, you know, proving theorems and stuff like that, that's that's what LLMs are really good at.

Speaker 236:43 - 37:08

这两个领域里，language（语言）本身就是 reasoning（推理）的载体。它不是推理唯一的载体，但当你做数学时，对吧，尤其是在纸面上进行形式化推导时，不是那种直觉性的东西，而是人类在操纵语言。对吧？而 LLMs 在这件事上特别擅长。所以，证明定理之类的事情，正是 LLMs 真正擅长的。

Speaker 237:08 - 37:19

They're not so good at sort of, you know, coming up with, like, good concepts and definitions and things like that. It it's more like, here is a problem. Solve it. They're problem solvers. Mathematics is not just problem solving.

Speaker 237:08 - 37:19

它们不太擅长那种，你知道的，提出好的概念、定义之类的东西。更像是：这里有一个问题，去解决它。它们是 problem solvers（问题求解者）。而数学不只是解题。

Speaker 237:19 - 37:32

Right? Most of it is actually a creative act that those things don't do. And same for code. So LLMs are good programmers. They're not software architects.

Speaker 237:19 - 37:32

对吧？其中大部分其实是一种创造性的行为，而这些东西做不了。代码也是一样。所以 LLMs 是优秀的程序员，但不是 software architects（软件架构师）。

Speaker 237:33 - 37:47

They're not computer scientists. Right? But they can program for us. So they they're they're not in a in a state where they can just, you know, replace humans entirely. It changes the world of humans.

Speaker 237:33 - 37:47

它们不是 computer scientists（计算机科学家），对吧？但它们可以替我们编程。所以它们还没有到那种可以彻底取代人类的状态。它改变的是人类所处的世界。

Speaker 237:47 - 38:10

So humans now, you know, kind of go one level up in the abstraction hierarchy, and our role is to decide what to build. But like building it, you know, you can you can get help from LLMs. But, okay, that's the the important point is that LLMs are particularly successful at domains where the language itself is the substrate of reasoning, not for anything else.

Speaker 237:47 - 38:10

所以人类现在某种程度上会在抽象层级里再往上升一层，我们的角色是决定要构建什么。但至于怎么把它构建出来，你可以从 LLMs 那里获得帮助。不过，好，重点在于：LLMs 尤其擅长那些“语言本身就是推理载体”的领域，除此之外并非如此。

Speaker 138:10 - 38:14

Yeah. What would an LLM, like, need to do to convince you otherwise?

Speaker 138:10 - 38:14

嗯。要让你相信情况并非如此，LLM 需要做到什么？

Speaker 238:14 - 38:23

So it's like a zero shot agentic system. Right? You have an agentic system. Give it a new problem. It's not been trained to solve that that particular problem.

Speaker 238:14 - 38:23

所以这有点像一个 zero-shot agentic system（零样本 agentic 系统）。对吧？你有一个 agentic system，给它一个新问题，而它此前并没有被训练去解决那个特定问题。

Speaker 238:23 - 38:52

It doesn't have a script for it. Is it gonna be able to accomplish this task that it's never been trained to solve? And unless the system has the ability of predicting the consequences of its actions and then use using use that for play for planning, it's not gonna be able to do it. And you're not gonna do this with an LLM. You're gonna do this perhaps with a significantly augmented LLM that is capable of, you know, search and planning, blah blah blah.

Speaker 238:23 - 38:52

它对此没有现成的 script（脚本）。它能完成这个自己从未被训练去解决的任务吗？除非这个系统具备预测自己行动后果的能力，并且把这种能力用于 planning（规划），否则它做不到。而这不是靠一个 LLM 就能实现的；你也许需要的是一个经过显著增强的 LLM，它能够进行 search（搜索）和 planning（规划）之类的，等等。

Speaker 238:52 - 39:12

And currently, you know, LMs that do math and code actually do this. Yep. Right? Because they search for, you know, sequences of tokens that actually accomplish a particular task, and, you know, they can run the code or verify that the proof is correct or whatever. So you have, like, a way of checking whether something that's produced is is is correct.

Speaker 238:52 - 39:12

而目前，能做数学和代码的 LMs 其实就是这样做的。对。因为它们会搜索能够完成某个特定任务的 token（词元）序列，而且它们可以运行代码，或者验证证明是否正确，等等。所以你得有一种方式来检查产出的东西是否正确。

Speaker 239:13 - 39:29

But it's not a very efficient way of of doing planning. And it only works in domains where this type of search can be performed in token space. What I'm talking about with Jepa is you don't do this in token space. You do this in, you know, abstract thoughts space.

Speaker 239:13 - 39:29

但这并不是一种非常高效的规划方式。而且，它只适用于那些这类搜索可以在 token 空间中执行的领域。我说的 Jepa 不一样：你不是在 token 空间里做这件事，而是在一种更抽象的 thoughts 空间里做。

Speaker 139:29 - 39:39

And I'm sure some people listening might think, well, you know, hey. If if even if it's inefficient and it works at, you know, at at things that are done in token space, that's still a a large part of the of the economy that Well,

Speaker 139:29 - 39:39

我相信有些正在听的人可能会想，嗯，你知道，嘿，即便它效率不高，而且只适用于那些在 token 空间中完成的事情，但那仍然是经济活动中相当大的一部分。那——

Speaker 239:39 - 39:52

I mean, if it works, it's fine. I mean, there's again, there's nothing wrong with, you know, using it for what they're good at. It's just not a pass towards you whenever they are. You're missing Yeah. You know, in a like a huge domain.

Speaker 239:39 - 39:52

我的意思是，如果它有用，那就没问题。我的意思是，还是那句话，用它来做它擅长的事并没有什么错。只是这并不是通往——无论你想称之为什么——的路径。你遗漏了，是的，你知道，一个非常巨大的领域。

Speaker 139:52 - 39:56

You seem like, you you know, hey. It's gonna tap out before it can become a software architect, whereas I'm sure

Speaker 139:52 - 39:56

你看起来像是在说，呃，你知道，嘿，它会在成为 software architect 之前就先到极限，不过我确信——

Speaker 239:56 - 40:30

It's not gonna tap out. It's it's just gonna have, like, a a limited, you know, ability ability to be deployed for, an it's gonna become, like, increasingly difficult to kinda deploy it for an increasingly large number, you know, of of use cases because you're gonna have to collect tons of training data for each of those use cases. And these are basically, you're not gonna be able to make those systems completely reliable, you know, without hallucinations or or dangerous stuff or etcetera, unless those systems have the ability to predict the consequences of their actions, which means they're gonna have to have explicit world models.

Speaker 239:56 - 40:30

它倒不是真的会到极限。只是它可部署的能力会比较有限，而且你知道，要把它部署到越来越多的 use case（使用场景）上，会变得越来越困难，因为你将不得不为每一种 use case 收集海量训练数据。而且基本上，如果这些系统没有预测自己行为后果的能力——这意味着它们必须具备显式的 world model（世界模型）——你就不可能让这些系统变得完全可靠，不出现 hallucination（幻觉）或危险行为之类的问题。

Speaker 140:30 - 40:36

Yeah. So I guess the bet against, you know, the a 100% accuracy and then also the generalization Yeah. Across different tasks.

Speaker 140:30 - 40:36

对，所以我想，你反对的点一方面是无法达到 100% accuracy（准确率），另一方面也是它跨不同任务的 generalization（泛化）能力，对吧。

Speaker 240:36 - 40:37

Right.

Speaker 240:36 - 40:37

对。

Speaker 140:37 - 40:53

I guess, know, one thing that that's so interesting about the way the the field has developed is, obviously, you shared the the Turing Award with two others, and I feel like they seem much more convinced of, like, maybe the the power or potential threats or safety risks of LLMs over time. I'm wondering, like, when did your view start diverging?

Speaker 140:37 - 40:53

我想，嗯，这个领域的发展方式有一点特别有意思：显然，你和另外两个人一起分享了 Turing Award，而且我感觉他们似乎更相信 LLM（大语言模型）随着时间推移所具有的力量，或者潜在威胁、以及安全风险。我想知道的是，你的看法是从什么时候开始出现分歧的？

Speaker 240:53 - 40:55

In 2023.

Speaker 240:53 - 40:55

在 2023 年。

Speaker 140:55 - 40:57

And what, like, drove that in your mind?

Speaker 140:55 - 40:57

那在你看来，是什么推动了这件事？

Speaker 240:57 - 41:09

I didn't change my mind. They changed their mind. Okay? And at just about the same time, it was basically GPT four. I mean, Jeff basically had was not connected to any of that.

Speaker 240:57 - 41:09

我并没有改变想法。是他们改变了想法。明白吗？而且差不多就在同一时间，基本上是因为 GPT-4。我的意思是，Jeff 基本上和那一切都没有关联。

Speaker 241:09 - 41:34

He was never really interested in LLMs and discovered GPT four, you know, 2023 when it came out. And, basically, he had an epiphany and said, oh my god. Those systems, you know, are really close to human level intelligence, and they have possibly, they have subjective experience. And he he did a a quick calculation saying, like, okay. The human cortex has about 16,000,000,000 neurons.

Speaker 241:09 - 41:34

他其实从来都不是真正对 LLMs（大语言模型）感兴趣；直到 2023 年 GPT-4 推出时，他才注意到它。然后，基本上，他有了一次顿悟，说，哦天哪，这些系统，你知道，已经非常接近人类水平的智能了，而且它们可能——有可能——拥有主观体验。接着他很快算了一笔账，大概是这样：人类大脑皮层大约有 16,000,000,000 个神经元。

Speaker 241:34 - 42:11

If you want to do something like backprop okay. The brain doesn't do backprop directly. But if it does something like backprop, like some sort of, you know, gradient estimation for some sort of objective function, you probably need like a network of a few neurons to kind of reproduce the functionality of a virtual neuron in a neural net. As I said, like, let's assume, you know, maybe you need a circuit of 10 actual neurons to reproduce what a back prop neuron does. Then all of a sudden, your cortex is only 1,600,000,000 neurons.

Speaker 241:34 - 42:11

如果你想做类似 backprop（反向传播）的事情，好吧，大脑并不是直接做 backprop。但如果它做的是某种类似 backprop 的事情，比如为了某种目标函数做某种 gradient estimation（梯度估计），那你大概需要一个由几个神经元组成的网络，来大致复现神经网络里一个虚拟神经元的功能。正如我刚才说的，我们先假设，也许你需要一个由 10 个真实神经元组成的回路，才能复现一个 backprop 神经元所做的事。那样一来，你的大脑皮层就只相当于 1,600,000,000 个神经元了。

Speaker 242:11 - 42:19

Oh my god. GPT-four is really close to this. Okay? So maybe it's as smart, you know, it's going get as smart as humans. I do not believe in this claim at all.

Speaker 242:11 - 42:19

哦天哪。GPT-4 已经非常接近这个数量级了。对吧？所以也许它和人类一样聪明，你知道，它将会变得和人类一样聪明。我完全不相信这种说法。

Speaker 242:19 - 42:43

This is kind of, you know, Jeff's way of saying, okay, basically, I can retire. I can declare victory. You know, I search for the learning algorithm of the cortex of my career. Maybe I didn't discover what it really was, but backprop seems to be like a good substitute for it. It works really well.

Speaker 242:19 - 42:43

这有点像是 Jeff 的一种说法：好吧，基本上，我可以退休了。我可以宣布胜利了。你知道，我职业生涯一直在寻找大脑皮层的学习算法。也许我没有发现它真正是什么，但 backprop 看起来像是一个很好的替代品。它效果非常好。

Speaker 242:44 - 43:20

And so maybe that's what we need. So I can retire and and go around the world and give talks about, you know, the potential promises and dangers of AI. That's basically what, you know, I think what his intellectual kind of trajectory has been. He's much less vocal about the potential dangers now than he was a year or two ago. He kinda realized there's probably a way to design three d intelligence systems.

Speaker 242:44 - 43:20

所以，也许那就是我们需要的。这样我就可以退休，到世界各地去演讲，谈谈 AI 的潜在承诺和危险。基本上，我认为这就是他的那种智识轨迹。相比一两年前，他现在对潜在危险谈得少得多了。他有点意识到，也许是有办法设计出 3D intelligence systems（3D 智能系统）的。

Speaker 243:20 - 43:49

So first of all, he probably you know, he realized that current LLMs are not that smart, first of all. And and second, that there's probably a need for a few breakthroughs, like conceptual breakthroughs before we get to human like intelligence. And third, that the the blueprint of those systems will be quite different from LLMs, and we we probably have a way of, you know, making them controllable and things like that. Yeah. I've been saying this for years, but okay.

Speaker 243:20 - 43:49

所以首先，他大概已经意识到，当前的 LLMs 并没有那么聪明，这是第一点。第二，可能还需要一些突破，比如说概念性的突破，我们才能走到类人智能。第三，这些系统的 blueprint（设计蓝图）会与 LLMs 很不一样，而我们大概也会有办法让它们变得可控，诸如此类。对，这些话我已经说了很多年了，不过好吧。

Speaker 243:49 - 44:24

He sort of discovered this recently. Yeah. Same kind of there's a similar thing with Joshua. I think what they are both worried about is the ability of society and the political system to make sure that the benefits of of AI will be maximized and AI would not, you know, just profit, you know, make rich people even richer and, you know, accentuate inequalities and and, you know, cause major catastrophes because of bad usage. Okay?

Speaker 243:49 - 44:24

他算是最近才发现这一点。对。Joshua 那边也有点类似。我觉得他们两个人共同担心的是，社会和政治体系是否有能力确保 AI 的收益被最大化，以及 AI 不会只是带来利润、让富人更富，并且加剧不平等，还会因为糟糕的使用方式而造成重大灾难。明白吗？

Speaker 244:24 - 44:31

This is not like the the numerous scenario of AI taking over the world. It's more bad uses. Right.

Speaker 244:24 - 44:31

这不是那种 AI 接管世界的老套情景。更多是糟糕的使用方式。对。

Speaker 144:31 - 44:33

Which seems possible with the LLMs of today.

Speaker 144:31 - 44:33

这在今天的 LLMs 身上看起来是有可能发生的。

Speaker 244:33 - 44:55

Which is a danger, but, you know, I I don't think it's as apocalyptic as, you know, what some people have claimed it is. Certainly not as apocalyptic as what even Anthropoc has claimed and has tried to kinda lobby governments into, you know, scaring governments into kinda regulating AI because because of that. I don't I don't I don't subscribe to this at all.

Speaker 244:33 - 44:55

这当然是一种危险，但我不认为它像有些人宣称的那样末日化。当然也远没有 Anthropoc 所声称的那么末日化；他们甚至还试图以此游说政府，某种程度上是通过吓唬政府来推动对 AI 的监管。我完全不认同这一套。

Speaker 144:55 - 44:57

They seem to genuinely believe it.

Speaker 144:55 - 44:57

他们看起来是真心相信这一点。

Speaker 244:57 - 45:13

I think they genuinely believe it, but, also, I think there is, you know, some kind of commercial good commercial reasons for them to believe that and to kind of, you know, brainwash some people and government into thinking their systems are are dangerous.

Speaker 244:57 - 45:13

我觉得他们确实是真心相信，但同时我也认为，他们相信这一点、并且某种程度上给一些人和政府洗脑，让他们觉得自己的系统是危险的，这背后也有一些商业上的、对他们有利的理由。

Speaker 145:13 - 45:38

And it sounds like, you know, with these other architectures, do you think they're because, obviously, it doesn't you know, as maybe bearish as you are on LMs being the end state of everything, know, you have some pretty ambitious timelines too for for these new architectures. And so Yeah. It doesn't seem like you think we're particularly far away from from from some very compelling capabilities. How do you think about, I guess, the the safety around, you know, if it ends if these breakthroughs end up coming from newer architectures and whether that should make us rest easier or not.

Speaker 145:13 - 45:38

听起来，关于这些其他架构，你的意思是——显然，尽管你并不看好 LMs 会成为一切的终局形态，但你对这些新架构的发展时间线也相当激进。所以，是的。感觉你并不认为我们距离一些非常引人注目的能力还很遥远。我想问的是，如果这些突破最终来自更新的架构，你会如何看待其中的 safety（安全性）问题？这是否会让我们更安心一些，还是不会？

Speaker 245:38 - 45:53

I'm gonna say something that's, again, might be controversial. And certainly, my some of my colleagues at Meta didn't like me saying this. But I think LLMs are intrinsically unsafe. I don't think they can be made reliable and safe. Okay?

Speaker 245:38 - 45:53

我要说一件事，这话再次强调一下，可能会有争议。当然，我在 Meta 的一些同事并不喜欢我这么说。但我认为 LLM（大语言模型）从根本上就是不安全的。我不认为它们能够变得既可靠又安全。好吗？

Speaker 245:53 - 46:05

They cannot be made reliable because you can't stop them from hallucinating. And if they are agentic, you cannot guarantee they're not going to, like, take an action that, you know, they didn't predict the outcome of and that I mean,

Speaker 245:53 - 46:05

它们不可能变得可靠，因为你无法阻止它们产生 hallucination（幻觉）。而如果它们是 agentic（具备 agent 行为能力的），你就无法保证它们不会采取某种行动——比如说，去做一件它们自己都没有预测清楚后果的事——而这我的意思是，

Speaker 146:05 - 46:09

is it surprised you they can do these, like, fifteen hour coding tests given the concerns around reliability?

Speaker 146:05 - 46:09

考虑到大家对可靠性的担忧，它们居然能完成这种比如长达十五小时的 coding（编程）测试，这会让你感到惊讶吗？

Speaker 246:09 - 46:29

Well, the coding is something where you can actually verify that, you know, the code that you generate, you know, satisfy your specification. But but not everything is coding. And and there are examples of, you know, coding agents, wiping up your Yeah. Definitely. Your hard drive.

Speaker 246:09 - 46:29

嗯，coding 这种事情，是少数你确实可以验证的场景：你可以检查生成出来的代码是否满足你的 specification（规格说明）。但是——并不是所有事情都是 coding。而且，也确实有一些 coding agent（编程 agent）的例子，会把你的——对，没错——你的 hard drive（硬盘）给清空。

Speaker 246:29 - 47:20

Right? So, like or or doing stupid things, right, that makes you lose a lot of money or data or whatever. So I think I think, you know, LLMs in their current forms are are intrinsically unsafe because they cannot predict the consequences of their actions and because the way the task that they accomplish is determined is is subject to their training. You know, you you give them a prompt, and then they will accomplish a task that correspond to that prompt only to the to the extent that their training has conditioned them to actually do the right task corresponding to this prompt. But there's no hardwired constraint that will force them to accomplish this task and then predict that the task would be accomplished properly.

Speaker 246:29 - 47:20

对吧？或者做出一些很蠢的事，让你损失很多钱、数据之类的。所以我认为，LLM 以它们当前的形态来看，从根本上就是不安全的，因为它们无法预测自己行动的后果，而且它们究竟会如何完成一个任务，是由训练方式决定的。你给它们一个 prompt（提示词），然后它们会去完成一个与这个 prompt 对应的任务；但它们之所以会这样做，只是因为训练把它们塑造成了在这个 prompt 下去执行“看起来正确”的任务。可并不存在一种硬编码的约束，能够强制它们真正完成这个任务，并进一步确保这个任务会被正确地完成。

Speaker 147:20 - 47:25

Yeah. Mean, I think famously in the early days, right, they would you'd ask them a question, they'd keep asking the question. Right?

Speaker 147:20 - 47:25

是的。我的意思是，我想大家都知道，在早期的时候，对吧，你问它们一个问题，它们却会不断重复这个问题。对吧？

Speaker 247:25 - 47:30

Right. For example. Well, I mean, also, don't have common sense. Right? Right.

Speaker 247:25 - 47:30

对，比如说。还有，我的意思是，它们也没有 common sense（常识）。对吧？对。

Speaker 247:30 - 47:48

So, I mean, there's the the joke that was circulating, like, a month ago of, you know, I need to wash my car, and, you know, the the car wash is a 100 yards from my house. Should I walk? I tried it again, like, maybe two weeks ago. They all say, yes. You should walk, except Gemini.

Speaker 247:30 - 47:48

所以，我是说，大概一个月前流传过一个笑话：我需要洗车，而洗车店离我家有 100 码，我该走路去吗？我大概两周前又试了一遍，结果它们全都回答“应该，应该走路去”，除了 Gemini。

Speaker 247:49 - 47:49

Gemini So says

Speaker 247:49 - 47:49

Gemini So 是这么说的

Speaker 147:50 - 47:53

they're training on your video of of having done having given that speech before?

Speaker 147:50 - 47:53

他们是在用你之前做过、发表过那次演讲的视频来训练吗？

Speaker 247:53 - 47:56

It was not my video because I didn't come up with this or Whoever

Speaker 247:53 - 47:56

那不是我的视频，因为这不是我想出来的，或者说，是那个

Speaker 147:56 - 47:57

came up with it.

Speaker 147:56 - 47:57

想出这件事的人。

Speaker 247:57 - 48:13

Yeah. Right. Whoever came up with it. But there are issue instances, right, where where I said, like, you know, an LLM can do this, and then six months later, it was capable of doing it. And it's simply because, you know, as soon as people watch the podcast of me saying LLM can do this, they, of course, type it into ChangeGPT.

Speaker 247:57 - 48:13

对，没错。就是那个想出这件事的人。不过，确实有一些情况，对吧？比如我说过，像是，you know，LLM（大语言模型）可以做到这个，然后六个月之后，它就真的有能力做到这件事了。这其实只是因为，you know，人们一看到我在 podcast 里说 LLM 可以做到这个，他们当然就会把这句话输入到 ChangeGPT 里。

Speaker 248:13 - 48:25

So now it becomes part of the training set. Right. And now, of course, you know, the next version has that, you know, that thing in the fine tuning set. And of course, it can answer the question. But it's not because it's it became smart all of a sudden.

Speaker 248:13 - 48:25

所以现在它就成了训练集的一部分，对吧？然后当然，you know，下一个版本就会把那个东西放进 fine-tuning（微调）数据集里。于是它当然就能回答这个问题了。但这并不是因为它突然之间变聪明了。

Speaker 248:25 - 48:47

It's just because it was explicitly trained with that question. So LNMs are intrinsically unsafe. I don't think there is any way to fix that in the current paradigm. And what I've been proposing is the architecture I've been talking about is objective driven AI. So, basically, you give an objective to an AI system, which is accomplish this task.

Speaker 248:25 - 48:47

这只是因为它被明确地用那个问题训练过。所以 LNMs 本质上是不安全的。我认为在当前范式下，没有任何办法能修复这一点。而我一直在提议的、我一直在讲的那种架构，是 objective-driven AI（目标驱动 AI）。基本上，就是你给一个 AI system 一个目标，也就是：完成这项任务。

Speaker 248:47 - 49:30

Now how does the system knows it will accomplish this task? It has a world model, and it predicts, you know, the outcome of a sequence of actions it imagines taking. And if this outcome satisfies a cost function that, you know, describes to what extent the task has been accomplished or not accomplished, then that system, if the way that system works is by by optimization, finding a sequence of actions that accomplishes this task, minimizes this cost according to its model, it can do nothing else. Yeah. Okay?

Speaker 248:47 - 49:30

那么这个系统怎么知道自己会完成这项任务呢？它有一个 world model（世界模型），它会预测，you know，它所设想采取的一连串动作所带来的结果。如果这个结果满足一个 cost function（代价函数）——这个函数描述的是这项任务在多大程度上被完成、或者没有被完成——那么，这个系统，如果它的工作方式是通过 optimization（优化），根据它的模型去寻找一系列能够完成这项任务、并使这个 cost 最小化的动作，那么它就不可能做别的事。对。明白吗？

Speaker 249:30 - 49:50

And of course, there's many things that can go wrong there. In in particular, the cost function might be inaccurate. It could be that the cost function you think is actually measuring to what extent the task had been accomplished, but perhaps it's not accurate. Okay? The war model might be inaccurate.

Speaker 249:30 - 49:50

当然，这里面有很多可能出错的地方。尤其是，cost function（代价函数）可能不准确。也许你以为这个 cost function 实际上衡量的是任务完成到了什么程度，但它可能并不准确。对吧？world model（世界模型）也可能不准确。

Speaker 249:50 - 50:22

So the prediction that the system makes is actually not the right one. So its prediction of what was gonna happen as a consequence of its action wasn't right. Okay? So the system can still make mistakes, but but it can predict the consequences of its actions to some extent, which is, I think, indispensable for any agentic system. Now what you can add to that system is not just a cost function that guarantees a task has been accomplished, but you can also add a bunch of other objective functions, other other cost functions, or even constraints that are safety constraints.

Speaker 249:50 - 50:22

所以系统做出的 prediction（预测）实际上并不正确。也就是说，它对自己的 action（行动）会带来什么后果的预测是不对的。对吧？因此系统仍然会犯错，但它至少能在一定程度上预测自己行动的后果，而我认为这对任何 agentic system（代理型系统）都是不可或缺的。现在，你可以在这个系统里加入的不只是一个保证任务完成的 cost function，还可以加入一堆其他 objective function（目标函数）、其他 cost function，甚至还可以加入一些 safety constraints（安全约束）。

Speaker 250:23 - 50:40

Let's say, okay. You know, don't hurt anybody on the way. Right? And you cannot specify this at a at an abstract level, but you can have, you know, low level objective functions that put together will guarantee that the system will not be dangerous. And the system cannot violate those things by construction.

Speaker 250:23 - 50:40

比如说，好，你得确保在过程中不要伤害任何人。对吧？你不能在一个很抽象的层面上去规定这件事，但你可以设置一些底层的 objective function，这些目标函数组合起来，就能保证系统不会有危险。而且从构造上说，系统无法违反这些东西。

Speaker 250:40 - 50:52

It will have to satisfy those conditions. Not the case for an LLM. The LLM can always escape. There's a gap between your training error and test error. There's always gonna be a prompt where the system is gonna do really stupid things.

Speaker 250:40 - 50:52

它必须满足这些条件。LLM（大语言模型）则不是这样。LLM 总是可能逃逸。你的 training error（训练误差）和 test error（测试误差）之间存在差距。总会有某个 prompt（提示词）让系统做出非常愚蠢的事情。

Speaker 150:52 - 51:09

To talk through one specific space around LMs, like, you know, I I think you're obviously really excited about ME and health care, and I think and people have been using LLMs in health care for for all sorts of things. And so Sure. I'm curious how you think about, like, the set of things where LMs are just not gonna work in health care and you need, like, a model that understands the world better.

Speaker 150:52 - 51:09

具体说到 LM（语言模型）的一个应用领域，比如说，我觉得你显然对 ME 和 health care 非常兴奋，而且我认为人们已经在 health care 中把 LLM 用于各种各样的事情了。所以，当然，我很好奇你怎么看这样一类问题：在 health care 里，哪些事情 LM 就是行不通，你需要的是一种更理解世界的 model（模型）？

Speaker 251:09 - 51:39

So, I mean, designing a course of treatment for chronic disease, for example, or even a nonchronic disease for a particular patient, which may not completely fit into, you know, templates that you've observed before. But if you have a good mental model of the dynamics of the physiology of the patient, then you might design a course of treatment that will actually bring the the patient to a good state. Yeah. What I'm seeing and when I'm seeing a patient, it can be a cell. Okay.

Speaker 251:09 - 51:39

比如说，为某个患者设计一种针对慢性病的治疗方案，或者甚至是非慢性病的治疗方案，而这个患者未必能完全套进你以前见过的那些模板里。但如果你对这个患者生理机制的 dynamics（动态过程）有一个很好的 mental model（心智模型），那么你就可能设计出一种真正能把患者带到良好状态的治疗方案。是的。我所看到的，而且当我在看一个 patient（患者）时，它也可以是一个 cell（细胞）。好吧。

Speaker 251:39 - 52:05

How do you tell a a stem cell to turn into a pancreas beta cell that produces insulin? Okay. You have a patient with type one diabetes, and, you know, they have you know, the immune system basically, you know, kind of eats up their own beta cells, right, that's autoimmune. How do you keep making beta cells? You know, can you send a message?

Speaker 251:39 - 52:05

你要怎样告诉一个 stem cell（干细胞）变成一个能产生 insulin（胰岛素）的 pancreas beta cell（胰腺 beta 细胞）？好吧。比如你有一个 type one diabetes（1 型糖尿病）患者，而他们的 immune system（免疫系统）基本上会把自己的 beta cells 吃掉，对吧，这就是 autoimmune（自身免疫）问题。那你要怎样持续制造 beta cells？你知道，能不能发送某种 message（信息）？

Speaker 252:06 - 52:16

Do you have a a model of a of a human cell that will allow you to figure out what sequence of message do you need to send to a stem cell so that it turns into a beta cell?

Speaker 252:06 - 52:16

你是否拥有一个 human cell（人体细胞）的模型，使你能够弄清楚：你需要向一个 stem cell 发送什么样的 message sequence（信息序列），才能让它变成一个 beta cell？

Speaker 152:16 - 52:40

The less LLM piled camp and the LLM piled camp talk past each other, but it's like I think it's actually very possible that both what LMs can do, which is maybe scaling what a top doctor the treatment you get at like the top doctor or at the top place, scaling that around the world, like unbelievable potential impact of that. Right? If you're able to do that. And then, you know, I think what you're talking about, which is certainly still on the come for for a lot these things is, okay. Well, even better than the top doctor.

Speaker 152:16 - 52:40

不那么看好 LLM 的阵营和看好 LLM 的阵营经常各说各话，但我觉得，实际上这两种说法都可能是对的。LM（language model，语言模型）能做到的一件事，也许是把你在顶级医生那里、或者在顶级医疗机构那里能获得的治疗水平，扩展到全世界；如果真能做到，那潜在影响将令人难以置信。对吧？而且，你提到的那一点——对于很多这类事情来说，当然还在逐步到来——就是：好吧，那要是它甚至能比顶级医生更好呢。

Speaker 152:40 - 52:42

Like, how do you how do you go do that?

Speaker 152:40 - 52:42

比如说，你要怎么去做到这一点？

Speaker 252:42 - 53:18

But it's more than just a top doctor. Right? Because, I mean, what the LLM can do well is, you know, it it can sort of regurgitate knowledge that you can read in books mostly. But if medicine was only kind of about accumulating declarative language that declarative knowledge that exist in books, You can be a doctor by just reading books, and you can't be a doctor by reading books. You have to do, you know, residency and, you know, actually kind of listen to the heart and, like, press on their belly and things like that to, you know, diagnose appendicitis or whatever it is.

Speaker 252:42 - 53:18

但这不只是“顶级医生”这么简单。对吧？因为我的意思是，LLM 真正擅长的，某种程度上还是复述那些你主要可以在书里读到的知识。但如果医学只是关于积累那些存在于书本中的 declarative knowledge（陈述性知识）、语言化的知识，那你靠读书就能成为医生了；可现实是，你不能靠读书成为医生。你还得去做 residency（住院医培训），还得真的去听心音、按压腹部之类的，才能诊断阑尾炎或者别的什么病。

Speaker 153:18 - 53:25

Yeah. Right? It's interesting. I I we're very curious to see whether LLMs themselves can provide, like, you know, top quality health care Right. Globally.

Speaker 153:18 - 53:25

是啊，对吧？这很有意思。我们非常想看看，LLM 本身是否能够在全球范围内提供，比如说，顶级质量的医疗服务。

Speaker 153:25 - 53:41

We'll to we'll have to check back in on that one. It seems it seems like there's pretty pretty close. You know, I definitely also wanna hit on your your time at Meta because you spent over a decade building, like, one of the most respected research labs in the world. You know, obviously, you recently left. As you reflect back on on the time there, what do you think you got, like, most right and most wrong in your time running there?

Speaker 153:25 - 53:41

这个问题我们之后还得再回头验证一下。感觉它似乎已经相当接近了。你知道，我也一定想聊聊你在 Meta 的那段时间，因为你花了十多年时间打造了一个可以说是全球最受尊敬的研究实验室之一。显然，你最近离开了。现在回头看你在那里的那段经历，你觉得自己在负责那个实验室期间，做得最对和最错的分别是什么？

Speaker 253:41 - 54:01

So the thing we got right is, you know, building a a a top research lab that really sort of innovated, produced a lot of the sort of basic methods and science and tools like PyTorch that are useful to the entire industry. Right? I mean, the entire industry is built

Speaker 253:41 - 54:01

我们做对的一点是，建立了一个顶级研究实验室，它真正实现了创新，产出了很多对整个行业都有用的基础方法、科学成果和工具，比如 PyTorch。对吧？我的意思是，整个行业基本上都是建立在——

Speaker 154:01 - 54:02

A 100%.

Speaker 154:01 - 54:02

完全同意。

Speaker 254:02 - 54:24

Basically, except for a few people at Google. And I think a culture of, you know, openness and and kind of, you know, scientific process, which I think is is necessary for breakthrough innovation. Yeah. Because, you know, there there is a lot of there's a whole chain of innovation. Right?

Speaker 254:02 - 54:24

基本上都是这样，除了 Google 的少数一些人。我认为，还有一种 openness（开放）以及某种 scientific process（科学过程）的文化，而我觉得这对于突破性创新是必要的。是的。因为你知道，创新其实是一整条链条。对吧？

Speaker 254:24 - 54:42

You have blue sky research, new concepts. A lot of that takes place in universities. Some of that takes place in advanced research labs in industry, which can be counted on the fingers of one hand. You know, Google is a good one. You know, FARE was a good one.

Speaker 254:24 - 54:42

你有那种 blue sky research（蓝天研究），有新的概念。很多这类工作发生在大学里。其中一部分发生在工业界的高级研究实验室里，而这种实验室一只手就数得过来。比如说，Google 就是个很好的例子。还有，FARE 以前也是个很好的例子。

Speaker 254:42 - 55:00

Hopefully, it will still be. I'm not sure. And, you know, a few others. Then you have, okay, this is a good idea. Let's push it forward and see if it can be made useful, but still at the research level, in a in a sense of we're not gonna fool ourselves.

Speaker 254:42 - 55:00

希望它现在仍然是吧，我也不确定。然后，还有少数其他机构。接着会进入这样一个阶段：好，这个想法不错。让我们把它往前推进，看看它是否能变得有用，但仍然停留在 research（研究）层面，从某种意义上说，就是我们不会自欺欺人。

Speaker 255:00 - 55:42

We're not gonna try to just, you know, find a solution that just works for this problem. We're we're going to see if this technique that we imagine or we picked up from other people in the community can actually be pushed and and be be made practical, not as a product, but, like, we can show that it beats some record on, you know, some task or benchmark. And then the next stage is for the the company that hosts the research lab to say, okay. Now we're gonna push the button, devote a, you know, big engineering effort to that to that vision, and then push it forward. That is where a lot of projects fail.

Speaker 255:00 - 55:42

我们不会只是去找一个仅仅对这个问题有效的 solution（解决方案）。我们要做的是看看我们设想出来的、或者从社区里的其他人那里吸收来的这种 technique（技术方法），是否真的能被推进并变得 practical（实用）。不是做成 product（产品），而是说，我们可以证明它在某个 task（任务）或 benchmark（基准测试）上打破了某项纪录。再下一阶段，就是承载这个研究实验室的公司会说，好，现在我们要按下按钮了，要为这个 vision（愿景）投入大量 engineering effort（工程投入），然后把它继续向前推进。而很多项目就是在这里失败的。

Speaker 255:42 - 55:57

That's that's where a lot of companies kind of fail to pick up. Meta was actually pretty good at this. Okay? But far from perfect. It was not like, you know, textbook example of how you do it wrong, like, know, Xerox PARC, like, totally missing out on Yeah.

Speaker 255:42 - 55:57

这也是很多公司没能真正接上的地方。Meta 在这方面其实做得还不错，明白吗？当然，离完美还差得远。它并不是那种教科书式的反面案例，不像 Xerox PARC 那样，完全错失了——是的。

Speaker 255:57 - 56:29

You know, GUI interface and, you know, mouse and windowing systems. Right? Meta was you know, kind of missed a few steps, essentially. And it and it's part partly just organizational. It's partly because you need a an organization that is pretty close to research, but not completely to take the relay of, you know, pushing a technology a little further, not making product with a three month deadline, but, like, you know, pushing things.

Speaker 255:57 - 56:29

比如 GUI interface（图形用户界面）、mouse（鼠标）和 windowing systems（窗口系统），对吧？Meta 更像是本质上漏掉了几个环节。而这部分也部分是 organizational（组织层面）的问题。部分原因在于，你需要一个与 research（研究）相当接近、但又不完全等同于研究的组织，来接过接力棒，把一项 technology（技术）再往前推一点；不是去做那种只有三个月 deadline（截止期）的 product（产品），而是去持续推进事情。

Speaker 256:30 - 56:32

And we had that at one point

Speaker 256:30 - 56:32

而我们曾经一度拥有过这样的机制。

Speaker 156:33 - 56:33

Yeah.

Speaker 156:33 - 56:33

对。

Speaker 256:33 - 57:16

At at at Facebook and Meta, and then we lost it. And Fare was basically isolated within the company, had lots of ideas that nobody picked up on. And then in 2023, the Gen AI organization was created by basically taking about 60 or 70 scientists and engineers from FARE, right, initially, and then it built up. But then it was under so much short term pressure that basically that organization, JN AI, didn't have time to talk too fair. And so instead of being at the forefront and innovating in LLM, JN AI basically had to focus on short term things and became very conservative.

Speaker 256:33 - 57:16

在 Facebook 和 Meta 的时候，我们曾经有，后来又失去了。然后 FARE 基本上在公司内部是被孤立的，有很多想法没人接手。到了 2023 年，Gen AI 这个组织被建立起来了，最初基本上是从 FARE 抽调了大约 60 到 70 名 scientist（科学家）和 engineer（工程师），然后再逐步扩张起来。但它当时承受了太多 short term（短期）压力，以至于这个组织——JN AI——基本上没有时间和 FARE 充分交流。所以它没有站在前沿去创新 LLM，而是基本上不得不把重点放在短期事务上，结果变得非常保守。

Speaker 257:17 - 57:22

And so there was a gap, basically, impedance mismatch between research Yeah. And and the

Speaker 257:17 - 57:22

所以这里基本上出现了一个 gap（鸿沟），也就是 research（研究）与——之间的 impedance mismatch（阻抗失配）。

Speaker 157:22 - 57:24

Is that kinda what happened with LAMA four?

Speaker 157:22 - 57:24

这有点像 LAMA four 身上发生的情况吗？

Speaker 257:24 - 57:40

Yeah. Well, even with, you know, LAMA three starting with LAMA three. So LAMA one was a small project within FARE. In 2022, early twenty three, GenAI was created. The LAMA people were basically moved to GenAI.

Speaker 257:24 - 57:40

对。是的，其实甚至从 LAMA three 开始就是这样。LAMA one 原本是 FARE 内部的一个小项目。到了 2022 年、2023 年初，GenAI 被组建出来了。LAMA 的那些人基本上都被转到了 GenAI。

Speaker 257:41 - 57:59

They started working on Lama two, and then a bunch of them realized, like, I could do a start up. So that was the genesis of Mistral. Yeah. Okay? Two of the authors of Lama Juan basically created Mistral with another guy from Google.

Speaker 257:41 - 57:59

他们开始做 Lama two，随后其中有不少人意识到，我可以自己做一家 start up（初创公司）。这就是 Mistral 的起源。对，明白吧？Lama Juan 的两位作者，基本上是和另一位来自 Google 的人一起创办了 Mistral。

Speaker 257:59 - 58:34

And and, you know, a few people kind of left and sort of did other things. This is not a kind of a happy time at at Meta for various reasons. And so there were, you know, a bunch of people kinda left. And then the the the organization, which kinda took over LAMA two to some extent and LAMA three and four was under so much short term pressure that they became very conservative. And, you know, it's a combination of of of the groups, but but, like, pressure from the leadership.

Speaker 257:59 - 58:34

然后，你知道，也有一些人离开了，去做别的事情。出于各种原因，那段时间对 Meta 来说并不是一个愉快的时期。所以，确实有不少人离开了。接着，某种程度上接手了 LAMA two，以及后来 LAMA three 和 four 的那个组织，承受着非常强的短期压力，于是变得非常保守。你知道，这是多个团队共同作用的结果，但同时也有来自领导层的压力。

Speaker 258:35 - 58:43

And, I mean, there's many ways things can go wrong, and you can't blame anyone in particular. But but, yeah, that's kinda what happened.

Speaker 258:35 - 58:43

我的意思是，事情可能出问题的方式有很多，你也没法把责任归到某一个人头上。但是，对，大概就是这么回事。

Speaker 158:43 - 59:15

I mean, it feels like a lot of these organizations, obviously, are under short term pressure right now because there's just an incredible race going on. And so I'm curious, like, obviously, this this, you know, fair setup you had and kind of there's a similar one, you know, at Google for for many years, and certainly, many researchers running around OpenEye, Anthropic, trying many different things. Do you think, like, that is still possible going forward? Or, like, is the only you know, is one of the only paths to leave and and do your own company? Or or, you know, are there still places within the industry that you think have this, like, original ethos affair even amidst the race that is race dynamics that are happening?

Speaker 158:43 - 59:15

我的意思是，感觉这些组织里很多现在显然都承受着短期压力，因为眼下正在进行一场非常激烈的竞赛。所以我很好奇，像你们当时那种 fair 的架构，以及类似的、Google 多年来也有过的那种模式，再加上 OpenEye、Anthropic 里很多研究人员四处尝试不同方向——你觉得这种模式未来还可能存在吗？还是说，少数可行的路之一就是离开，然后自己开公司？或者说，行业内部现在是否仍然有一些地方，哪怕在这种竞赛动态之中，依然保留着最初那种类似 fair 的 ethos（精神气质）？

Speaker 259:15 - 59:32

I think there are a few places within Google research and deep mind that where where people actually do research. But increasingly, the industry has become more kinda closed. Right? I mean, Google certainly climbed up. And, you know, meta and fair even is kind of going a bit in the same direction.

Speaker 259:15 - 59:32

我觉得在 Google research 和 deep mind 内部，还是有少数一些地方，人们确实是在做 research（研究）。但整个行业正变得越来越封闭，对吧？我的意思是，Google 显然已经收紧了。然后，meta 甚至 fair 也有点在朝同一个方向走。

Speaker 259:32 - 59:57

There are restrictions on publication now, like more restrictions. And so it's sort of less appealing for people who really want to kind of do breakthrough research and, you know, they they don't get as much resources. If they do something that is relevant in medium term, they are told not to talk about it. And and so it's it's not, you know, it's not a good atmosphere, I think, for for breakthrough. It's not conducive.

Speaker 259:32 - 59:57

现在对 publication（发表）的限制更多了，就是限制变多了。所以对那些真正想做 breakthrough research（突破性研究）的人来说，这种环境的吸引力就更低了，而且他们也拿不到那么多资源。如果他们做出了某些在中期内有相关性的东西，别人又会告诉他们不要谈论它。所以我觉得，这并不是一种有利于 breakthrough（突破）的良好氛围。它并不 conducive（有助于促成）。

Speaker 2 | 59:57 - 1:00:24 You you know? I mean, basically, the get the best way to get breakthrough research of the type that, you know, you we were getting in in the early days of FAIR and, you know, at Bell Labs in the good days and Xerox PARC is you hire the best people, and those are people who have a good nose to know what to work on, what projects to kind of attack. You give them the means to succeed, and you get the fuck out of the way. Alright? Pardon my French.

你明白吗？我的意思是，基本上，要想拿到那种突破性研究——就是我们早期在 FAIR、Bell Labs 的黄金时代，以及 Xerox PARC 时期曾经产出的那种——最好的办法，就是雇用最好的人；而所谓最好的人，就是那些对该做什么、该攻哪些项目有敏锐直觉的人。你给他们成功所需的资源，然后你就别挡路。懂吗？原谅我说句粗话。

Speaker 1 | 1:00:27 - 1:00:37 Yeah. I mean, I'm curious, like, what, you know, what impact that then ends up having on the broader research community. So obviously, of the legacies of FAIR is you trained, you know, so many researchers. Right? And and like they're all throughout the ecosystem.

对。我是有点好奇，这最终会对更广泛的研究群体产生什么影响。很明显，FAIR 留下的一大遗产就是，你们培养了非常多研究者。对吧？而且他们现在遍布整个生态系统。

Speaker 1 | 1:00:37 - 1:00:57 And it feels like now the maybe equivalents of those people that came in younger in their careers at FAIR, you know, they're joining these these labs with maybe shorter term priorities and focus. And I guess I'm wondering, like, you know, in in this current ecosystem where it feels like a lot of younger people getting into the field are thrust much more into these, like, short term dynamics, does that change anything about the way the the ecosystem evolves?

而现在感觉上，也许相当于当年那些在职业生涯较早期进入 FAIR 的人，如今加入的是这些可能更偏短期优先级和短期聚焦的 lab。我想问的是，在当前这个生态里，感觉很多刚进入这个领域的年轻人都被更大程度地推入这种短期动态之中，这会不会改变整个生态系统的演化方式？

Speaker 2 | 1:00:57 - 1:01:06 Well, I mean, the people who tend to want to work with me are generally people who, you know, sufficiently crazy to do it, first

嗯，我的意思是，那些倾向于想跟我共事的人，通常首先都是那种——你知道——疯狂到愿意这么做的人，

Speaker 1 | 1:01:06 - 1:01:08 of fair.

的 FAIR。

Speaker 2 | 1:01:08 - 1:01:25 And or or know, kinda subscribe to the the whole idea that in academia and during your PhD, you should work on the next generation of of AI systems. You shouldn't work on the current generation. Yeah. Like, if you work on LLM in in academia now, it's incredibly boring. At least to me, it's boring.

以及，或者说，他们多少认同这样一种整体理念：在 academia（学术界）里、在你读 PhD 的时候，你应该研究下一代 AI 系统，而不应该研究当前这一代。对。比如说，如果你现在在 academia 里做 LLM，那会无聊得要命。至少对我来说，很无聊。

Speaker 2 | 1:01:25 - 1:01:41 It's basically kind of studying how how and why LLMs work and explaining why they work or what the limitations are. It's like descriptive science. It's it's really not, you know, kinda creative, very creative. Like, I I don't find that particularly interesting. It's useful.

那基本上是在研究 LLM 是如何以及为什么起作用的，解释它们为什么有效，或者它们的局限是什么。这有点像描述性科学。它其实并不是那种——你知道——特别有创造性的工作。像这种事，我个人并不觉得特别有意思。当然，它是有用的。

Speaker 2 | 1:01:41 - 1:01:55 Yeah. And, you know, if you really want to kinda show how to do new things with LLM, like, you're not gonna have the GPUs you need for that. Totally. So, like, forget that. Like, don't work on if if you're doing a PhD.

对。而且，你知道，如果你真的想展示怎么用 LLM 做一些新的事情，你也不会有为此所需的 GPU。完全同意。所以，干脆别想这个了。就是说，如果你是在读 PhD，就别做这个。

Speaker 2 | 1:01:55 - 1:01:57 Like, there's no point. You cannot contribute.

就是说，这没意义。你没法做出贡献。

Speaker 1 | 1:01:57 - 1:02:04 How do you know it was time to leave Meta? It sounds like it was you know, you know, you were thinking through some of these things over a period of time. You know, was there a moment that it crystallized? Or

你是怎么知道该离开 Meta 了？听起来像是，嗯，你在一段时间里一直在思考其中一些事情。有没有某个瞬间让这件事一下子明朗了？还是说——

Speaker 2 | 1:02:04 - 1:02:20 Well, it was a combination of things. Right? So first of all, you have to understand. A lot of people have, like, completely wrong idea about what my role at Facebook and Meta was. So I joined in late twenty thirteen, really kind of started early twenty fourteen.

嗯，这是多种因素共同作用的结果。对吧？首先，你得明白，很多人对我在 Facebook 和 Meta 的角色有一种完全错误的理解。所以我是在 2013 年末加入的，真正开始算是 2014 年初。

Speaker 2 | 1:02:20 - 1:02:41 The first four and a half years, I was director of FAIR. So I built the FAIR organization, set up the culture, hired the key people, and and sort of managed it. And after four and a half years, I stepped down from that role for a number of reasons. I and I became chief AI scientist. Okay?

前四年半里，我是 FAIR 的 director。所以我建立了 FAIR 这个组织，设定了它的文化，招募了关键人物，并且大致负责管理它。四年半之后，出于一些原因，我从那个职位上退了下来。然后我成了 chief AI scientist。明白吧？

Speaker 2 | 1:02:41 - 1:03:16 So the the the reason is, you know, I was basically getting close to turning 60, first of all, 58. And I just don't wanna do management. Okay? I mean, I was ready to do it for a while to get the the organization started, but I'm just not good at it. It's not the thing I'm I'm more like a, you know, scientific or technical visionary and engineer and scientist.

所以，原因是，你知道，首先我那时基本上快 60 岁了，58 岁。而且我就是不想做管理了。明白吧？我的意思是，为了把这个组织启动起来，我愿意做一段时间，但我就是不擅长这个。这不是我真正适合做的事——我更像是那种，怎么说，科学或技术上的 visionary（愿景型人物），也是 engineer 和 scientist。

Speaker 2 | 1:03:16 - 1:03:52 So other people are much better at management than I am. So I basically stepped down. You know, two other people, Joel Pineau and Antoine Bard, basically Yeah. Took over the directorship of FAIR, and I became chief FAIR scientist. So was reporting to the CTO and and, you know, had roles of basically restarting a research project that I thought was necessary because the ambition of FARE was always to build intelligent systems.

所以其他人在管理方面比我强得多。于是我基本上就退下来了。你知道，另外两个人，Joel Pineau 和 Antoine Bard，基本上——对——接手了 FAIR 的 director 职位，而我成了 chief FAIR scientist。所以我向 CTO 汇报，并且，你知道，我的职责基本上是重启一个我认为有必要推进的研究项目，因为 FAIR 的雄心一直都是构建智能系统。

Speaker 2 | 1:03:52 - 1:04:27 Right? And I thought you know, I put my own research in in parentheses while I was running FARE. I just didn't didn't have the time. And I thought it was important to basically kind of design the architecture of of, like, human level, you know, human like AI systems. And, you know, I had come up with the concept that this was gonna be based on self supervised learning and on, you know, prediction from sensory signals like video, things like that.

对吧？而且我觉得，你知道，在我管理 FAIR 期间，我基本上把自己的研究先放在一边了。我就是没有时间。而我认为，去设计那种人类水平、类人的 AI 系统架构，基本上是很重要的。并且，你知道，我已经形成了一个概念：这将建立在 self-supervised learning（自监督学习）之上，以及建立在对感官信号——比如 video——进行预测之上，诸如此类。

Speaker 2 | 1:04:27 - 1:04:50 I mean, these are old ideas and and world models. I actually gave a keynote at NeurIPS in 2016 where I I said, like, this is the way AI research should go. Like, world models predict, you know, consequences of your actions and plan. And I said, like, you know, RL is not the thing that will take us there because it's too inefficient. Supervised learning has shown its limits.

我的意思是，这些其实都是老想法——还有 world models（世界模型）。我实际上在 2016 年的 NeurIPS 做过一次 keynote，当时我说，AI 研究应该朝这个方向走。比如说，world models 会预测你行为的后果，并据此进行规划。而我当时也说过，RL（强化学习）不是能把我们带到那一步的东西，因为它效率太低了。supervised learning（监督学习）也已经显现出它的局限。

Speaker 2 | 1:04:50 - 1:05:14 And so the future is self supervised learning and world models. So how do we do self supervised learning and world models? And and I started a few projects on this with, like, a few avenues that didn't pan out, some projects on video prediction and stuff like that. And and then came up with this concept that you could train self supervised learning from video, but you have to train the system to make prediction in representation space. So that's the idea of JAPA.

所以，未来属于 self-supervised learning（自监督学习）和 world model（世界模型）。那么我们该如何做 self-supervised learning 和 world model 呢？我当时围绕这个方向启动了几个项目，尝试了几条路，但都没有成功，也做过一些 video prediction（视频预测）之类的项目。后来我提出了这样一个概念：你可以从视频中训练 self-supervised learning，但必须让系统在 representation space（表征空间）中进行预测。这就是 JAPA 的核心想法。

Speaker 2 | 1:05:14 - 1:05:28 Yeah. And if you have JAPA, you can turn it into a wall model by making it action condition, and then you can use it for planning. So I had this idea around 2020. And in 2022, I wrote a long vision paper. So I said, I'm just gonna write a paper with my entire vision.

对，而且如果你有了 JAPA，就可以通过让它变成 action-conditioned（动作条件化）的系统，把它转化成一个 world model，然后你就能用它来做 planning（规划）。我大概在 2020 年左右形成了这个想法。到 2022 年，我写了一篇很长的 vision paper。所以我当时想，我干脆就把自己完整的设想都写成一篇论文。

Speaker 2 | 1:05:28 - 1:05:36 Okay? Spill all my secrets. Like, I don't care. But maybe they will rally a bunch of people to to that vision. And, boy, did it work.

好吗？把我所有的秘密都抖出来。就像，我无所谓。但也许这样能把一大批人凝聚到这个愿景周围。结果，天啊，真的奏效了。

Speaker 2 | 1:05:38 - 1:06:05 Because not only did I rally, you know, a bunch of students who kind of came working with me at NYU or in Paris because they wanted to work on this, but also a whole team at at at Fair who said, like, this sounds great. That's what we want to work on. And then Joel Pineau said, well, maybe this should be, like, a major mission of of of FAIR. We called it advanced machine intelligence. That was the internal name of the part.

因为我不仅吸引来了一批学生，他们来到 NYU 或 Paris 跟我一起工作，就是因为他们想做这个；还有 Fair 的整个团队也说，这听起来太棒了，这正是我们想做的事。然后 Joel Pineau 说，也许这应该成为 FAIR 的一个主要使命。我们把它叫作 advanced machine intelligence，那是这个项目内部的名字。

Speaker 1 | 1:06:05 - 1:06:07 Interesting. Okay. And they let you leave with it.

有意思。好吧。所以他们还让你带着这个方向离开了。

Speaker 2 | 1:06:07 - 1:06:35 And now it's the name of the company. And, you know, Mark Zuckerberg, you know, kind of read that paper and knew what it was about and subscribed to the project. And Andrew Budworth, the CTO, also, and Mac Schreffer, the previous CTO. Chris Cox, who was my my direct manager, chief product officer, also loved the idea. So, like, you know, there's a lot of support in the leadership about this project that we internally called Ami.

现在这成了公司的名字。而且，Mark Zuckerberg 读过那篇论文，知道它是关于什么的，也支持这个项目。CTO Andrew Budworth 也是，前任 CTO Mac Schreffer 也是。Chris Cox——我的直接上级、chief product officer——也很喜欢这个想法。所以你知道，领导层里有很多人都支持这个项目，我们内部把它叫作 Ami。

Speaker 2 | 1:06:35 - 1:07:09 And and, you know, and and and it started really kind of working for for video. But then, you know, company kind of refocused all of its effort on LLM despite support from Mark and Andrew, Buzz. We call him Buzz. You know, the all the layers below, like, didn't see the point, I think. And so politically, it sort of became a little difficult.

而且，你知道，这件事一开始在视频方向上确实开始奏效了。但后来，公司还是把几乎所有精力都重新集中到 LLM 上，尽管 Mark 和 Andrew，也就是 Buzz——我们都这么叫他——是支持的。只是再往下的各层管理者，我觉得，并没有看出这里面的意义。所以在政治层面上，这件事就开始变得有点困难了。

Speaker 2 | 1:07:10 - 1:07:47 The applications, as I as I said, of Japan World Model are there are applications in, like, you know, wearable agents and stuff like that, but and robotics. But but Meta chose to get rid of its entire robotics AI group that was led by Gitani Malik, who is now Amazon. And so, you know, clearly, it wasn't the right environment anymore. Most of the applications were in industry that Meta had no interest in. FARE was increasingly getting pressure to kind of basically help MSL with L and M's.

正如我说过的，Japan World Model 的应用场景是存在的，比如 wearable agents（可穿戴 agent）之类的东西，还有 robotics（机器人）。但是 Meta 选择裁掉整个 robotics AI 团队，那个团队原本由 Gitani Malik 领导，他现在去了 Amazon。所以很明显，那已经不再是合适的环境了。大多数应用都在 Meta 并不感兴趣的工业领域。而且 FAIR 也越来越多地受到压力，基本上要去协助 MSL 做 LLM。

Speaker 2 | 1:07:48 - 1:08:14 So, yeah, you know, made clear. And that, you know, throat ramming, worked really well with investors too. Yeah. Because when I had to raise money for Ami, everybody knew my story. And you anybody knew, you know, many investors, you know, staff at various VCs that read my paper and or had listened to my talks and had bought my story.

所以，是的，你知道，这一点变得很明确。而且，你知道，那种强力推进的方式，对投资人也非常有效。对。因为当我必须为 Ami 融资时，所有人都知道我的故事。而且你知道，很多投资人、各家 VC 的员工，都读过我的论文，或者听过我的演讲，也都买账我的叙事。

Speaker 2 | 1:08:14 - 1:08:24 They were realizing, you know, LLMs had limitations and, you know, were kind of, yeah, interested by the idea of, like, building the next generation AI systems.

他们开始意识到，你知道，LLM（大语言模型）是有局限性的，而且，算是吧，也对构建下一代 AI 系统这个想法很感兴趣。

Speaker 1 | 1:08:24 - 1:08:29 I guess, was was, like, scale acquisition, like, part of this catalyst of of, like, the pure LLM focus internally?

我猜，Scale 的收购是不是也算是推动内部转向纯 LLM 聚焦的一个催化因素？

Speaker 2 | 1:08:29 - 1:08:45 Yeah. Definitely. I mean, I there's probably some, you know, other reasons to it. Think, you know, maybe I don't have any sort of inside information to comment on this, but it's possible that Mark sees in Alex kind of a potential successor to himself, like a younger version of himself.

对，绝对是。我的意思是，我觉得这里面可能还有一些别的原因。你知道，也许——我没有任何内部信息可以就此发表评论——但也有可能 Mark 在 Alex 身上看到了某种潜在接班人的影子，像是一个更年轻版本的自己。

Speaker 1 | 1:08:45 - 1:08:56 Yeah. I feel like that, like, a lot of the popular narrative or or, you know, in the media has been like, oh, like, you know, when Alice comes in, it then gets harder to run, like, a research organization. You know? I don't know if that to the to the extent you felt that. Or

对。我感觉很多流行叙事，或者说媒体上的说法，都是那种：哦，你知道，当 Alex 进来之后，运营一个研究型组织就变得更难了。你知道？我不知道你是不是也在某种程度上有这种感受。或者——

Speaker 2 | 1:08:56 - 1:09:21 Well, okay. So here's a big misconception about my role, my relation to Alex, and how AI was run at Meta. I had zero technical contribution to LAMA, like none whatsoever. My one contribution to LAMA was to argue for open sourcing LAMA too, because it was a big internal debate whether we should open source. Like, the legal department was against it.

好，那我来澄清一个关于我的角色、我和 Alex 的关系，以及 Meta 内部 AI 是如何运作的重大误解。我对 LAMA 的技术贡献是零，完全没有。我对 LAMA 唯一的贡献，是主张把 LAMA 2 开源，因为当时内部就是否应该开源有过非常大的争论。比如，法务部门就反对。

Speaker 2 | 1:09:22 - 1:09:36 The policy department was kind of against it. The comms department was for it. All the engineering side was for it. Like, BOS was for it. So there was, like, enormous internal discussions at a very high level.

政策部门也有点反对。传播部门支持。整个工程团队都支持。像 BOS 也是支持的。所以，当时内部在非常高的层级上进行了极其大量的讨论。

Speaker 2 | 1:09:36 - 1:10:13 You know? 40 people from Marseille Kubrick down every week for two hours for months. So so really, it was, you know, kind of a a big debate internally. And I really, really, you know, pushed, argued for the fact that, you know and and Buzz also was was very vocal about it, that the, you know, safety risks were basically overblown. The opportunities to create an industry were extremely strong, and that we were gonna jumpstart the AI industry by open sourcing LAMA too.

你知道吗？从 Marseille Kubrick 往下，大概 40 个人，每周开两小时，连续开了好几个月。所以这确实是一次内部非常重大的争论。而我当时非常、非常积极地推动并论证这一点——Buzz 也公开而强烈地支持——也就是，所谓的安全风险基本上被夸大了，而创造一个产业的机会则非常强，我们会通过开源 LAMA 2 来启动整个 AI 产业。

Speaker 2 | 1:10:13 - 1:10:28 And in fact, that's exactly what happened. So but I had zero contribution to to LAMA, positive or negative. Like, I I didn't do anything to stop it or slow it down or anything. There's a lot of people working on LLMs within FAIR, and it was fine. Yeah.

事实上，事情也的确就是这样发生的。不过我对 LAMA 没有任何贡献，无论正面还是负面。比如，我没有做任何事去阻止它、拖慢它，或者诸如此类。FAIR 内部有很多人在做 LLM（large language model，大语言模型），这没问题。对。

Speaker 2 | 1:10:28 - 1:10:42 I never said anything against it. Okay. Other than saying this is not a path to human neural intelligence, but it's fine. It's useful. Same thing for speech recognition or translation.

我从来没有说过反对它的话，明白吗。除了我说过，这不是通向 human neural intelligence 的一条路径，但没关系，它是有用的。就像 speech recognition 或 translation 一样。

Speaker 2 | 1:10:42 - 1:11:11 Right? And particularly since 2018, when I stepped down from being director of affair, I didn't have any direct influence on what people were working on other than, you know, basically publishing my my vision and then rallying people around my project. But they were working with me because they wanted, not because I was their boss. I wasn't telling them to work with me.

对吧？尤其是从 2018 年开始，在我不再担任 affair 的 director 之后，除了发表我自己的愿景、然后围绕我的项目去凝聚大家之外，我对人们在做什么工作并没有任何直接影响。但他们是因为自己愿意，才和我一起工作，不是因为我是他们的老板。我并没有命令他们和我一起做事。

Speaker 1 | 1:11:13 - 1:11:13 And

然后——

Speaker 2 | 1:11:16 - 1:11:57 so I had no positive or negative influence on LLM, okay, within within Meta. And I had some influence on the strategy, but it was more like the long term and and, like, how how you maintain a research lab and things like this. And in the last year, you know, I mean, starting maybe early twenty four and certainly in '25, the the the way FAIR was kind of the direction in which it was moved and managed basically did not correspond to what I thought was necessary to preserve, you know, innovation, research, and breakthrough, and preserve the good people. Like, a lot of good people have left already.

所以，在 Meta 内部，我对 LLM 没有任何正面或负面的影响，明白吗。我对 strategy 有一些影响，但更多是长期层面的，比如你如何维持一个 research lab，以及类似这样的事情。而在过去一年里，你知道，我是说，也许从 24 年初开始，当然到了 25 年，FAIR 被推动和管理的方向，基本上并不符合我认为为了保持 innovation、research 和 breakthrough，并留住优秀人才所必需的做法。比如，已经有很多优秀的人离开了。

Speaker 1 | 1:11:57 - 1:12:05 Yeah. And I guess a lot of know, it probably was harder to get people to work on the stuff you were working on internally, and and I'm sure there's pressure for your you yourself to work on a lot of the solemn stuff.

对。我想很多人也知道，可能在内部更难让别人去做你当时在做的那些东西，而且我也相信，你自己应该也承受着压力，要去做很多那类更主流的东西。

Speaker 2 | 1:12:05 - 1:12:08 Yeah. Yeah. No. But a lot of other people also have left.

对，对，不。不过也有很多其他人离开了。

Speaker 1 | 1:12:08 - 1:12:28 Right? No. It's it's it's fascinating. The one thing I'm struck by throughout our whole conversation is I feel like you're you've like had a remarkably consistent point of view, like, the space, like, for a long time and you can go back to your a bunch of the earlier talks you referenced. Know, obviously, it is a fast moving space, and and a ton of interesting things have happened in the last year.

对吧？不，这很有意思。贯穿我们整个对话，有一点让我印象很深：我感觉你的观点一直都异常一致，好像你多年来在这个领域里的看法都相当稳定，而且你也可以回头去看你之前提到的很多更早期的 talks。当然，这显然是一个变化非常快的领域，而过去一年里也发生了大量有趣的事情。

Speaker 1 | 1:12:28 - 1:12:30 What's, like, one thing you've changed your mind on in the last year?

Speaker 1 | 1:12:28 - 1:12:30 比如说，过去一年里，有什么一件事是你改变了看法的？

Speaker 2 | 1:12:30 - 1:13:06 I mean, the whole idea of what we used to call unsupervised learning that we now call self supervised learning. You know, until about 2003, the whole idea of unsupervised pre training, where you get a good representation for the input data, and then you either fine tune the the model with a little bit of supervised labeled data, and it sort of give us, you know, some evidence that this whole technique could work. I try to apply this to video because, ultimately, what I wanted to do is train a system to understand how the world works by just watching the world go by. Yeah. Right?

Speaker 2 | 1:12:30 - 1:13:06 我的意思是，就是我们过去叫作 unsupervised learning（无监督学习）、现在叫作 self supervised learning（自监督学习）的整个理念。你知道，大概到 2003 年之前，unsupervised pre training（无监督预训练）的整个思路是：先为输入数据学到一个好的 representation（表征），然后再用少量带标签的 supervised（监督）数据去 fine-tune（微调）模型；这算是给了我们一些证据，说明这整套技术是可行的。我当时试着把这个方法用到 video 上，因为归根结底，我想做的是训练一个系统，让它仅仅通过观察世界的流动，就能理解世界是如何运作的。对吧？

Speaker 2 | 1:13:06 - 1:13:20 I mean, that's the basic idea. And so I started to argue for this in the sort of, you know, early 2010s. Did some work on simple video prediction. We didn't have GPUs. Okay.

Speaker 2 | 1:13:06 - 1:13:20 我的意思是，这就是最基本的想法。所以我在 2010 年代初期就开始为这个方向发声了，也做过一些简单的 video prediction（视频预测）工作。那时候我们还没有 GPUs。好吧。

Speaker 2 | 1:13:22 - 1:13:58 And and then sort of doing this more seriously about after the creation of FARE by doing pixel level video prediction, realizing that wasn't working, but then arguing for self supervised learning. Okay? This whole idea of, like, training a system generically not to solve a task but to basically just predict, and then using the representation that is learned this way as input to a downstream task that you can train supervise or reinforcement or whatever. So that was a bit of the topic of my second half of my keynote at NIPS in 2016. It was too called NIPS at the time

Speaker 2 | 1:13:22 - 1:13:58 然后，在 FARE 创建之后，我开始更认真地做这件事，比如做 pixel level（像素级）video prediction，并意识到那样行不通；但与此同时，我开始主张 self supervised learning。好吗？整个思路就是，训练一个系统时，不是泛化地去解决某个特定任务，而是让它从根本上只做 prediction（预测）；然后把它以这种方式学到的 representation，用作 downstream task（下游任务）的输入，而这些下游任务你可以用 supervised、reinforcement（强化学习）或者别的方式来训练。所以这也是我在 2016 年 NIPS keynote（主题演讲）后半部分的一个主题。当时它还叫 NIPS。

Speaker 1 | 1:13:58 - 1:13:58 Yeah, of course.

Speaker 1 | 1:13:58 - 1:13:58 对，当然。

Speaker 2 | 1:13:58 - 1:14:23 In 2016. And then I kept kind of, you know, kind of pushing for this idea and tried to kinda discover some methods to to get that to work. And what surprised me is that that became incredibly successful, but not for video, for language. LLMs basically are a a blindingly successful example of self self supervised learning.

Speaker 2 | 1:13:58 - 1:14:23 到了 2016 年之后，我就一直在持续推动这个想法，也试着去发现一些能让它真正奏效的方法。让我惊讶的是，它后来取得了惊人的成功，但不是在 video 上，而是在 language 上。LLMs 基本上就是 self supervised learning 极其成功、成功得耀眼的一个例子。

Speaker 1 | 1:14:23 - 1:14:41 No. That that that they are. Well, I feel like that's it's almost like the perfect note to end on, but I wanna make sure to leave the last word to you. I feel like there's I mean, all our listeners are are very familiar with you, but I wanna at least give you the mic to point them to anything that you think they should they should check out with some of the new stuff you're doing or, I don't know, any of or your your work you wanna point to. The the mic is yours.

Speaker 1 | 1:14:23 - 1:14:41 没错，确实如此。我觉得这几乎已经是一个完美的收尾点了，不过我还是想确保把最后的话留给你。我想，我们所有听众都非常熟悉你了，但我至少还是想把麦克风交给你，让你告诉大家，有没有什么你觉得他们应该去关注的内容——比如你最近在做的一些新东西，或者我不知道，任何你想特别指出来的工作。现在由你来讲。

Speaker 2 | 1:14:41 - 1:15:13 Okay. Let me tell you one thing. An LLM works because when you have a sequence of discrete symbols, making predictions is easy because there's only a finite number of possible symbols in your language, a 100,000 possible tokens or something like that. Right? And you can have your neural net produce a probability distribution over all possible tokens, and then you can sample from that distribution, shift the token into the input, and then produce the next token, and you can do autoregressive prediction.

Speaker 2 | 1:14:41 - 1:15:13 好，我来说一件事。LLM 之所以有效，是因为当你处理的是一串 discrete symbols（离散符号）时，做 prediction 很容易，因为你的语言里可能的符号数量是有限的，也就是大约 100,000 个可能的 tokens 之类的。对吧？你可以让 neural net（神经网络）输出一个针对所有可能 token 的 probability distribution（概率分布），然后从这个分布里 sample（采样），把这个 token 移入输入中，再生成下一个 token，于是你就可以做 autoregressive prediction（自回归预测）。

Speaker 2 | 1:15:13 - 1:15:34 Okay? So that's a special case. If you have the real world, you can't use a generative model. So now you have to train a system that learns a representation and makes prediction in the representation space. There's a big issue with this, which I didn't think until about five years ago that was easily solvable, even though I invented one technique to solve it, you know, decades before that.

好吗？所以这是一个特殊情况。如果你面对的是真实世界，就不能使用 generative model（生成式模型）。于是你就必须训练一个系统，让它学会一种 representation（表征），并在这个 representation space（表征空间）里做 prediction（预测）。这里有个大问题，大约五年前之前我一直觉得它不太容易解决，尽管我其实早在几十年前就发明过一种解决它的技术。

Speaker 2 | 1:15:35 - 1:16:10 And it's the problem that if you take two inputs, let's say the initial segment of a video and the continuation of that video, or you take one image and a corrupted version of it, you run them both through an encoder, and you train a predictor to predict the representation of one from the representation of the other. Is a very simple solution where the system basically predicts a constant representation. Another prediction problem becomes trivial. That's called a collapse, representation collapse. So the big question of self supervised learning for JEPPA, for the joint embedding architecture, is how do you prevent collapse?

这个问题是：如果你拿两个输入，比如一段视频的起始片段和这段视频的后续内容，或者你拿一张图像和它的一个受损版本，把它们都送进一个 encoder（编码器），然后训练一个 predictor（预测器），根据其中一个的 representation 去预测另一个的 representation，那么有一种非常简单的“解法”，就是系统基本上总是预测一个常数 representation。这样一来，整个预测问题就变得微不足道了。这就叫 collapse（坍塌），也就是 representation collapse（表征坍塌）。所以，对于 JEPA，也就是 joint embedding architecture（联合嵌入架构）的 self-supervised learning（自监督学习）来说，核心问题就是：你要如何防止 collapse？

Speaker 2 | 1:16:10 - 1:16:33 Yep. The solution that I came up with many years ago, 1993, is contrastive learning. So basically, you have examples of things that should be predictable predictable from one another and an example of things that should not be predictable from one another. It turns out this method works, but it doesn't scale with dimension. Doesn't scale very well.

对。很多年前，1993 年，我想到的解决方案是 contrastive learning（对比学习）。基本上，你会有一些样本，它们彼此之间应该是可预测的；也会有一些样本，它们彼此之间不应该是可预测的。事实证明，这个方法是有效的，但它在维度升高时无法很好扩展，scaling（可扩展性）不太行。

Speaker 2 | 1:16:34 - 1:17:17 There's another technique that was actually invented by Jeff Hinton and Sue Becker in the late nineties late eighties, I'm sorry, where you have those two networks, and you try to maximize the mutual information between them. Juergen Schmidhuber is mad at me because he also came up with a version of this in 1992, and he says that's JEPA. It's not JEPA. It's just another way of preventing collapse of a joint embedding architecture, okay, which is fine, but it's not you It's a particular way of doing it, which I don't think is particularly good. So, okay, so now you have this JEPR architecture.

还有另一种技术，其实是 Jeff Hinton 和 Sue Becker 在八十年代末发明的——抱歉，不是九十年代末——做法是你有这两个 network（网络），然后试图最大化它们之间的 mutual information（互信息）。Juergen Schmidhuber 对我有意见，因为他在 1992 年也提出过一个版本，他说那就是 JEPA。那不是 JEPA。那只不过是防止 joint embedding architecture 发生 collapse 的另一种方式，明白吗？这当然没问题，但它不是 JEPA 本身，而只是其中一种具体实现方式，而且我并不觉得这种方式特别好。所以，好，现在你有了这个 JEPA 架构。

Speaker 2 | 1:17:17 - 1:17:39 You have to come up with a good way of preventing collapse, and there is a couple of ways. So as already said, Cochati methods, I think, is not a good approach. There's another set of methods that are kind of called distillation methods, and they do prevent collapse. We we don't know why. So a good example of that is Dino or Dyno.

你必须想出一种好的办法来防止 collapse，而办法有几种。正如我刚才说的，contrastive methods（对比方法）我认为不是一个好路子。还有一类方法，通常被叫做 distillation methods（蒸馏方法），它们确实能防止 collapse。只是我们不知道为什么。一个很好的例子就是 DINO。

Speaker 2 | 1:17:39 - 1:17:59 Yep. That's a method using the distillation method. Basically, one of the encoders trains the other one, is, like, used as a teacher for the other encoder. And the encoder that is being trained, you do backprop to it. The one that is not being trained, you don't do backprop, but you share the weight with the other one with some exponential moving average.

对。这是一种使用 distillation 方法的方案。基本上，其中一个 encoder 训练另一个，它像是另一个 encoder 的 teacher（教师）。对于那个被训练的 encoder，你会对它做 backprop（反向传播）；而那个不被直接训练的 encoder，你不对它做 backprop，但你会通过 exponential moving average（指数滑动平均）让它与另一个共享权重。

Speaker 2 | 1:18:00 - 1:18:14 It's a collection of recipe. There's a paper from from DeepMind about it called BYOL, Boostrap Pure One Latent, which uses this trick. That trick is derived from some intuition from reinforcement learning. And somehow, it prevents collapse, but we don't know why. Okay?

这更像是一套 recipe（配方）的集合。DeepMind 有一篇相关论文叫 BYOL，Boostrap Your Own Latent，它就用了这个技巧。这个技巧源自 reinforcement learning（强化学习）中的一些直觉。不知怎么地，它确实能防止 collapse，但我们并不知道为什么。好吗？

Speaker 2 | 1:18:14 - 1:18:35 There's a few theoretical papers on it that explain why it possibly might work in some simple cases, but it's not satisfactory. The function, the cost function you think you're minimizing, you're not actually minimizing, and so you can't monitor. It actually goes up when you train. I mean, makes sense. So we don't like this method, but it works.

目前有少数几篇理论论文试图解释它为什么在一些简单情形下可能有效，但这些解释并不能令人满意。你以为自己在最小化的那个 cost function（代价函数），实际上并没有真的被最小化，所以你也没法监控它。训练过程中它反而会上升。我的意思是，这在某种意义上也说得通。所以我们不喜欢这种方法，但它确实有效。

Speaker 2 | 1:18:36 - 1:19:15 And some of the models we've trained, large scale video representation learning system, VJPA, VJPA two, VJPA 2.1, They train using this method, iJPA also. But we're moving away from this, and now we have a few papers that came out recently on a explicit regularizer to prevent this collapse, which basically tries to maximize the information content coming out of the encoder. So it's in the same family as the Becker and Hinton Yeah. From '89 and the Schmidluber 1992 and a bunch of others since then. And to some extent, also, contrastive techniques although it's not sample contrastive.

我们训练过的一些模型，比如大规模视频表征学习系统 VJPA、VJPA two、VJPA 2.1，也包括 iJPA，都是用这种方法训练的。不过我们现在正逐渐离开这种方法，最近也出了几篇论文，提出一种 explicit regularizer（显式正则项）来防止这种 collapse（坍塌）；它的核心基本上是尽量最大化 encoder（编码器）输出的信息含量。所以它和 Becker and Hinton 在 1989 年的工作、Schmidluber 在 1992 年的工作，以及此后很多相关工作，属于同一个谱系。从某种程度上说，它也和 contrastive techniques（对比式技术）有关，虽然它不是 sample contrastive（样本对比）那一类。

Speaker 2 | 1:19:16 - 1:19:34 And then the question is, how do you measure information content? How do you maximize the information content coming out of a neural net? And the problem is if you want to maximize the quantity, you either need to be able to measure it, or you need to have a lower bound on it. Yep. Information content, we only have upper bounds.

接下来的问题就是，怎么衡量信息含量？怎么让一个 neural net（神经网络）输出的信息含量最大化？问题在于，如果你想最大化某个量，你要么得能测量它，要么得有它的 lower bound（下界）。对。可对于信息含量，我们手里只有 upper bounds（上界）。

Speaker 2 | 1:19:34 - 1:19:46 We cannot measure it. We can only come up with upper And so we take an upper bound, and we cross our fingers. Okay? And it kinda works. So the latest one is called SIGREG.

我们没法直接测量它。我们只能想办法给出 upper bound（上界）。所以我们就拿一个 upper bound（上界）来做，然后只能碰碰运气。明白吧？不过它多少是有效的。最新那个方法叫 SIGREG。

Speaker 2 | 1:19:46 - 1:20:23 That means sketch isotropic Gaussian regularization. We had a previous one called VCREG or VicREG, variance, invariance, covariance, regularization. And the CGREG stuff is really cool. So this is some work by Randall Balestriero, who was a postdoc with me, a assistant professor at Brown now. And it basically consists in forcing the distribution of variables coming out of the encoder to be joint Gaussian, essentially, sort of maximize information, if you want.

它的意思是 sketch isotropic Gaussian regularization。我们之前有一个叫 VCREG 或 VicREG 的方法，意思是 variance（方差）、invariance（不变性）、covariance（协方差）、regularization（正则化）。而这个 CGREG 的东西非常酷。这是 Randall Balestriero 做的一项工作，他之前是跟我的 postdoc（博士后），现在是 Brown 的 assistant professor（助理教授）。它的基本思路是强制 encoder（编码器）输出变量的分布近似成为 joint Gaussian（联合高斯分布）；从某种意义上说，你可以把它看成是在最大化信息。

Speaker 2 | 1:20:23 - 1:21:02 It's just a very different way of doing it than, you know, what Juergen Schmiduber and Sue Becker and Jeff Hinton were doing. And so is super promising, in my opinion. And we have, you know, variations of it, you know, one that can produce sparse representations, another one that can produce isotropic representations, but not necessarily Gaussians. And we have a paper with Randall, a student at Mila, Luca Meiss, where we train a world model with this. It's still small scale, but we think it's super promising.

这和 Juergen Schmiduber、Sue Becker、Jeff Hinton 当年做的方式非常不一样。所以在我看来，这条路线非常有前景。我们也做了一些变体，比如一种可以产生 sparse representations（稀疏表征），另一种可以产生 isotropic representations（各向同性表征），但不一定是 Gaussian（高斯）的。我们还和 Randall、Mila 的一位学生 Luca Meiss 合写了一篇论文，在里面我们用这种方法训练了一个 world model（世界模型）。目前规模还比较小，但我们觉得它非常有希望。

Speaker 2 | 1:21:02 - 1:21:10 So if you wanna read one paper, read that paper. It's Le world model, l e world model.

所以如果你只想读一篇论文，那就读那篇。名字叫 Le world model，l e world model。

Speaker 1 | 1:21:10 - 1:21:11 Awesome. We'll definitely link to it too.

太好了。我们肯定也会把它链接上。

Speaker 2 | 1:21:11 - 1:21:15 Yeah. I'm I'm not responsible for the name. Randall picked up the name.

对，那个名字不是我起的，我不负责。是 Randall 取的。

Speaker 1 | 1:21:15 - 1:21:21 Amazing. Well, Jan, seriously, thank you so much. It is such a a privilege to get to spend the the last the bit of time with you.

太棒了。Jan，说真的，太感谢你了。能和你一起度过最后这一点时间，真的是一种荣幸。

Speaker 2 | 1:21:21 - 1:21:22 Well, thanks. Thanks for

嗯，谢谢。谢谢你——

Speaker 1 | 1:21:22 - 1:21:23 having appreciate coming on the podcast.

——邀请我上这个 podcast（播客），我非常感激。

Speaker 2 | 1:21:23 - 1:21:24 Thanks for having me. That was fun.

谢谢你邀请我。很有意思。

Speaker 1 | 1:21:24 - 1:21:50 I'm Jacob Efron, and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project in addition to my day job as an investor at Redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work.

我是 Jacob Efron，这里是 Unsupervised Learning。这是一档 podcast（播客），我会在这里和 AI 领域最聪明的人交流，问他们大量关于 models（模型）正在发生什么，以及这对全球企业意味着什么的问题。希望大家已经听出来了，我做这件事真的非常开心。除了我在 Redpoint 担任 investor（投资人）的本职工作之外，这还是一个利用晚上和周末时间在做的项目。但我们之所以能请到这些了不起的嘉宾，确实离不开像你们这样订阅这档 podcast（播客）、并把它分享给朋友的人。归根结底，正是这些支持让整件事得以运转。

Speaker 1 | 1:21:50 - 1:21:55 And so please consider doing that, and thank you so much for your support and listening. We'll see you next episode.

所以，也请你考虑这么做，非常感谢你的支持和收听。我们下期再见。

原文 ↗https://www.youtube.com/@RedpointAI