BuildSpeak每日 builder 文摘
今日归档生词本关于
🎙 播客Training Data· 2026 年 4 月 30 日· 5,735 词 · 约 29 分钟

Andrej Karpathy: From Vibe Coding to Agentic Engineering

SPACE 播放 / 暂停·←→ 上一句 / 下一句
Speaker 100:02 - 00:42
We're so excited for our very first special guest. He has helped build modern AI, then explain modern AI, and then occasionally rename modern AI. He actually helped co found Open AI right inside of this office, was the one who actually got autopilot working at Tesla back in the day and he has a rare gift of making the most complex technical shifts feel both accessible and inevitable. You all know him for having coined the term vibe coding last year, but just in the last few months, he said something even more startling, that he's never felt more behind as a programmer. That's where we're starting today.
Speaker 100:02 - 00:42
我们非常兴奋能请来第一位特别嘉宾。他帮助构建了现代 AI,然后又帮助解释现代 AI,偶尔还会给现代 AI 重新命名。他其实就在这间办公室里参与共同创立了 Open AI,当年也是那个真正让 Tesla 的 autopilot 跑起来的人,而且他有一种罕见的能力,能把最复杂的技术转变讲得既通俗易懂,又像是不可避免。大家都知道他去年提出了 vibe coding 这个说法,但就在过去几个月里,他又说了一句更令人吃惊的话:作为程序员,他从未感觉自己如此落后。我们今天就从这里开始。
Speaker 100:42 - 00:44
Thank you, Andre, for joining us.
Speaker 100:42 - 00:44
Andre,感谢你加入我们。
Speaker 200:44 - 00:46
Yeah. Hello. Excited to be here and to kick us off.
Speaker 200:44 - 00:46
好。大家好。很高兴来到这里,也很高兴为今天的讨论开场。
Speaker 100:46 - 00:59
Okay. So just a couple months ago, you said that you've never felt more behind as a programmer. That's startling to hear from you of all people. Can you help us unpack that? Was that feeling exhilarating or unsettling?
Speaker 100:46 - 00:59
好,就在几个月前,你说自己作为程序员从未感觉如此落后。这样的话从你口中说出来,确实很让人吃惊。你能帮我们展开讲讲吗?这种感觉是令人兴奋,还是让人不安?
Speaker 200:59 - 01:24
Yeah, a mixture of both for sure. Well, first of all, I guess like as many of you, I've been using agentic tools like LotCode, adjacent things for a while, maybe over the last year as it came out. It was very good at chunks of code, and sometimes it would mess up and you have to edit them, and it was kind of helpful. Then I would say December was this clear point where for me, I was on a break, so I had a bit more time. I think many other people were similar.
Speaker 200:59 - 01:24
对,肯定两者都有。首先,我想就像你们很多人一样,我已经用了 agentic tools(agent 驱动工具),比如 LotCode,以及一些相邻的东西有一段时间了,也许是过去一年里,随着它们陆续出现开始用的。它们在处理一段一段的代码时已经很好用了,当然有时也会出错,你还得自己修改一下,总体来说算是挺有帮助的。然后我会说,December 是一个非常明确的时间点;对我来说,当时我正在休息,所以时间稍微多一点。我觉得很多其他人当时的情况也类似。
Speaker 201:24 - 01:43
And I just started to notice that with the latest models, the chunks just came out fine. And then I kept asking for more and just came out fine. And then I can't remember the last time I corrected it. And then I was I just, you know, trusted the system more and more, and then I was bad coding. And so it was kind of a I do think that it was a very stark transition.
Speaker 201:24 - 01:43
然后我开始注意到,随着最新一代 models(模型)出现,那些代码片段生成出来就是对的。接着我不断要求它做更多,结果还是对的。然后我已经想不起来自己上一次去纠正它是什么时候了。再后来,我就只是,你知道的,越来越信任这个系统,接着我就开始变成 bad coding(不怎么亲自写代码)了。所以我确实觉得,那是一次非常鲜明、非常剧烈的转变。
Speaker 201:43 - 02:27
I think that a lot of people actually I tried to stress this on Twitter and or X because I think a lot of people experienced AI last year as a chattyptu adjacent thing, but you really had to look again and you had to look as of December because things have changed fundamentally, especially on this agentic coherent workflow that really started to actually work. So I would say that, yeah, it was just that realization that really had me go down the whole rabbit hole of just, you know, Infinity side projects. My side projects folder is, like, extremely full with lots of random things and just by coding all the time. So, yeah, that kind of happened in December, I would say. And I was looking at the repercussions of that since.
Speaker 201:43 - 02:27
我觉得其实很多人都是这样。我之前试着在 Twitter,或者说 X 上强调这一点,因为我认为很多人去年体验 AI 时,把它当成一个 chattyptu adjacent thing(一种类似聊天机器人的东西),但你真的必须重新看一遍,而且得以 December 之后的状态来看,因为事情已经发生了根本性的变化,尤其是在这种 agentic coherent workflow(agent 式连贯工作流)上,它真的开始实际运转起来了。所以我会说,是那种意识上的突然领悟,让我一头扎进了整个兔子洞——去做几乎无限多个 side projects(副项目)。我的 side projects 文件夹现在特别满,里面塞满了各种随机的东西,因为我一直都在不停地 coding(写代码)。所以,对,这大概就是 December 发生的事。此后我一直在观察它带来的各种 repercussions(连锁影响)。
Speaker 102:27 - 02:49
You've talked a lot about this idea of LLMs as a new computer, that it isn't just better software, it's a whole new computing paradigm. And software one point zero was explicit rules. Software two point zero was learned weights. Software three point zero is this. If that's actually true, what does a team build differently the day they actually believe this?
Speaker 102:27 - 02:49
你谈过很多关于“LLMs(大语言模型)是一种新计算机”的想法:它不只是更好的软件,而是一个全新的 computing paradigm(计算范式)。Software one point zero 是显式规则,Software two point zero 是 learned weights(学习得到的权重),Software three point zero 就是现在这个东西。如果这真的成立,那么当一个团队真正相信这一点的那一天,他们会以什么不同的方式来构建东西?
Speaker 202:50 - 03:03
Right. Yeah, exactly. So software one point zero, I'm writing code. Software two point zero, I'm actually programming by creating data sets and training neural networks. So the programming is kind of like arranging data sets and maybe some objectives and neural network architectures.
Speaker 202:50 - 03:03
对,没错。所谓 software 1.0,就是我在写 code。software 2.0,则是我通过创建 data set(数据集)和训练 neural network(神经网络)来进行编程。所以,这时的编程有点像是在整理 data set,或许再加上一些 objective(目标)和 neural network architecture(神经网络架构)。
Speaker 203:04 - 03:52
And then what happened is that basically, if you train one of these GPT models or LLMs on a sufficiently large set of tasks, implicitly, because by training on the Internet, you have to multitask all the things that are in the data These actually become kind of like a programmable computer in a certain sense. So software three point zero is kind of about your programming now turns to prompting, and what's in the context window is your lever over the interpreter that is the LLM, that is interpreting your context and performing computation in the digital information space. I guess that's kind of the transition. And I think there's a few examples that really drove it home for me, and maybe that might be instructive. For example, when OpenCLaw came out, when you want to install OpenCLaw, you would expect that normally this is a bash bash script, like a shell script.
Speaker 203:04 - 03:52
然后发生的是,基本上,如果你在足够大的 task(任务)集合上训练其中一种 GPT model 或 LLM(large language model,大语言模型),由于你是在 Internet 上训练,它必须对数据中的各种事情进行 multitask(多任务处理),于是这些模型在某种意义上就变得有点像一台可编程计算机。所以 software 3.0 的核心在于,你的编程现在转向了 prompting(提示词编程);而 context window(上下文窗口)里的内容,就是你用来操纵这个 interpreter(解释器)——也就是 LLM——的杠杆。它会解释你的上下文,并在数字信息空间里执行 computation(计算)。我想,这大概就是这种转变。我觉得有几个例子特别让我真正意识到这一点,也许会很有启发性。比如,当 OpenCLaw 发布时,如果你想安装 OpenCLaw,通常你会以为这应该是一个 bash script,也就是 shell script。
Speaker 203:52 - 04:29
So run the shell script to run to install OpenCLaw. But the thing is that in order to target lots of different platforms and lots of different types of computers you might run-in OpenCLaw, these shell scripts usually balloon up and become extremely complex. But the thing is you're still stuck in a software one point zero universe of wanting to write the code, and actually the OpenCLaw installation is a copy paste of a bunch of text that you're supposed to give to your agent. So basically, it's a little skill of copy paste this and give it to your agent, and it will install OpenCLaw. And the reason this is a lot more powerful is you're working now in the software three point zero paradigm where you don't have to precisely spell out all the individual details of that setup.
Speaker 203:52 - 04:29
也就是说,运行这个 shell script 来安装 OpenCLaw。但问题在于,为了适配很多不同的平台,以及很多你可能运行 OpenCLaw 的不同类型计算机,这些 shell script 往往会迅速膨胀,变得极其复杂。可你本质上仍然被困在 software 1.0 的宇宙里,还是想去写那段 code。实际上,OpenCLaw 的安装方式是:复制粘贴一段文本,把它交给你的 agent。基本上,它就是一种小技巧:把这段内容 copy paste 给你的 agent,它就会安装 OpenCLaw。而这之所以强大得多,是因为你现在是在 software 3.0 的 paradigm(范式)下工作,你不再需要把这个 setup(配置)的每一个细节都精确地写死。
Speaker 204:29 - 04:48
The agent has its own intelligence that it packages up, and then it follows the instructions. It looks at your environment, your computer, and it performs intelligent actions to make things work and debugs things in the loop. It's just so much more powerful. I think that's a very different way of thinking about it. It's just like, what is the piece of text to copy paste to your agent?
Speaker 204:29 - 04:48
agent 会打包并运用它自身的 intelligence(智能),然后按照指令行动。它会查看你的 environment(环境)、你的 computer,并执行智能操作让一切正常工作,还会在循环中调试问题。它的能力就是强大得多。我觉得这是一种完全不同的思考方式。就像是:要复制粘贴给你的 agent 的那段文本到底是什么?
Speaker 204:48 - 05:15
That's the programming paradigm now. I think one more example that comes to mind that is even more extreme than that is when I was building Menujen. So Menujen is this idea where you come to a restaurant, they give you a menu, there's no pictures usually, so I don't know what any of these things are. Usually, I like 30% of the things, have no idea what they are, 50%. I wanted to take a photo of the restaurant menu and to get pictures of what those things might look like in a generic sense.
Speaker 204:48 - 05:15
这就是现在的编程范式。我想到的另一个例子,比刚才那个还要更极端,就是我在做 Menujen 的时候。Menujen 的想法是:你去一家餐厅,他们给你一份 menu,通常没有图片,所以我根本不知道这些菜是什么。一般来说,大概有 30% 的东西我喜欢,另外 50% 我完全不知道是什么。我想做的是拍一张餐厅菜单的照片,然后以一种通用的方式,得到这些菜大概会长什么样的图片。
Speaker 205:16 - 06:05
I've encoded this app that basically lets you upload a photo, and it does all this stuff, and it runs on Vercel, and it basically re renders the menu and it gives you all the items and it gives you a picture that it uses an image generator for to basically OCR all the different titles, use the image generators to get pictures of them, then shows it to you. And then I saw the software three point zero version of this, which blew my mind, which is literally just take your photo, give it to Gemini, and say use Nanobanana to overlay the things onto the menu. Nanobanana basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels, it rendered the different things in the menu. And this blew my mind because, actually, all of my menu gen is spurious. It's working in the old paradigm.
Speaker 205:16 - 06:05
我写了这个 app,它基本上可以让你上传一张照片,然后它会完成这一整套流程;它跑在 Vercel 上,基本上会重新渲染菜单,把所有菜品列出来,并给你一张图片。它会用 image generator(图像生成器)先对不同标题做 OCR(光学字符识别),再用 image generator 生成它们的图片,然后展示给你看。后来我看到了这个东西的 software 3.0 版本,这真的让我震惊:做法居然就是直接把你的照片交给 Gemini,然后说,用 Nanobanana 把这些内容叠加到菜单上。Nanobanana 基本上返回了一张图,和我拍的菜单照片一模一样,但它实际上直接在像素里把菜单上的不同内容渲染了出来。这让我大受震撼,因为说到底,我做的整个 Menujen 其实都是多余的。它是在旧范式下工作的。
Speaker 206:05 - 06:32
That app shouldn't exist. Yeah, the software three point zero paradigm is a lot more kind of raw. It just your neural network is doing more and more of the work, and your prompt or context is just the image, and the output is an image. There's no need to have any of the app in between. Think that people have to reframe not to work in the existing paradigm of what things existed and just think about it as a speed up of what exists.
Speaker 206:05 - 06:32
那个 app 本来就不该存在。对,software 3.0 范式要原始得多。就是你的 neural network 在做越来越多的工作,而你的 prompt(提示词)或 context(上下文)仅仅就是那张图片,输出也是一张图片。中间根本不需要那个 app。我觉得人们需要重新调整思路,不要继续停留在既有范式里,不要只把它看作是对现有事物的加速。
Speaker 206:33 - 06:52
It's actually like new things are available now. And going back to your programming question, it's not even I think that's also an example of working in old mindset because it's not just about programming and programming becoming faster. This is more general information processing that is automatable now. So it's not just even about code. So previous code worked over kind of like structured data, right?
Speaker 206:33 - 06:52
现在其实是有全新的东西变得可用了。回到你刚才关于编程的问题,我甚至觉得,那也是一种旧思维方式的体现,因为这不只是编程,也不只是让编程变快而已。这其实是更广义的信息处理,现在都可以被自动化了。所以,事情甚至不只是关于 code。因为以前的 code,处理的有点像是 structured data(结构化数据),对吧?
Speaker 206:52 - 07:20
And you write code over structured data. But like for example, with my LLM knowledge basis project, basically you get LLMs to create wikis for your organization or for you in person, etcetera. This is not even a program. This is not something that could exist before because there was no there was no code that would create a knowledge base based on a bunch of facts. But now you can just take these documents and basically recompile them in a different way and reorder them and create something that is new and interesting as a reframing of the data.
Speaker 206:52 - 07:20
你是在结构化数据之上写代码。但比如说,以我的 LLM knowledge basis project 为例,本质上你是在让 LLM 为你的组织,或者为你个人,等等,创建 wiki。这甚至都不算一个程序。这种东西以前根本不可能存在,因为过去并没有代码能够基于一堆事实自动创建 knowledge base(知识库)。但现在,你可以直接拿这些文档,基本上用另一种方式把它们重新编译、重新排序,然后创造出新的、有趣的东西,作为对这些数据的一种重新框定。
Speaker 207:20 - 07:36
These are new things that weren't possible. I think this is something that I keep trying to get back to as to not only what can we do that existed that is faster now, but I think there's new opportunities of just things that couldn't be possible before, I almost think that that's more exciting.
Speaker 207:20 - 07:36
这些都是以前做不到的新东西。我觉得这是我一直想强调的一点:不只是那些原本就能做、现在只是变快了的事情;我认为还出现了新的机会——一些以前根本不可能实现的事情,我甚至觉得这更让人兴奋。
Speaker 107:37 - 08:07
I love the menu gen progression and dichotomy that you laid out, and I think even I'm sure many folks here followed your own progression of programming from last October to early January, February year. If you extrapolate that further, what is the 2026 equivalent for building websites in the '90s, building mobile apps in the 2010s, building SaaS in the last cloud era. What will look completely obvious in hindsight that is still mostly unbuilt today?
Speaker 107:37 - 08:07
我很喜欢你提出的 menu gen 的演进路径和二分结构,而且我想,这里很多人肯定也都经历了你自己那种编程方式的演进——从去年 10 月一路到今年年初的 1 月、2 月。如果把这个趋势继续外推,那么到了 2026 年,什么会相当于 90 年代做网站、2010 年代做 mobile app、上一轮 cloud 时代做 SaaS?今天仍然大多还没被建出来、但事后回看会显得完全显而易见的东西,会是什么?
Speaker 208:07 - 08:38
Well, going with the example of menu gen, I guess. So a lot of this code shouldn't exist and it's just neural network doing most of the work. I do think that the extrapolation looks very weird because you could basically imagine I don't think yeah. So you could imagine completely neural computers in a certain sense. You feed raw videos Imagine a device, it takes raw videos or audio into basically what's a neural net and uses diffusion to render a UI that is unique for that moment in a certain sense.
Speaker 208:07 - 08:38
嗯,还是沿着 menu gen 这个例子来说吧。我猜,这里面很多代码本来就不该存在,真正做大部分工作的其实只是 neural network(神经网络)。我确实觉得,继续外推下去会显得非常奇怪,因为你基本上可以想象——我觉得是可以的——某种意义上完全由 neural net 构成的 computer。你输入原始视频,想象这样一种设备:它把原始视频或音频送进一个本质上就是 neural net 的东西里,再用 diffusion(扩散模型)渲染出一个在那个时刻独一无二的 UI。
Speaker 208:40 - 09:03
I kind of feel like in the early days of computing, actually, people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets. And in fifties and sixties, it was not really obvious which way it would go. And, of course, we went down the calculator path and ended up building classical computing. And then neural nets are currently running virtualized on existing computers. But you could imagine, I think, that a lot of this will flip and that the neural net becomes kind of like the host process.
Speaker 208:40 - 09:03
我有点觉得,在 computing 的早期,人们其实对 computers 到底会长得像 calculators,还是会长得像 neural nets,是有些困惑的。在五六十年代,往哪个方向走并不真的显而易见。当然,后来我们走上了 calculator 这条路径,最终构建了 classical computing。而 neural nets 目前则是在现有 computers 上以 virtualized(虚拟化)的方式运行。但我想,你也可以设想,很多东西会反转过来,neural net 会变得有点像 host process(宿主进程)。
Speaker 209:03 - 09:29
And the CPU has become kind of like the coprocessor. So we saw the diagram of, you know, intelligence compute is going to of neural networks is going to take over and become the dominant spend of flops. So you could imagine something really weird and foreign when where neural nets are doing most of the heavy lifting. They're using tool use as just like, you know, historical appendage for some kinds of, like, deterministic tasks. But what's really running the show is these neural nets that are networked in a certain way.
Speaker 209:03 - 09:29
而 CPU 则变得有点像 coprocessor(协处理器)。所以我们看到了那张图,你知道的,intelligence compute——也就是 neural networks 的计算——将会接管,并成为 FLOPs 支出的主要部分。于是你可以想象一种非常奇怪、非常陌生的局面:neural nets 在承担绝大多数繁重工作,它们把 tool use 只是当作一种历史遗留的附属物,用来处理某些 deterministic(确定性)的任务。但真正掌控全局的,是这些以某种方式互联起来的 neural nets。
Speaker 209:30 - 09:40
So you can imagine something extremely foreign as the extrapolation, but I think we're gonna probably get there sort of piece by piece. That progression is TBD, I would say.
Speaker 209:30 - 09:40
所以,继续外推下去,你可以想象出某种极其陌生的东西;但我觉得,我们大概率会以一种渐进、一步一步的方式走到那里。至于这个演进过程具体会怎样,我会说目前还是 TBD。
Speaker 109:40 - 10:01
I'd love to talk a little bit about this concept of verifiability, the fact that AI will automate faster and more easily domains where the output can be verified. If that framework is right, what work is about to move much faster than people realize? And what professions do we have that people actually think are safe, but that are actually highly verifiable?
Speaker 109:40 - 10:01
我很想再多聊一点 verifiability(可验证性)这个概念:也就是 AI 会更快、更容易地自动化那些输出结果可以被验证的领域。如果这个框架是对的,那么哪些工作马上就会比人们意识到的快得多地发生变化?又有哪些职业,人们其实以为它们很安全,但实际上却具有很高的 verifiability?
Speaker 210:02 - 10:59
Yes. So spent some time writing about verifiability, and basically traditional computers can easily automate what you can specify in code, and this latest round of LLMs can easily automate what you can verify in a certain sense. Because the way this works is that when frontier labs are training these LLMs, these are giant reinforcement learning environments, so they are given verification rewards, and then because of the way that these models are trained, they end up basically progressing and creating these jagged entities that really peak in capability in verifiable domains, like math and code and adjacent, and stagnate and are a little bit rough around the edges when things are not in that space. I think the reason I wrote about verifiability is I'm trying to understand why these things are so jagged. And some of it has to do with how the labs train the models, but I think some of it also has to do with the focus of the labs and what they happen to put into the data distribution.
Speaker 210:02 - 10:59
是的。所以我花了一些时间写 verifiability(可验证性)这个话题。基本上,传统计算机很容易自动化那些你能用代码明确规定的事情,而这一轮最新的 LLMs 则在某种意义上很容易自动化那些你能够验证的事情。其运作方式是:当前沿实验室在训练这些 LLMs 时,本质上是在巨大的 reinforcement learning(强化学习)环境里进行训练;模型会因“可验证”的结果而获得奖励。由于这些模型的训练方式,它们最终会逐步形成某种“参差不齐”的能力结构:在 math、code 以及邻近的、可验证的领域里能力特别突出,但一旦问题不在这个空间里,表现就会停滞,而且边缘部分会显得有些粗糙。我写 verifiability 的原因,是我想理解为什么这些东西会如此 jagged(锯齿状、参差不齐)。一部分原因和实验室如何训练模型有关,但我觉得还有一部分原因在于实验室的关注重点,以及他们恰好放进 data distribution(数据分布)里的内容。
Speaker 211:00 - 11:34
Because some things basically are significantly more valuable in economy and end up creating more environments because the labs wanted to work in those settings. So I think code is a good example of that. There's probably lots of verifiable environments they could think about that happen not to make it into the mix because they're just not that useful to have the capability around. But I think to me, big I guess, like the big mystery is the favorite example for a while was that how many letters are are in the strawberry, and the models would famously get this wrong, and it's an example of jaggedness. The models now patch this, I think, but the new one is I wanna go to a car wash to wash my car and it's 50 meters away.
Speaker 211:00 - 11:34
因为有些事情在经济上明显更有价值,于是最终会形成更多相关环境,因为实验室本来就想在那些场景里发力。所以我觉得 code 是一个很好的例子。理论上他们大概还能想到很多可验证的环境,但那些并没有被纳入整体组合,只是因为围绕那些能力进行投入并没有那么有用。不过对我来说,最大的谜团之一是:之前大家很喜欢举的一个例子是 strawberry 这个词里有几个字母 r,而模型经常会答错,这就是 jaggedness(能力参差不齐)的一个例子。我觉得模型现在大概已经把这个问题补上了,但新的例子是:我想去 car wash 洗车,而它离我只有 50 米。
Speaker 211:34 - 12:11
Should I drive or should I walk? And state of the art models today will tell you to walk because it's so close. How is it possible that state of the art Opus 4.7 will simultaneously refactor a 100,000, like, code base a line code base or find zero day vulnerabilities and yet tells me to walk to this car wash? This is insane. And to whatever extent these models remain jagged, it's an indication that number one, maybe something slightly off or number two, you need to actually be in the loop a little bit and you need to treat them as tools and you do have to kind of stay in touch with what they're doing.
Speaker 211:34 - 12:11
我应该开车还是走路?而今天最先进的模型会告诉你走路去,因为距离太近了。怎么会出现这种情况:最先进的 Opus 4.7 一方面能同时重构一个 100,000 行代码规模的 code base,或者找出 zero day vulnerabilities(零日漏洞),另一方面却告诉我应该走路去这个 car wash?这太离谱了。而只要这些模型仍然保有这种 jagged 的特性,就说明第一,也许有些地方略微不对劲;第二,你确实需要在 loop(回路)里参与一点,你需要把它们当作工具来使用,而且你确实得持续关注它们到底在做什么。
Speaker 212:12 - 12:40
And so I think all of my writing, long story short, about verifiability, just trying to understand why these things are jagged, is there any pattern to it, and I think it's some kind of a combination of verifiable plus labs care. Maybe one more anecdote that is instructive is from GPT 3.5 to GPT four, people noticed that chess improved a lot. And I think a lot of people thought, oh, well, it's just a progression of the capabilities. But actually, it's more that I think this is public information. I think I saw it on the Internet.
Speaker 212:12 - 12:40
所以长话短说,我所有关于 verifiability 的写作,都是在试图理解为什么这些东西会是 jagged 的,这里面有没有某种规律。我觉得答案大概是 verifiable(可验证)加上 labs care(实验室重视)这两者的某种组合。再讲一个或许很有启发性的 anecdote(轶事):从 GPT 3.5 到 GPT four,大家注意到 chess 水平提升了很多。我想很多人当时以为,哦,这只是能力自然演进的一部分。但实际上,更准确地说,我觉得——这应该是公开信息——我想我是从 Internet 上看到的。
Speaker 212:40 - 13:15
A huge amount of, like, data of chess made it into the pre training set. And just because in the data distribution, basically, the model improved a lot more than it would just by default. So someone at OpenAI decided to add this data, and now you have a capability that just peaked a lot more. And so that's why I think I'm stressing this dimension of it as we are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix, and you have to actually explore this thing that they give you that has no manual. And it works in certain settings, but maybe not in some settings, and you have to kind of explore it a little bit.
Speaker 212:40 - 13:15
有大量 chess 数据被放进了 pre training set(预训练集)里。正是因为 data distribution 里有了这些内容,模型的这项能力提升幅度远大于它默认自然会达到的水平。也就是说,OpenAI 里有人决定加入这些数据,于是你就得到了一项明显更高峰化的能力。所以这也是为什么我一直在强调这个维度:在某种程度上,我们略微受制于这些实验室在做什么、他们恰好往混合数据里放了什么。你必须自己去探索他们交给你的这个没有 manual(说明书)的东西。它在某些场景里有效,但在另一些场景里也许不行,而你必须亲自摸索一下。
Speaker 213:15 - 13:35
And if you're in the circuits that were part of the RL, you fly. And if you're in the circuits that are out of the data distribution, you're going to struggle. And you have to kind of figure out which which circuits you're in in your application. And if you and if you're not in the circuits, then you have to really look at fine tuning and doing some of your own work because it's not gonna necessarily come out of the LLM out of the box.
Speaker 213:15 - 13:35
如果你的应用落在那些 RL 过程中覆盖到的 circuits(回路)里,你就会飞起来;如果你落在 data distribution 之外的 circuits 里,你就会很吃力。你必须搞清楚,在你的应用中自己到底处在哪些 circuits 里。如果你不在那些 circuits 里,那你就必须认真考虑 fine tuning(微调),做一些你自己的工作,因为这些能力不一定会从 LLM 开箱即用地直接冒出来。
Speaker 113:36 - 14:04
I'd love to come back to the concept of jagged intelligence in a little bit. If you are a founder today and thinking about building a company, you are trying to solve a problem that you think is tractable, something that is a domain that is verifiable, But you look around and you think, oh my gosh, well the labs have really, really started getting to escape velocity in the ones that seem most obvious, math, coding, and others. What would your advice be to the founders in the audience?
Speaker 113:36 - 14:04
我很想过一会儿再回到 jagged intelligence(参差智能)这个概念。如果你今天是一位 founder,正在考虑建立一家公司,你想解决一个自己认为 tractable(可处理、可攻克)的问题,也就是某种 verifiable 的领域;但你环顾四周会想,天哪,那些实验室在最显而易见的方向上——比如 math、coding 等——似乎已经真的开始达到 escape velocity(逃逸速度)了。对于在场的创业者,你会给出什么建议?
Speaker 214:05 - 14:38
So I think maybe that comes to the previous question of, I do think that verifiability because it Let me think. So verifiability makes something tractable in the current paradigm because you can throw a huge amount of RL at it. So maybe one way to see it is that that remains true even if the labs are not focusing on it directly. So if you are in a verifiable setting where you could create these RL environments or examples, then that actually sets you up to potentially do your own fine tuning and you might benefit from that. But that is fundamentally technology that just works.
Speaker 214:05 - 14:38
所以我觉得,这大概可以回到前一个问题。我确实认为 verifiability——让我想想。可验证性之所以重要,是因为在当前范式下,它会让一个问题变得 tractable,因为你可以往里面投入海量 RL。所以也许一种理解方式是:即使实验室没有直接聚焦在这个方向上,这一点依然成立。也就是说,如果你所在的是一个 verifiable 的场景,你能够构造出这些 RL 环境或示例,那么这实际上会让你有机会去做自己的 fine tuning,而且你可能会从中受益。但归根结底,这是一种本身就有效的技术路线。
Speaker 214:38 - 15:05
You can pull a lever. If you have huge amount of diverse data sets of RL environments, etcetera, you can use your favorite fine tuning framework and and pull the lever and get something that actually works pretty well. So I don't know what the examples of this might be, but I do think there are some very valuable reinforcement learning environments that people could think of that I think are not part of the yeah. I don't wanna give away the answer, but there is one domain that I think is very oh, okay. Sorry.
Speaker 214:38 - 15:05
你可以拉一个杠杆。如果你手上有海量而且多样化的 RL(强化学习)环境数据集,诸如此类,你就可以用你最喜欢的 fine-tuning(微调)框架,拉下这个杠杆,做出实际上效果相当不错的东西。所以我也不确定这里具体会有哪些例子,但我确实认为,有一些非常有价值的 reinforcement learning(强化学习)环境,是大家可以去思考的,而且我觉得这些并不属于那个——嗯,对。我不想直接把答案说出来,但我觉得有一个领域非常——哦,好吧。抱歉。
Speaker 215:05 - 15:09
I don't mean to vague post on on the stage, but there are some examples of this.
Speaker 215:05 - 15:09
我不是想在台上故意说得很模糊,但这类例子确实是有的。
Speaker 115:09 - 15:13
On the flip side, what do you think still feels automatable only from a distance?
Speaker 115:09 - 15:13
反过来说,你觉得哪些事情现在看上去像是可以自动化,但其实只是远看如此?
Speaker 215:14 - 15:41
I do think that ultimately almost everything can be made verifiable to some extent. Some things easier than others. Because even for things like writing or so on, you can imagine having a council of LLM judges and probably get something reasonable from this kind of an approach. So it's more about what's easy or hard. So I do think that ultimately, yeah, I think
Speaker 215:14 - 15:41
我确实认为,最终几乎所有事情在某种程度上都可以被做成可验证的。有些事情比另一些更容易。因为即便是写作这类事情,你也可以设想组建一个由 LLM judges(大语言模型评审)构成的委员会,而且这种方法大概率能给出某种还算合理的结果。所以问题更多在于什么容易、什么困难。所以我确实认为,最终,是的,我觉得
Speaker 115:41 - 15:42
Everything.
Speaker 115:41 - 15:42
一切。
Speaker 215:42 - 15:44
Everything is automatable.
Speaker 215:42 - 15:44
一切都可以被自动化。
Speaker 115:45 - 15:56
Amazing. Okay. So last year, you coined the term vibe coding. And today, we're in a world that feels a little bit more serious, more regentic engineering. What do you think is the difference between the two, and what would you actually call what we're in today?
Speaker 115:45 - 15:56
太惊人了。好,那么去年你创造了 vibe coding 这个说法。而今天,我们所处的世界感觉更严肃一些,更像是 agentic engineering(agent 驱动的工程)。你觉得这两者的区别是什么?你又会如何称呼我们今天所处的这种状态?
Speaker 215:57 - 16:16
Yeah. So I would say vibe coding is about raising the floor for everyone in terms of what they can do in software. So the floor rises, everyone can vibe code anything, and that's amazing, incredible. But then I would say agentic engineering is about preserving the quality bar of what existed before in professional software. So you're not allowed to introduce vulnerabilities due to vibe coding.
Speaker 215:57 - 16:16
对。所以我会说,vibe coding 的意义在于提高每个人在软件方面“能做到什么”的下限。也就是下限被抬高了,每个人都可以用 vibe coding 做任何东西,这很棒,令人惊叹。但我会说,agentic engineering 的重点在于保住此前专业软件开发所存在的质量门槛。也就是说,你不能因为 vibe coding 而引入漏洞。
Speaker 216:18 - 16:36
You're still responsible for your software just as before, but can you go faster? And spoiler is you can, but how do you how do you do that properly? And so to me, agentic engineering, I call it that because I do think it's kind of like an engineering discipline. You have these agents, are these spiky entities. They're a bit fallible, a little bit stochastic, but they are extremely powerful.
Speaker 216:18 - 16:36
你仍然像以前一样要对自己的软件负责,但你能不能做得更快?先剧透一下:可以。但问题是,怎样才能把这件事做对?所以在我看来,agentic engineering(代理式工程)之所以这么叫,是因为我确实觉得它有点像一门工程学科。你有这些 agent,它们像是带刺的实体:有点容易出错,也有一点 stochastic(随机性),但它们又极其强大。
Speaker 216:36 - 16:59
How you coordinate them to go faster without sacrificing your quality bar? Doing that well and correctly is the realm of agentic engineering. So I kind of see them as different. One is about maybe raising the floor and the other is about extrapolating. And what I'm seeing, I think, is there is a very high ceiling on agentic engineer capability.
Speaker 216:36 - 16:59
你要怎样协调它们,在不牺牲质量标准的前提下提速?把这件事做好、做对,就是 agentic engineering 的领域。所以我会把它们看作不同的东西:一个可能是在抬高下限,另一个则是在做外推、把能力向上延展。而我现在看到的是,agentic engineer 的能力上限非常高。
Speaker 217:00 - 17:18
People used to talk about the 10x engineer previously. I think that this is magnified a lot more. 10x is not the speed up you gain. I think it does seem to me like people who are very good at this peak a lot more than 10x from my perspective right now.
Speaker 217:00 - 17:18
以前人们常说 10x engineer。我觉得现在这个效应被放大得多得多。你获得的提速并不只是 10x。就我目前的观察来看,那些非常擅长这件事的人,能力峰值远远不止 10x。
Speaker 117:18 - 17:38
I really like that framing. When Sam Altman came to AISent last year, one memorable thing he said was that people of different generations use ChatGPT differently. So if you're in your 30s, you use it as a Google search replacement. But if you're in your teens, ChatGPT is your gateway to the internet. What is the parallel here in coding today?
Speaker 117:18 - 17:38
我很喜欢这个框架。去年 Sam Altman 来 AISent 的时候,他说过一句让我印象很深的话:不同代际的人使用 ChatGPT 的方式不一样。所以,如果你是 30 多岁,你会把它当成 Google 搜索的替代品;但如果你是十几岁,ChatGPT 就是你通往互联网的入口。那放到今天的编程语境里,对应的平行现象是什么?
Speaker 117:38 - 17:50
If we were to watch two people code using OpenClaw, Cloud Code, Codex, one you'd consider mediocre at it and one you would consider fully AI native. How would you describe the difference?
Speaker 117:38 - 17:50
如果我们去看两个人用 OpenClaw、Cloud Code、Codex 写代码,一个你会觉得水平平平,另一个你会认为是彻底 AI native(AI 原生)的开发者,你会怎么描述他们之间的区别?
Speaker 217:51 - 18:43
I mean, think it's just trying to get the most out of the tools that are available, utilizing all of their features, investing into your own setup. So just like previously, all the engineers are used to basically getting the most out of the tools you use, either it's VIM or Versus Code or now it's ClothCode or Codecs or so on. Just investing into your setup and utilizing a lot of the tools that are available to you, and I think it just kind of looks like that. I do think that maybe related thought is a lot of people are maybe hiring for this because they want to hire strong agentic engineers. I do think that what I'm seeing is that most people have still not refactored their hiring process for AgenTeq engineer capability.
Speaker 217:51 - 18:43
我的意思是,本质上就是尽可能把现有工具的价值榨出来,用上它们的所有功能,并且愿意投入到你自己的 setup(工作配置)里。就像以前一样,工程师们本来就习惯于尽量发挥手头工具的最大效用,不管是 VIM、Versus Code,还是现在的 ClothCode、Codecs 等等。就是投入你的 setup,充分利用你能用到的各种工具,我觉得大致就是这个样子。我确实也觉得,另一个相关的想法是,很多人现在也许是在为这个能力招人,因为他们想招强的 agentic engineers。但据我所见,大多数人仍然还没有围绕 AgenTeq engineer 的能力重构他们的招聘流程。
Speaker 218:43 - 19:17
If you're giving out puzzles to solve, then this is still the old paradigm. I would say that hiring has to look like, give me a really big project and see someone implement that big project. Let's write, say, a Twitter clone for agents and then make it really good, make it really secure, and then have some agents simulate some activity on this Twitter. And then I'm gonna use 10 codecs 5.4 x high to try to break your website break that you deployed. They're going to try to basically break it and they should not be able to break it.
Speaker 218:43 - 19:17
如果你给候选人一些谜题式题目去解,那这仍然是旧范式。我会说,招聘应该更像是:给我一个真正很大的项目,看这个人能不能把它实现出来。比如说,我们来做一个给 agents 用的 Twitter clone,然后把它做好,做得非常安全,再让一些 agents 在这个 Twitter 上模拟活动。接着,我会用 10 个 codecs 5.4 x high 去尝试攻破你部署的网站;它们会想尽办法把它搞崩,而它们本不应该能成功。
Speaker 219:17 - 19:28
Maybe it looks like that, right? Watching people in that setting and building bigger projects and utilizing the tooling is maybe what I would look at for the most part.
Speaker 219:17 - 19:28
也许它看起来就应该是这样,对吧?在那样的环境里观察人们,看看他们如何构建更大的项目、如何利用这些 tooling(工具链),这大概就是我主要会关注的东西。
Speaker 119:28 - 19:34
And as agents do more, what human skill do you think becomes more valuable, not less?
Speaker 119:28 - 19:34
随着 agent 能做的事情越来越多,你觉得哪种人类技能会变得更有价值,而不是更不值钱?
Speaker 219:34 - 19:57
So, yeah, it's a good question. I think Well, right now the answer is that the agents are cataloged these intern entities, right? So it's remarkable. You basically still have to be in charge of the aesthetics, the judgment, the taste, and a little bit of oversight. Maybe one of my favorite examples of the weirdness of agents is for MenuGen.
Speaker 219:34 - 19:57
所以,是的,这是个好问题。我想,嗯,就目前来说,答案是这些 agents 就像被编入目录的 intern(实习生)实体,对吧?所以这很了不起。基本上你仍然得负责 aesthetics(审美)、judgment(判断力)、taste(品味),以及一点点 oversight(监督)。也许我最喜欢的一个能体现 agents 奇怪之处的例子,是在 MenuGen 里。
Speaker 219:57 - 20:30
You sign up with a Google account, but you purchase credits using a Stripe account, and both of them have email addresses. And my agent actually tried to basically When you purchase credits, it assigned it using the email address from Stripe to the Google email address. There wasn't a persistent user ID for people. It was trying to match up the email addresses, but you could use different email addresses for your Stripe and your Google and basically would not associate the funds. And so this is the kind of thing that these agents still will make mistakes about.
Speaker 219:57 - 20:30
你可以用 Google account 注册,但购买 credits(额度)时用的是 Stripe account,而这两者都有 email address。我那个 agent 实际上试图这样处理:当你购买 credits 时,它会把来自 Stripe 的 email address 关联到 Google 的 email address。用户并没有一个持久的 user ID。它是在尝试匹配这些 email address,但你的 Stripe 和 Google 完全可以使用不同的 email address,这样一来资金基本上就不会被正确关联起来。所以这就是这类 agents 现在仍然会犯错的地方。
Speaker 220:30 - 20:44
It's like, why would you use email addresses to try to cross correlate the funds? They can be arbitrary. You can use different emails, etc. This is such a weird thing to do. I think people have to be in charge of this spec, this plan, and I actually don't even like the plan mode.
Speaker 220:30 - 20:44
这就像是在说,为什么你会用 email address 来尝试交叉关联这些资金呢?它们本来就是任意的。你可以用不同的邮箱,等等。这真是件很奇怪的事。我认为人还是必须负责这个 spec(规格说明)、这个 plan(方案),而且其实我甚至都不喜欢 plan mode。
Speaker 220:46 - 21:22
Obviously it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed, and maybe basically the docs, and then get the agents to write them. You're in charge of the oversight and the top level categories, but the agents are doing a lot of the under the hood. So I think you're not caring about some of the details. As an example, also with arrays or tensors in neural networks, there's a ton of details between PyTorch and NumPy and all the different Pandas and so on, for all the different little API details. I already forgot about the keepdims versus keepdim, or whether it's dim or axis or reshape or permute or transpose.
Speaker 220:46 - 21:22
它当然非常有用,但我觉得这里还有个更普遍的点:你必须和你的 agent 一起设计一个非常详细的 spec,也许本质上就是 docs(文档),然后让 agents 去把它们写出来。你负责 oversight 和顶层分类,但 agents 在底层做了大量工作。所以我觉得,你将不会再去关心某些细节。再比如,在 neural networks(神经网络)的 arrays(数组)或 tensors(张量)里,PyTorch、NumPy、各种 Pandas 等等之间有一大堆细节,涉及各种零碎的 API 细节。我已经记不清 keepdims 和 keepdim 的区别了,也记不清到底是 dim 还是 axis,还是 reshape、permute 或 transpose。
Speaker 221:22 - 21:51
Don't remember this stuff anymore, right? Because you don't have to. This is the kind of details that are handled by the intern because they have very good recall. But you still have to know, for example, that there's underlying tensor, there's an underlying view, and then you can manipulate view of the same storage, or you can have different storage which would be less efficient. And so you still have to have an understanding of what this stuff is doing and some of the fundamentals so that you're not copying memory around unnecessarily and so on, but the details of the APIs are not handed off.
Speaker 221:22 - 21:51
这些东西已经不用再记了,对吧?因为你没这个必要。这类细节就该交给 intern 去处理,因为它们的 recall(记忆/检索能力)非常好。但你还是得知道,比如说,底层有 tensor,也有底层的 view;你可以操作同一块 storage(存储)的 view,也可以使用不同的 storage,但那样效率会更低。所以你仍然必须理解这些东西在做什么,理解一些 fundamentals(基础原理),这样你才不会不必要地来回复制内存,等等;但 API 的那些细节已经可以交出去了。
Speaker 221:51 - 22:13
So you're in charge of the taste, the engineering, the design, and that it makes sense, and that you're asking for the right things, and that you're saying that, okay, these have to be unique user IDs that we're going to tie everything to. And so you're doing some of the design and development, and the engineers are doing the fill in the blanks. And that's currently kind of like where we are, and I think that's what everyone, of course, is seeing, I think, right now.
Speaker 221:51 - 22:13
所以你负责的是 taste、engineering(工程判断)、design(设计),以及确保事情讲得通,确保你提的要求是对的,确保你会说:好,这些必须是唯一的 user ID,我们要用它把所有东西都关联起来。所以你仍然在做一部分设计和开发,而 engineers(工程师)是在负责把空白补上。这大概就是我们现在所处的位置,我想这也是现在每个人当然都能看到的情况。
Speaker 122:13 - 22:20
Do you think there's a chance that this taste and judgment matters less over time, or will the ceiling just keep rising?
Speaker 122:13 - 22:20
你觉得随着时间推移,这种 taste 和 judgment 的重要性会不会下降,还是说上限只会不断提高?
Speaker 222:21 - 22:52
Yeah, it's a good question. I would say I mean, I'm hoping that it improves. I think probably the reason it doesn't improve right now is, again, it's not part of the RL. There's probably no aesthetics cost or reward or it's not good enough or something like that. I do think that when you actually look at the code, sometimes I get a little bit of a heart attack because it's not like super amazing code necessarily all the time, it's very bloaty, there's a lot of copy paste, and there's awkward abstractions that are brittle, and it works, but it's just really gross.
Speaker 222:21 - 22:52
是啊,这是个好问题。我的意思是,我希望它会变好。我觉得它现在之所以还没有改善,原因大概还是因为这不属于 RL(强化学习)的一部分。可能根本没有什么 aesthetics(美感)方面的成本或奖励,或者这部分做得还不够好,诸如此类。我确实觉得,当你真的去看代码时,有时候我会有点心脏病发作的感觉,因为那并不总是什么特别出色的代码,往往非常 bloated(臃肿),有很多 copy paste(复制粘贴),还有一些很别扭、很脆弱的 abstractions(抽象层);它能运行,但就是非常粗糙、很难看。
Speaker 222:54 - 23:15
And I do hope that this can improve in future models. A good example also is this micro GPT project, which where I was trying to simplify LLM training to be as simple as possible. The models hate this. They can't do it. I tried to I keep I kept trying to prompt an LLM to simplify more, more, and it just can't you feel like you're outside of the RL circuits.
Speaker 222:54 - 23:15
而且我确实希望未来的模型能在这方面改进。另一个很好的例子是这个 micro GPT 项目,我当时想把 LLM(大语言模型)训练尽可能简化到最简单的程度。模型特别不擅长这个,它们做不到。我一直在试着 prompt(提示)一个 LLM 进一步简化、再简化,但它就是做不到,你会感觉自己像是处在 RL 回路之外。
Speaker 223:15 - 23:29
It feels like you're obviously you know, you're pulling teeth. It's not like light speed. So I do think that people still remain in charge of this, but I do think that there's nothing fundamental again that's preventing it. It's just the labs haven't done it yet, almost.
Speaker 223:15 - 23:29
那种感觉很明显,就是你知道,你是在硬拔牙,不是什么光速推进。所以我确实认为,人类仍然掌握着主导权;但我也确实觉得,从根本上说并没有什么东西在阻止这件事发生。几乎只是这些 labs(实验室)还没做到而已。
Speaker 123:30 - 23:59
So I'd love to come back to this idea of jagged forms of intelligence. You wrote a little bit about this with a very thought provoking piece around animals versus ghosts. And the idea is that we're not building animals. We are summoning ghosts. And these are jagged forms of intelligence that are shaped by data and reward functions, but not by intrinsic motivation or fun or curiosity or empowerment, things that came about via evolution.
Speaker 123:30 - 23:59
所以我很想回到这个“锯齿状智能形态”的想法。你写过一点这方面的内容,其中有一篇非常发人深省,讨论的是 animals(动物) versus ghosts(幽灵)。这个想法是:我们不是在构建动物,我们是在召唤幽灵。它们是由数据和 reward functions(奖励函数)塑造出来的锯齿状智能形态,而不是由 intrinsic motivation(内在动机)、fun(乐趣)、curiosity(好奇心)或 empowerment(掌控感)这些经由进化形成的东西塑造出来的。
Speaker 124:00 - 24:07
Why does that framing matter? And what does it actually change about how you build and deploy and evaluate or even trust them?
Speaker 124:00 - 24:07
为什么这种 framing(框架化方式)很重要?它究竟会如何改变你构建、部署、评估,甚至信任它们的方式?
Speaker 224:08 - 24:43
Yeah, I think the reason I wrote about this is because I'm trying to wrap my head around what these things are, right? Because if you have a good model of what they are or are not, then you're going to be more competent at using them. Not sure if it actually has real power. I think it's a little bit of philosophizing, but I do think that think it's just coming to terms with the fact that these things are not animal intelligences. Like if you'll yell at them, they're not going to work better or worse or it doesn't have any impact.
Speaker 224:08 - 24:43
是的,我觉得我之所以写这个,是因为我在试着弄明白这些东西到底是什么,对吧?因为如果你对它们是什么、或者不是什么,有一个好的模型,那么你在使用它们时就会更有能力。我不确定这是否真的具有很强的现实效力。我觉得这多少有点哲学化,但我确实认为,这本质上是在接受这样一个事实:这些东西不是动物式智能。比如说,你冲它们大喊大叫,它们并不会因此工作得更好或更差,或者说那根本不会产生什么影响。
Speaker 224:45 - 25:15
And it's all just kind of like these statistical simulation circuits where the substrate is pre training, so like statistics. But then there's RL bolting on top, so it kind of like increases the appendages. And maybe it's just kind of like a mindset of what I'm coming into or what's likely to work or not likely to work or how to modify it. But I don't actually I don't know that I have like, here are the five obvious outcomes of how to make your system better. It's more just being suspicious of it and figuring it out over time.
Speaker 224:45 - 25:15
它整体上更像是一种 statistical simulation circuits(统计式模拟电路),其底层 substrate(基底)是 pre training(预训练),也就是统计规律;然后上面又 bolt on(外挂)了一层 RL,所以它有点像是长出了更多 appendages(附肢)。也许这更多只是我进入这个问题时的一种 mindset(思维方式):什么可能有效,什么不太可能有效,或者该怎么修改它。但我其实并没有那种“这里有五个显而易见的结果,能让你的系统变得更好”的结论。更多只是对它保持怀疑,然后随着时间推移一点点摸索清楚。
Speaker 125:16 - 25:24
That's where it starts. Okay. So you are so deep in working with agents that don't just chat. They have real permissions. They have local context.
Speaker 125:16 - 25:24
事情就是从这里开始的。好。那么你现在已经非常深入地在和 agents(智能体)打交道了,而且它们不只是聊天而已。它们有真实的 permissions(权限),也有本地上下文。
Speaker 125:24 - 25:30
They actually take action on your behalf. What does the world look like when we all start to live in that world?
Speaker 125:24 - 25:30
它们实际上会代表你采取行动。当我们都开始生活在那样一个世界里时,这个世界会是什么样子?
Speaker 225:31 - 25:53
Yeah. I think I think a lot of people probably here are excited about what this agentic you know, native agentic environment looks like, and everything has to be rewritten. Everything is still fundamentally written for humans and has to be moved around. I still use most of the time when I use different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans. This is my favorite pet peeve.
Speaker 225:31 - 25:53
对。我想这里很多人可能都对这种 agentic(agent 驱动的)、原生 agentic 的环境会是什么样子感到兴奋,而且一切都得重写。现在的所有东西在根本上仍然是为人类写的,必须重新调整。我现在大多数时候在使用各种 framework、library 之类的东西时,它们的 docs(文档)本质上也还是写给人看的。这是我最受不了的一点。
Speaker 225:53 - 26:31
Like, don't why are people still telling me what to do? Like, I don't wanna do anything. What is the thing I should copy paste to my agent? Like, so it's just every time I'm told, you know, go to this URL or something like that, it's just like, oh, you know? So everyone is, I think, excited about how do we decompose the workloads that need to happen into fundamentally sensors over the world, actuators over the world, how do we make it agent native, basically describe it to agents first, and then have a lot of automation around data structures that are very legible to the LLMs.
Speaker 225:53 - 26:31
比如,为什么人们还在告诉我该怎么做?我根本不想自己做任何事。我应该复制粘贴给我的 agent 的东西到底是什么?所以每次别人跟我说,去这个 URL 之类的,我都会觉得,啊,你知道吧?所以我想,大家现在都很兴奋的一点是:我们该如何把必须发生的工作负载拆解成面向世界的 sensors(传感器)和 actuators(执行器);我们该如何让它变成 agent native,基本上先描述给 agents 听;然后围绕那些对 LLMs 来说非常清晰易读的数据结构,建立大量自动化。
Speaker 226:32 - 27:22
So I think, yeah, I'm hoping that there's a lot of agent first infrastructure out there. And that, you know, for MenuGen, famously, when I wrote the not I'm not sure how famously, but when I wrote the blog post about MenuGen, a lot of the work or a lot of the trouble was not even writing the code for MenuGen, it was deploying it in Vercel because I had to work with all these different services and I had to string them up and I had to go through their settings and the menus and, you know, configure my DNS and it was just so annoying. And so that's a good example of I would hope that MenuGen that I could give a prompt to an LLM, build MenuGen, and that I didn't have to touch anything and it's deployed in that same way on the Internet. I think that would be a good kind of a test for whether or not a lot of our infrastructure is becoming more and more agent native. And then ultimately, I would say, yeah, I do think we're going towards a world where there's agent representation for people and for organizations.
Speaker 226:32 - 27:22
所以我想,是的,我希望外面会有很多 agent first 的基础设施。而且,你知道的,就拿 MenuGen 来说——说“著名地”可能也不一定算著名——但当我写那篇关于 MenuGen 的博客时,很多工作或者很多麻烦甚至都不在于写 MenuGen 的代码,而是在 Vercel 上部署它,因为我得和所有这些不同的服务打交道,我得把它们串起来,我得去翻它们的设置和菜单,还得配置我的 DNS,真的非常烦人。所以这是一个很好的例子:我会希望对于 MenuGen,我只需要给一个 LLM 一条 prompt,告诉它构建 MenuGen,而我什么都不用碰,它就能以同样的方式部署到 Internet 上。我觉得,这会是一个很好的测试,用来判断我们的很多基础设施是否正在变得越来越 agent native。再往后说,最终我确实认为,我们正走向一个世界:无论个人还是组织,都会有自己的 agent representation(agent 代理表示)。
Speaker 227:23 - 27:38
And I'll have my agent talk to your agent to figure out some of the details of our meetings or things like that. So I do think that that's roughly where things are going, but yeah, I think everyone here is excited about that.
Speaker 227:23 - 27:38
然后我会让我的 agent 去和你的 agent 交流,敲定一些会议细节之类的事情。所以我确实觉得,大致上事情会朝这个方向发展,不过是的,我想在场的每个人都对此很兴奋。
Speaker 127:38 - 28:05
I really like the visual analogy of sensors and actuators. I actually hadn't thought of that. That's super interesting. Okay, I think we have to end on a question about education, because you are probably one of the very best in the world at making complex technical concepts simple and deeply thoughtful about how we design education around it. What still remains worth learning deeply when intelligence gets cheap as we move into the next era of AI?
Speaker 127:38 - 28:05
我真的很喜欢你用 sensors 和 actuators 做的这个视觉类比。我之前其实没这么想过。这特别有意思。好,我想我们得用一个关于教育的问题来收尾,因为你可能是这个世界上最擅长把复杂技术概念讲简单的人之一,而且你对我们该如何围绕这些内容设计教育也有非常深入的思考。当 intelligence(智能)变得廉价、我们迈入 AI 的下一个时代时,还有哪些东西仍然值得被深入学习?
Speaker 228:05 - 28:15
Yeah. There was a tweet that blew my mind recently, I keep thinking about it like every other day. It was something along the lines of, you can outsource your thinking, but you can't outsource your understanding.
Speaker 228:05 - 28:15
对。最近有一条 tweet 让我非常震撼,我几乎隔一天就会想起它一次。它大概是这么说的:你可以外包你的思考,但你无法外包你的理解。
Speaker 128:17 - 28:19
I think that's really nicely put.
Speaker 128:17 - 28:19
我觉得这话说得特别好。
Speaker 228:19 - 28:56
Because I'm still part of the system and still has to make it into my brain and I feel like I'm becoming a bottleneck of just even knowing what are we trying to build, why is it worth doing, how do I direct my agents and so on. I do still think that ultimately something has to direct the thinking and the processing and so on. That's still kind of fundamentally constrained somehow by understanding. And this is one reason I also was very excited about all LM knowledge bases, because I feel like that's a way for me to process information. And anytime I see a different projection onto information, I always feel like I gain insight.
Speaker 228:19 - 28:56
因为我仍然是这个系统的一部分,信息仍然得进入我的大脑,而我感觉自己正在变成一个 bottleneck(瓶颈):哪怕只是弄清楚我们到底想构建什么、为什么这件事值得做、我该如何指挥我的 agents,等等。我依然认为,归根结底,还是需要某种东西来引导思考和处理过程等等。而这在某种根本意义上,仍然受限于理解能力。这也是为什么我也对所有 LM knowledge bases 特别兴奋,因为我觉得那是我处理信息的一种方式。而且每当我看到对信息的不同投影方式时,我总会觉得自己获得了新的洞见。
Speaker 228:56 - 29:32
So it's really just a lot of prompts for me to do synthetic data generation kind of over some fixed data. Whenever I read an article, have my wiki that's being built up from these articles, and I love asking questions about things. Think that ultimately these are tools to enhance understanding in a certain way, and this is still kind of like a bit of a bottleneck because then you can't direct the you can't be a good director if you still because the L1s certainly don't excel at understanding. You still are uniquely in charge of that. So yeah, I think tools to that effect, I think, are incredibly interesting and exciting.
Speaker 228:56 - 29:32
所以对我来说,这其实就是基于某些固定数据,使用大量 prompts 来做一种 synthetic data generation(合成数据生成)。每当我读一篇文章时,我都会把它纳入我那个由这些文章不断构建起来的 wiki,而且我特别喜欢就这些内容提问题。我认为,归根结底,这些都是以某种方式增强理解的工具,而这仍然有点像一个 bottleneck,因为如果你还做不到真正理解,你就无法去指挥——你就不可能成为一个好的指挥者,因为 L1s 显然并不擅长理解。这方面仍然是你独特需要负责的。所以,是的,我觉得朝这个方向发展的工具都极其有趣,也令人兴奋。
Speaker 129:32 - 29:42
I'm excited to be back here in a couple years and to see if we've been fully automated out of the loop, and they actually take care of understanding as well. Thank you so much for joining us, Andre.
Speaker 129:32 - 29:42
我很期待几年后再回到这里,看看我们是否已经被完全自动化地移出了 loop(环路),以及它们是否连“理解”这件事也一并接管了。非常感谢你今天加入我们,Andre。
Speaker 229:42 - 29:42
Thank you.
Speaker 229:42 - 29:42
谢谢。
Speaker 129:42 - 29:42
We really appreciate it.
Speaker 129:42 - 29:42
我们非常感激。
原文 ↗https://www.youtube.com/playlist?list=PLOhHNjZItNnMm5tdW61JpnyxeYH5NDDx8
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers