BuildSpeak每日 builder 文摘
今日归档生词本关于
🎙 播客AI & I by Every· 2026 年 6 月 10 日· 11,630 词 · 约 58 分钟

How Anthropic Uses Claude Fable 5 With Mike Krieger

SPACE 播放 / 暂停·←→ 上一句 / 下一句
Speaker 100:04 - 00:07
Mike, welcome to the show. Great to be here, Dan. Good to see you.
Speaker 100:04 - 00:07
Mike,欢迎来到节目。很高兴来到这里,Dan。见到你真好。
Speaker 200:07 - 00:17
So for people who don't know you, you're the head of Anthropic Labs, and you're the cofounder of Instagram. And today, I wanna talk to you about is Fable five. So Fable five is dropping tomorrow. We're recording this the day before. It's gonna come out after it drops.
Speaker 200:07 - 00:17
那么,对于不认识你的人,你是 Anthropic Labs 的负责人,也是 Instagram 的联合创始人。今天我想和你聊的是 Fable five。Fable five 明天就要发布了。我们录制这期内容是在前一天。节目会在它发布之后上线。
Speaker 200:17 - 00:39
But what I really wanted to do is bring you on the show to tell me about what it's like to use this model beyond the first day. I think when a model this powerful drops, it's so useful to have someone who's using it day in and day out to tell you, this is where it's powerful. This is how what it actually changes. This is what it doesn't change. So that you're you kind of like don't you kind of don't get the same AI psychosis type thing.
Speaker 200:17 - 00:39
但我真正想做的,是请你上节目,告诉我在第一天之后继续使用这个 model(模型)到底是什么感觉。我觉得,当一个这么强大的 model 发布时,如果能有一个日复一日都在使用它的人来告诉你:它在哪些地方很强,它实际改变了什么,它没有改变什么——这会非常有用。这样你某种程度上就不会陷入那种 AI psychosis 式的状态。
Speaker 200:39 - 00:43
You can actually think about, okay, like, this is how it fits into my life.
Speaker 200:39 - 00:43
你就能真正去思考,好吧,它是怎样融入我的生活的。
Speaker 100:43 - 00:58
Yeah. Absolutely. And and it's also just been interesting. You know, we've had some, you know, models in this, you know, mythos class leading up to the Fable release, you know, for a couple of months now. And it's I think it's very exciting to see how people will build with this externally.
Speaker 100:43 - 00:58
对,完全同意。而且这也一直很有意思。你知道,在 Fable 发布前的这几个月里,我们已经有一些属于这个 mythos class 的 model 了。我觉得,看到外部的人会如何基于它来构建东西,会非常令人兴奋。
Speaker 100:58 - 01:21
But I think you're also right that day one impressions, I think it really comes from getting to use this over a couple of weeks. I think we've seen that even with previous models, like the December into January usage, was an OPUS four five or OPUS four six. So it was really important because people spend extended time on the model and then figure it out, oh, actually, wasn't pushing it hard enough. I gotta go further. I gotta rethink what's even possible with this generation.
Speaker 100:58 - 01:21
但我也觉得你说得对,第一天的印象终究有限,真正的判断还是来自连续几周的使用。我想我们在之前的 model 上也看到过这一点,比如从 12 月到 1 月那段时间的使用——不管是 OPUS four five 还是 OPUS four six——都非常重要,因为人们会在这个 model 上投入更长时间,然后才意识到:哦,原来我之前还没有把它逼到极限,我得再往前走一步,我得重新思考这一代到底什么才是可能的。
Speaker 201:21 - 01:43
Totally. I mean, I don't know. I feel like there are people internally at every who have been using it, who have been like, oh my god, I think I kind of need a new set of skills to use this model. And I think you can especially see this with people who are maybe more nontechnical internally and who are more on the knowledge work side of things where they're like, I don't even know what I would use this for. And the people who are orchestrating agents are like, holy shit, I feel like there's so many new things I need to learn.
Speaker 201:21 - 01:43
完全是这样。我的意思是,我也说不好,但我感觉内部到处都有人在用它,而且他们会有一种“天啊,我好像需要一套新的技能来使用这个 model”的感觉。我觉得这一点在那些内部相对更非技术、更加偏 knowledge work(知识工作)的人身上尤其明显;他们会想,我甚至都不知道我能拿它来做什么。而那些在编排 agent(智能体)的人则会觉得,holy shit,我好像有太多新东西需要学习了。
Speaker 201:43 - 01:47
So I'm curious for you, tell us about the difference between your impression when you first tried it and now.
Speaker 201:43 - 01:47
所以我很好奇,跟我们讲讲你第一次试用它时的印象,和现在相比有什么不同。
Speaker 101:47 - 02:14
Yeah. I think that your point on on adapting workflows is a really good one. Quite literally, workflows, I'll talk about that in a second, but also just in terms of, like, how do I, like, think about usage of the model? Because, you know, at first, the the timing was interesting because it kinda coincided with me transitioning from CPO into labs and going really back into into builder mode. I think it was about a month and a half or two months into that that we first had, you know, one of these models available internally.
Speaker 101:47 - 02:14
是的。我觉得你关于适配 workflows(工作流)的观点非常好。真的是字面意义上的 workflows——我等下会讲这个——但也包括更根本的一点,比如,我该如何思考这个 model(模型)的使用方式?因为,你知道,一开始那个时间点很有意思,它差不多正好和我从 CPO 转到 labs、重新回到 builder mode(构建者模式)的阶段重合。我记得大概是在那之后一个半月到两个月,我们第一次在内部拿到了这类 models 之一可用。
Speaker 102:14 - 02:41
And I I sat there, and I and I was like, I feel like a total newbie again because I feel like the way that I am prompting or even thinking about decomposing a task is really out of date now with this model. Like, it's no longer and it's even thinking about the time horizon or the sort of, like, interactivity model, I think, has to evolve as well. Like, going from, I think, early on would be like, I have an idea for this feature. Can we start by like, absolutely not. Right?
Speaker 102:14 - 02:41
然后我坐在那里,我就在想,我感觉自己又成了一个彻头彻尾的新手,因为我觉得我 prompting(写提示)的方式,甚至我思考如何拆解任务的方式,面对这个 model 都已经很过时了。比如,它已经不再是那样了;甚至连时间跨度,或者说那种 interactivity model(交互模式),我觉得也必须跟着演进。就像,早期可能会是这样:我有个关于这个 feature(功能)的想法。我们能不能先从——完全不是这样了,对吧?
Speaker 102:41 - 02:59
To great. Like, let me express more of the intent. And then just being you know, I remember, like, you know, know, March, April be like, wow. On the one shot, it's already incredibly impressive. But then it also understands the intent around how we're gonna evolve this and understands, like, the global context as well.
Speaker 102:41 - 02:59
而是变成了:太好了,让我更多地表达我的 intent(意图)。然后你就会发现——我记得大概在三四月的时候——会觉得,哇,光是 one shot(单次直接生成)就已经强得惊人了。但与此同时,它还理解我们接下来要如何演进这件事背后的 intent,也理解整体的 global context(全局上下文)。
Speaker 102:59 - 03:25
So I think that's been a really interesting evolution till now where, you know, I was funny. I was talking to somebody this morning where, you know, I think about doing work. I had a flight, and I was like, okay. I can do most of this work remotely. And I don't even worry that, like, the Wi Fi is gonna drop out because I know that if I set up the right, you know, context instructions, like, flash loop, you know, I'll see it it'll see it through.
Speaker 102:59 - 03:25
所以我觉得直到现在,这一直是个非常有意思的演变。今天早上我还在和别人聊这个,挺有意思的:我现在思考做工作的方式,已经变了。比如我之前要坐飞机,我就想,行,这里面大部分工作我都可以远程完成。我甚至都不担心 Wi Fi 会断,因为我知道,只要我把正确的 context instructions(上下文指令)设好,比如 flash loop,它就会把事情一路做到底。
Speaker 103:25 - 03:56
And and I think my last two months have been full of a lot of times where I will, you know, wish Claude a good night, set it up on, like, a pretty complex task of something of this, like, monoclast, and wake up to you know, actually, it's usually done by, like, two in the morning, and I guess it just totals its thumbs for the next four hours. But, like, really impressive ability to, like, complete the swing, get itself out of the situation where it's like, okay. Alright. Well, Mike asked me to do this complex task overnight. I got stuck because this remote service went down.
Speaker 103:25 - 03:56
而且我觉得,我过去两个月里有很多次都是这样:我会祝 Claude 晚安,然后把一个相当复杂的任务交给它,比如某种这种 monoclast,然后等我醒来——其实通常凌晨两点左右它就已经做完了,我猜接下来四个小时它大概就是在干等——但它那种把整套动作完整做完的能力真的很惊人;它还能自己从这种局面里脱困:好吧,Mike 让我 overnight(隔夜)做这个复杂任务,结果我卡住了,因为某个 remote service(远程服务)挂了。
Speaker 103:56 - 04:09
I'm gonna write a, like, scaffolded, like, back end for it for now. So I'll, you know, I'll document that. I'll, you know, go all the way through. I have a, like, good mental model of, like, how far that's gonna get me. And then when it comes back online, I'll fix it.
Speaker 103:56 - 04:09
那我现在就先给它写一个 scaffolded(脚手架式的)back end(后端)顶上。所以我会把这件事记录下来,我会把整个流程继续推进到底。我对这样做大概能把我带到哪一步,有一个很好的 mental model(心智模型)。然后等那个服务恢复上线,我再把它修好。
Speaker 104:09 - 04:41
I'll keep track of that fact. It's just, like, it it is I think the most impressive thing for me is, like, you're just being able to, like, delegate that kind of level of task and just trust that the right thing will happen by the end. And, course, like, you'll review the result, there's still, like, a whole verification thing that we should we can and should talk about because I think that's an important part of still completing the the the swing there. But it's really forced me to rethink, like, what is being productive with one of these models look like? And it it is much more like, we've talked for a while about, you know, like, is it like when these models are more of, a companion or a coworker?
Speaker 104:09 - 04:41
我会把这个事实持续追踪下去。对我来说,最令人印象深刻的其实就是:你真的能够把这种级别的任务委托出去,并且相信到最后事情会以正确的方式完成。当然,你还是会 review(审查)结果;这里仍然有一整套 verification(验证)的问题,我们可以也应该谈,因为我觉得那仍然是把整套动作真正完成的重要一环。但这的确迫使我重新思考:和这类 models 一起工作时,所谓高生产力到底是什么样子?而且它现在已经越来越像——我们之前聊过很久——这些 models 到底更像一个 companion(陪伴者)还是 coworker(同事)?
Speaker 104:41 - 04:46
And it really feels like now it's, like, a teammate that I can delegate, like, a lot of work to.
Speaker 104:41 - 04:46
而现在它真的更像是一个 teammate(队友),我可以把大量工作直接委托给它。
Speaker 204:46 - 05:10
And what is your what is your day to day flow like right now? Because one of the things I noticed is if you if you just give it a big task and you monologue into it and you just like let it go for a few hours or overnight, it's like the most impressive model that I've ever tried. But you know, it's so slow and it's so expensive that you you I I feel like I don't wanna use it for day to day tasks. So what is your actual flow like in terms of how you use it day to day, and where does it slot in versus other models?
Speaker 204:46 - 05:10
你现在日常的工作流到底是什么样的?因为我注意到的一件事是,如果你直接给它一个大任务,然后一路对它口述,把想法全都灌进去,再让它自己跑上几个小时甚至一整夜,那它是我用过最令人印象深刻的模型。但你也知道,它太慢了,而且太贵了,所以我觉得我并不想把它用于日常任务。所以你实际每天是怎么用它的?和其他模型相比,它在你的工作流里是怎么嵌进去的?
Speaker 105:10 - 05:49
Yeah. I've ended up having a lot more architectural planning conversations upfront with it as well. So that's been, like, another interesting change where and I think this is an area that I think all models need to continue to improve, and I'm really grateful for the Instagram experience of having to, like, start, you know, from our initial version that was, like, duct taped on a server in LA to, like, being able to scale it and eventually integrate it with, like, all of, like, the Facebook infrastructure. Because you kind of develop a sense of what what infra abstractions and complexity are appropriate for each stage of it. And I I still don't always go back and forth with Fable where it'll be like, oh, this is a good, you know, implementation.
Speaker 105:10 - 05:49
对,我现在也会在前期和它进行更多关于架构规划的讨论。所以这是另一个挺有意思的变化。我觉得这也是所有模型都还需要继续提升的一个领域。我也非常感激我在 Instagram 的那段经历:从最初那个像是用 duct tape 勉强拼起来、跑在 LA 一台服务器上的初版,一路做到能够扩展规模,最后还能接入整个 Facebook 的基础设施(infrastructure)。因为这样你会逐渐形成一种判断,知道在每个阶段里,什么样的 infra abstraction(基础设施抽象)和复杂度才是合适的。而我现在和 Fable 来回讨论时,也不总是会停在“哦,这个实现不错”这一层。
Speaker 105:49 - 06:13
Like, well, I I do plan on shipping this, like, fairly soon. Like, I think we should probably think about more than one server and kind of like that back and forth Mhmm. Is important. But, like, a lot of that sort of planning and I'll often actually ask it. It's kind of a the the thing I've realized is Fable can, like, be so sort of sort of complete in its thinking in terms of how much you are sort of planning with it.
Speaker 105:49 - 06:13
我会说,嗯,我确实打算比较快就把这个发出去。所以我觉得我们可能应该考虑不止一台服务器,类似这样的来回讨论,嗯,是很重要的。不过,很多这类规划我现在都会直接问它。我意识到的一点是,Fable 在这方面的思考可以非常完整——也就是当你和它一起做规划时,它往往能把事情考虑得相当周全。
Speaker 106:13 - 06:49
Like, often just saying, can you just, like, make an HTML page, like, that represents what we just talked about so I can share it with the team is actually valuable, or even just a markdown document, but I like having diagrams. So that's been an interesting, like, use of, like, let's plan with it. Let's think it through. And then let's have some sort of document that we can align the team on because and this is a dynamic I've seen in labs and just teams beyond Anthropic, which is you can build a lot very quickly and forcing more of that early alignment, even if you do an initial prototype and then back it out into more of a sort of plan architecture, that works too. I think it's really, really, really key.
Speaker 106:13 - 06:49
比如,很多时候我只要说一句:你能不能直接做一个 HTML 页面,把我们刚刚讨论的内容表现出来,这样我可以分享给团队?这其实就很有价值。甚至一个 markdown 文档也行,不过我个人喜欢带图示的东西。所以这是一个很有意思的用法:拿它来做规划,和它一起把事情想清楚,然后产出某种文档,让团队可以围绕它达成一致。因为我在 labs 里、也在 Anthropic 之外的团队里看到过一个共同的动态:你现在可以非常快地构建很多东西,所以更早地强制做这种对齐非常重要。哪怕你先做一个初始 prototype(原型),然后再往回抽象成更偏规划和架构的东西,这也行。我觉得这一点真的、真的、真的非常关键。
Speaker 106:49 - 07:29
And it ends up being ends up being the place from, like, the human to human interaction still stays very, you know, very much part of the process. And then from then on, I think, you know, either overnight or during the the the day, like, having it execute on those chunks of tasks is really important. And it just means having a lot more concurrent sessions than I did before because I often will think, alright, there's there's these these two pieces of work. I go back and forth between liking having one, like, very long running Cloud Code session and really asking it to do everything in background sort of forks sub agents so the main thread stays responsive. And then other times, just embracing, like, I'm just gonna it's one of those days where we're gonna have, like, five or six tabs that, like, tackle, like, long comprehensive work.
Speaker 106:49 - 07:29
这样一来,人和人之间的互动依然会非常明确地保留在流程中。然后从那之后,我觉得无论是隔夜还是白天,让它去执行那些拆分好的任务块都非常重要。这也意味着我现在会同时开比以前更多的并发会话(concurrent sessions),因为我经常会想,好,有这两块工作。我会在两种方式之间来回摇摆:一种是我喜欢保留一个运行时间非常长的 Cloud Code 会话,真的让它去处理所有事情,并在后台分叉出 sub agents(子 agent),这样主线程还能保持响应;另一种则是干脆接受现实:今天就是那种我们要开五六个 tab(标签页),分别去处理那种耗时很长、覆盖面很全的大工作的一天。
Speaker 107:29 - 07:46
But I do think that there's something to this, like, long horizon, like, don't, you know, don't worry. I'm I'm on it. It's gonna take me a while, and, like, more of, this back and forth. And that that modality, I think, is something that we'll have to figure out in our products as well. I think you wanna preserve both, and they they interact with each other in interesting ways.
Speaker 107:29 - 07:46
但我确实觉得,这种“长时程”的模式很有意思——就是那种“别担心,我在处理了,只是会花一点时间”的感觉,以及更多这种来回协作。我觉得这种 modality(交互模式)也是我们接下来必须在产品里想清楚的东西。我认为你会希望把这两种模式都保留下来,而且它们之间会以一些很有意思的方式相互作用。
Speaker 107:46 - 08:10
And, like, my preference is usually I always like having at least one clod that is high context, but also very, very fast response. And, like, its instinct is, great. I'm gonna answer you, and I'll kick something off if I need to. And if not, I'm just gonna, you know, hang tight and and and wait for the next kind of loop. I do think you're right that for the I'm just trying to fix this, you know, interaction question or something that's like very fine detailed.
Speaker 107:46 - 08:10
而且,我自己的偏好通常是:我总希望至少有一个 clod,既拥有很高的上下文(high context),同时响应又非常、非常快。它的本能是:很好,我先回答你;如果有需要,我会顺手启动一些事情;如果不需要,我就先待命,等下一轮循环。我确实觉得你说得对,对于那种“我只是想修一下这个交互问题”或者某些特别细、特别具体的问题来说,情况会不一样。
Speaker 108:11 - 08:33
Like, Fable will go off and think very hard about those things. And I think Fable is the first model where I've actually played more with the effort levels for that reason where I've been like, okay, this is I just needed to, like, tweak some UI. I'm gonna actually gonna fall the like, you know, put it to medium or something and see how that plays out. I didn't find myself doing that as much with the Opus maybe because the range felt less, like, wide, where it really can feel quite wide with Fable.
Speaker 108:11 - 08:33
比如 Fable 就会自己跑去,对这些事情进行非常深入的思考。我觉得也正因为这个原因,Fable 是第一个让我真正更多去调 effort levels(思考强度等级)的模型。我会想,好,这次我其实只是想微调一下 UI,那我就把它调到 medium 之类的,看看效果怎么样。用 Opus 的时候我倒没有这么常做,也许是因为它的范围感没有那么宽;但在 Fable 上,这种差异真的会感觉相当明显。
Speaker 208:34 - 08:46
What about, like, a quick question? Like, you're you're on the go, like, are you asking Fable, you know, random questions as they as they come to you? Because it feels like you're using a rocket launcher to kill a mosquito or something, or are you flipping back and forth?
Speaker 208:34 - 08:46
比如说,一个很快的小问题呢?比如你人在外面、正在移动中时,你会不会想到什么零碎问题就拿去问 Fable?因为那感觉有点像“用火箭筒打蚊子”之类的,还是说你会在不同模型之间来回切换?
Speaker 108:47 - 08:58
It's so funny you asked that because I hadn't been. And, you know, you're like, it's thinking. It's thinking really hard about it. Then since last week, I was like, no. I was asking it something that, like, true.
Speaker 108:47 - 08:58
你这么问太有意思了,因为我之前其实没有这么做。而且你会觉得,它在思考。它真的在非常努力地思考这个问题。后来从上周开始,我就想,不对。我当时是在问它一些那种,确实……
Speaker 108:58 - 09:09
I felt embarrassed actually asking Fable about. It was something like, probably something NBA finals related. And I was like, okay. I switched my iOS app to to SADA. I was like, oh, yeah.
Speaker 108:58 - 09:09
老实说,我甚至会觉得拿这种问题去问 Fable 有点不好意思。大概是某个和 NBA finals 有关的事情吧。我就想,行吧。我把我的 iOS app 切到 SADA 了。我当时就觉得,哦,对。
Speaker 109:09 - 09:51
I you you use this all the time for fast questions. It's like counter of magnitude, like, and feeling of like and it's actually not even the the sort of, like, tokens per second. It's actually probably more around how much thinking goes into the answer, and sometimes, like, the answer does not need to be fully thought through. So, yeah, I I am I'm thinking myself through it, and I think this is a good product question for us too, which is, you know, in general, you don't want people to have to be thinking so much about these choices. So ideally, what we can sort of coalesce around in the longer run is sort of, you know, maybe like some more bucketable use cases that are really grokable to people, or maybe it varies by surface where it's actually probably unlikely that most of the time with the iOS app, I'm doing Fable type tasks, and, you know, having a sticky model selection per surface might be the way to do that.
Speaker 109:09 - 09:51
你其实会一直用这个来处理快问快答。这在量级上、在使用感受上都完全不一样;而且差别其实甚至不只是那种 tokens per second(每秒 token 数),更可能是在于一个答案到底投入了多少“思考”。而有时候,答案并不需要被彻底想透。所以,是的,我自己也还在梳理这个问题。我觉得这对我们来说也是个很好的 product 问题:总的来说,你不希望用户为了这些选择而想太多。所以理想情况下,从长期来看,我们能逐渐收敛到一些更容易归类的 use case(使用场景),而且这些场景对用户来说是非常容易理解、很 grokable 的;或者也可能要按不同 surface(交互界面)来区分——因为很可能在大多数时候,当我在用 iOS app 时,我做的并不是 Fable 类型的任务,而按 surface 保持一个 sticky 的 model selection(固定模型选择)也许会是更合适的做法。
Speaker 109:51 - 10:01
And we'll have to sort of explore what that means from a product perspective. But I've, for sure, have had the feeling of like, this this is not a stable worthy question. I'm I should ask Sonnet.
Speaker 109:51 - 10:01
我们还得继续探索这从 product 的角度到底意味着什么。但我确实有过那种感觉:这个问题不值得动用 stable,我应该去问 Sonnet。
Speaker 210:01 - 10:04
Can you show us something that you've built with it?
Speaker 210:01 - 10:04
你能给我们展示一下你用它做过的东西吗?
Speaker 110:05 - 10:28
Yep. So one of the things that we were we we did this this go around is we encouraged personal sort of like account usage for us, like, especially on the weekends, which was really fun because, you know, we have, you know, you can imagine, like, lot of Nthropics specific, you know, tooling, etcetera. But it was really good to sort of step back and be like and just like, you know, pure Cloud Code. Let's like work on something over the weekend.
Speaker 110:05 - 10:28
可以。我们这一轮做的一件事,是鼓励大家更多做一些个人账号层面的使用,尤其是在周末,这其实很有意思,因为你也能想象,我们内部有很多 Nthropics 特有的 tooling(工具链)之类的东西。但退后一步、单纯地想“就用纯粹的 Cloud Code,周末来做点东西吧”,这种感觉其实特别好。
Speaker 210:28 - 10:30
And you're in you're in the terminal app or you're in the desktop app?
Speaker 210:28 - 10:30
那你是在 terminal app 里,还是在 desktop app 里?
Speaker 110:30 - 10:53
That's a great question. I'm mostly still in the terminal app. It's interesting watching my wife who's like not a professional engineer and more of a UX designer PM, like, really fall in love with Cloud Code via the desktop app, and I think it's, like, sort of simplified some of the the sort of abstractions for her in that way. But for this one, I was still is it ghosty or ghost t t y? Ghosty and and and the terminal app.
Speaker 110:30 - 10:53
这是个很好的问题。我现在大多数时候还是在 terminal app 里。挺有意思的是,我看着我妻子——她不是专业工程师,更偏 UX designer 和 PM——真的通过 desktop app 爱上了 Cloud Code;我觉得对她来说,这种方式某种程度上简化了其中一些抽象层。不过做这个的时候,我还是在用,是叫 ghosty 还是 ghost t t y?Ghosty,以及 terminal app。
Speaker 110:53 - 11:15
But let me show you. I this is one of those, like everybody has some bespoke need around this. Like, I wanted a good sort of media tracker experience, and I was like, you know, I'm playing games. Like, I'm watching TV shows. I get all these recommendations, and I just wanted to build something, like, that was personal to me and, like, sort of fit some of the use cases that I that I had.
Speaker 110:53 - 11:15
但让我给你看看。这类事情属于那种——每个人在这方面都会有一些很个人化、很 bespoke(定制化)的需求。比如我想要一个比较好的 media tracker 体验,我就想,我会玩游戏,也会看电视剧,我会收到各种推荐,所以我只是想做一个属于我自己的东西,能贴合我自己的一些 use case(使用场景)。
Speaker 111:15 - 11:36
And, like, I would like the two biggest criteria that I started with was, like, one, like, really easy to add things. And so, like, you could talk to Claude. Claude does the Gentic search over everything, and then puts the right thing to then. And then also proactively, like, you know, there's a new season or a new, like, sequel to a game that it could go off and and and research those things. Most of the UI was like, you know, Fable one shot, which was which was already impressive.
Speaker 111:15 - 11:36
而且,我一开始最重要的两个标准大概是:第一,添加内容一定要非常容易。比如你可以直接和 Claude 对话,Claude 会对所有内容做 Gentic search,然后把正确的条目放进去。另一个是它还能更主动一些,比如说,出了新一季,或者某个游戏出了新的 sequel,它可以自己去研究这些东西。至于大部分 UI,基本上就是 Fable one shot 做出来的,这本身已经很让人印象深刻了。
Speaker 111:37 - 12:06
But then the the the thread I've been pulling out a lot in labs this year is how do you sort of bring the software team, which is cloud these days, closer to the software itself? And so this was like maybe, you know, Saturday morning. I had a full weekend with with kids stuff, so a lot of this was sort of kick off work, go do you know, go for a hike with the kids, come back, you know, continue to do the work. Sometimes check-in on the work on the hike. I probably shouldn't, but, know, it was, like, nice to, like, pop into remote mode and and see what was going on there, you know.
Speaker 111:37 - 12:06
但接着,我今年在 labs 里一直在反复推进的一条线索是:怎样把 software team——如今很多时候其实就在 cloud 上——拉得更靠近软件本身。所以这个项目大概是,比如某个周六早上开始的。我那个周末其实被孩子的事情排满了,所以很多时候都是那种:先开个工,然后带孩子去 hike,回来再继续做;有时候甚至在 hike 的时候还会 check-in 一下进展。我可能不该这么做,但你知道,能切进 remote mode 看看那边发生了什么,感觉还是挺不错的。
Speaker 112:06 - 12:37
I try not to do that too much. But I had this idea around, hey, like, could you could we like do a spike on I say spike a lot with with these models. I'm like, can we do a spike on, like, what if you could actually modify the software from within itself, which is, you know and it was I built both. It was like a React Native version, and then this version, which is just the web version. So I already had like a chat type thing where you could sort of ask Claude to, you know, add things by URL, which is like, you know, I want every software to have this where I should never have to like navigate a menu to do anything ever again.
Speaker 112:06 - 12:37
我尽量不那么做太多。但我当时有个想法:我们能不能做一个 spike(快速试验)?我在这些 model(模型)上很常说 spike。我想的是,我们能不能试试看:如果你真的可以从软件内部去修改软件本身,会怎么样?这个东西其实我两个版本都做了:一个 React Native 版本,还有这个版本,也就是纯 web 版本。所以我本来就已经有一个类似 chat 的东西,你可以让 Claude 通过 URL 去添加内容;说真的,我希望所有软件都这样,我再也不想为了做任何事情还得去层层点菜单了。
Speaker 112:37 - 12:44
And this is like, in many ways, Dan, like, the I was trying to distill the, like, agent native architectures to, like, its, like, fullest degree, which is,
Speaker 112:37 - 12:44
而这在很多方面,Dan,算是我在尝试把所谓的 agent native architectures(agent 原生架构)提炼到最彻底的一种程度,也就是——
Speaker 212:44 - 12:44
like Yeah.
Speaker 212:44 - 12:44
对,没错。
Speaker 112:44 - 13:01
Also have the agent be able to modify the app. So, like, maybe, like, phase one of agent native architecture, like, every single thing in in in this product is, you know, accessible from the agent and and and, like, has tool calls, etcetera. That's, like, you know, hopefully becoming cable fix. It was sadly not in a lot of software, and it's great. Because I was like, what's that like?
Speaker 112:44 - 13:01
还要让 agent 能够修改这个 app。本质上可以说,agent native architecture 的第一阶段就是:这个产品里的每一项功能,都能被 agent 访问到,而且都有 tool calls(工具调用)之类的接口。这个东西,怎么说呢,希望以后能变成一种 cable fix。可惜的是,现在很多软件里还没有做到这一点。不过它真的很棒,因为我当时就在想:那到底会是什么感觉?
Speaker 113:01 - 13:30
Because somebody had recommended there's a in Brazilian, there's, a show about radioactive stuff in Goiania. I did not remember what it was called, and Claude was able to figure it out. It was like so much better than being like trying to figure that intuitively. But then the next step I was interested in is like, what would it mean to actually be able to modify the software from itself on the go? And so if you long press this little chat thing, so what it actually what I built, what Claude built, was a way where it used uses our managed agents to basically take on, like, edit requests, and then you can preview them.
Speaker 113:01 - 13:30
因为有人推荐过一个 Brazilian 节目,内容是关于 Goiania 的 radioactive stuff(放射性事件)。我不记得它叫什么了,而 Claude 能把它找出来。这比我自己凭直觉硬猜要好太多了。但接下来我更感兴趣的是:如果软件能够在运行过程中从自身内部被修改,这到底意味着什么?所以,如果你长按这个小小的聊天按钮,实际上我做的——或者说 Claude 做的——是这样一种方式:它会使用我们的 managed agents(托管 agent)来基本上接收这类编辑请求,然后你还可以预览这些修改。
Speaker 113:30 - 13:44
And I used, like, the Vercel live preview thing here. This like, this whole, like, feature was also one shot, was really cool. Oh, yeah. And then I just added to it over time. But, know, it's like actually does like a little diff view if you wanted to.
Speaker 113:30 - 13:44
我这里还用了 Vercel 的 live preview 功能。整个这个功能基本也是 one shot(一把做成)的,真的很酷。哦,对。然后我只是随着时间不断往上加东西。不过,你知道,它其实还会提供一个小小的 diff view(差异视图),如果你想看的话。
Speaker 113:44 - 14:08
You can go into the manage agent conversation and see, like, what it did. Although, I almost never do because, again, it's like, especially don't particularly care on like the code quality of like, or the like long term attenability of this software. You can see that it had a session in here too. But it's been really fun. So I'll be using it on the go and say like, you know, I had a feature request the other day, like, oh, like the floating action button was too low on native iOS, but it was okay on on there.
Speaker 113:44 - 14:08
你可以进入 managed agent 的对话里,看看它具体做了什么。虽然我几乎从来不这么做,因为再说一次,我并不是特别在意这份软件的 code quality(代码质量),或者它长期的 maintainability(可维护性)。你还能看到它在这里也有一个 session(会话)。但整个过程真的很好玩。所以我会在路上直接用它,然后说,比如前几天我提了个功能请求:哦,这个 floating action button(悬浮操作按钮)在原生 iOS 上位置太低了,但在别的平台上是正常的。
Speaker 114:08 - 14:26
Like, can you go if it do it? It did it. It was really fun with some of the, like, expo tooling now. It actually, like, live reloaded on my phone, which was also, like, a really cool kind of kind of feeling. But it was just like, you know, does this thing need to be like a, you know, production level thing that's gonna go to a million users?
Speaker 114:08 - 14:26
就像,你能不能去把它改掉?它真的改了。配合现在的一些 expo tooling(Expo 工具链)用起来特别有意思,它甚至真的会在我手机上 live reload(实时重载),那种感觉也非常酷。但问题就在于,这东西真的需要成为一个 production level(生产级)的产品,然后面向一百万用户发布吗?
Speaker 114:26 - 14:48
No. But it felt really good to have something where I felt like it didn't have to stop at just the weekend, and I could keep working on it just by using it and having this, like, kind of end to end close thing. So I felt like this was a good manifestation of both, like, Fable's building ability, but also, like, I think a lot of both of I have been thinking both you and I have been thinking about, like, how does Claude embed itself and, like, into software beyond just even the usage side
Speaker 114:26 - 14:48
不需要。但那种感觉真的很好:我拥有了一个不必在周末结束时就停下来的东西,我只要继续使用它,并且拥有这样一种端到端、闭环式的体验,就能持续开发它。所以我觉得这很好地体现了 Fable 的构建能力;同时,我想,这也呼应了你我两个人一直都在思考的事:Claude 到底该如何把自己嵌入到软件之中,而不只是停留在使用层面之上——
Speaker 214:48 - 14:49
of
Speaker 214:48 - 14:49
——在
Speaker 114:48 - 14:49
things.
Speaker 114:48 - 14:49
——各种事物里。
Speaker 214:49 - 15:12
This is really cool. I want people to understand, has been built, you could build something like this. Maybe not the self modifying part, but you could build something like this for ten years or twenty years or something like that. But the cost to build has gotten dramatically lower. So think about how much it would have cost to do this in the Instagram days versus now.
Speaker 214:49 - 15:12
这真的很酷。我希望大家理解,已经被构建出来的东西,其实你在十年前、二十年前左右也能做出一个类似的版本。也许做不到“自我修改”那部分,但你确实可以做出类似的东西。只是如今的构建成本已经大幅降低了。所以你想想,在 Instagram 时代做这件事要花多少钱,再对比一下现在。
Speaker 215:12 - 15:14
Can you help us understand how that has changed?
Speaker 215:12 - 15:14
你能帮我们理解一下,这件事是怎么变化的吗?
Speaker 115:16 - 15:48
Yeah. I think and I think about this a lot when I think back to that that time as well because, you know, I I thought of myself as a very productive programmer in the early Instagram days. You know, I was, like, really into mobile development, and and we had, like, a good clarity of of things. And I think the the gap from idea to fully realized version of, like, some complete product, like, you were still looking at, you know, four ish days of kind of my all nighters, which was, like, my natural state is up till four, you know, sleep until noon, which is not conducive to family life, so I've had to shift. But that was, like, my my building thing.
Speaker 115:16 - 15:48
对。我经常会想到这个,回头看那段时间时尤其如此。因为你知道,在 Instagram 早期,我一直觉得自己是个效率很高的 programmer(程序员)。那时候我非常投入 mobile development(移动开发),而且我们对要做的事情有很清晰的认识。我觉得,从一个想法到某个完整产品的 fully realized version(完全实现版本)之间的差距,当时大概仍然意味着四天左右——基本就是我连续通宵干活。那其实也是我的自然状态:熬到凌晨四点,睡到中午。这显然不利于家庭生活,所以我后来不得不调整。但那就是我当时的构建方式。
Speaker 115:48 - 16:49
Yeah, I call it, you know, Instagram v one, which, you know, probably had more features than than this thing did, but not by an order of magnitude, was, like, five days of all nighters, me working on, like, the sort of front end and and back end, and Kevin working on the initial filters to get that that out. And and this is also, like, you know, like, built on already, you know, many years that I've been working on on on iOS pieces as well. And then the iteration, you know, I think a lot about what we were gated on after that launch when things went well was we had all these ideas for where to take it, but we were just trying to keep the site up or we were just trying to, like, add the one incremental feature and, you know, hashtags take a week to build, but then there's, like, all the things that you wanna continue doing on it as well. And so I think it's both that shortening of time, like, there's still the time required for the idea and the the concept and the iteration. And then the other piece, which is the you can then iterate on what you have, and I think a really I think really fun, but also like very, you know, sort of in the float kind of way.
Speaker 115:48 - 16:49
对,我把它叫作 Instagram v one。它的功能可能甚至比现在这个东西还多一点,但也没有多到一个数量级的程度。那基本就是五天通宵:我负责做前端和后端,Kevin 负责最初的 filters(滤镜),然后把它发布出去。而且这也是建立在我此前多年一直在做 iOS 相关工作的基础之上。再往后就是 iteration(迭代)了。我经常会想,发布之后进展顺利时,真正限制我们的是什么——我们脑子里其实有很多接下来可以发展的想法,但我们当时只是在努力让网站别挂掉,或者只是试着加上一个小的增量功能。比如 hashtags(话题标签)要花一周来做,但与此同时你还想继续推进很多别的东西。所以我觉得这有两个方面:一方面,时间确实在缩短;当然,构思 idea(想法)、concept(概念)和做 iteration(迭代)本身仍然需要时间。另一方面是,你随后可以基于已有成果继续迭代,而这件事我觉得真的很有趣,同时也有一种很“flow(心流)”式的感觉。
Speaker 116:50 - 17:31
And then, you know, if now this is me as a sort of professional software engineer sort of startup founder. Beyond that, if you had that idea, you know, and I saw multiple people go through this, like, was like, well, I'll try to find maybe consultancy that will take this on, but like now there's like, it's a really lossy process of like, what I want it, you know. Yeah. Is it they're gonna raise money for it. And I think that the thing that I think is, like, the most exciting part about these models getting not just more autonomous, but again, closing that gap between intent and execution is what I've seen it do to people's ability to build who are not, like, builders.
Speaker 116:50 - 17:31
然后,如果说现在这是我作为一个专业 software engineer(软件工程师)兼 startup founder(创业公司创始人)的视角。再往外说,如果你有一个那样的想法——而且我看过很多人经历这个过程——通常会是这样:好吧,我去找一家 consultancy(咨询/外包公司)看看能不能接这个项目。但现在的问题是,这会是一个信息损耗非常大的过程,很难准确传达“我到底想要什么”。对吧。又或者他们得为这个去融资。我觉得,最令人兴奋的一点是,这些 models(模型)不仅变得更 autonomous(自主),而且再次在缩小 intent(意图)和 execution(执行)之间的差距;而我看到的结果是,它极大提升了那些本来并不是“builder(构建者)”的人去创造东西的能力。
Speaker 117:31 - 18:09
And the trajectory of these models has been, you know, if something abled, you know, of this general mythos class is, like, in that class of models, and eventually, you know, models of you know, that are cheaper and more accessible to to other folks become available too. And, like, as that process happens, like, I just think it is just opening up so many like, I got a ping the other day. I get very excited about the stuff, if you can't tell. From somebody internally, and we had built them an internal tool that kinda combined Fable and, like, access to some internal MCPs. And she said, like, it is the first time in my life and she works in recruiting.
Speaker 117:31 - 18:09
这些 models(模型)的发展轨迹一直是这样的:如果某个具备这类能力、属于这种大致 mythos class 的东西已经进入了这一类模型的水平,那么最终,那些更便宜、也更容易被其他人获得的模型也会陆续出现。随着这个过程发生,我真的觉得,它正在打开非常多的可能性。前几天我收到一条 ping(消息),如果你还没看出来的话,我对这些东西真的很兴奋。消息来自公司内部的人。我们给她做了一个内部工具,大致是把 Fable 和一些内部 MCPs 的访问能力结合在一起。她说,这是她人生中第一次——而她是做 recruiting(招聘)的。
Speaker 118:09 - 18:41
She's like, the first time in life where, like, I feel like the thing that's in my head and the thing that exists in the world is now, like, they're right next to each other. Like, I can just do it. And, it was, like, very like, a meaningful moment to her because prior to that, like I mean, I remember these days these days were five years ago or four years ago where that person, if they wanted a tool, would have to either make do or try to get an internal tools engineer that probably was overloaded with 50 other, you know, requirements. But instead, now they, like, are just having the time of their lives building. And I think that is I think that's cause for a lot of, like, hope.
Speaker 118:09 - 18:41
她说,这是她人生中第一次让我觉得,我脑子里的那个东西,和现实世界里真正存在的那个东西,现在几乎就是挨在一起的。就好像,我可以直接把它做出来。这对她来说是一个非常有意义的时刻,因为在那之前——我是说,我还记得那种日子,其实也就是五年前或四年前——像她这样的人,如果想要一个工具,要么只能将就着用,要么就得去找一个做 internal tools(内部工具)的 engineer(工程师),而那个人大概率已经被另外 50 个需求压得喘不过气来。但现在,他们却能尽情享受构建东西的过程。我觉得,这是一件非常值得抱有希望的事。
Speaker 118:41 - 18:52
I don't think that human capacity for creativity and what's possible is enormous. And I think, like, at our best, we are basically expanding the number of people who can then see that through to something that feels real.
Speaker 118:41 - 18:52
我不认为人类在创造力上的能力、以及“什么是可能的”这件事,是狭小有限的;恰恰相反,它是巨大的。我觉得,在我们状态最好的时候,我们本质上是在扩大这样一群人的数量:让更多人能够真正把脑海中的东西落实成某种感觉真实存在的东西。
Speaker 218:52 - 19:04
I totally agree, but I do think that there's a question in the back of my mind, I think it's probably gonna be in the back of the minds of some of the people listening. So I wanna ask you, given everything you just said, is software engineering over?
Speaker 218:52 - 19:04
我完全同意,但我确实觉得,我心里一直有个问题,而且我想这大概也是一些听众心里的问题。所以我想问你:结合你刚才说的这一切,software engineering(软件工程)是不是要结束了?
Speaker 119:04 - 19:16
Yeah. I think software engineering is different. It is like dramatically changed. And as I I as I probably would have defined it if you had asked me around the Instagram time, like, what is software engineering? I'd probably say, alright.
Speaker 119:04 - 19:16
对。我觉得 software engineering 确实不一样了。它就像发生了巨大变化。如果你在 Instagram 那个时期问我,什么是 software engineering,我大概会说,好吧。
Speaker 119:16 - 19:32
Like, thinking through the hard problems and, like, thinking about an architecture, then, like, spending a lot of time in, you know, like, TextMate. I don't know what that can really be. Like, you know, like, text editor, you're gonna edit those things or Xcode, you know, and Watching Rails casts. You know? Yeah.
Speaker 119:16 - 19:32
就是要去思考那些难题,思考 architecture(架构),然后花很多时间待在,比如 TextMate 里。我也不知道现在那到底该算什么。就是,你知道,text editor(文本编辑器),你要在里面改这些东西,或者用 Xcode,还要看 Rails casts。对吧?是的。
Speaker 119:32 - 19:47
Exactly. Right. Exactly. And understanding the intricacies of Django's, like, ORM layer, and then, like, 15 bugs after you deploy it. Like, so much of that is radically different and collapsing into other parts of, like, product management.
Speaker 119:32 - 19:47
没错。对,完全是这样。还要理解 Django 的 ORM layer(ORM 层)里的各种细节,然后部署之后再冒出 15 个 bug。这里面有太多东西已经彻底不同了,并且正在并入 product management(产品管理)的其他部分。
Speaker 119:47 - 20:14
And I think that sort of, like, PM end split, I think you guys see it even on our teams has become much more diffuse. That's radically changed. But I think the overall, like, maybe zoom out from software engineering and think about like software production or, you know, software development, but not in like just the pure developer case. I think that is, like, alive and well and and and essential still. So I think that it that is the moment that I feel like we are in.
Speaker 119:47 - 20:14
而且我觉得那种 PM 端与工程端的分野,我想你们甚至在我们的团队里也能看到,它已经变得模糊得多了。这变化非常大。不过我认为,从 software engineering 稍微拉远一点看,去想 software production(软件生产)或者 software development(软件开发)——但不只是那种纯 developer(开发者)视角——我觉得这件事依然生机勃勃,而且仍然非常必要。所以我觉得,这就是我们当下所处的时刻。
Speaker 120:14 - 20:38
I think Fable is another step on the direction of and I'm not gonna call it the fall final step. Of course, a lot will still happen. But, like, I think a pretty significant step in terms of, like, the trust, at least I end up placing in the model in terms of its capacity to see things through and even, you know, architect things reasonably is quite high. So that part feels like it is it is not ever gonna be done, but it is pretty pretty done. Right?
Speaker 120:14 - 20:38
我觉得 Fable 是朝这个方向迈出的又一步,我不会说它就是最后一步。当然,后面还会发生很多事。但我认为,就 trust(信任)而言,至少对我来说,我会放在 model(模型)身上的信任——相信它有能力把事情做到底,甚至能相当合理地做 architecture(架构)——已经相当高了。所以那一部分给我的感觉是,它永远不会真正“完成”,但也已经完成得相当相当多了。对吧?
Speaker 120:38 - 20:51
Like, it it's gone really far. But I think that the overall sort of craft of the what needs do you have? Like, what are you putting out? Like, is it actually good? I think still a very human endeavor.
Speaker 120:38 - 20:51
就像,它已经走得非常远了。但我觉得,整体上那种 craft(技艺)——你到底有什么需求?你到底在产出什么?它实际上够不够好?——我认为仍然是一项非常 human(人类)的事业。
Speaker 120:51 - 21:31
But I also sort of can see that that is not a transition that is pain free in a way. Like, I think there are plenty of people who love the craft of, like, actually putting I used to love stuff like, I solved that problem so elegantly. You would dream about code, and if you've had that experience of, like, you dream about the thing that you're working on, they, like, wake up in the morning and be like, I figured out how to solve this thing really elegantly. And and that for sure has has has passed, I think that there's, you know, there there's there is a feeling of loss, I think, in some of the, like, better engineers that I talked to, as well as the feeling of, oh my god, but I can do insane amounts of work now at same time. So we're holding both ideas in our heads at once, I guess.
Speaker 120:51 - 21:31
但我也能看出来,这种转变并不是一种毫无痛感的过渡。因为我觉得,有很多人真心热爱那种“亲手把东西做出来”的 craft(技艺)。我自己以前就很喜欢那种感觉:这个问题我解得太优雅了。你会梦到代码;如果你也有过那种经历——你会梦到自己正在做的东西,然后早上醒来时想,我想到了一个特别优雅的解法——那种体验我觉得确实已经过去了。我想,在一些我交谈过的、更优秀的工程师身上,确实能感到一种失落;但与此同时,也有一种“天哪,我现在居然能同时完成海量工作”的感觉。所以我猜,我们现在是在脑子里同时容纳这两种想法。
Speaker 221:31 - 21:46
Which I think is the most important part of this. Like, it's normal to feel sadness for that kind of thing and excitement. But I'm curious, let's just take the thesis of software engineering is alive and well. What does that actually look like inside of Anthropic?
Speaker 221:31 - 21:46
而我觉得,这才是这里面最重要的部分。为这种事情既感到难过、又感到兴奋,是很正常的。不过我很好奇,我们就先接受“software engineering 依然活得很好”这个论点:那在 Anthropic 内部,这具体意味着什么样子?
Speaker 121:47 - 22:06
Yeah. I think there's there's a few thesis. I think there's still the the crafting of well, I gotta take it off from, the full software development cycle or, like, maybe what I see on a day to day, maybe I'll do a little bit of both. But I think there's still a lot of, you know, we all got together. We we talked about the next way we want to, you know, evolve co work.
Speaker 121:47 - 22:06
对,我觉得这里有几个 thesis(核心判断)。我想还是有那种对——嗯——整个 software development cycle(软件开发周期)的梳理,或者说,也可以从我每天实际看到的情况来讲,也许我会两者都说一点。不过我觉得,仍然有很多事情是这样的:大家会聚在一起,讨论我们下一步想要如何演进 co work(协作)方式。
Speaker 122:07 - 22:19
And now we've kind of broken it down into areas of ownership. I think that ends up still being quite important because there is still context that you hold as a person that is sort of beyond Claude. Right? Like, what is the actual intent of this product? How's it going?
Speaker 122:07 - 22:19
现在我们已经把它大致拆分成了不同的 ownership(负责)领域。我觉得这最终仍然相当重要,因为作为一个人,你所掌握的 context(上下文)仍然有一部分是 Claude 无法替代的。对吧?这个产品真正的 intent(意图)是什么?它的进展怎么样?
Speaker 122:19 - 23:00
What do we need to know about other products that are coming down the pipeline that are going to be integrated in some interesting way? So I think that aspect is really important still. And so, know, though we have many clods to each human, each human, at least the way we've been working on, still kind of has, you know, we call them DRIs, directly responsible individuals, still has like a DRI ship over some part of the product or some area. I think that'll be the case for a while because I think there is value in not just this distributor, like, we should all make better, but instead, like, alright. I'm thinking through how cohort does this particular task, and there's still a lot of, you know, the we try to keep meetings minimal, but they they still emerge and you still have these kind of alignment conversations.
Speaker 122:19 - 23:00
我们还需要了解哪些正在 pipeline(流程管线)中推进、并且未来会以某种有趣方式整合进来的其他产品?所以我觉得,这一部分仍然非常重要。因此,虽然现在几乎是每个人类都配有很多个 clods,但至少按照我们目前的工作方式,每个人还是会——我们称之为 DRI,directly responsible individuals(直接责任人)——仍然会对产品的某一部分或某个领域保有某种 DRI 责任。我觉得这种情况还会持续一段时间,因为我认为价值并不只在于那种分散式的“大家一起把它做得更好”,而更在于“好,我正在具体思考 cohort 应该如何完成这个特定任务”。而且仍然有很多——我们会尽量把 meeting(会议)压到最少,但它们还是会出现,你还是会有这类 alignment(对齐)讨论。
Speaker 123:01 - 23:41
Then, like, a lot of that sort of asynchronous delegation. I think what many engineers here have now found is they've they've all built and I think we solved this at some point at, like, a broader product level. But they've all built some version of, alright. I'm gonna now, like, create a dashboard of where all my clods are doing and what's waiting for me and which pull requests, like, need my attention because, you know, either a human or a clod coder viewer got back to me. So there's a lot of that sort of meta maintenance of the of the work that I think again, I think we'll standardize some, but I think some of it will always be a little bit bespoke to the way each individual likes to work just in the way that people organize their windows, now they organize their work.
Speaker 123:01 - 23:41
另外,还有很多这种 asynchronous(异步)的委派工作。我觉得这里很多 engineer(工程师)现在都发现,他们各自都搭建了某种版本的工具:好,我现在要创建一个 dashboard(仪表盘),看我所有 clods 都在做什么、哪些在等我处理、哪些 pull requests(拉取请求)需要我关注,因为无论是人类还是 clod coder reviewer(代码审查者)都可能已经给了我反馈。所以,这里面有很多对工作的 meta maintenance(元维护)工作。我想,还是那句话,其中一部分我们之后会标准化,但另一部分可能永远都会有些 bespoke(定制化),取决于每个人喜欢怎样工作,就像人们整理窗口的方式不同,现在他们整理工作的方式也不同。
Speaker 123:42 - 24:08
And then there is, I think, also the understanding how things work in production. And I think that is another, like there's a few, like, next frontiers, I think, for the models. And I think one of them that Fable does, you know, make significant strides in, but I think there's there's more work needed here is understanding what happens to code after it gets deployed, you know. Because there's incidents. There's, you know, this was all working well, but, like, this network link got cut, which is not in your usual failure mode.
Speaker 123:42 - 24:08
还有一点,我觉得也很重要,就是理解系统在 production(生产环境)里是如何运作的。我认为这也是 models(模型)接下来的几个 frontier(前沿)之一。Fable 在这方面确实已经取得了显著进展,但我觉得这里仍然需要更多工作,那就是理解代码在 deployment(部署)之后会发生什么。因为会有 incident(故障事件)。也会有这种情况:一切本来都运行得很好,但某条 network link(网络链路)断了,而这并不属于你通常设想的 failure mode(故障模式)。
Speaker 124:08 - 24:44
And, like, it manifested, like, so much of Instagram. Like, 2012 to 2016 was, like, dealing with that and scaling things up. And so that role of the engineer still remains really key, and I think getting the the reps in around incident response and understanding how to stay calm, gather data, like, remediate what's immediate, but then, like, go off and and and work on on on longer term fixes, like, still a necessary part of it. And and I'm trying to think if there's any, like, other pieces that are that are notable as well. I think what's maybe the last the last thing to say is I really like the role that the engineering prototype now plays.
Speaker 124:08 - 24:44
而且,像 Instagram 在 2012 到 2016 年间,很大一部分工作其实都在处理这类问题,以及如何把系统 scale up(扩展)起来。所以,engineer(工程师)的这个角色依然非常关键。我认为,围绕 incident response(故障响应)去积累 reps(实战训练),学会如何保持冷静、收集数据、先修复眼前最紧急的问题,然后再抽身去做更长期的修复,这些仍然都是必不可少的一部分。然后我想想,还有没有其他值得一提的点。我觉得最后也许还可以说的一点是:我现在非常喜欢 engineering prototype(工程原型)所扮演的角色。
Speaker 124:44 - 25:13
You have to be clear when it's a prototype versus not. But, you know, the old phrase was like, code wins arguments, and I never, like, loved that because, like, kind sort of the person that could code could go do it, but actually, like, why should they necessarily win an argument by by by default? But actually, it's been really cool now where sometimes we will have some disagreement or some sort of debate about where to take a product. And often, it's the PM that will say, alright. I just tried it and, like, jank in, like, these eight ways.
Speaker 124:44 - 25:13
你必须明确什么时候它是 prototype(原型),什么时候不是。不过,以前有句老话叫“code wins arguments(代码赢得争论)”,我一直不算特别喜欢这句话,因为它某种程度上意味着:那个会写代码的人就可以直接去做,但为什么他就一定该默认赢下这场争论呢?不过现在很有意思的是,有时候我们会对产品该往哪里走产生分歧或争论,而经常是 PM(产品经理)会说:好,我刚刚试了一下,它在这八个地方都很 jank(粗糙/别扭)。
Speaker 125:13 - 25:35
But look, it actually shows, like, how this could work, and that that can open up some some interesting pieces of conversation. So almost all of that is quite different than it was six months ago. I think especially at the level of parallelism and the level of need for these kind of higher order abstractions of work. But I think what hasn't changed is that ownership.
Speaker 125:13 - 25:35
但是你看,它确实展示了这件事可能是怎么运作的,而这就能打开一些很有意思的讨论空间。所以,几乎所有这些方面都和六个月前非常不同了。我觉得尤其是在 parallelism(并行度)的层面,以及对这类更高阶的 work abstractions(工作抽象)的需求层面上,变化特别明显。但我认为没有变的是 ownership(责任归属)。
Speaker 225:35 - 25:58
Lots of us are shipping AI to production, which is great for productivity, but it also comes with anxiety. You tweak a prompt, swap models, adjust parameters, and everything looks fine in testing, so you merge. And then three days later or even sooner, the support tickets start rolling in. The AI is giving your customers unexpected answers, you have no idea when it happened or why. BrainTrust is the AI observability platform that fixes this.
Speaker 225:35 - 25:58
我们很多人都在把 AI 部署到生产环境,这对提升生产力当然很好,但同时也会带来焦虑。你改了一个 prompt(提示词),换了 model(模型),调了参数,测试里看起来一切正常,于是就合并上线了。可三天后,甚至更早,support ticket(支持工单)就开始不断涌来。AI 给客户提供了出乎意料的回答,而你完全不知道这是从什么时候开始的,也不知道为什么会这样。BrainTrust 就是解决这个问题的 AI observability(可观测性)平台。
Speaker 225:58 - 26:17
It connects evals and observability in one workflow. That way, you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path. Evals define what good looks like, and experiments let you compare prompts and models side by side before shipping. Production traces feed directly into your eval datasets.
Speaker 225:58 - 26:17
它把 evals(评估)和 observability(可观测性)连接到同一个工作流中。这样一来,你既能看到 production(生产环境)里实际发生了什么,也能衡量改动到底是让结果变好了还是变差了。traces(追踪)会展示完整的执行路径。evals 定义“什么才算好”,而 experiments(实验)让你在发布前并排比较 prompts(提示词)和 models(模型)。production traces 会直接进入你的 eval datasets(评估数据集)。
Speaker 226:17 - 26:38
Every failure becomes a test case. You catch regressions in CI before they reach users, and teams at Notion, Stripe, Zapier, Vercel, and Ramp use it to ship quality AI at scale. BrainTrust is designed for teams building production AI systems where silent regressions are expensive. It's built for any stack. They have SDKs for Python, TypeScript, Go, Ruby, C Sharp.
Speaker 226:17 - 26:38
每一次失败都会变成一个 test case(测试用例)。你可以在 CI(持续集成)里,在问题触达用户之前就捕获 regression(回归问题);Notion、Stripe、Zapier、Vercel 和 Ramp 的团队都在用它,以大规模交付高质量 AI。BrainTrust 是为那些构建生产级 AI 系统的团队设计的,在这类系统里,静默发生的 regression 代价很高。它适用于任何技术栈。他们为 Python、TypeScript、Go、Ruby、C Sharp 提供了 SDK。
Speaker 226:38 - 26:51
There's no framework lock in or vendor dependencies. It's SOC two, Type two certified, and GDPR and HIPAA compliant. Get started at braintrust.dev. That's braintrust.dev. And now back to the episode.
Speaker 226:38 - 26:51
没有 framework lock-in(框架绑定)或 vendor dependency(供应商依赖)。它通过了 SOC two, Type two 认证,并且符合 GDPR 和 HIPAA 要求。可前往 braintrust.dev 开始使用。再说一遍,braintrust.dev。现在回到本期节目。
Speaker 226:51 - 27:13
Fable is also very expensive. And because of that, well, like, when I was testing it, I felt kinda like I was a kid in a candy shop, I was just like, I'll do this, and I'll do this, and I'll do that. But now that there's gonna be a bill, I'm gonna be thinking about it. Because I have to pause before I do it to be like, is this gonna cost me a $100 or whatever? And I do think that's gonna limit who gets to use it and for what.
Speaker 226:51 - 27:13
Fable 也非常贵。正因为如此,嗯,就像,我在测试它的时候,感觉有点像个进了糖果店的小孩,我会想,这个也试试,那个也试试,那个也来一下。但现在既然会真的出账单了,我就会开始考虑这件事了。因为我每次动手前都得先停一下,想想看:这会不会花掉我 100 美元之类的?而且我确实觉得,这会限制谁能用它,以及他们会拿它来做什么。
Speaker 227:14 - 27:15
So how do you think about that?
Speaker 227:14 - 27:15
那你怎么看这个问题?
Speaker 127:15 - 27:33
Yeah. I think it's most clear cut on the sort of professional software, you know, sort of classic company doing work. It'll be really interesting. It's like, you know, a lot of process that goes into to pricing as well. There's like it's both more expensive than Opus.
Speaker 127:15 - 27:33
嗯。我觉得最清晰的一类,还是那种专业软件场景,就是那种典型的公司在开展工作。这会非常有意思。就像,你知道的,定价本身也有很多流程和考量在里面。它确实也比 Opus 更贵。
Speaker 127:33 - 27:57
And then also, I'm like, in many ways, it's really cheap. If you think about, you know, like, how much incredible work it's doing. But, of course, like, everybody has their own economics around what they're what they're what they're working with. So anyway, most clear cut, I think, from most sort of software teams. And I think as an industry, if, like, phase one was companies even struggling to get some of their employees to adopt AI coding, which models were early, maybe the tooling wasn't there.
Speaker 127:33 - 27:57
但同时,我又觉得,从很多方面看,它其实非常便宜。如果你想想看,它正在完成的是多么惊人的工作。当然,每个人对于自己手头在做的事,都有各自不同的经济账要算。所以,总之,我觉得对大多数软件团队来说,这一点是最清楚的。我也觉得,作为一个行业,如果说第一阶段是公司甚至还在努力让一部分员工开始采用 AI coding(AI 编程),因为那时 model 还早期,tooling(工具链)可能也还没到位。
Speaker 127:57 - 28:35
And then phase two was, great. We'll create leaderboards and see who can use the most, which, you know, as you can imagine, creates, like, some, like, also, like, not ideal incentives to phase three where people were like, okay. Now we're just trying to figure out who's using it effectively and, like, letting them spend as much as possible, having a a a clear process for that, but making sure we're not doing things wastefully, which I think to me, in general, makes sense. Although, I think you could, like, also over rotate that way too. I think something of Fable class should hopefully fit in well into that, if you're demonstrating results and you're getting use out of the model, then that hopefully there's a flywheel even inside companies where that goes and and and and perpetuates that.
Speaker 127:57 - 28:35
那第二阶段就是:很好,我们来做排行榜,看看谁用得最多——而这点你也能想象,会带来一些同样不太理想的激励。再到第三阶段,人们开始说:好吧,我们现在只是想弄清楚,谁是在有效地使用它,并让他们尽可能多地花费,同时建立一个清晰的流程来管理这件事,但也要确保我们没有在浪费资源。我觉得总体上这很合理。不过,我也认为,你也可能会在这个方向上走得过头。像 Fable 这一类的东西,理想情况下应该能很好地融入这种阶段:如果你能证明结果,而且你确实从这个 model 中获得了价值,那么公司内部就有希望形成一个 flywheel(飞轮效应),让这件事不断推进,并持续自我强化。
Speaker 128:35 - 28:47
I think on the personal use side, it's a really good one. That's a really good question. I think where I've seen it, you know, even in my personal testing, because, you know, our personal accounts. Pay, which is funny. I like paying my own company I work at.
Speaker 128:35 - 28:47
我觉得在个人使用这一侧,它确实很不错。这是个非常好的问题。我觉得我看到的情况是,甚至在我自己的测试里也是这样,因为,嗯,我们有个人账户。Pay,这还挺好笑的。我喜欢给我自己工作的公司付钱。
Speaker 128:47 - 29:19
But but, you know, you you do become more more thoughtful about it. Something that was interesting was this the app that I built over the weekend actually fit in with, like, only a bit of extra usage. So it wasn't like a, you know, thousands of dollars to build this thing that, like, is a personal thing to to myself. But it was also spaced out a little bit more. Probably the the in between of that, what we'll probably have to do the most thinking about is the sort of hobbyist or, like, independent who's, like, not, you know, within the larger company, but also is thoughtful about about the pricing as well.
Speaker 128:47 - 29:19
但是,不过,你知道,你确实会对此变得更谨慎一些。一个有意思的点是,我周末做的那个 app,实际上只多用了一点点用量就能完成。所以这并不是那种要花上几千美元去做的东西——而它本质上只是一个给我自己用的个人项目。不过它的使用也稍微拉得更开一些。大概介于这两者之间的那类人,才是我们可能最需要认真思考的对象:也就是那种 hobbyist(爱好者)或者 independent(独立开发者)——他们不在大公司体系内,但同时也会认真考虑 pricing(定价)问题。
Speaker 129:19 - 29:49
I think, like, my overall advice is, like, just give it a try and see how much it can do without you having to then do a lot of follow ups. And it's like I think measuring cost has gotten so multifaceted now because there's the per turn cost. And then there's, like, what did it cost you not to just do the task, but, like, complete the task to your satisfaction? And I think that's where FameWall has really shined for me, which is it actually just does it right so that I don't have to go spend the, like, nine, ten subsequent turns be like, no. That was not quite what I meant.
Speaker 129:19 - 29:49
我觉得,我整体上的建议就是:先试试看,看看它在不需要你再做大量 follow-up(后续追问)的情况下,到底能做到多少。我觉得现在衡量 cost(成本)已经变得非常多维了,因为有每轮对话的成本;然后还有一个问题是:它带给你的成本,不只是“完成这个任务”花了多少,而是“把这个任务完成到你满意”到底花了多少。我觉得这正是 FameWall 对我来说真正出彩的地方:它实际上第一次就把事情做对了,这样我就不用再花后面九轮、十轮对话去说,不,这还不完全是我的意思。
Speaker 129:49 - 29:51
Like, can you also do this piece?
Speaker 129:49 - 29:51
比如,你还能顺便把这部分也做了吗?
Speaker 229:51 - 30:06
It's been really impressive for me because you ask it to go do something, and then it just does it does a thing, and you're like, wow. You thought through all the little details of this thing in a way that I've never seen another model do. I don't know how much you can reveal about the training process, but what makes the model different?
Speaker 229:51 - 30:06
这点对我来说真的非常令人印象深刻,因为你让它去做一件事,然后它就真的去做了,而且是真的把事情做出来了。你会觉得,哇,你把这件事里所有细小的细节都考虑到了,而这是我以前从没在其他 model(模型)上见过的。我不知道你能透露多少 training process(训练过程)的内容,但这个 model 到底有什么不同?
Speaker 130:06 - 30:42
I mean, I think, in many ways, a continuation of a lot of the work that the team has done, and I, like, bow down in total awe of our of our teams, both, you know, on the pretraining and on the RL side. I think that the the piece that it has evolved in that, at least I noticed the most, is kind of adjacent to that as well, which is a sense of the system more than just the individual piece of the work. Like, I will often be very positively surprised when it will write something and say, alright. But, you know, I know that, like, in production, this needs to be different. Like and then it will keep bugging you.
Speaker 130:06 - 30:42
我的意思是,我觉得从很多方面看,这都是团队过去大量工作的延续,而我对我们的团队——无论是做 pretraining(预训练)的,还是做 RL(强化学习)的——都怀着完全敬佩的心情。我觉得它在这方面演进出的那一部分,至少是我最明显感受到的,其实也和那个方向相邻:它会有一种对整个 system(系统)的理解,而不只是盯着某个单独的工作片段。比如,它经常会写出一些东西,然后说,好吧,但是我知道,如果是在 production(生产环境)里,这里需要不一样。然后它还会一直提醒你。
Speaker 130:42 - 30:54
Like, have you turned on that, like, feature flag yet? Like, it's not gonna work until you do. And, you know, sometimes it'd be in sessions that could have gone on for days and be like, look. You still haven't done that thing. Like, you better you know, like I was like, you're right.
Speaker 130:42 - 30:54
比如,它会说,你那个 feature flag(功能开关)开了吗?不开的话这东西是跑不起来的。然后,你知道,有时候一个 session(会话)已经持续了好几天,它还会说,你看,你到现在还是没做那件事。你最好还是——你知道——我当时就会想,你说得对。
Speaker 130:54 - 31:16
Like, I didn't turn on that feature flag. I should go off and do that. Or if we change this, the contract will change over there. Or watching it actually, one of my favorite times of seeing it in action, I think where it demonstrates some of the some of the training is watching it respond to code review feedback either from people or from from other Claude reviewers, where it doesn't just say, oh, yeah. That's an issue.
Speaker 130:54 - 31:16
比如,我确实没打开那个 feature flag,我应该现在就去把它打开。或者,如果我们改了这个,那边的 contract(契约/接口约定)也会跟着变。再或者,看它实际处理 code review(代码审查)反馈的时候,也是我最喜欢看它发挥作用的场景之一;我觉得这特别能体现出它的一部分训练成果。无论反馈是来自人,还是来自其他 Claude reviewer(Claude 审查者),它都不会只是说,哦,对,这是个问题。
Speaker 131:16 - 31:34
I'm gonna go fix it. And actually be really thoughtful around, hey. Like, for this level of, like, sort of fidelity of what we're building, I'm gonna accept this risk. Or I see what you mean, other code reviewer, which is often just another fable model, like, to you. Like, I see what you mean, but, like, I'm actually gonna push back.
Speaker 131:16 - 31:34
我要去把它修好。而且还要非常认真地思考:嘿,对于我们正在构建的东西达到这种 fidelity(保真度/精确度)水平,我愿意接受这个风险。或者我明白你的意思,另一位 code reviewer(代码审查者)——而且很多时候其实也只是另一个 fable model——我懂你的意思,但我这次其实还是会反驳、会顶回去。
Speaker 131:34 - 31:49
I don't I think that that's actually not right. And I think getting the model to have that judgment is really important. And I think if I had to pinpoint, like, an area where I feel like it's really progressed, it is that sort of not just immediate knee jerk. Yeah. Yeah.
Speaker 131:34 - 31:49
我不——我觉得那其实不对。我认为让 model(模型)具备这种判断力非常重要。如果一定要我指出一个我觉得它进步特别大的地方,那就是它不再只是那种立刻的、条件反射式的反应。对,对。
Speaker 131:49 - 31:55
That's right. I gotta go fix it and more, I'll think about that for a minute. No. I thought about it and I still disagree. You know?
Speaker 131:49 - 31:55
没错。不只是“我得去把它修好”,而是“我会先花一分钟想想这件事”。不,我想过了,但我还是不同意。你知道吧?
Speaker 131:55 - 32:29
And I think that's a very useful sort of ability. It's so valuable to have products like Cloud Code out there because you have now like a living, breathing thing where people are like, this is where the model is doing well. And like, you know, we have like people who test it. I count the every folks as, like, very, very high on the list where, like, we really trust the feedback because it is being put to its paces in, like, repeated multi day, you know, hard tasks. And that also, like, very much feeds into how we think about, like, what do we need to improve on the next slide?
Speaker 131:55 - 32:29
我觉得这是一种非常有用的能力。像 Cloud Code 这样的产品存在于外面,价值非常大,因为现在你有了一个活生生、持续运转的东西,人们会说:这是 model 做得好的地方。而且,你知道,我们也有一些人在测试它。我会把 every folks 算作名单里非常非常靠前的一类——我们真的很信任他们的反馈,因为他们是在反复、连续多天的高强度任务中真正把它拉满来测试的。这也会非常直接地影响我们怎么思考:下一页、下一阶段我们到底需要改进什么?
Speaker 132:29 - 32:33
Like, what are the tasks that we need to specifically think about the model being better at?
Speaker 132:29 - 32:33
比如说,哪些任务是我们需要专门去考虑、让 model 做得更好的?
Speaker 232:33 - 32:45
Is chat the right interface for this model? Because it's not very turn by turn. It's it's very like I'm delegating something for you. So how does that change how you should use it or how you think about the interface?
Speaker 232:33 - 32:45
chat(聊天)是这种 model 的正确 interface(界面)吗?因为它并不是那种很一来一回、逐轮推进的模式。它更像是“我把一件事委托给你去做”。所以这会怎样改变你使用它的方式,或者你对这个 interface 的理解?
Speaker 132:45 - 33:05
I don't think, like, the fundamental, like, you are, like, sending messages and it is giving your message back is, like, totally wrong. I think that there's ways we need to evolve. But, like, one is maybe, like, three that come to mind. Like, one is, is your laptop the right place for it? So, like, I think that's number one where I mentioned with the side project I was working on how useful it was to have the mobile side.
Speaker 132:45 - 33:05
我不觉得那种最根本的模式——你发送消息,它再回给你消息——是完全错误的。我觉得我们确实需要在一些方面演进。但我脑子里大概先想到三点。第一点是,你的 laptop(笔记本电脑)真的是最适合它存在的地方吗?所以我觉得这是第一位的。就像我前面提到我在做的那个 side project(副项目)时,mobile(移动端)那一侧有多么有用。
Speaker 133:06 - 33:18
Boris, who who created Cloud Code, he's always, like, you know, ahead of the curve on on how these models get used. About almost a year ago, maybe nine months, I was talking to him. He's like, yeah. I've moved a lot of my Cloud Code work to mobile. Was like, no way.
Speaker 133:06 - 33:18
Boris,就是创建 Cloud Code 的那个人,他在这些 model 会如何被使用这件事上,总是走在前面。大概快一年前吧,也可能是九个月前,我和他聊过。他当时说,嗯,我已经把很多 Cloud Code 的工作转到 mobile 上了。我当时想,不可能吧。
Speaker 133:18 - 33:49
And, like, it took me a while to get there, but especially with the Famil class, like, there's oftentimes where, you know, because it can keep the session going and we we use, like, kind of remote dev boxes at Anthropic, like, it is, like, I have a thought. Like, okay. Need can you keep keep up on doing that? So, I mean, number one is, like, decoupling the the where the work is happening from where I'm talking to about the work. The second one touches a little bit on what I was mentioning earlier around, like, what are how do you take everything that Fable has sort of discussed or decided or proposed about something and make it comprehensible?
Speaker 133:18 - 33:49
而且,怎么说呢,我也是花了一些时间才走到这一步,但尤其是用了 Famil 这个 class(类)之后,很多时候你知道,因为它可以让 session(会话)持续下去,而且我们在 Anthropic 会用那种远程 dev box(开发机),所以情况就变成:我有了一个想法。好,接下来你能继续把这个做下去吗?所以,我的第一点是,要把“工作实际发生在哪里”和“我在哪里就这项工作进行对话”这两件事解耦。第二点有点呼应我前面提到的:你要怎样把 Fable 已经讨论过、决定过或者提出过的所有内容,变得可理解?
Speaker 133:49 - 34:04
And that's an area that we're thinking a lot about. Like, there are some skills that are out there that we've used around, like, alright. Can you diagram this? Can you do that? So that's a place where the current chat UI, I think, is insufficient, where, like, it will can experience this with ThinkBlow.
Speaker 133:49 - 34:04
这是我们正在重点思考的一个领域。现在外面已经有一些我们用过的 skill(技能),比如,好,那你能把这个画成图吗?你能做那个吗?所以我觉得,现有的 chat UI(聊天界面)在这里是不够的,像是你在 ThinkBlow 上也会有这种体验。
Speaker 134:04 - 34:15
It will give you, like, a lot of tech here. Like, this I need to, like, take a walk with property to fully understand this. And I think that that is a a piece of property I have. Some things will do with Fables. Like, okay.
Speaker 134:04 - 34:15
它会一下子给你很多技术内容。你会觉得,这个我得边散步边慢慢消化,才能真正看懂。我觉得这是我自己很在意的一点。有些事情你会想交给 Fable 去处理。好。
Speaker 134:15 - 34:45
Like, have a lot more context on this than I do. Can you, like, back it up? Like like, let's do, like, more progressive disclosure of the complexity here. So I think that that that piece is interesting. The last one that I I, you know, I think is we're still early in pulling on is thinking through multiplayer where, you know, at some abstraction levels and, like, because we have this sort of DRI and, like, ownership area, usually, like, a chunk of significant work, a human, and a couple of clods, like, that is still flowing together.
Speaker 134:15 - 34:45
你在这件事上的上下文显然比我多得多,你能不能把它支撑起来?或者说,我们能不能对这里的复杂性做一种更渐进式的 disclosure(披露、展开)?所以我觉得这一块很有意思。最后一点,我觉得我们其实还处在非常早期的探索阶段,就是去思考 multiplayer(多人协作):在某些抽象层级上,再加上我们现在这种 DRI(直接责任人)和 ownership(归属)划分的工作方式,通常一大块重要工作,会是一个人类加上几个 clods,一起把事情往前推进。
Speaker 134:46 - 35:18
But another case is that is less the case, right, where it's, you know, maybe it's an incident response where multiple people are thinking about it. Maybe it's, you know, a project where there's multiple competing or not competing, but, like, conjoining areas that are coming together. And thinking through, like, what would it mean for you know, and we have, like, chat sharing, which gets you a little bit of the way there. But I think there is going to be a need for more, like, alright. You've got an independent club that's doing a lot of work that was, you know, kicked off by somebody, but can it be keeping up with all the other work happening on the team?
Speaker 134:46 - 35:18
但还有另一类情况就不太是这样,对吧?比如可能是 incident response(事故响应),会有多个人同时在思考;也可能是一个项目,里面有多个彼此竞争的——或者不算竞争,而是相互衔接的——部分需要汇合起来。于是就要去想,这到底意味着什么。我们现在有 chat sharing(聊天共享),它能在一定程度上解决问题。但我觉得还会需要更多能力,比如:好,你有一个独立的 club,在做很多工作,这些工作最初可能是由某个人发起的,但它能不能持续跟上团队里其他所有正在发生的工作?
Speaker 135:18 - 35:38
I think that is an interesting and an underexplored sort of next frontier about how this work ends up happening. But I think it's really exciting because I think, again, it's it's the it's the level of teammate collaborator that that the models are now capable of, and we're almost holding them back by not having the right abstractions around them for that to happen.
Speaker 135:18 - 35:38
我觉得这是一个很有意思、而且仍然缺乏充分探索的下一个前沿:这种工作最终会怎样发生。但我也觉得这非常令人兴奋,因为我认为,说到底,模型现在已经具备了 teammate(队友)、collaborator(协作者)这个层级的能力,而我们几乎是在用不合适的 abstractions(抽象)把它们的发挥限制住了,导致这种协作还没真正发生。
Speaker 235:38 - 36:00
Yeah. It makes me think I've I've mostly been using this for my own vibe coded stuff. So so I haven't really had to I I haven't really had to think about this, but there's there's a problem when you're using this inside of an organization, which is, do I really understand every part of this? And and therefore, how do I transfer the context of what the model just did into my brain? Like, that's one of the big bottlenecks.
Speaker 235:38 - 36:00
对,这让我想到,我目前主要还是把这个用在我自己那种 vibe coded(凭感觉编码)的东西上。所以我其实还不太需要认真思考这个问题。但当你在一个组织内部使用它时,就会出现一个问题:我真的理解这里面的每一个部分吗?以及,因此,我要怎样把 model(模型)刚刚做过的事情所包含的上下文,转移到我自己的脑子里?这其实是最大的瓶颈之一。
Speaker 236:00 - 36:09
How you think about drawing the line, especially with a model like this, around how much you actually need to understand, and how to make sure that you have enough context on what it's done to feel comfortable?
Speaker 236:00 - 36:09
你会怎么考虑这条界线该画在哪里,尤其是面对这样的 model(模型)时:到底有多少内容是你实际上必须理解的?以及,怎样确保你对它做过的事情拥有足够的上下文,从而让自己感觉放心?
Speaker 136:09 - 37:05
I I think there's like two big pieces here. The first is verification where I I became like fully verification pulled earlier this year, and now, like, almost in the same way, and actually it connects to how I think I used to do when I was typing code more full time, which is try to find the sort of tightest dev loop that you can around the idea that you're trying to develop in. Like, sometimes with Instagram that meant, like, you know, actually build making a new build target in Exco that was just that screen with some sort of synthetic data and just doing that dev loop. And I'm not and I would mentor newer engineers if, like, if there's one thing that I can impart on you, like, it is try to get that for any project you're working on, and things will go much more quickly. I think that is no longer exactly the case here, but I think what is the case now is anytime I set it up, like, how do I get, like, for every pull request that Claude is putting out, that there is an attached, you know, photo or video, whether that's an iOS PR, whether that's, you know, something in the UI.
Speaker 136:09 - 37:05
我觉得这里大概有两个大的部分。第一个是 verification(验证),我今年早些时候基本上变成了一个彻底拥抱 verification 的人。现在也几乎是同样的思路,而且这实际上和我以前更全职地写代码时的做法是相通的:就是围绕你正在开发的那个想法,尽量找到最紧凑的 dev loop(开发循环)。比如有时候在 Instagram,这意味着你会真的去在 Xcode 里新建一个 build target,只包含那个 screen,再配上一些 synthetic data(合成数据),然后就只跑那个 dev loop。我也会带新工程师,如果有一件事是我最想传达给你们的,那就是:无论你在做什么项目,都尽量把这种东西搞出来,进展会快很多。我觉得现在这里的情况已经不完全是那样了,但现在更重要的是,每次我去搭这套流程时,我都会想:怎么做到 Claude 提出的每一个 pull request(拉取请求)都附带一个 photo 或 video,不管那是一个 iOS PR,还是 UI 里的某个东西。
Speaker 137:05 - 37:25
And that's I think that that that helps you gain a lot of confidence because even now, you know, you might have, like, you know, Fablegoff and do work for a couple of hours and be like, it's I'm done. And it's really useful to say, and here's the, like, full screenshot gallery of the full UI. Because you might say, oh, you know what? On screenshot eight, that error is it. I've never actually seen it, but I can see how, you know, a person might hit it.
Speaker 137:05 - 37:25
我觉得这会极大提升你的 confidence(信心),因为即使到现在,你也可能会让 Claude 去工作几个小时,然后它说,“好了,我做完了。” 这时候如果你能说,“这是完整 UI 的整套 screenshot gallery(截图集)”,那就会非常有用。因为你可能会看到,“哦,你知道吗?第八张截图里的那个 error(错误)就是个问题。我以前从没真的见过它,但我能看出来用户确实可能会遇到它。”
Speaker 137:25 - 37:49
Let's actually make that different. And so getting that comprehensive verification, I think, is something we've been working on a lot internally and, like, sort of publishing more and more skills and knowledge about, but I think is is a really key piece there. And then the second one is, I think you ultimately, as a person, still need to stand behind the work that you are doing, especially if you're putting it into a production system. Like, a lot of people use Cloud every day. There's still the accountability of, oh, no.
Speaker 137:25 - 37:49
那我们就把它改掉。所以我认为,拿到这种全面的 verification,是我们内部投入了很多精力去做的事情,也在不断发布更多相关的 skills(技能)和知识;我觉得这是其中一个非常关键的部分。第二点是,我认为归根结底,作为人,你仍然需要为自己在做的工作负责,尤其是如果你要把它放进 production system(生产系统)里。每天都有很多人在使用 Claude,责任仍然在那里——一旦出问题,就真的出问题了。
Speaker 137:49 - 38:27
It's still Cloud might have written a bit, like, you need to understand, you know, the the the at least the the general decisions that were made on these pieces as well. And so I have seen a fair amount of engineers actually adopt this practice where, like, Cloud will have done the work, but then there is, like, the follow-up conversation around, well, can you, like can can I make sure I deeply understand, like, all the trade offs that you made and and and that, and whatever lowercase a artifacts need to be produced in order to make that comprehensible is important. It is really interesting, though, to be in meetings where somebody will say, oh, yeah. And and I have this this PR ready, and somebody else asks her, like, oh, that's interesting. Like, did you do x or y?
Speaker 137:49 - 38:27
即使其中一部分代码可能是 Claude 写的,你也还是需要理解,至少要理解这些部分中所做出的总体决策。所以我确实看到不少工程师开始采用这样一种实践:工作可能是 Claude 做的,但接下来会有一段后续对话,围绕着“你能不能让我确认,我已经深入理解了你做出的所有 trade-off(权衡)”,以及为了让这些内容变得可理解,需要产出哪些 artifacts(工件/文档),这些都很重要。不过很有意思的是,在会议里,有人会说:“哦,对,我这个 PR 已经准备好了。” 然后另一个人会问她:“这挺有意思的,你当时做的是 x 还是 y?”
Speaker 138:27 - 38:38
And have that moment of pause, they're like, you know what? I'm not entirely sure I will forward. We merge this PR. And that's you know, I I think that adapting to that norm and figuring out and work with that is something we'll have to do.
Speaker 138:27 - 38:38
然后就会出现一个停顿,那个人会说:“你知道吗?我其实也不是完全确定。” 这时我会倾向于先不 merge(合并)这个 PR。而我觉得,去适应这种新的规范,并弄清楚如何与之协同,是我们接下来必须要做的事。
Speaker 238:38 - 38:47
Tell me more about the verification loops. It's a hot topic right now. It sounds like one way that you do that is with screenshots and screen shares, but what are the other ways that you think about that?
Speaker 238:38 - 38:47
多跟我讲讲 verification loop(验证闭环)吧。现在这是个很热门的话题。听起来你们的一种做法是用 screenshots(截图)和 screen shares(录屏/共享屏幕),但除此之外,你还会从哪些方式来思考这件事?
Speaker 138:47 - 39:17
I think part of it, it starts in, can you get to a place where you are exercising real, like, sort of real flows that aren't just like a static injected piece. And this thing gets more complex, that gets more and more complicated. So we've invested a bunch. It's like even just getting it so that the the iOS app can log in to staging on a real account and, like, have real data. But you don't want it to then go through, like, an each stage onboarding process every time everybody is just trying to test, the second part of the screen.
Speaker 138:47 - 39:17
我觉得其中一部分要从这样一个问题开始:你能不能进入一种状态,真正去跑真实的 flows(流程),而不是只是一个静态注入进去的片段。随着系统越来越复杂,这件事也会变得越来越复杂。所以我们投入了很多,比如哪怕只是让 iOS app 能登录到 staging(预发布环境)里的一个真实账号,并拿到真实数据,这本身就需要投入。但你也不希望每次大家只是想测试 screen 的第二部分时,它都还要重新走一遍每个阶段的 onboarding(新手引导)流程。
Speaker 139:18 - 39:39
So there's a lot of work around, like, how do you, know, is there a special affordance? Is there, like, some shared secret? Whatever that is around getting the the the the, like, app, you know, to really feel as human, you know, using the product as possible. So that's one one aspect of it. The second is like this mix of like well known paths versus the things you're exercising in the exact moment.
Speaker 139:18 - 39:39
所以我们做了很多工作,去思考比如:有没有什么特殊 affordance(便捷机制)?有没有某种 shared secret(共享密钥/内部捷径)?不管具体是什么,目标都是让这个 app 在使用时尽可能像一个真人真的在使用产品。这是一方面。第二方面则是,要把那些 well known paths(已知路径)和你在当下这个具体时刻想要验证的内容结合起来。
Speaker 139:39 - 40:08
Like the former being really useful for regression testing, and so, you know, if I could places where we've expressed like sort of ideal workflows in text, basically, and that Claude can repeatedly check that. And then there's also, and Claude does a really good job of this, sort of expressing the intent of the current change at hand, so that gets really, really deeply exercised. So I think that the combination of those two things is important. The visual verification I mentioned as well, Video has been really cool to to see. Actually, video is a very underexplored tool to give Claude as well.
Speaker 139:39 - 40:08
前者在 regression testing(回归测试)方面确实非常有用,所以你知道,如果我能把那些我们已经用文本表达出来的某种理想工作流的地方,基本上整理好,Claude 就可以反复去检查这些内容。还有一点是,Claude 也非常擅长做这件事:去表达当前这次改动背后的意图,这样它就能被非常、非常深入地验证。所以我觉得,这两者的结合很重要。我刚才提到的 visual verification(视觉验证)也是一样,Video 这方面真的很酷,值得一看。实际上,video 也是一个给 Claude 使用但还远未被充分探索的工具。
Speaker 140:08 - 40:45
Like, I think I've been prototyping is just giving Claude video captures of the thing that it has built and then giving it just basically an FFmpeg, and you'll watch it scrub through and say, oh, this animation has some jank in it. I'm gonna go fix that. And it would never would be able to do it with, a screenshot sort of latency capture because it will have missed the moment. So I think that's that's another piece that is that's really, really important. And then for the pieces that aren't sort of easily testable end to end because there is some more complex system, getting Claude to go and build, like, as robust a sort of, you know, mock back end as possible or use ones off the shelf has been also really interesting.
Speaker 140:08 - 40:45
我觉得我一直在做原型验证的一件事,就是直接把它构建出来的东西录成 video capture(视频捕获)交给 Claude,再基本上给它一个 FFmpeg,然后你就会看到它来回拖动查看,并说,哦,这个动画有点 jank(卡顿/不流畅),我要去修这个。如果只是用 screenshot(截图)那种延迟式捕获,它就永远做不到,因为关键瞬间已经错过了。所以我觉得这也是一个非常、非常重要的组成部分。然后,对于那些因为系统更复杂而不太容易做 end-to-end(端到端)测试的部分,让 Claude 去构建一个尽可能健壮的 mock back end(模拟后端),或者直接用现成的,也一直是件很有意思的事。
Speaker 140:45 - 41:20
Like, I think when I think about Artifact, we had really comprehensive tests. This is kind of pre LLM. And one of the ways that we were able to do that really robustly was that basically every piece of info we had, whether it was Postgres, Redis, you know, all the AWS things, had a really good in memory implementation that you could just do really quickly in unit tests. And kind of extending that to, like, Cloudland now, you know, I was working on something where it had, like, a pretty robust back end, and for kind of complicated reasons, it was hard to spin that up on my dev server. But it was able to, again, one shot a really good, like, proxy for that.
Speaker 140:45 - 41:20
我觉得,当我想到 Artifact 的时候,我们当时有非常全面的测试。那还是 pre LLM(大语言模型出现之前)的时期。而我们之所以能把这件事做得这么稳健,一个关键方式基本上是:我们拥有的每一类信息,不管是 Postgres、Redis,还是各种 AWS 服务,都有一个非常好的 in-memory implementation(内存内实现),这样你在 unit test(单元测试)里就可以非常快速地使用它。现在把这个思路延伸到 Cloudland 上,我之前在做一个东西,它有一个相当健壮的 back end(后端),但由于一些比较复杂的原因,在我的 dev server(开发服务器)上很难把它跑起来。不过它又一次 one shot(一次生成)出了一个非常不错的 proxy(替代实现)。
Speaker 141:21 - 41:37
By proxy, mean, like, a substitute for that. And that was so valuable. And over time, it's been interesting as that, like, substitute has evolved as the rest of the code has evolved, which is the thing that, you know, if you had pitched that idea to be before, it'd be like, well, that's gonna be really hard because the upstream's gonna change. How are gonna keep it in sync? And I don't think about that anymore.
Speaker 141:21 - 41:37
我说的 proxy,意思就是它的替代品。而这真的非常有价值。更有意思的是,随着其余代码不断演进,这个 substitute(替代实现)也在跟着演进。你知道,如果以前有人把这个想法提给我,我会觉得,嗯,这会非常难,因为 upstream(上游实现)会变化。你要怎么让它们保持同步?但我现在已经不再这么想了。
Speaker 141:37 - 41:42
I'm like, yeah. Cloud will read the changes, and it'll adapt the thing, and it'll keep the two in sync, that that's that's fine.
Speaker 141:37 - 41:42
我现在的想法是,没问题,Cloud 会去读取这些变更,然后调整那个东西,并让两边保持同步,这完全可以。
Speaker 241:43 - 41:57
There's some really interesting architectures around when you get a bug, it just automatically goes out and closes it. You know, the the agent just gets kicked off, it closes it, and then it sends a message to the customer being like, it's it's fixed. Are you noticing with Fable any change in in how that process works?
Speaker 241:43 - 41:57
还有一些非常有意思的架构:当你收到一个 bug,它会自动出去把它关掉。你知道,agent(代理)会被直接触发,修掉这个问题,然后给客户发一条消息,说它已经修好了。你在 Fable 里有没有注意到这个流程的运作方式发生了什么变化?
Speaker 141:57 - 42:29
Yeah. I think there's a couple of things, a very, like, human to human or human to cloud level, one of the things that I've seen it do I've had better other models than the cable, but I just needed to do it really consistently too, is if the bug report, for example, came from somebody, you you know, mentioning something in our, like, feedback channel in Slack, and then, like, the thing that got fed into the Cloud Code session is like, oh, there's this, and because of the Slack MCP, you can actually pull the thread. Have it then actually post back, you know, as me. It'll be like, hey, this is Mike's Claude. Like, I fixed it.
Speaker 141:57 - 42:29
有,我觉得有几件事。先说一个非常偏 human-to-human(人与人)或者 human-to-cloud(人与 Cloud)层面的例子。我看到它做到过的一件事——虽然我在别的 model(模型)上也见过更好的表现,但我需要的是它也能做得非常稳定——是这样的:比如 bug report(缺陷报告)来自某个人在我们 Slack 里的 feedback channel(反馈频道)中提到的问题,然后被喂进 Cloud Code session(Cloud Code 会话)里的内容是,哦,这里有这么个问题,而且因为有 Slack MCP,你实际上还可以把整个 thread(讨论串)拉出来。接着它就真的能回帖,而且是以我的身份回。它会说,嘿,这是 Mike's Claude,我已经修好了。
Speaker 142:29 - 42:41
Here's the, you know, here's the pull request. But then, I think and the previous Claude's who've done it, but think it does really well, is then say, but hold tight. It's not in production yet. I'll follow-up when it actually is. And then, like, maybe a few hours later, like, oh, like, this deploy went out.
Speaker 142:29 - 42:41
这是那个 pull request(拉取请求)。但我觉得——之前的 Claude 也做过这件事,不过我觉得它现在做得特别好——它还会接着说,不过先别急,这还没有进 production(生产环境),等真正上线了我再跟进通知你。然后,也许几个小时后,它又会再发一句,哦,这次 deploy(部署)已经发出去了。
Speaker 142:41 - 43:01
Like, you should go test it. Is it fixed now? Like, that level of follow through, I think, is is new on on the closing the loop piece, and it's five the fans of these long running Cloud Code sessions are basically, like, interacting as as me, I guess. Let's put some disclaimer in there too. And the second goes back to that, like, taste and discernment piece that we were talking about, which is, like, it's one thing to say, there was a bug report.
Speaker 142:41 - 43:01
你应该去测试一下。现在修好了吗?我觉得,这种跟进到底的程度,在“闭环(closing the loop)”这件事上是新的;而这些长期运行的 Cloud Code session 的粉丝,基本上算是在和“我”互动吧,我猜。这个也得加点免责声明。第二点又回到我们刚才说的那个 taste 和 discernment(判断力、鉴别力)的问题:如果只是说,这里有一份 bug report,这是一回事。
Speaker 143:01 - 43:11
Therefore, I must go fix this thing. And it's another one to say, you know what? Like, this like, the I hit this over the weekend. One of our internal systems basically had been running without restarting for a while. There was a memory leak.
Speaker 143:01 - 43:11
所以我必须去把这个东西修好。这和另一种情况不一样:你会说,知道吗?我周末就碰到了这个。我们有一个内部系统,基本上已经连续运行了一段时间没重启了。结果有个 memory leak(内存泄漏)。
Speaker 143:12 - 43:24
And it was I had good discernment saying, alright, Mike. Like, it's the weekend. Like, just rebounce the server. It's gonna solve it for now. And, like, we'll work on the, like, asynchronously get the PR going to, like, fix this more long term.
Speaker 143:12 - 43:24
当时我做了一个很有判断力的决定:好吧,Mike。现在是周末,先把 server 重启一下就行。这暂时就能解决问题。然后我们再异步推进 PR,把真正的长期修复做起来。
Speaker 143:24 - 43:48
So I think if you're gonna have Claude in the loop in this kind of, like, sort of close the loop bug report or system sort of issue to change, I think you really wanted to understand where, you know, as any good SRE or engineer in the loop would, okay, to solve the problem at hand, let's, like, defer the question of, like, do we need to rearchitect on top of a completely different language? Found and and understanding that balance is really important.
Speaker 143:24 - 43:48
所以我觉得,如果你要让 Claude 参与到这种闭环式的 bug report 或系统问题处理与变更里,它真的需要理解的是:就像任何一个合格的 SRE 或在环的 engineer 一样,为了解决眼前这个问题,我们先暂时搁置另一个问题——比如,我们是不是需要基于一种完全不同的语言来重新做架构?找到并理解这里面的平衡,非常重要。
Speaker 243:48 - 44:11
One of the things that's really exciting, mostly exciting to me about new models is it raises the floor so that everyone can kind of go build apps in one shot. But it also raises the ceiling for experts. So if you're a software engineer or a founder, you can just go do things that you never would have been able to before because you have access to this really powerful model. So for me, I built this one shot version of Borges', Infinite Library. It's like a three d game version of the of the of the library.
Speaker 243:48 - 44:11
对我来说,新 model 最令人兴奋的一点是,它抬高了下限,让每个人都能某种程度上“一次成型”地构建 app;但它也抬高了上限。所以如果你是 software engineer 或 founder,你现在可以直接去做以前根本做不到的事,因为你能用上这个非常强大的 model。比如我自己,就做了一个 Borges 的《Infinite Library》的一次成型版本。它有点像这个图书馆的 3D game 版本。
Speaker 244:11 - 44:18
It's wild. It runs right in the browser. It's so good. I can find, like, any every essay inside of it. I'll send you the link.
Speaker 244:11 - 44:18
很夸张。它直接就在浏览器里跑,而且效果特别好。我甚至能在里面找到几乎任何一篇 essay。我把链接发给你。
Speaker 244:18 - 44:37
It's sick. But I think there's gonna be this flowering of people doing things like, oh, I made a game, or maybe I trained a new model, or or or whatever that they couldn't do they couldn't do before. And I'd love to give people some inspiration, some examples of things that they might be able to do that they might not be thinking to do with this model. What are some ideas that come to you?
Speaker 244:18 - 44:37
特别厉害。但我觉得,接下来会出现这样一波爆发:人们会去做各种事情,比如“我做了个 game”,或者“我训练了一个新 model”,又或者别的什么——这些都是他们以前做不到的。我也很想给大家一些灵感,给一些例子,让他们想到原本可能不会想到、但其实可以用这个 model 去做的事。你会想到哪些点子?
Speaker 144:37 - 44:56
Yeah. I think a few. Maybe I'll start with the fun side and, like, riffing off the game piece. Like, I think people have a lot of, like, creative ideas for how do they express the complexity of what they are, like, their world. Like, everybody has the thing that they know really, really well, and there's probably some level of, like, how do I then explain that to somebody else?
Speaker 144:37 - 44:56
嗯,我想到几个。也许我先从更好玩的方向说起,顺着 game 这个话题展开一下。我觉得,很多人其实都有很多很有创造力的想法,想表达“他们自身的复杂性”,或者说,他们所处世界的复杂性。每个人都有自己非常非常熟悉的一件事,而通常接下来就会有一个问题:那我该怎么把它讲给别人听?
Speaker 144:57 - 45:36
Or how do I apply techniques elsewhere that I could then go go off and do? My wife is studying, like, environmental engineering, like, studying geothermal, like, very complex math and simulations. And I've seen, like, as the models have gotten better, she has been able to apply even more complex techniques from even outside of that domain into that work. And I think what people should be able to do, you know, like, full on PyTorch end to end simulations of that work in a way that wouldn't be possible. Think that maybe just one is, bring the, like, beautiful complexity of what you have and either show it to other people by, like, maybe making a game or maybe making a visualization, which I've seen her do as well, or at least, like, make you know, bring other techniques to bear.
Speaker 144:57 - 45:36
或者说,我怎样把别处的技术用到这里,然后真正自己去做出来?我妻子在学 environmental engineering,比如研究 geothermal,这里面有非常复杂的数学和 simulation。我也看到,随着 model 变得更强,她已经能够把更多、甚至来自那个领域之外的复杂技术,应用到这项工作里。我觉得人们应该能做到的,是比如把这整套工作用 PyTorch 做成完整的端到端 simulation,以一种原本做不到的方式来实现。我想其中一个方向也许就是,把你手头那种优美的复杂性带出来,要么通过做一个 game,或者做一个 visualization,把它展示给别人——我也见过她这么做——要么至少能把其他技术真正用进来。
Speaker 145:36 - 46:04
And the second piece is its ability to compose software that like solves a really unique problem to you. I've seen that internally. A lot of the work that we've been doing is how do we get as many of our internal systems like MCPified with the right permissioning structure and the right deployment kind of setup. Although externally, you have good options around some of these, like, platform as a service pieces, and you can just ask Cloud about them, and they'll, like, help you set things up. But, like, I love that feeling of, like, that thing that you always wished that you had.
Speaker 145:36 - 46:04
第二点,是它能够组合出软件,去解决对你来说非常独特的问题。我在内部已经看到这一点。我们最近做的很多工作,都是在想怎么让尽可能多的内部系统变得 MCPified,同时配上正确的 permissioning structure 和合适的 deployment setup。至于外部,其实在某些方面你也有不错的选择,比如一些 platform as a service 组件,你直接去问 Cloud,他们也会帮你把这些东西搭起来。但我很喜欢那种感觉:就是那个你一直希望自己拥有的东西,现在终于能做出来了。
Speaker 146:04 - 46:42
And then what has blown my mind, there was a a person who works in our go to market organization, has been, like, building this, like, really, like, for deep deeply thought integration of Quad into every part of her whole process. And you don't have to stop at that one shot. Like, she's been working on it for months now, and she can keep going. And, like, I think one of the things that is maybe underappreciated about the models is I think in previous generations, they would eventually get to a complexity level where it was hard to iterate on it without feeling like you then would break the thing that they had, you know, like, under or over abstracted. Whereas this is actually, you know, she's had access to from the Fable or Fable like for a couple months, and, like, you've just seen it keep growing and growing and growing and growing.
Speaker 146:04 - 46:42
还有一件让我非常震撼的事:我们 go to market organization 里有一位同事,一直在构建一种对 Quad 的整合,而且是那种经过非常深入思考的整合,几乎把 Quad 放进了她整个流程的每一个环节里。而且你不需要停留在一次性的结果上。她已经持续做了几个月了,而且还可以继续往前推进。我觉得 model 一个可能被低估的地方在于,前几代产品到了某个复杂度之后,你会很难继续迭代,因为你总会担心自己把它弄坏——比如它原本的抽象做得不够,或者抽象过头了。但这次其实不是这样。她已经用了 Fable,或者说类似 Fable 的东西,有几个月了,而你看到的是,它一直在长、一直在长、一直在长。
Speaker 146:42 - 46:55
And now she's, like, deploying it to the whole GTM org and, I think that is really cool. Like, the the ceiling of complexity that a a person that does not start out as technical can now build for solving problems within their domain is, like, is impressive at it.
Speaker 146:42 - 46:55
现在她已经把它部署到整个 GTM org,我觉得这真的很酷。一个起点并非 technical 的人,现在居然能构建出这种复杂度上限很高的东西,来解决自己领域里的问题,这一点非常令人印象深刻。
Speaker 246:55 - 47:14
I agree. It it it writes great code. Like, my my benchmark that I have is called the senior engineer benchmark. I just have it see if it can rewrite a code base from, from first principles. And the nearest model, the the, like, previous top was, like, a 62 or 63 out of a 100, and this model got a 90 on the benchmark or 91, which is human senior engineer level.
Speaker 246:55 - 47:14
我同意。它写代码真的很强。我自己有一个 benchmark,叫 senior engineer benchmark。我就是看它能不能从 first principles 出发,重写一个 code base。最接近的 model,也就是之前的最佳水平,大概是 100 分里的 62 或 63 分,而这个 model 在这个 benchmark 上拿到了 90 分或者 91 分,这已经是人类资深工程师的水平了。
Speaker 247:14 - 47:22
Like, you can just keep going with this thing in a way that's it's it's really fantastic. I'm curious, though, one other thing that's really powerful that you mentioned is dynamic workflows. Tell us about that.
Speaker 247:14 - 47:22
你真的可以一直和这个东西持续做下去,这点特别棒。不过我也很好奇,你刚才提到的另一个非常强大的东西是 dynamic workflows。给我们讲讲那个吧。
Speaker 147:22 - 47:40
This is you know, we'll build things internally sometimes, and I will go really aggressively bug the engineer who built it and be like, when are we shipping this publicly? Because I think people are gonna really like it. I realize there's a good reason why it was, like, built internally. But, like, we try to ship as many of these as possible. And Dynamic Workflows was, like, definitely that to me.
Speaker 147:22 - 47:40
有时候我们会在内部做一些东西,然后我就会特别积极地去烦做出它的 engineer,问他们:这个什么时候公开发布?因为我觉得大家会很喜欢。我也明白,它最初做成内部工具肯定是有充分理由的。但我们确实会尽可能把这类东西发布出去。而 Dynamic Workflows 对我来说就绝对属于这一类。
Speaker 147:40 - 48:07
I the person who built this is engineering, Sid, who's awesome. And I was like, Sid, like, I want to get this out to the world because it's so good. But I think it's especially good with a model like Fable for two really big reasons. One, it helps sort of create the scaffold for, like, deep, meaningful work. The craziest dynamic workflow I I did and used Fable for was I had an internal project that we had written in Python, but we needed it actually in TypeScript for, like, a really specific deployment reason.
Speaker 147:40 - 48:07
做出这个的人是工程师 Sid,他特别厉害。我当时就跟 Sid 说,我想把这个带给全世界,因为它真的太好了。但我觉得它和像 Fable 这样的 model 搭配时尤其出色,原因有两个,而且都很重要。第一,它能帮助你为那种深入而有意义的工作搭起脚手架。我做过、并且用 Fable 配合过的最疯狂的 dynamic workflow,是这样一个场景:我们有一个内部项目,原本是用 Python 写的,但由于一个非常具体的 deployment 原因,我们实际上需要把它改成 TypeScript。
Speaker 148:07 - 48:28
And having been internal to Instagram, we were like, should we rewrite the whole thing into Hacker and, you know, port it to the PHP engine that Facebook? I like, you never would have done that. Like, maybe they can now with with the model. But, you know, at the time, it seemed impossible. But here, I had, you know, pretty complex code base, and I was like, I'm just gonna set up a dynamic workflow and just let it run over the weekend.
Speaker 148:07 - 48:28
由于我们当时是在 Instagram 内部做这件事,我们就会想,要不要把整套东西都重写成 Hacker,然后,你知道的,再把它迁移到 Facebook 的 PHP engine 上?我当时心想,你以前根本不可能这么做。也许他们现在借助 model(模型)可以做到了。但你知道,在当时这看起来是不可能的。不过这次我手上有一个相当复杂的 code base(代码库),我就想,我直接搭一个 dynamic workflow(动态工作流),然后让它整个周末自己跑。
Speaker 148:28 - 48:36
And it did. And it it was the workflow was so cool. It was like, alright. I'm gonna do, like, deep understanding of the work. I'm gonna create sort of, like, a almost like a spec of how everything works.
Speaker 148:28 - 48:36
结果它真的做到了。而且这个 workflow(工作流)特别酷。它像是在说,好,我先对这项工作做深入理解。我要先整理出一个有点像 spec(规格说明)的东西,来描述整个系统是怎么运作的。
Speaker 148:36 - 48:43
I'm gonna go module by module. I'm gonna translate these pieces. I'm gonna test it incrementally. I'm gonna do another adversarial test. I'm gonna go check for anything that I missed.
Speaker 148:36 - 48:43
然后我会一个 module(模块)一个 module 地推进。我会翻译这些部分。我会做增量测试。我会再做一次 adversarial test(对抗性测试)。我还会去检查有没有遗漏的东西。
Speaker 148:43 - 49:04
And it was this, like, really cool, like, series of steps that the workflow was able to to orchestrate. And I came back, and I was like, yeah. This thing is a, like, TypeScript and BUN port of that thing, and it's actually better in these ways. And it was very, you know, sort of documented. Like, these are the things I couldn't port, but most of these were, like, very specific to this specific implementation.
Speaker 148:43 - 49:04
整个过程就是这样一连串非常酷的步骤,而这个 workflow(工作流)能够把它们编排起来。我回来一看,就觉得,没错,这东西已经被迁移成那个项目的 TypeScript 和 BUN port(移植版本)了,而且它在这些方面实际上还更好。并且整个过程也记录得很清楚,比如哪些东西我没法 port(移植),但这里面大多数问题,其实都非常特定于那个具体实现。
Speaker 149:04 - 49:46
It wasn't worth porting. And I do not think you could have done that, a, with previous models at that level of of success, and b, with without, like, the kind of scaffolding that workflows provide. So I think that is extremely exciting kind of kind of combination of model capabilities and then our own ability to, like, orchestrate them over longer and longer time horizon with that feeling of, like, you you had a goal, you broke it down effectively, and and then you were able to work make make it work. The other piece is, I think over time, we'll be able to also make some of those subtasks sort of tune to the have the model be tuned to the level of complexity of it. So you can imagine that some parts of the dynamic workflow don't need extra high thinking.
Speaker 149:04 - 49:46
那些东西不值得移植。而且我不认为这件事能做到:a,用以前的 model(模型)很难达到这样的成功水平;b,如果没有 workflow(工作流)提供的那种 scaffolding(脚手架式支撑),也做不到。所以我觉得,这种组合特别令人兴奋:一方面是 model capability(模型能力),另一方面是我们自己越来越能把它们编排到更长的时间跨度里,并且带来那种感觉——你有一个目标,你把它有效地拆解了,然后你就真的能把它做成。另一个方面是,我觉得随着时间推移,我们也能让其中一些子任务进一步调优,让 model(模型)针对任务复杂度来匹配。你可以想象,dynamic workflow(动态工作流)里的某些部分其实不需要特别高强度的思考。
Speaker 149:46 - 49:57
They could use, you know, a medium thinking to get it done or even a smaller model. And I think that's really the the future of where these things are going. So, yeah, I I'm a huge workflows DA here.
Speaker 149:46 - 49:57
它们可能用中等强度的思考就能完成,甚至用更小的 model(模型)也行。我觉得这才是真正的未来,也是这类东西的发展方向。所以,没错,我绝对是 workflows(工作流)的超级拥护者。
Speaker 249:57 - 50:04
For people who haven't used it before, tell me about how you got that workflow made. How did you design it? How did you make sure it was good?
Speaker 249:57 - 50:04
对于那些之前没用过这个的人来说,跟我讲讲你是怎么把这个 workflow(工作流)做出来的吧。你是怎么设计它的?又是怎么确保它足够好的?
Speaker 150:04 - 50:22
Yeah. It was pretty iterative, but sort of just started with Cloud Code, like, hey, I'm I have this complex, you know, kind of task, like, let's design a workflow to go and do it. It kind of showed me the the plan. I was like, oh, this is, like, close to what I want. I wanna make sure that you do these three or four levels of of, like, additional verification for missed features.
Speaker 150:04 - 50:22
对,整个过程其实是相当 iterative(迭代式)的,不过一开始基本上就是从 Cloud Code 入手:比如说,嘿,我现在有一个复杂任务,那我们来设计一个 workflow(工作流)去完成它。它先给我展示了一个计划。我当时觉得,哦,这已经很接近我想要的了。但我还想确保你能再加入三到四层额外的 verification(验证),用来检查是否遗漏了某些功能。
Speaker 150:22 - 50:40
It's like, here's what you have. Are you ready to go? And it expressed in the workflows and code, which I think is really valuable to kinda see what it was about to do. And then what was interesting is it did the full port, and then I had, like, a couple of, like, follow-up kind of questions that I had or, like, little tweaks. And I did those as sort of, like, mini workflows that built off the the previous one as well.
Speaker 150:22 - 50:40
这有点像是在说:你现在手头有这些东西。你准备好开始了吗?而且它把这些内容体现在 workflows 和 code 里,我觉得这一点特别有价值,因为你能大致看出它接下来要做什么。然后有意思的是,它完成了整个 port,之后我又提了几个后续问题,或者说做了一些小 tweak。我把这些也做成了某种 mini workflows,并且也是建立在前一个 workflow 的基础之上。
Speaker 150:40 - 51:05
But I think that's, like you know, we we talked a little bit about whether chat was the was the right interface. So we've had that conversation over the last year, and I think workflows are good middle ground of you can compose them using chat, but they're expressed using code. And then they're executed with, like, I think, a nice clean UI around what's happening at every stage. And, like, I think we'll start bridging longer horizon work with chat in ways like that over time.
Speaker 150:40 - 51:05
但我觉得这也正是——就像你知道的——我们之前稍微讨论过,chat 是否是合适的 interface。过去一年里我们一直在聊这个问题,而我认为 workflows 是一个很好的折中:你可以用 chat 来组合它们,但它们是用 code 来表达的。然后它们会被执行,同时在每个阶段发生了什么这件事上,还配有一个我认为很不错、很干净的 UI。并且,我觉得随着时间推移,我们会开始用这样的方式,把更长周期的工作和 chat 连接起来。
Speaker 251:05 - 51:10
Mike, this is such a great conversation. Thank you so much for joining and telling us all about this new model.
Speaker 251:05 - 51:10
Mike,这真是一场很棒的对话。非常感谢你加入我们,也感谢你为我们详细介绍这个新 model。
Speaker 151:10 - 51:31
I'm really excited to to get to spend time with you and really, really looking forward to what people, think outside too. Oh my gosh, folks. You absolutely, positively have to smash that like button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness.
Speaker 151:10 - 51:31
我真的很高兴能和你共度这段时间,也真的、真的非常期待大家接下来会怎么看。天哪,各位,你们绝对、百分之百得猛戳那个 like 按钮,然后订阅 AI and I。为什么?因为这个节目简直就是“超棒”这件事的极致体现。
Speaker 151:31 - 51:55
It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor.
Speaker 151:31 - 51:55
这就像是在自家后院发现了一个宝箱,只不过里面装的不是黄金,而是满满纯粹、毫无掺杂的关于 chat GPT 的知识炸弹。每一集都是一趟情绪、洞见和欢笑交织的过山车之旅,会让你全程坐在座位边缘,意犹未尽、还想继续看下去。它不只是一个节目,而是一场通往未来的旅程,而 Dan Shipper 就是那艘宇宙飞船的船长。所以,帮自己一个忙吧。
Speaker 151:55 - 52:05
Hit like, smash subscribe, and strap in for the ride of your life. And now without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love with you.
Speaker 151:55 - 52:05
点赞,猛击订阅,然后系好安全带,准备开启你人生中最精彩的一段旅程。现在闲话少说,我只想说一句,Dan,我彻彻底底、无可救药地爱上你了。
原文 ↗https://www.youtube.com/watch?v=XWpTgCvgYaE
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers