BuildSpeak每日 builder 文摘
今日归档生词本关于
🎙 播客AI & I by Every· 2026 年 5 月 8 日· 9,520 词 · 约 48 分钟

The Secrets of Claude's Platform From the Team Who Built It

SPACE 播放 / 暂停·←→ 上一句 / 下一句
Speaker 100:00 - 00:03
A year from now, where do you think the platform will be?
Speaker 100:00 - 00:03
一年后的这个平台,你觉得会发展到什么程度?
Speaker 200:03 - 00:18
We wanna experiment with directions where Cloud actually gets so good at understanding itself. It figures out what model you should be using. It figures out how to spin up all the sub agents. You don't have to think so much about what kind architectures are there because Claude is actually able to understand itself enough that it can write itself on the fly.
Speaker 200:03 - 00:18
我们想要尝试一些方向,让 Claude 实际上能够非常擅长理解它自己。它会判断你应该用什么 model(模型),会判断如何启动所有 sub agents(子 agent)。你不用再花那么多心思去想有哪些架构,因为 Claude 实际上已经足够理解自己,能够在运行过程中即时重写自己。
Speaker 300:18 - 00:26
In that world, if Claude is on the fly, your agents on the fly are becoming what they need to become in order for you to do what you're trying to do, the platform has to seriously scale.
Speaker 300:18 - 00:26
在那样的世界里,如果 Claude 能够即时调整,你的 agents(agent)也会即时变成它们为帮助你完成目标所需要成为的样子,那么这个平台就必须进行非常严肃的大规模扩展。
Speaker 100:26 - 00:47
How close are we to Club Making A Billion Dollars? Is that's really what I'm asking. Angela, Caitlin, welcome to the show.
Speaker 100:26 - 00:47
我真正想问的是:我们离 Club making a billion dollars 还有多近?Angela,Caitlin,欢迎来到节目。
Speaker 300:47 - 00:48
Thanks for having us. Yeah. Thank you.
Speaker 300:47 - 00:48
谢谢邀请。对,谢谢。
Speaker 100:48 - 01:07
So for people who don't know, you both work on the platform at Anthropic. So, Angela, you're the head of product for the cloud platform, and, Caitlin, you are the head of engineering for the cloud platform. I'm I'm really psyched to talk to you because, a, you've been launching a bunch of stuff. You have cloud managed agents that came out recently. You've been launching new features for it.
Speaker 100:48 - 01:07
对于不了解情况的人来说,你们两位都在 Anthropic 负责平台相关工作。所以,Angela,你是 Claude platform 的产品负责人;Caitlin,你是 Claude platform 的工程负责人。我真的很兴奋能和你们聊,因为,首先,你们最近推出了很多东西。你们最近上线了 Claude Managed Agents,也一直在为它发布新功能。
Speaker 101:08 - 01:37
And I think that it it comes at this really interesting time where it makes me think about what actually is a platform in AI for a model company. Because in the GPT-three days, the platform was a completion endpoint. You just, like, send a prompt to get a response. After that, it was, like, a completion endpoint with tool calling and a couple and, like, chat sessions, like that kind of stuff. And now, like, with Cloud Managed Agents, you're essentially getting a cloud on a computer with memory and all this other stuff.
Speaker 101:08 - 01:37
而我觉得,这恰好处在一个特别有意思的时间点,让我开始思考:对于一家 model company(模型公司)来说,AI 里的 platform(平台)到底是什么。因为在 GPT-3 时代,平台就是一个 completion endpoint(补全端点):你基本上就是发一个 prompt(提示词),然后得到一个 response(响应)。再往后,它变成了带有 tool calling(工具调用)、再加上一些 chat sessions(聊天会话)之类功能的 completion endpoint。再到现在,有了 Claude Managed Agents,你本质上得到的是一个运行在计算机上的 Claude,带有 memory(记忆)以及所有这些其他能力。
Speaker 101:38 - 01:47
So I'm just trying to I'd love to help I'd love for you to help me unpack that trajectory and, like, what it means to build a platform in AI.
Speaker 101:38 - 01:47
所以我其实是想——我很希望你们能帮我梳理一下这条演进轨迹,以及在 AI 里构建一个平台到底意味着什么。
Speaker 201:47 - 02:06
Yeah. I think, like, you know, your your characterization is, like, very accurate. I think like as we've kind of like, as a lot of these kind of like technologies have evolved with the LM like for starting, and then I think like putting that behind an API was very fun. A lot of people were like, wow, I could like do some, at the time I think it was very cool. Now we'll probably look back at it and be like, oh, that was like really basic.
Speaker 201:47 - 02:06
对,我觉得,怎么说呢,你的概括非常准确。我觉得,随着很多这类技术从最初的 LM 逐步演进,把它封装到 API 之后其实是很有意思的一步。很多人当时都会觉得,哇,我可以用它做点什么了;在那个时候我觉得这已经很酷了。现在我们回头看,可能会觉得,哦,那其实还挺基础的。
Speaker 202:08 - 02:53
And then, you know, I think, like, we've moved more and more towards, like, a slightly more, like, stateful world as you kind of, like, want to persist the kind of, like, sessions state, to be able to make sure that the kind of performance of the model is, like, better and better. I think that that's probably, like, actually the the through line. Like, as a lot of these kind of, like, as we make improvements to Claude and as it continues to get better and, like, more autonomous, we find ourselves, like, basically needing to kind of, like, evolve the platform to be sort of, like, higher and higher order abstraction, but it's in the pursuit of like helping you get the best outcomes out of something. Like, I think in the very beginning, you know, we were very like, everyone was very exploratory. It's like, have no idea what people are going to build with these LLMs and you wanted to kind of have as much possibility out there as as available.
Speaker 202:08 - 02:53
然后,你知道,我觉得我们越来越多地走向了一个稍微更 stateful(有状态)的世界,因为你会希望持久化这类 session state(会话状态),从而确保模型的表现越来越好。我觉得这其实大概就是贯穿始终的一条主线。随着我们不断改进 Claude,让它持续变得更强、也更 autonomous(自主),我们会发现自己基本上需要把平台演进成更高阶的 abstraction(抽象)。但这么做的目的,其实都是为了帮助你从这个系统里拿到最好的结果。我觉得在最开始的时候,大家都非常 exploratory(探索性)。就像是,谁也不知道人们会用这些 LLMs 构建出什么,所以你会希望尽可能把各种可能性都开放出来、让它们可用。
Speaker 202:53 - 03:09
And then as those use cases started to kind of narrow down, like people started building products with it, people started now, like, building agents with it. And more and more of that is about, you know, like customers coming to us and being like, how do I get the best out of Claude? How do I, like, set up my tools? How do I run the loop? And so on and so forth.
Speaker 202:53 - 03:09
然后,随着这些 use case(用例)开始逐渐收敛,人们开始用它做产品,现在也开始用它来构建 agent。越来越多的事情都变成了:客户来问我们,怎样才能把 Claude 的能力发挥到最好?我该如何设置我的 tools?我该如何运行这个 loop?以及诸如此类的问题。
Speaker 203:09 - 03:45
And you have some people who are like really, really experimenting and they're on the edges and that's great. And then you have like just a whole host of other folks that are coming in who are like, I kind of want a lot of this stuff like out of the box. And in our pursuit for making sure that like Claude is basically producing the best outcomes, we find ourselves, like, enriching the platform to be richer and richer and richer. And that's, you know, contained in that is, like, both the state, it's, like, the tools that you start to see us adding, and contains a lot of kind of, like, almost sort of the cloud components of a lot of these types of things. But it's in pursuit of the same mission of like just making things literally as easy as possible.
Speaker 203:09 - 03:45
有些人确实非常非常爱做实验,他们在边界地带探索,这很好。然后还有一大批其他人进来时会想,我其实希望这里面很多东西都是 out of the box(开箱即用)的。为了确保 Claude 基本上能够产出最好的结果,我们发现自己会不断把这个平台做得越来越丰富、越来越丰富、越来越丰富。而这其中包含的,既有 state(状态),也有你开始看到我们加入的那些 tools,还包括很多这类东西里近似 cloud components(云组件)的部分。但这一切追求的其实还是同一个使命:就是让事情真的尽可能简单。
Speaker 203:45 - 04:09
And I think in probably, you know, the forward state of a lot of these things in terms of maybe the philosophy of what a platform ultimately ends up doing, it probably ends up just being, like, whatever it it's like the set of primitives and infrastructure that enables you to basically get the outcome as fast as possible, with actually as little of work as possible. And I think that that tends to follow a certain form factor, at least in this current state. Yeah. But yeah.
Speaker 203:45 - 04:09
我觉得,可能从更往前看的状态来说,就这类事情的平台哲学而言,一个平台最终会做的,大概就是:它提供一组 primitives(基础原语)和 infrastructure(基础设施),让你能够以尽可能快的速度、尽可能少的工作量,拿到你想要的结果。我觉得,至少在当前这个阶段,这通常会呈现出某种特定的 form factor(形态)。对,不过,是这样。
Speaker 104:09 - 04:15
How would you characterize, like, what the primitives are today? So maybe that's just asking. What are the primitives in Cloud Managed Agents?
Speaker 104:09 - 04:15
那你会如何描述今天这些 primitives 是什么?或者也可以直接这么问:Cloud Managed Agents 里的 primitives 是什么?
Speaker 304:16 - 04:42
Yeah. So Cloud Managed Agents is built on all of our same primitives that you could otherwise build on directly, so the messages API. And within the messages API, we've built a whole bunch of, I guess, maybe innovations around the API. Like, you could just get tokens in and out if you really wanted to, but, can use some of our built in tools. You use stuff like code execution, spawn a sandbox and execute work.
Speaker 304:16 - 04:42
对。Cloud Managed Agents 是构建在我们同一套 primitives 之上的;如果你愿意,本来也可以直接基于它们来开发,比如 messages API。而在 messages API 里面,我们围绕 API 又做了很多我想可以称作 innovations(创新)的东西。比如,如果你真的只想做最基础的事,你当然可以只是让 tokens(词元)进出;但你也可以使用我们的一些 built-in tools(内置工具)。你可以用 code execution(代码执行)之类的功能,拉起一个 sandbox(沙箱)并执行工作。
Speaker 304:43 - 05:00
You can use, I guess, like, you know, web search and all these sorts of different things. And so I think we've taken what we see as all the most powerful of those things and put them together into a harness and a set of infrastructure that is, you know, just the way to get what we think is the best outcomes out of Claude.
Speaker 304:43 - 05:00
你还可以使用,比如说,web search(网页搜索)以及各种不同的这类能力。所以我觉得,我们是把其中那些我们认为最强大的能力提炼出来,整合进一个 harness(执行框架)和一套 infrastructure 里;而这套东西,就是我们认为能从 Claude 身上拿到最佳结果的方式。
Speaker 105:01 - 05:35
So I'm sitting here feeling this sense of I've been thinking of it as, like, time deflation. Like, my time gets more valuable in the future as opposed to, the opposite, whatever whatever the the opposite would be, my time gets less valuable in the future. And and and the reason is because we're so for example, internally for us, we're building an agent. We're building some agent products where it's like agents that do specific things for us internally and then hopefully for customers. And in order to do that, we've, like you know, we have a couple Mac minis with, you know, Claude running in a loop on the Mac mini.
Speaker 105:01 - 05:35
所以我现在坐在这里,有一种我一直称之为“时间通缩”的感觉。也就是说,相比之下,在未来我的时间会变得更有价值,而不是相反——不管相反该怎么说——不是说未来我的时间会变得更不值钱。原因在于,我们现在非常……比如说,就我们内部而言,我们正在构建一个 agent(智能体)。我们在做一些 agent 产品,也就是那种先在内部替我们完成特定任务、之后希望也能给客户使用的 agent。为了做到这一点,我们现在就是,比如你知道的,我们有几台 Mac minis,在上面让 Claude 在循环里运行。
Speaker 105:35 - 06:00
Right? And a lot of that and it's like a thousand line Python file or whatever. And a lot of that mirrors what you guys are building in Claude managed agents. And so for for for me and I think for a lot of people building on Cloud or on the Cloud platform or ecosystem, there's a there's at least I feel this. Maybe we should just wait for you guys to build it, but then I don't know what the lines are.
Speaker 105:35 - 06:00
对吧?其中很多东西就是一个大概一千行的 Python 文件之类的。而且这其中很多内容,和你们正在做的 Claude managed agents 很像。所以对我来说——我觉得对很多基于 Cloud 或者 Cloud 平台、生态来构建的人来说也是这样——至少我自己会有这种感觉:也许我们应该干脆等你们把它做出来,但我又不知道边界到底在哪里。
Speaker 106:00 - 06:11
And, and I yeah. I'm sort of wondering if if I wanna build an agent, like, what is the best path to do that in a way that aligns with what you guys are doing?
Speaker 106:00 - 06:11
所以,我确实有点想知道,如果我想构建一个 agent,怎样做才是最佳路径,才能和你们正在做的方向保持一致?
Speaker 206:12 - 06:32
Yeah. I think, you know, this this part of the the kind of platform business is actually somewhat similar to any other form of the platform business where you do have customers like yourself who are building and, you know, you're kind of thinking, should I go ahead and do it because maybe I have this, like, immediate need, but at the same time, don't kinda wanna, like, you know, repeat the work per se, and you could've just when you could've just gotten it for free Yeah. Out of the platform.
Speaker 206:12 - 06:32
对,我觉得你知道,这一部分的平台业务,其实和其他任何形式的平台业务都多少有些类似:确实会有像你这样的客户在构建东西,而你会想,我是不是应该直接自己做,因为我眼下确实有这个即时需求;但与此同时,你也不太想把这些工作重复做一遍,因为本来你其实可能直接就能免费从这个平台里拿到。对。
Speaker 106:33 - 06:40
And also infrastructure sucks. It does. Yeah. It sucks so much to, like, spin up servers. I can't believe you do that all the time.
Speaker 106:33 - 06:40
而且基础设施真的很糟。确实。对。像启动服务器这种事真是烦透了。我真不敢相信你们一直都在做这个。
Speaker 106:40 - 06:40
That's
Speaker 106:40 - 06:40
那就是
Speaker 306:42 - 07:17
like have to be a big one. Part everyone's like, that's so nice. But I I will actually say, part of why we ended up building Cloud Managed Agents was because Anthropic ourselves had gone through enough of these iterations where we built products that were agents that you could run autonomously in the cloud. And we did that stand up the infrastructure so that it works well sort of work enough times that we ourselves were like, okay, we're done building this for ourselves. We're we're doing it once in a way that's gonna really work from everything that we've learned, but also for all the people who are doing it.
Speaker 306:42 - 07:17
很大的一部分。大家都会说,哇,这真不错。不过我也确实想说,我们后来之所以会去做 Cloud Managed Agents,部分原因就是 Anthropic 自己已经经历了足够多轮这样的迭代:我们构建过一些产品,它们本质上就是可以在 cloud(云端)自主运行的 agents。我们已经把那套基础设施搭起来、让它能够良好运作,这样的事情做过足够多次了,所以连我们自己都觉得,好,别再只是为自己重复造这个了。我们要一次性把它按真正可行的方式做好,既基于我们学到的一切,也为了所有正在做这件事的人。
Speaker 307:17 - 07:32
Like, you can run whatever you're running on a couple of Mac minis maybe. Right? And for a lot of people, that could work. But I think if you're building agents into your product and you're running something really at scale, right, like that's where it really starts to become more and more challenging to get that infrastructure right.
Speaker 307:17 - 07:32
比如,你也许可以把你现在跑的那些东西放在几台 Mac minis 上运行,对吧?对很多人来说,那可能就够用了。但我认为,如果你是在把 agents 集成进自己的产品里,并且你运行的东西是真正有规模的,那么基础设施这件事就会变得越来越有挑战,想把它做对也会越来越难。
Speaker 107:32 - 07:33
That's really interesting.
Speaker 107:32 - 07:33
那真的很有意思。
Speaker 207:33 - 07:56
Yeah. And then maybe to answer the other part of your question, I think we have two pieces of the philosophy here. One is is a bit in the way that we kind of design managed agents, which is that we try to have it be modular enough. Like, we wanna be opinionated about some pieces that we feel like should be, you know, very well, like, married to the Claude model. But then we, like, oftentimes, like, the way we want for example, we want Claude to, like, very specifically use, like, file systems.
Speaker 207:33 - 07:56
对。然后也许为了回答你问题的另一部分,我觉得这里有两层理念。其一有点体现在我们设计 managed agents 的方式上,也就是我们会尽量把它做得足够模块化。比如说,在某些部分上我们会比较有明确倾向,因为我们觉得那些部分就应该,你知道,非常紧密地和 Claude model 结合在一起。但与此同时,很多时候,比如我们希望 Claude 能非常明确地去使用 file systems(文件系统)。
Speaker 207:56 - 07:58
That's, like, a very particular, like, Claude
Speaker 207:56 - 07:58
这算是一个非常特定、很 Claude 风格的——
Speaker 107:58 - 08:00
In a specific way or just file systems in general?
Speaker 107:58 - 08:00
你是说以某种特定方式使用,还是只是泛指 file systems?
Speaker 208:00 - 08:13
Just file systems in general. We will we also really wanna lean into skills. I know, like, a lot folks like skills, but, like, that's something that we, like, we want to have our hearts be really opinionated about that. And so we're kinda particular about, like, those kind of primitives being the case. Like, use the file systems, use the skills.
Speaker 208:00 - 08:13
只是 file systems 本身。我们也真的很想强调 skills。我知道很多人都喜欢 skills,但这确实是我们在核心上非常有明确倾向的一点。所以我们对这类 primitives(基础原语)会比较坚持。比如,用 file systems,用 skills。
Speaker 208:13 - 08:35
They're really basic. But at the same time, like, we still find people who are, like, still trying other methodologies to go do that. And we wanna kind of, like, help you, you know, when you build to start, just kind of starting the best foot. So that's one piece, on some of the kind of more opinionated ones. But as each one of these kind of, like, you know, endpoints or or, APIs that we have as part of the suite, we try to, like, open them up a little bit, in certain areas.
Speaker 208:13 - 08:35
它们其实都非常基础。但与此同时,我们仍然会看到有人在尝试别的方法来做这些事。我们想做的是,帮助你在开始构建时就尽量站在一个更好的起点上。所以这是一部分,也就是那些更“有明确倾向”的部分。不过,对于我们这整套产品中的每一个 endpoint 或 API,我们也会尽量在某些方面把它们开放一些。
Speaker 208:35 - 09:11
So there's, things that, you know, we're looking, kind of forward to and being, like, you know, from maybe it's not available today, but in our design, we are trying to make it flexible enough for people to kind of like add in different pieces because we recognize that this API or suite of APIs is not necessarily going to solve like maybe everything in its original construct, and there are gonna be pieces that need to kind of open up. And then the second bit is like, you know, we're we're kind of public about this is like when we do design a lot of these things, we do put out, like, blog posts and sort of, like, reference implementations. So if you did want to kind of at least be inspired by that construct Yeah. But still maybe make your own on the messages API, you can definitely do that.
Speaker 208:35 - 09:11
所以,有些东西是我们正在往前看的方向。也就是说,也许今天还没提供,但在设计上,我们会尽量让它足够灵活,让大家能够加入不同的部件,因为我们也意识到,这个 API 或这组 APIs 未必能在它最初的构造下解决所有问题,总会有一些部分需要开放出来。然后第二点是,我们对此其实也讲得比较公开:当我们设计很多这类东西时,我们也会发布 blog posts,以及某种 reference implementations(参考实现)。所以如果你至少想从这种构造中获得一些启发——对——但又想基于 messages API 自己做一套,那当然也是可以的。
Speaker 109:11 - 09:55
I think that's to the to the point you just made, that's something that's that's coming up for us. Again, we have, you know, Clouds running on a Mac Mini with a Python file and a couple other, like, you know, bigger, more serious implementations on, like, you know, cloud infrastructure that we're trying to figure out what to do with. And I think I I told the team that we're that we were talking today, and I think one of the, one of the questions that they have or or one of the feelings of consternation that they have considering using cloud managed agents for this kind of thing for spinning up agents for our customers is just right now, it's a like, we have a playground. We have we just have, like, a little we have a server or a Mac Mini. We can just, like, pipe stuff to to Cloud.
Speaker 109:11 - 09:55
我觉得这也呼应了你刚才提到的点,这件事现在也正在我们这边出现。再说一次,我们现在有一些 Clouds 跑在一台 Mac Mini 上,配一个 Python 文件;另外还有几个更大、更严肃一些、跑在 cloud infrastructure(云基础设施)上的实现,我们正在思考该怎么处理它们。我想我今天也跟团队提到过,我们今天会聊这个。我觉得他们其中一个问题,或者说一种顾虑,是在考虑把 cloud managed agents 用在这类场景、为我们的客户 spin up agents(启动 agents)时,目前的状态更像是——我们有一个 playground。我们只是有个小环境,有一台 server 或一台 Mac Mini,然后就可以把东西直接 pipe 给 Cloud。
Speaker 109:55 - 10:06
It can do anything that Cloud Code can do. It has a file system. It has a browser. It has, like, all stuff. If we wanna, you know, switch it switch it out to GPT 5.5 or Gemini or whatever, it's, like, pretty easy to to do that.
Speaker 109:55 - 10:06
它能做 Cloud Code 能做的任何事。它有文件系统,也有浏览器,基本上各种东西都有。要是我们想把它切换成 GPT 5.5、Gemini 或别的什么,做起来也相当容易。
Speaker 110:08 - 10:37
So is that kind of and I feel like they they they feel like they're we're gonna get if we use a Cloud managed agent, we're gonna get in, and it's not gonna we're not gonna have the flexibility to do all the stuff that we want. And it it there there's also a worry that features are going to come to Cloud Code itself that won't be in Cloud Managed Agent for a little while and that it'll prevent us from being at the edge, which is sort of what we promise to our customers and really to ourselves. Like, we just love being like, just doing whatever the new thing is. How do you think about that?
Speaker 110:08 - 10:37
所以是不是会有这样一种感觉:他们会觉得,如果我们用 Cloud managed agent(云托管 agent),我们就会被套进去,而且就没法灵活地做我们想做的所有事情了。另外大家也会担心,某些功能会先出现在 Cloud Code 本身里,而 Cloud Managed Agent 可能要过一段时间才会有,这样就会让我们没法始终站在前沿——而这恰恰是我们对客户、其实也是对自己做出的承诺。比如说,我们就是很喜欢第一时间去做那些最新的东西。你怎么看这个问题?
Speaker 310:37 - 11:04
Yeah. So I think the what's nice about the way that we work internally, I guess, is, like, so we run the platform, and the platform for what most people think of it as is our externally facing APIs and our suite of APIs. The other rest of what our team actually does is internal platform in the sense that all of our first party products are built directly on the same platform as everybody else. And so what's cool about that is we're we spend all of our time, not all
Speaker 310:37 - 11:04
对,我觉得从我们内部的工作方式来看,比较好的一点是——这么说吧——我们在运行这个 platform(平台),而多数人通常理解的 platform,是我们对外提供的 APIs(应用程序接口)以及整套 API 服务。但实际上,我们团队其余很大一部分工作,是做 internal platform(内部平台):也就是说,我们所有的 first party products(第一方产品)也是直接构建在和其他所有人相同的平台之上的。所以这件事很酷的地方在于,我们把很多时间——倒也不是全部
Speaker 111:04 - 11:05
of our time, but a
Speaker 111:04 - 11:05
时间,但确实有很
Speaker 311:05 - 11:33
lot of our time working with the teams internally who are building on top of the platform and kind of enabling the features that they will build, sharing ideas and these sorts of things. And so I think over time, you'll maybe see less and less divergence of, you know, like what might be available in Cloud Managed Agents, what might be available in coworker, Cloud Code that might sit on top of the same infrastructure, right? Like, that's I think one way to think about that. Yeah. And then I think,
Speaker 311:05 - 11:33
大一部分时间——都花在和内部那些基于这个平台构建产品的团队一起工作,帮助他们实现想做的功能,交流想法,诸如此类。所以我觉得,随着时间推移,你可能会看到这种分化越来越少——比如说,Cloud Managed Agents 里能用什么、coworker、Cloud Code 里能用什么,虽然它们可能都建立在同一套基础设施之上,对吧?我觉得这是理解这个问题的一种方式。对。然后我还觉得,
Speaker 211:33 - 12:01
you know, on your point around or your team's point around, like, you know, having some kind of, like, model lock in fear, I think that that's, like, valid. Like, many folks kind of have that consternation. And I think we're kind of at this place where there's a bit of like an evolution here where, you know, if you look back, maybe even just a couple months ago, it was very standard to kind of build a very, very, very generic harness. It's super generic, and then you can kind of hot swap models across all of those things. And I think for kind of an older generation of models, across labs, that kind of worked like okay.
Speaker 211:33 - 12:01
关于你提到的——或者你团队提到的——这种 model lock-in(模型锁定)担忧,我觉得这是合理的。很多人确实都有这种顾虑。我觉得我们现在正处在一个有点演进的阶段:如果你回头看,哪怕只是几个月前,构建一个非常、非常、非常通用的 harness(封装框架)还是很标准的做法。它超级通用,然后你就可以在这些东西之间热切换不同模型。我觉得对于上一代模型来说,跨不同 labs(实验室/模型提供方)这么做还算行得通。
Speaker 212:01 - 12:30
A lot of things were were moving at a pace where I think that that was, like, mildly reasonable. I think now, for the next kind of generation of models and as we kind of see it forward, I think you kinda see this a little bit from every lab. Like, everyone's taking, like, slightly different, techniques and perspectives on how they want to kind of advance their particular form of the model. And so in theory, I guess you could do kind of the superset of all those things. But more often than not, I think, know, like when you build agents for your company or for your customers, you do want to deliver like an outcome ultimately for them.
Speaker 212:01 - 12:30
很多事情当时的发展速度,让我觉得那样做还算是某种程度上合理的。但我觉得现在,到了下一代模型,以及从未来趋势来看,你会从每一家 lab 那里都多少看到这一点:大家都在用略有不同的技术和视角,来推进各自那一类模型的发展。所以理论上,你当然可以去做一个把所有这些都覆盖进去的 superset(超集)。但更多时候,我觉得,当你为你的公司或者客户构建 agents(智能体)时,你最终真正想交付的其实是某种 outcome(结果)。
Speaker 212:30 - 12:53
And so I think that that level of abstraction of like what you're actually hot swapping stops becoming this, like, really generic harness and hot swapping the model, and it gets more to, the harness and the model get very paired. You still need redundancy, and you still might want to use other models for things, but you probably do it at the layer of, like, the agent, meaning, like, the harness plus the model, rather than necessarily the other architecture of, like, you know, really, really generic harness and and hot swapping everything underneath.
Speaker 212:30 - 12:53
所以我觉得,这里的抽象层级——也就是你真正要去热切换的对象——会不再是那种非常通用的 harness 加上可热切换的模型;而会变成 harness 和模型高度配对的一体。你仍然需要冗余,也仍然可能想在某些场景里用其他模型,但你大概率会在 agent 这一层来做这件事——也就是 harness 加模型这一层——而不一定是采用另一种架构:先做一个极其、极其通用的 harness,再去热切换底下的所有东西。
Speaker 112:53 - 13:04
That's really interesting. Is that how, I don't know, the cursors of the world are doing things? Like, do they have a a separate harness for each model, or is it a generic harness that they're kinda hot swapping the models in and out of? Do know?
Speaker 112:53 - 13:04
这真的很有意思。我不知道,像 Cursor 这类产品现在是这么做的吗?比如,他们会为每个 model 单独配一套 harness(运行/调用框架),还是用一个通用的 harness,然后把不同 model 在里面来回热切换?你知道吗?
Speaker 213:05 - 13:31
I'm not entirely sure. My, intuition would be that, like, I don't know about Cursor in particular, but there have been, like, teams that we have talked to who have kind of fallen on similar kind of perspectives. And it's mostly because they're just trying to squeeze the most out of each model to kind of like, almost like harness engineer, like every single, like, nuance. And, you know, one example that we have, it's it's not an external customer per se, but, something that we've done a lot internally. Like we recently watched like memory, for example, with with managed agents.
Speaker 213:05 - 13:31
我也不完全确定。我的直觉是——具体 Cursor 我不清楚——但我们聊过的一些团队,确实也大致形成了类似的看法。主要原因是,他们就是在努力把每个 model 的能力尽可能压榨出来,几乎像是在做 harness engineer(harness 工程)一样,把每一个细微差别都利用起来。然后,你知道,我们自己也有一个例子——严格说不算外部客户,而是我们内部做过很多的一件事。比如说,我们最近就在 managed agents 上反复研究过 memory。
Speaker 213:32 - 13:58
And we tried a bunch of different harnesses ourselves. Like we tried one that was like the one that we ended up launching. We tried a bunch of others using a bunch of different other techniques. And, at least personally for myself, like when I saw eval suite from the team, each one of these harnesses performed drastically differently. And so I think just even looking at something like that shows you that you can actually hill climb a tremendous amount by just harness engineering the right pieces together.
Speaker 213:32 - 13:58
而且我们自己也试了很多不同的 harness。我们试过一个,就是后来真正上线的那个;也试过很多其他版本,用了各种不同的技术。至少对我个人来说,当我看到团队跑出来的 eval suite(评测套件)结果时,这些 harness 的表现差异非常大。所以我觉得,哪怕只看这种例子,也足以说明:如果把合适的部件用对的方式拼起来,光靠 harness engineering(harness 工程)本身,你就能实现非常大幅度的 hill climb(爬坡式优化)。
Speaker 213:58 - 14:12
I think if you were to just take that forward across, like, all model combinations, across all different labs, all different kinds of providers, there is a lot of alpha in that kind of construct. And so I wouldn't be surprised if more than just ourselves have experimented with that level of, like, know, unit tying.
Speaker 213:58 - 14:12
我觉得,如果把这个思路推广到所有 model 组合、所有不同 lab、各种不同 provider 上,这种构造方式里面其实有非常多 alpha(超额收益/优势)可挖。所以如果不只是我们自己,其他人也在尝试这种程度的“绑单元”式做法,我一点也不会惊讶。
Speaker 114:12 - 14:26
It's really interesting that there's this path dependence where you make some choice for how you do requests and responses or how you do tool calls or whether you're you have the model wanna use file systems or not. Yeah. And then that sort of, like, changes the trajectory of
Speaker 114:12 - 14:26
很有意思的一点是,这里面存在一种 path dependence(路径依赖):你一开始会对 request 和 response 怎么做、tool call(工具调用)怎么做、或者要不要让 model 使用 file system(文件系统)之类的问题做出某种选择。对。然后这些选择就会在某种程度上改变——
Speaker 214:26 - 14:36
all these different models. And it feels like, maybe at the time, like, such a small, almost like, you know, kind of like footnote. Yeah. But it ends up becoming very big.
Speaker 214:26 - 14:36
——所有这些不同 model 的发展轨迹。而且当时看起来,也许只是个很小的决定,几乎像个脚注一样。对。但最后它会变得非常重要。
Speaker 114:36 - 15:02
Do you think that that will end up affecting the model's generalizability in the sense that at some point, they they'll just have these sort of maybe locked in lanes of stuff that they're good at because they're you know, Cloud is really good at file systems and OpenAI's you know, GPT is good at some other things. Like, yeah, how is that gonna, how's that gonna flow through the model's, like, personality and behavior if it's, like, locked into a specific way of doing things?
Speaker 114:36 - 15:02
你觉得这最终会不会影响 model 的 generalizability(泛化能力)?就是说,到某个阶段,它们会不会被锁进一些自己擅长的“赛道”里,比如 Claude 很擅长 file system,而 OpenAI 的 GPT 又擅长别的东西。那样的话,如果它被锁定在一种特定的做事方式里,这会怎样进一步影响 model 的“个性”和行为模式?
Speaker 215:03 - 15:28
I do think it does actually kind of tend to lock the model. So, like, what, what we end up, like, kind of treating as like the right path and the right primitives need to be like very carefully thought through. And so like I think in the in some eras, you know, like of of other models, they become really, really, really good at like reasoning. And then they almost like over optimize on that level of reasoning. And there's other perspectives around, like, okay, like, yes, we want it to be really good at, a computer.
Speaker 215:03 - 15:28
我确实觉得,这种情况实际上会在某种程度上把 model 锁住。所以,我们最终会当作“正确路径”和“正确 primitives(原语)”的那些东西,必须经过非常谨慎的思考。因此我觉得,在某些时期,某些其他 model 会变得特别特别擅长 reasoning(推理),然后它们又几乎会在这种推理层面上过度优化。与此同时,也还有另一种视角会问:好,没错,我们希望它在 computer(计算机操作)方面也非常强。
Speaker 215:28 - 15:55
Like, maybe the computer part is the interesting part. And so if you think through maybe some of the the primitives, which we could get right, we could get wrong, but at least we'll, like, go through the thought process of, like, that will probably at least lead us, you know, one path or the other. I think it's hard to say, like, you know, which direction per se will ultimately be true, but I do think there's a lot of, like, path dependency it ends up taking. So being really, like, thoughtful about what you choose to actually include or give kind of the model more natively is really important.
Speaker 215:28 - 15:55
比如,也许 computer 这一部分才是有意思的部分。所以如果你把一些 primitives(原语)想透,我们可能会做对,也可能会做错,但至少我们会经历这样一个思考过程,而这大概率至少会把我们引向某一条路径。我觉得很难说到底哪个方向本身最终会被证明是对的,但我确实认为,最后会走成什么样,很大程度上取决于 path dependency(路径依赖)。所以,认真思考你到底选择纳入什么,或者让 model(模型)原生具备什么,是非常重要的。
Speaker 115:55 - 15:57
Are there any of those path dependencies that you've had to undo?
Speaker 115:55 - 15:57
那有没有哪些 path dependency(路径依赖)是你们后来不得不回退、纠正的?
Speaker 216:03 - 16:24
Probably. I can't I can't speak enough about that at the anthropic level. I've only been here, like, a couple of months, but I have to imagine that that has been the case. I mean, we've experimented, like, even at other labs, like, the kind of, like, primitives that we have to take a look at are constantly changing. And you do kind of hit, like, a little local maxima and rethink, like, okay.
Speaker 216:03 - 16:24
很可能有。我没法就 Anthropic 这个层面说太多,我来这里才几个月,但我可以想象这种情况肯定发生过。我的意思是,我们做过实验——包括在其他 labs(实验室)也是一样——我们需要关注的那类 primitives(原语)一直都在变化。你确实会碰到某种局部最优点,然后重新思考:好吧。
Speaker 216:24 - 16:26
Maybe there's, like, a more generic approach that we do.
Speaker 216:24 - 16:26
也许我们应该采用一种更通用的方法。
Speaker 116:26 - 16:38
Yeah. Interesting. I I wanna take a take a step back and and ask you something that maybe I should have asked at the beginning, which is, like, who who is Cloud Managed Agents for? Right? Like, I I set one up, earlier today.
Speaker 116:26 - 16:38
对,很有意思。我想稍微退一步,问你一个也许我一开始就该问的问题:Cloud Managed Agents 到底是给谁用的?对吧?我今天早些时候自己搭了一个。
Speaker 116:38 - 17:09
We we've got some people already using it in production inside of Every, and I just I just did one today. I really loved the, the sort of, like, getting started chat experience that you that you had and the sort of, some of the examples that you had. And it it felt to me like even if I was not technical, I might wanna use this to set up an agent. It it might be a little bit complicated, but what I actually did is I just and I'm sorry to say this, but I did it in the Codex in app browser. So I had Codex driving the the managed agent setup, and it like, I had a Slack bot working pretty pretty quickly.
Speaker 116:38 - 17:09
我们这边已经有人在 Every 内部把它用于 production(生产环境)了,而我今天也刚做了一个。我非常喜欢你们那种 getting started chat(入门聊天)体验,还有你们提供的一些示例。这让我感觉,即使我不是技术人员,我可能也会想用它来搭一个 agent(智能体)。它也许还是有一点复杂,但我实际做的事情是——说这个有点抱歉——我是在 Codex 的 app browser 里完成的。所以是 Codex 在驱动整个 managed agent(托管式智能体)的设置过程,而它很快就帮我把一个 Slack bot 跑起来了。
Speaker 117:09 - 17:15
It was, like, it was really cool. So how do you think about when you're designing stuff when you're designing cloud managed agents who it's for?
Speaker 117:09 - 17:15
真的特别酷。所以你在设计这些东西、设计 cloud managed agents(云端托管式智能体)的时候,会怎么思考它到底是给谁用的?
Speaker 317:15 - 17:34
Yeah. So it's interesting because I think you're right that especially with that quick start experience, which we actually felt pretty strongly about launching, not specifically for the sake of making it so that nontechnical people could go and build agents, but actually just for anybody technical or not be able to wrap their head around the primitives like
Speaker 317:15 - 17:34
对,所以这点很有意思,因为我觉得你说得对,尤其是那个 quick start(快速开始)体验。我们其实非常坚定地想把它发布出来,但并不只是为了让非技术人员也能去构建 agents(智能体);更准确地说,是想让任何人——不管是否技术背景——都能够理解这些 primitives(原语)到底是什么。
Speaker 117:34 - 17:36
Here's what it can do and here's how it fits together.
Speaker 117:34 - 17:36
下面说说它能做什么,以及这些部分是如何组合在一起的。
Speaker 317:36 - 17:56
Exactly. Like, you know, the the kind of education portion of it. But I think when we think about who it's for, we think about a couple different things. One is we're seeing people internally within companies build automation or build really powerful platforms or systems. Like we've seen people say, I want, you know, a full end to end software development platform.
Speaker 317:36 - 17:56
没错。比如,你知道,它也有一部分是偏教育性质的。但我觉得,当我们思考它是给谁用的时候,我们会想到几类不同的人。一类是我们看到公司内部有人在构建 automation(自动化),或者搭建非常强大的 platform(平台)或 system(系统)。比如我们见过有人说,我想要一个完整的、端到端的 software development platform。
Speaker 317:56 - 18:06
Right? And like ManageEngine is a perfect solution for something like that. Or, you know, I want to automate a little process over here where, like, legal has to review my marketing copy. Right? And things like that.
Speaker 317:56 - 18:06
对吧?像 ManageEngine 就是这类需求的一个完美解决方案。或者说,我想把这边一个小流程自动化,比如 legal(法务)需要审核我的 marketing copy(营销文案)。对吧?类似这样的事情。
Speaker 118:06 - 18:10
And so you shouldn't have to reimplement memory and, like Exactly. All that stuff
Speaker 118:06 - 18:10
所以你不应该每次做这些的时候,都还得重新实现 memory(记忆)之类的东西。没错。所有那些东西。
Speaker 318:10 - 18:39
every time you're doing that. Right. You can get started really quickly, and you can get something running quickly. The other user that's top of mind for us is people building into their products that they expose to their customers. And so that's the other one where actually, yes, like you do still want a lot of customization, you do still wanna make something that's gonna be really powerful for your product, but we still, like, definitely definitely believe that not spending your engineering resources on the infrastructure and on all the little harness engineering tweaking sort of stuff is
Speaker 318:10 - 18:39
每次做这种事的时候都不该如此。对吧。你可以非常快地开始,也可以很快让某个东西跑起来。另一个我们当前最关注的用户群体,是那些把这些能力构建进自己产品、并提供给客户的人。所以这是另一类用户:确实,你仍然会想要大量 customization(定制),你也仍然想做出一个对你的产品来说非常强大的东西,但我们依然非常、非常相信,不把 engineering resources(工程资源)花在 infrastructure(基础设施)以及各种零碎的 harness engineering(支撑工程)调优这类事情上,是更合理的。
Speaker 118:39 - 18:43
Why couldn't we have talked, like, a month ago? You would have saved so much time.
Speaker 118:39 - 18:43
我们为什么不能像一个月前那样聊上一次?那样你本来可以省下很多时间。
Speaker 318:46 - 18:47
We'll just
Speaker 318:46 - 18:47
我们还是得
Speaker 218:47 - 18:47
need to talk more.
Speaker 218:47 - 18:47
多聊聊。
Speaker 118:47 - 19:01
But I am I am sort of curious. Okay. So maybe infrastructure is one of these things. But when you see people setting up agents, what do they what do you see them think the hard thing is, and what ends up actually being the hard thing? Are they the same?
Speaker 118:47 - 19:01
不过我确实有点好奇。好吧。所以也许 infrastructure(基础设施)就是其中一类问题。但当你看到人们搭建 agent(智能体)时,你觉得他们认为什么才是难点,而最后真正的难点又是什么?这两者是同一件事吗?
Speaker 319:01 - 19:18
Good question. I I maybe this is, I don't know, spicy. I'm not sure, but I think I think people think the harness engineering part is the hard part. Mhmm. And so, actually, like, you know, in the past, we launched the Aegion SDK, which is what you guys, I think, are using, on your Mac minis.
Speaker 319:01 - 19:18
问得好。我——我也许,这个说法我不知道,可能有点 spicy(有争议)。我不太确定,但我觉得——我觉得人们会认为 harness engineering(封装/调度框架工程)这部分才是难点。嗯哼。所以实际上,比如说,之前我们发布了 Aegion SDK,我想你们现在在 Mac mini 上用的就是这个。
Speaker 319:18 - 19:28
And for a lot of people, were like, okay. Great. I don't have to do the harness engineering part where I have to do prompt caching and I have to maximize my context window and all these sorts of things.
Speaker 319:18 - 19:28
对很多人来说,他们会觉得,好的,太棒了。我就不用去做 harness engineering(封装/调度框架工程)这部分了,不用自己处理 prompt caching,也不用去尽量扩大我的 context window(上下文窗口),以及诸如此类的事情。
Speaker 119:28 - 19:32
I think we're just actually using just Claude in bat, like the Claude dash p command.
Speaker 119:28 - 19:32
我觉得我们其实只是直接在 bat 里用 Claude,比如 Claude dash p 这个命令。
Speaker 219:32 - 19:33
Oh, wow. Yeah. Okay.
Speaker 219:32 - 19:33
哦,哇。对。好吧。
Speaker 119:33 - 19:34
It's it's pretty good.
Speaker 119:33 - 19:34
它——它挺不错的。
Speaker 319:34 - 19:34
Yes.
Speaker 319:34 - 19:34
对。
Speaker 119:34 - 19:34
Yeah.
Speaker 119:34 - 19:34
对。
Speaker 319:34 - 19:41
Cool. Nice. Okay. Cool. And but regardless, like, you guys did that because it takes off your hands building the harness.
Speaker 319:34 - 19:41
很酷。不错。好。很酷。不过不管怎样,你们这么做还是因为这样就不用自己去搭建 harness(封装/调度框架)了。
Speaker 319:41 - 20:28
Right? But I do think what we saw with a lot of customers was, okay, now I wanna go and take that thing and like get it into production and scale it, and everybody hits an infrastructure wall. Like everyone hits the same problem of like, oh wow, I either need to like keep a server constantly running or I need to use that will spin up and spin down and I need to store the transcript data and I need secure sandboxing and all these sorts of things. So, you know, and like if you boot a clogged code session or you boot the agent SDK in a sandbox and like that's the thing that you have running, but your sandbox loses connection and dies or whatever, your whole agent dies, right? And so I think the infrastructure part especially is the wall that most people end up hitting, but they're more expecting that actual harness engineering and like getting the most out of the model is the part that's gonna be harder.
Speaker 319:41 - 20:28
对吧?但我确实觉得,我们在很多客户身上看到的是:好,现在我想把那个东西真正投入 production(生产环境)并把它扩展起来,结果所有人都会撞上一堵基础设施墙。就像每个人都会遇到同样的问题:哦天哪,我要么得让一台 server(服务器)一直运行,要么就得用那种会 spin up 和 spin down(拉起和关闭)的方案;我还得存 transcript(对话转录)数据,还得有安全的 sandboxing(沙箱隔离),以及诸如此类的一堆事情。所以你知道,如果你启动了一个堵住的 code session,或者你在 sandbox 里启动了 agent SDK,而那就是你正在运行的东西,但你的 sandbox 断连然后挂掉之类的,那你的整个 agent 也就跟着死掉了,对吧?所以我觉得,尤其是基础设施这一块,是大多数人最后都会撞上的那堵墙;但他们原本更以为,真正更难的部分会是 harness engineering(编排/驱动工程),以及如何把 model(模型)的能力尽可能榨出来。
Speaker 220:28 - 20:44
Yeah. I totally agree with that. I was just gonna say like, you know, we we talk to so many people who are now at a place where they're like prototyping really quickly and they're super excited and it's like it's doing the thing. And then yet there's like a class of people who are really pushing and being like, okay, I do want a hill climb. I really want to edit the hardest.
Speaker 220:28 - 20:44
对,我完全同意。我刚才正想说,我们现在接触到太多人了,他们已经到了这样一个阶段:原型做得非常快,他们也特别兴奋,而且看起来它确实能做成那件事。但与此同时,也有一类人真的在更用力地往前推,他们会说:好,我确实想做 hill climb(爬坡式优化),我真的想去打磨最难的部分。
Speaker 220:44 - 20:59
But then once you have that thing, like productionizing is just a freaking nightmare, especially for the more interesting kind of long running async ones that you want to do a bit more remotely that are a bit more autonomous. And everyone kind of runs into that wall. It was a big inspiration for why we built what we built.
Speaker 220:44 - 20:59
但一旦你真的有了那个东西,把它 productionize(工程化并上线到生产环境)简直就是噩梦,尤其是那种更有意思的、长时间运行的 async(异步)agent:你希望它们能更远程一点、更自主一点。而且大家基本都会撞上那堵墙。这也是我们为什么会构建现在这套东西的一个重要灵感来源。
Speaker 120:59 - 21:51
I feel like, one of the, like, examples of the shape of an agent is OpenClaw. And in particular, the the thing that it has brought to us internally is you have an always on agent in Slack that has its own personality and has its own, like, part of the world that it, like, ends up working on. Are you guys like, is is that a possible future for, like, okay, a one click agent that lives in my Slack that, yes, I can go set up all the internals, but, like, I don't have to really think about all of the, you know, the technical infrastructure stuff? Because I I think you you all have the the beginnings of that, but it's still, like, a lot of steps from the current managed agent to something that's always on in my Slack that I have to, like, set up and customize. So is that does that fall in the realm of platform's job, or is it, like, too far in the product direction?
Speaker 120:59 - 21:51
我觉得,agent 的一种形态示例就是 OpenClaw。尤其是,它在我们内部带来的东西是:你有一个在 Slack 里始终在线的 agent,它有自己的个性,也有自己会去处理、会逐渐负责的那一部分世界。你们觉得,这会不会是一种可能的未来——比如,一个 one-click(一键式)agent,直接活在我的 Slack 里;没错,我可以去配置它内部的各种东西,但我其实不需要真正去操心所有那些技术基础设施方面的事?因为我觉得你们已经有了这方面的雏形,但从当前这种 managed agent(托管式 agent)到一个始终在线、存在于我 Slack 里、还需要我去设置和定制的东西,中间还是有不少步骤。所以这算不算是 platform(平台)职责范围内的事,还是说它已经太偏 product(产品)方向了?
Speaker 221:52 - 22:13
No. It it definitely is, something that we really want to do. I think, like, you know, we we focused a lot on kind of the infrastructure piece to start because that's where we just see a lot of these, like, pain points. But, yes, like, I think in like, it's, like, you know, I don't wanna say exactly say final shape, but in its, like, advanced shape, we actually want to make it so that you can kind of deploy these agents really, really easily. Like, we made, like, some light steps in this direction.
Speaker 221:52 - 22:13
不,这绝对是我们非常想做的事情。我觉得,我们一开始把很多精力放在基础设施这一块,主要是因为我们确实在这里看到了很多这样的痛点。但对,怎么说呢,我不想把它直接称为最终形态,不过在它更成熟、更高级的形态里,我们其实确实想做到:让你可以非常非常轻松地部署这些 agents。我们已经朝这个方向做了一些比较轻量的尝试。
Speaker 222:13 - 22:16
Like, for example, we included vaults as one of the primitives as just kind of
Speaker 222:13 - 22:16
比如说,我们把 vaults 也纳入了 primitives(基础构件)之一,算是一种……
Speaker 122:17 - 22:20
And vaults store your, like, keys and stuff, like your OAuth keys?
Speaker 122:17 - 22:20
所以 vaults 存的是你的那些 key(密钥)之类的东西?比如你的 OAuth keys?
Speaker 322:20 - 22:21
Credentials.
Speaker 322:20 - 22:21
Credentials(凭证)。
Speaker 222:21 - 22:50
Credentials. Yeah. As, like, you know, kind of solving some of the lower level pieces as a starting point. But once you kind of wrap some of these more sort of like agent identity type of primitives in a more secure way and you can handle it really easily and it works with like the whole like system, then, you know, I think it's very natural for us to get to a place where maybe you are either one clicking, Slack integration or alternatively even maybe just telling, you know, Claude, like, add Slack, and it just, like, handles absolutely everything. And then before you know it, your little bot is just picking you on Slack.
Speaker 222:21 - 22:50
凭证。对。算是先从解决一些更底层的部分开始。但一旦你能以一种更安全的方式,把这类更偏 agent identity(agent 身份)这一类的 primitives(基础构件)封装起来,而且可以非常容易地处理它,并且它还能和整个系统配合工作,那么我觉得,很自然我们就会走到这样一个阶段:也许你只需要一键完成 Slack integration(集成),或者甚至你只要直接告诉 Claude,比如说,加上 Slack,然后它就会把所有事情全部处理好。接着不知不觉间,你那个小 bot(机器人)就已经会在 Slack 上 ping 你了。
Speaker 122:51 - 23:08
I love it. I've I can't wait for that world. What are the best internal use cases of agents? Because I think there's this big question happening right now where, okay, yeah, everyone's in Codex or Cloud Code, but then now we have these agents that are out in the cloud. Now everyone inside of a company can, like, have their own agent.
Speaker 122:51 - 23:08
我很喜欢。我已经等不及那样的世界到来了。agent 最好的内部 use case(使用场景)是什么?因为我觉得现在有一个很大的问题正在发生:好吧,没错,大家都在用 Codex 或者 Cloud Code,但现在我们又有了这些运行在 cloud(云端)里的 agents。现在公司里的每个人,感觉上都可以拥有自己的 agent。
Speaker 123:08 - 23:17
There are team agents. There are company wide agents. So what are the patterns that you see for when people make really useful internal agents, what they do and what they look like?
Speaker 123:08 - 23:17
有团队级的 agents。也有公司范围的 agents。所以你观察到的模式是什么?就是说,当人们做出真正有用的内部 agents 时,他们通常会做什么,这些 agents 看起来又是什么样的?
Speaker 323:17 - 23:38
Yeah. I would say we similar to and we've actually seen a few examples of these in some of the more, like, AI pilled, AGI pilled companies, like Stripe built Minions, and they talked about that a lot as their kind of like end to end development platform that their engineers could use. I think Ramp did something similar, and we've done similar things as well. Right? Interesting.
Speaker 323:17 - 23:38
对。我会说,这和——而且我们其实已经在一些更偏 AI pilled、AGI pilled 的公司里看到过几个这样的例子——比如 Stripe 做了 Minions,他们也经常谈这个,把它当作工程师可以使用的一种端到端开发平台。我觉得 Ramp 也做了类似的东西,而我们自己也做过类似的事情。对吧?挺有意思的。
Speaker 323:38 - 23:53
Yeah. We've built kind of platforms internally that are you know, I have agents running that I can talk to you from Slack or from wherever. Right? And at a certain point, that becomes actually, like, a pretty thin layer on top of managed agents. Like, you don't have to do very much to accomplish
Speaker 323:38 - 23:53
对。我们在内部构建过这类平台,基本上就是——你知道——我有一些正在运行的 agents,我可以从 Slack 或者别的地方和它们交流。对吧?而到了某个阶段,那实际上就变成了托管型 managed agents(受管 agents)之上的一个相当薄的薄层。你并不需要做很多事情就能实现
Speaker 123:54 - 24:07
That's what I was thinking. Like, I looked at minions or whatever ramp does, and I was like, it why why? You know? So is it is it actually useful to have a sort of, like, thin coding agent that anyone in the company can use or, like, why not just install the Clot app in Slack?
Speaker 123:54 - 24:07
这也是我在想的。比如我看了 Minions,或者 Ramp 在做的那些东西,我当时就在想,为什么?你懂吧?所以,做一个那种任何公司员工都能使用的、比较薄的一层 coding agent(编码 agent),它真的有用吗?或者说,为什么不直接在 Slack 里安装 Clot app?
Speaker 324:08 - 24:23
Yeah. Would say the difference in a platform like that and some of the things that we've done internally is there's a lot of customization that you might wanna do on, you know, the development environment where an agent is actually running and able to verify its changes, right, and and things like that.
Speaker 324:08 - 24:23
对。我会说,这样的平台和我们内部做过的一些东西之间的区别在于,你可能会想做很多定制化,尤其是在 agent 实际运行、并且能够验证自己改动的 development environment(开发环境)上,对吧?以及诸如此类的事情。
Speaker 124:23 - 24:25
It's like here's how our CICD works.
Speaker 124:23 - 24:25
比如说,这就是我们的 CICD 是怎么运作的。
Speaker 324:25 - 24:35
Yeah. Exactly. And so, you know, I think for lots and lots and lots of people, like, Cloud Code is an excellent tool. Right? And and you can run Cloud Agents with Cloud Code and and that is really great.
Speaker 324:25 - 24:35
对,完全没错。所以,你知道,我觉得对非常非常多的人来说,Cloud Code 是一个很棒的工具,对吧?而且你可以用 Cloud Code 运行 Cloud Agents,这确实非常好。
Speaker 324:35 - 24:49
But I think if you're trying to do a little bit more end to end development, right, and you maybe wanna bake in more custom things, then you could start with something like managed agents and build a layer on top of that and end up with something that's maybe closer to that end to end experience.
Speaker 324:35 - 24:49
但我觉得,如果你想做更偏端到端的开发,对吧,而且你可能还想把更多自定义的东西内建进去,那你就可以从 managed agents 这样的东西开始,在它上面再搭一层,最后做出一个可能更接近那种端到端体验的东西。
Speaker 124:49 - 25:07
It also seems to me like there's something in particular about having a team that you need to work with that makes the manage agent shape important as opposed to it just all works in Cloud Code. Like, I guess, technically, you could, like, sync the skills between everyone's Cloud Code, but, like, there's something about just we all have one agent that does this thing that seems to work.
Speaker 124:49 - 25:07
另外我也觉得,和一个团队协作这件事本身,似乎会让 managed agent 这种形态变得重要,而不是说一切都只在 Cloud Code 里运转就行。比如我猜从技术上说,你当然可以在每个人的 Cloud Code 之间同步 skills,但就是那种“我们大家共同拥有一个负责这件事的 agent”的方式,感觉确实更有效。
Speaker 225:07 - 25:39
Yeah. I'm really glad you brought that one up because I think like, that's actually like one of the more common areas where we see, a lot of the opportunity is that, to your point, you know, there's a lot of like individual productivity that's happening, whether you're a developer or non developer, there's like so many tools that you're using to just like make yourself like more automated, more, you know, high leverage. But then when get to the team layers, suddenly everything gets like massively more complex. Like number one, obviously, you can't like sit on your laptop. And yes, you could maybe like, you know, put it in the cloud, but it's again more for yourself to kind of like handle with your laptop closed.
Speaker 225:07 - 25:39
对,我很高兴你提到这一点,因为我觉得这其实正是我们看到机会最多的更常见场景之一。就像你说的,现在有很多个人生产力层面的提升正在发生,不管你是 developer 还是非 developer,你都在使用很多工具,让自己变得更自动化、更 high leverage(高杠杆)。但一到了团队这一层,事情突然就会变得复杂得多。首先很明显,你不能只是把东西放在自己的 laptop 上运行。是的,也许你可以把它放到云端,但那很多时候也还是更偏向于你个人使用,相当于即使把 laptop 合上,它也还能继续帮你处理事情。
Speaker 225:39 - 26:12
But then you go to like, okay, well now like the three of us want like, you know, a couple agents that interface with each other and work with each other. And then maybe we're automating a process kind of end to end. And especially for some of the more complex processes that you kind of envision being, like, really transformed with AI, you do need, like, you do need that kind of, like, team orientation. And that needs to happen at, like, a layer that's a slightly higher bit of abstraction than just a single agent. And I think some of the teams exploring, you know, kind of multi agent architectures and things like that are really exciting, but it needs to be built on top of a a little bit of, like, a platform that everyone kinda spin up and down and control.
Speaker 225:39 - 26:12
但接下来就会变成:好,现在我们三个人想要几个彼此对接、彼此协作的 agents,然后也许我们还想把一个流程端到端地自动化。尤其是对于那些你设想会被 AI 真正深度改造的复杂流程来说,你确实需要那种面向团队的设计。而这需要发生在一个比“单个 agent”更高一层的抽象层上。我觉得一些在探索 multi-agent architectures(多 agent 架构)之类方向的团队真的很令人兴奋,但这需要建立在某种平台之上,而且这个平台得让所有人都能比较方便地 spin up and down(启动与关闭)并进行控制。
Speaker 226:12 - 26:36
And I think Jeev from Vercel, like, had a really good perspective on this in a way where I think his company, Vercel, is obviously incredibly, like, AI pills, and he kinda describes it as sort of like an AI, like, software factory, like, internally. And I think that's exactly with the right mindset and that, like, produces, you know, like, extremely high leverage organization that's really just, like, creating a tremendous amount of productivity, but not just for themselves, just, like, for every single process that they have in the company.
Speaker 226:12 - 26:36
我觉得 Vercel 的 Jeev 在这一点上有个很好的视角。我理解他的公司 Vercel 显然是非常 AI-pilled 的,他会把公司内部形容成某种 AI 驱动的软件工厂。我觉得这正是一种正确的思维方式,而且它会产出一个 high leverage(高杠杆)的组织,真正创造出惊人的生产力提升,而且不只是让某几个人自己受益,而是覆盖公司里的每一项流程。
Speaker 126:36 - 26:46
Mhmm. And I I really wanna go back to this, like, okay, agent use cases. We've got coding agents that that anyone can use in the company. Like, what are the other ones that are that you see people standing up that are really useful?
Speaker 126:36 - 26:46
嗯,我还真想回到这个话题:好,agent 的 use cases(使用场景)。我们已经有公司里任何人都能使用的 coding agents,那你还看到哪些人们正在搭建、而且确实非常有用的 agent?
Speaker 326:47 - 27:15
We've seen a few so one of the fun things that we get to do is just kind of work with our internal teams of different functions and, like, help them identify because we actually just get to learn a lot as a result of doing that. And so the silly example I brought up earlier of, like, legal team needs to review marketing copy was one of the ones that Very real. Yeah. Like, extremely real and, like, really, like, blew people's minds with, like, very basic agents that just give people the right setup to be able to do that. So you've seen that.
Speaker 326:47 - 27:15
我们已经看到了一些。我们能做的一件很有意思的事,就是和内部不同职能的团队合作,帮助他们识别这些场景,因为在这个过程中我们自己其实也学到了很多。所以我前面举过的那个看起来有点好笑的例子——legal team 需要审核 marketing copy——其实就是其中之一。它非常真实。对,非常非常真实。而且仅仅是一些很基础的 agents,只要给人们提供正确的 setup(配置),就已经足以让很多人大开眼界。所以,这类场景你确实已经能看到了。
Speaker 127:15 - 27:24
Well, what does that actually do? So it's like there there's marketing copy and there's a legal agent that is just like watching what everything marketing does and is like, stop. Like No. Yeah. How does it work?
Speaker 127:15 - 27:24
嗯,那这实际到底是做什么的?所以它是不是有点像:一边是 marketing copy(营销文案),另一边有个 legal agent(法务 agent)一直盯着 marketing 做的所有事情,然后一看到就说,停。不行。对吧。它是怎么运作的?
Speaker 327:24 - 27:38
It is more like, okay. I'm a marketer and I've written some copy. Right? And in the past, maybe you would have opened a ticket or something and be like, can you please review this copy? But instead, you submit it to this, like, you know, little app that we built on top of agents that is like, okay.
Speaker 327:24 - 27:38
更像是这样:好,我是个 marketer(营销人员),我写了一些文案。对吧?过去你可能会提个 ticket(工单)之类的,说,能不能帮我 review(审查)一下这份文案?但现在不同了,你把它提交到这个——你知道的——我们基于 agents 搭的一个小应用里,它会接手处理。
Speaker 327:38 - 27:55
Cool. Now I'm gonna go as an agent review first and then put it in legal's inbox as a already first pass review was done. And maybe actually, like, the agent is it's clear enough that it can say, okay, marketing, you're Right? Or maybe it's still like, no. This needs, an extra human review.
Speaker 327:38 - 27:55
很好。现在我会先以一个 agent 的身份做 first pass review(第一轮审查),然后再把它放进法务的收件箱,这样等于已经先完成了第一轮审核。甚至有时候,agent 的判断已经足够明确,它可以直接说,好,marketing,你这版没问题。或者它也可能还是会说,不行,这个还需要额外的人类审核。
Speaker 327:55 - 28:07
And so yeah. It's just and that's the sort of thing where, again, just thin layer on top, but you can build the, you know, you have access, I have access, we can both see the outputs and we can work together on that.
Speaker 327:55 - 28:07
所以,对,基本就是这样。而这类东西说到底还是那种——再说一次——上面加一层很薄的 layer(层),但你可以把它搭出来;你能访问,我也能访问,我们都能看到输出结果,也可以围绕它一起协作。
Speaker 128:07 - 28:09
Okay. But then, so for example, why is that not a skill?
Speaker 128:07 - 28:09
好,但那比如说,为什么这不算一个 skill(技能)呢?
Speaker 328:10 - 28:48
So it's a it can it very much can be a skill and that actually is you would probably build that agent as a legal reviewer agent, right? And so you would have MCP servers or whatever it is that help you access external context. You would have skills that help you understand here's like what rules we have to follow and not follow, right, and all those things. You put all those things together but then you can just fire off a session with that agent. And then I think the last piece you need, and this is where I'm saying it's a really thin layer, is just like the form factor on top where like different people can collaborate together and, like, work with that agent and multiple agents can be involved in the system.
Speaker 328:10 - 28:48
所以它当然可以,而且很大程度上也确实可以,被做成一个 skill;实际上,你大概率会把那个 agent 做成一个 legal reviewer agent(法务审查 agent),对吧?然后你会有 MCP servers,或者别的什么机制,来帮助它访问外部 context(上下文)。你还会有 skills,帮助它理解:这是我们必须遵守的规则,那是我们不能违反的规则,对吧,以及所有这些东西。你把这些要素都组合起来,然后就可以直接启动一个这个 agent 的 session(会话)。然后我觉得最后还需要的一块——这也是我说它只是很薄一层的原因——就是上面的 form factor(交互形态):让不同的人能够一起协作、一起跟那个 agent 工作,而且整个系统里还可以同时有多个 agents 参与。
Speaker 328:49 - 29:00
And so I think it goes a little bit broader than a skill because you kinda still need, like, the right form factor for the agent to be able to go run and then for people to be able to interact with it.
Speaker 328:49 - 29:00
所以我觉得它比一个 skill 的范围要更宽一点,因为你多少还是需要合适的 form factor,agent 才能真正跑起来,而人也才能跟它进行交互。
Speaker 229:00 - 29:30
Another core bit of why it's, like, not a skill is because or not exclusively a skill is because you actually do need human in the loop. And so, like, if you were to automate the whole thing and you were just, you know, like taking the skill and looking at yourself from, like, legal skill, for example, like in that world, of course, you could have just, like, done a pure skill. But be if you need a human in the loop to be like, okay, like, I want to review and I do want to check and I want to like, we're looking at, like, legal things. And there's a bit of, like, you know, authentication that's sort of necessary. In order to automate that entire process, you kind of need, like, agents to go do the thing.
Speaker 229:00 - 29:30
另一个核心原因,说明它不只是一个 skill,或者说不完全是一个 skill,在于你实际上确实需要 human in the loop(人类参与闭环)。所以如果你把整个流程全自动化了,只是——比如说——把它当成一个 legal skill(法务技能)来看,那在那个世界里,你当然完全可以只做成一个纯 skill。但如果你需要一个人类参与进来,说,好,我要 review,我确实要检查,而且这处理的是法务相关的事情;这里面还会有一些某种程度上必需的 authentication(身份验证)之类的要求。要把整个流程自动化,你基本上就需要 agents 去实际把这些事情跑起来。
Speaker 229:30 - 29:37
And so because you need to spin up sort of separate sessions for that to happen, some sort of stitching is necessary that can't be instantiated in a single skill.
Speaker 229:30 - 29:37
所以,因为你需要为这件事启动某种彼此独立的 session(会话),就必须做某种 stitching(编排/衔接);而这种东西没法在单个 skill(技能)里被实例化。
Speaker 129:37 - 29:44
That's really interesting. Yeah. Okay. So just to push on that a little bit. So what is the best practice for you?
Speaker 129:37 - 29:44
这真的很有意思。对。好吧。那我就顺着这个再追问一点。对你来说,最佳实践是什么?
Speaker 129:44 - 30:06
You create an agent that its job is to make sure that when marketing is writing something, they can get it approved really quickly by legal. And sometimes it'll approve things immediately. Sometimes it sends stuff to legal. And ideally, it's like getting better all the time so it can do more and more, right? What is the best practice for who owns that agent once it's built?
Speaker 129:44 - 30:06
你创建了一个 agent(智能体),它的工作是确保当 marketing(市场团队)在写东西时,能很快拿到 legal(法务)的批准。有时候它会立刻批准,有时候它会把内容发给 legal。理想情况下,它应该是一直在不断变好的,这样它能做的事情就越来越多,对吧?那这个 agent 一旦建好,最佳实践是谁来负责它?
Speaker 130:06 - 30:31
Because one of the things that we found is if you don't have a human who's responsible for the agent, it gets stale very quickly, and then it ends up being kind of this, like, dead thing that's all just, like, out there doing stuff, but it's not necessarily good. And also, even if it kinda works, there's all there are gonna be all these times where legal's like, you asked me to approve this, but I don't really need to approve this thing. Like, let's update your prompt. So, like, how does that all work when it works well?
Speaker 130:06 - 30:31
因为我们发现的一件事是,如果没有一个人类明确对这个 agent 负责,它很快就会变 stale(过时、失效),最后就会变成某种“死掉”的东西,到处在运行、在做事,但不一定做得好。而且,即使它大致能用,也总会出现很多这种情况:法务会说,你让我审批这个,但其实这个东西我根本不需要审批。那我们就来更新一下你的 prompt(提示词)。所以,当这套机制运转良好时,这一切通常是怎么运作的?
Speaker 330:31 - 31:08
So it's actually really interesting because so the form factor thing, right, like the app that sits on top of that that we originally built, one of our teams worked on that, right, and like kinda sitting with these teams and understanding what they needed. And they were kinda like, okay, here you go and we're gonna go do other stuff now and like let us know how this goes for you. And then a really cool thing actually ended up happening where people on those teams who were using the tool were like, I wish like this little thing could get tweaked or this thing could get better. And they, like, popped open Cloud Code, like, made some of the changes themselves Yeah. To the actual and so it's funny.
Speaker 330:31 - 31:08
这其实特别有意思,因为那个 form factor(产品形态)的问题,对吧——就是我们最初构建的、覆盖在它上面的那个 app(应用)——最开始是我们其中一个团队在做,对吧,他们会和这些团队坐在一起,了解他们需要什么。然后他们有点像是在说,好,给你们了,我们现在去做别的事情了,之后你们用得怎么样再告诉我们。结果后来实际上发生了一件很酷的事:那些团队里在使用这个工具的人会说,我希望这个小地方能调整一下,或者那个地方能变得更好一点。然后他们就直接打开 Cloud Code,自己做了一些修改。对,是真正在那个东西本身上做了修改,所以这就很有意思。
Speaker 131:09 - 31:12
Your team responsible for approving the PR? Does it just, like, go in?
Speaker 131:09 - 31:12
负责批准 PR(pull request,拉取请求)的是你们团队吗?还是说它就直接合进去了?
Speaker 331:12 - 31:39
Usually, my team's responsible for reviewing the PR if it's a system that we actually own. But But yeah, like people can kind of self serve making changes to those things, which I think is really cool. So it is, I do think we're still in a stage for a lot of teams and a lot of companies, like even going back to, you know, like Stripe has minions, right? Like Stripe has a large developer productivity team. We used work at Stripe, so we spend a lot time with them, but they have a large developer productivity team.
Speaker 331:12 - 31:39
通常,如果那是一个确实由我们拥有的 system(系统),那负责 review(审查)PR 的会是我的团队。不过,是的,大家某种程度上可以 self serve(自助式)去修改这些东西,我觉得这真的很酷。所以我确实认为,对很多团队和很多公司来说,我们现在仍然处在这样一个阶段——就算说回 Stripe 的例子,Stripe 有 minions,对吧?Stripe 有一个很大的 developer productivity(开发者生产力)团队。我们以前在 Stripe 工作过,所以和他们相处了很多时间;而且他们确实有一个很大的 developer productivity 团队。
Speaker 331:39 - 32:01
They're awesome, and they're obviously putting a lot of work and energy into building platforms and tools like this. And so I think we're definitely still in a place where something like managed agents or being able to build on top of our platform is really powerful, but you still kind of need the AI pilled people and technical people within a business to then go create something really excellent on top of that that works well for whatever you're trying to do.
Speaker 331:39 - 32:01
他们很棒,而且显然投入了大量工作和精力来构建这类 platform(平台)和工具。所以我觉得,我们现在显然仍然处在这样一个阶段:像 managed agents(托管式 agent)这种东西,或者能够基于我们平台进行构建,这些能力都非常强大;但在企业内部,你仍然某种程度上需要那些 AI pilled(非常认同并投入 AI 的)的人和技术人员,再在这之上做出真正优秀、并且适合你要解决的问题的东西。
Speaker 132:01 - 32:30
That's interesting. Yeah. I I love the anyone can open a PR to to do this because everyone's using Cloud Code. One of the things that I find talking to people who are in infrastructure roles at companies where this is starting to happen is, like, you you know that you know the meme where it's, like, there's there's a person, and he's, like, going like this, and he has, like, daggers in his, like, back, and he's, like, cover it. It's like infrastructure people are that, but, like, now anyone can, like, can once can submit PRs.
Speaker 132:01 - 32:30
挺有意思的。是啊。我很喜欢这种“任何人都可以开一个 PR 来做这件事”的模式,因为大家都在用 Cloud Code。我在和一些公司的基础设施岗位的人聊、而且他们所在公司开始出现这种情况时,发现有一点特别有意思:你知道那个 meme 吧——有个人像这样比划着,背上插满了匕首,还在说“cover it”。基础设施团队的人基本就是那种状态,只不过现在变成了几乎任何人都可以提交 PR 了。
Speaker 132:30 - 32:56
Yeah. How do you how do you deal with that, and how do you do that well? Because, obviously, like, in an ideal world, you would love for a legal to be able to submit to improve this agent. And, also, sometimes they're probably gonna submit stupid stuff that wastes time. And so what are the what are the right ways to either organizationally, like, culturally or technically, like, make that possible without ruining your your lives.
Speaker 132:30 - 32:56
对。那你们怎么应对这个问题?又怎么把它做好?因为很显然,理想情况下,你当然希望法务团队也能提交内容来改进这个 agent(智能体)。但同时,他们有时候大概率也会提交一些很蠢、纯属浪费大家时间的东西。所以问题就在于:无论从组织层面、文化层面,还是技术层面,正确的做法是什么,才能既让这件事成为可能,又不至于把你们的日子彻底搞乱。
Speaker 232:56 - 33:43
For this particular one that we've constructed that Caitlin's given as an example, we actually have, like, a couple layers of abstraction away from, like, that kind of, like, PR layer. So at the very beginning, it kind of, like, started that way and to kind of, like, basically prevent users from kind of footgutting themselves a little bit, they kind of get to a place where oftentimes their way of interacting with the agent that they own, like that whether it's the marketing team who owns the marketing agent requesting, or if it's the legal team, you know, owning the agent that does the review. They actually engage with those agents through Claude itself. So they actually spend more of their time like kind of talking directly to Claude. And then Claude will oftentimes figure out what should be the right way for them to go and handle it so that they're not kind of like, you know, hopping straight down to the absolute core bit and doing something that may result in, you know, some complication.
Speaker 232:56 - 33:43
就 Caitlin 举的这个具体例子来说,我们实际构建的是一个距离那种 PR 层还有几层抽象的体系。所以最开始它某种程度上确实是那样起步的;后来为了基本上防止用户有点“搬起石头砸自己的脚”,我们把它做成了这样一种状态:很多时候,用户和他们自己“拥有”的 agent 交互的方式——不管是拥有 marketing agent(营销智能体)的市场团队来发起请求,还是拥有负责 review(审查)的 agent 的法务团队——其实都是通过 Claude 本身来与这些 agent 交互。所以他们更多时间是在直接和 Claude 对话。然后 Claude 往往会判断,对他们来说正确的处理方式应该是什么,这样他们就不会直接一头扎进最核心的底层部分,做出一些可能导致复杂问题的操作。
Speaker 133:43 - 33:47
And they're talking to Claude or Claude Code? Like, Claude Chat or Claude Code or co
Speaker 133:43 - 33:47
那他们是在和 Claude 说话,还是和 Claude Code 说话?比如是 Claude Chat,还是 Claude Code,还是 co
Speaker 233:47 - 34:05
worker It's different initiation of of Claude Okay. That we made that actually is a managed agent in and itself. Of So it's just kind like managed agents all the way Interesting. Down in that construct. But we found that each layer, if we kind of tune and and prompt each variant of the managed agent, it helps to solve, like, you know, different parts of the problem for users.
Speaker 233:47 - 34:05
worker?这是我们做的另一种 Claude 的调用入口,实际上它本身也是一个 managed agent(托管式智能体)。所以本质上有点像是一层套一层的 managed agents。挺有意思的。在这个构造里一路往下都是这样。但我们发现,如果我们对每一层、对 managed agent 的每一种变体分别做调优和 prompt(提示词)设计,它就能帮助用户解决这个问题中不同的部分。
Speaker 234:05 - 34:24
So at the end state for, you know, like that marketing person or that legal person, it is like a really simple interface where the way that we tell them is like, you're just talking to Claude. But under the hood, it's many, many Claude's engaging with each other to get to the part where then they the Claude's themselves are doing the more complex work that the human doesn't really necessarily need to interpret.
Speaker 234:05 - 34:24
所以对于最终状态下的那个市场人员或者法务人员来说,这其实是一个非常简单的界面,我们告诉他们的方式就是:你只是在和 Claude 对话。但在底层,其实是很多很多个 Claude 在彼此协作,最终走到那个阶段——也就是由 Claude 们自己去完成那些更复杂的工作,而这些工作并不一定是人类必须亲自理解或解析的。
Speaker 134:24 - 34:30
Interesting. You guys just launched multi agent orchestration. What are the coolest what are the coolest things that people are doing with that?
Speaker 134:24 - 34:30
有意思。你们刚刚发布了 multi-agent orchestration(多智能体编排)。大家拿它在做的最酷的事情有哪些?
Speaker 234:32 - 35:05
One of the more interesting ones is, like, I think people are using it to like construct sort of different harness techniques. And that one I'm personally very excited by, because like there's different techniques that people have experimented with where, you know, like for example, we recently did like the advisor strategy one. But really if you were to genericize it, you just separate like execution from advice. And there's also one where you can have like two, you know, modes where there are one is generating someone something and the other one's adversarial to it. And then there could also be sort of like, you know, you split it into a bunch of different like little tiny pieces and then they kind of recombine.
Speaker 234:32 - 35:05
其中一个更有意思的方向是,我觉得大家在用它构建各种不同的 harness(测试/驱动框架)技术。这个我个人特别兴奋,因为人们已经尝试过很多不同的方法。比如说,我们最近做过 advisor strategy(顾问策略)那个。但如果把它泛化一下,本质上就是把 execution(执行)和 advice(建议)分离开来。还有一种做法是你可以设置两种模式:一种负责生成某个东西,另一种则对它采取 adversarial(对抗式)角色。再比如,也可以把任务拆成很多很多不同的小块,然后再把它们重新组合起来。
Speaker 235:06 - 35:23
And then there's ones where maybe it's kind of something closer to, like, best of end kind of, like, style of thing. And then there's so many more. And, like, in each one of these different types of, like, architectures or strategies, they are good for very specific use cases. So some of them are much better for, like, deep research or wide research type of, style use cases. Right?
Speaker 235:06 - 35:23
然后还有一些可能更接近某种 best of end 风格的东西。还有很多很多种。而且,在这些不同类型的架构(architecture)或策略(strategy)里,它们各自都适合非常具体的 use case(使用场景)。所以其中一些会更适合做 deep research(深度研究)或 wide research(广度研究)这类场景,对吧?
Speaker 235:23 - 35:50
And there are others that are like, these are, like, the kind of ones where they all sort of swarm together are better for, bug hunting, for example. And so, like, that's, like, really cool to see that, like, if we can make the primitives very LEGO like, then people can put them together to solve things at a slightly higher form factor, which is more like an architecture or like a strategy. And they get much more, like, interesting results out of that. And that's, like, really exciting to see because it also suggests that you can actually hill climb at, multiple layers, of abstraction.
Speaker 235:23 - 35:50
还有另外一些,则更像是那种大家一起 swarm(群体协作)起来的方式,比如说,会更适合用于 bug hunting(漏洞/缺陷排查)。所以,看到这一点真的很酷:如果我们能把这些 primitives(原语)做得像 LEGO 一样,那人们就可以把它们组合起来,以一种稍微更高层次的 form factor 来解决问题,这更像是一种 architecture(架构)或 strategy(策略)。而他们也会从中得到更有意思的结果。这一点真的很令人兴奋,因为它也说明,你实际上可以在多个 abstraction(抽象)层上进行 hill climb(爬坡式优化)。
Speaker 135:50 - 35:54
How do you know if an agent is successful? How do you measure success for an agent?
Speaker 135:50 - 35:54
你怎么判断一个 agent(智能体)是否成功?你又如何衡量一个 agent 的成功?
Speaker 235:54 - 36:21
Yeah. I mean, there's, like, evals and stuff like that, which everyone has talked about, like, ad nauseam. One direction that we we really like is, like, this kind of verifiable outcome. We've been somewhat opinionated on that one and it's almost like in the absolute end state of, you know, we talked a little bit about what's a platform, the end of things. Going from that philosophy, it's like our kind of principle of like maybe the end state of some of these things is that everything should kind of compress down to an outcome and like a budget.
Speaker 235:54 - 36:21
对。我的意思是,当然有 evals(评测)之类的东西,这些大家已经翻来覆去讲得太多了。我们非常喜欢的一个方向,是这种可验证的 outcome(结果)。我们在这一点上是有些明确立场的。这几乎就像是某种最终状态——你知道,我们之前也稍微谈过什么是 platform(平台),以及事情的终局是什么。沿着那个理念走下去,我们的一个原则有点像是:也许这些东西的最终状态,是一切都应该被压缩成一个 outcome 和一个 budget(预算)。
Speaker 236:21 - 36:40
And that's probably like about it. And everything else should be figured out for you to kind of resolve exactly across those parameters. And so for us, we're kind of, yes, we still have evals. We have a lot of these other things that we measure, that are domain specific, like, you know, coding evals would be like, you might want to measure like just the actual PR getting merged. Those are more verifiable.
Speaker 236:21 - 36:40
大概也就这些了。其他一切,都应该围绕这两个参数,替你被推导和解决清楚。所以对我们来说,没错,我们仍然会做 evals。我们也会衡量很多别的东西,那些往往是 domain specific(领域特定)的。比如说,coding evals(编程评测)里,你可能会想衡量 PR 是否真的被 merge 了。这类指标就更可验证。
Speaker 236:40 - 36:51
But as we get to the place where, you know, like an outcome is actually a spec that you are just as a human able to define, and our ability to interpret that and regrade itself over and over is closer to what we care about.
Speaker 236:40 - 36:51
但当我们走到这样一个阶段:一个 outcome 实际上就是一份 spec(规格说明),而你作为人类是能够定义它的;同时,我们对它进行理解、并一遍又一遍地自我重新评分的能力,会更接近我们真正关心的东西。
Speaker 136:51 - 36:54
Claude, make me a billion dollars. Your budget is $10.
Speaker 136:51 - 36:54
Claude,帮我赚到十亿美元。你的预算是 10 美元。
Speaker 236:54 - 36:57
Exactly. And then say no mistakes. Go. Go. Exactly.
Speaker 236:54 - 36:57
没错。然后再说一句,不许出错。开始吧。开始吧。对,就是这样。
Speaker 136:58 - 36:58
Maybe Mythos could
Speaker 136:58 - 36:58
也许 Mythos 可以做到。
Speaker 336:58 - 36:59
do that. Yeah.
Speaker 336:58 - 36:59
对,可以。
Speaker 237:01 - 37:01
And
Speaker 237:01 - 37:01
还有,
Speaker 137:03 - 37:50
then one of the things that we've been running into that I'm curious if you have a solution for is agents, like, get outdated pretty quickly, sometimes because there's no human attached to them, sometimes, like, they're just running an old model or there's an old or in an old architecture or whatever. And it feels like there needs to be a, end of life cycle for agents. Like, we've talked about having, like, a little, like, funeral for them and, like, having, like, a little page on our website that's, here's all the decommissioned agents and like, how do you manage especially in a in a a really big company, how do you manage the all of the agents that are, like, sort of out there, but and maybe they're, like, in Slack, like, pinging stuff once a week, but you're like, this is super stale. How do you make sure that, you you retire them as quickly as you are making them?
Speaker 137:03 - 37:50
那么,我们最近遇到的一个问题是——我很好奇你们有没有解决方案——agent 往往会很快过时。有时是因为背后没有人类负责,有时则只是因为它们跑的是旧 model,或者基于旧 architecture,诸如此类。感觉上,agent 应该有一个生命周期的终点。比如我们还讨论过,是否该给它们办个小小的“葬礼”,或者在我们网站上做一个页面,列出这些已经 decommissioned(退役)的 agent。尤其是在一家非常大的公司里,你要怎么管理那些已经“散落在外”的 agent?也许它们还在 Slack 里每周 ping 一次消息,但你会觉得,这东西已经非常陈旧了。你怎么才能确保,你退役它们的速度,能跟得上你创建它们的速度?
Speaker 337:50 - 38:37
So one of the things we have actually done is we have made skills that help you do things like upgrade to a new model when a new model comes out. Right? Like, we've actually put a good amount of work into making it easier to do exactly what you're talking about. And I think maybe some of the most like AGI pill people are like running agents that are monitoring their agents to see if their agents are like outdated and in need of that sort of stuff. But I think for the way that we like to talk to customers who ask us this question, I do think the most interesting instantiation of this is there's a new model, and now I need to go upgrade my agents or maybe be done with those agents because, you know, the new model enables me to build agents that are way more powerful and do more interesting things than the old agents did.
Speaker 337:50 - 38:37
所以,我们实际上已经做过的一件事,就是做了一些 skills,帮助你在新 model 发布时升级到新 model,对吧?我们确实投入了不少工作,让你更容易做到你刚才说的这些事。我觉得,也许那些最“AGI pill”的人,已经在运行一些 agent 来监控他们的 agent,看看它们是否已经过时、是否需要这类处理。但我认为,就我们通常如何回答客户的这个问题而言,这里面最有意思的一种具体情形还是:出了一个新 model,于是我得去升级我的 agent,或者干脆结束掉那些 agent,因为新 model 让我能够构建出比旧 agent 强大得多、也更有意思得多的 agent。
Speaker 338:37 - 39:09
But I think that upgrade process and that migration process is something people have had to wrap their heads around as like, it's like a breaking change and I have to put actual energy into making that work. And obviously, sorry to talk about evals, but if you have evals, this process is easier and things like this. But I do think that's one of the things we've tried to do is how do we give you skills and how do we give you the right just tools to make that process easier? And then you could go be AGI pilled and choose to actually automate more of that with more agents. Yeah.
Speaker 338:37 - 39:09
但我认为,这个升级过程和迁移过程,是人们必须真正理解的一件事:它就像一次 breaking change(破坏性变更),我必须投入实际精力,才能让它运转起来。显然,抱歉又提到 evals,但如果你有 evals(评估),这个过程以及类似的事情都会更容易。不过,我确实认为,我们一直在努力做的一件事就是:我们怎样给你提供 skills,以及恰当的工具,让这个过程更简单?然后,如果你愿意变得更“AGI pilled”,你也可以选择用更多 agent 把其中更多环节自动化。对。
Speaker 139:11 - 39:23
So a year from now, we're back at Code with Claude. Where do you think the platform will be? What will I be able to do, and how it will be different from what I can do today?
Speaker 139:11 - 39:23
那么,如果一年后我们再回到 Code with Claude,你觉得这个平台会发展到什么阶段?到时候我能做什么?它会和我今天能做的事情有什么不同?
Speaker 239:24 - 39:25
Do wanna go first?
Speaker 239:24 - 39:25
你想先说吗?
Speaker 339:25 - 39:30
You can go first. A year is a long time. It's in this in this industry, especially.
Speaker 339:25 - 39:30
你先说吧。一年是很长的时间,尤其是在这个行业里。
Speaker 139:30 - 39:34
How close are we to Klut making me a billion dollars? Is that's really what I'm
Speaker 139:30 - 39:34
我们离 Klut 给我赚到十亿美元还有多近?这真的是我在
Speaker 239:34 - 39:36
asking? I think we're yes.
Speaker 239:34 - 39:36
问的问题吗?我觉得是的。
Speaker 339:36 - 39:36
We probably won't
Speaker 339:36 - 39:36
我们大概不会
Speaker 139:36 - 39:37
be sitting here.
Speaker 139:36 - 39:37
只是坐在这里。
Speaker 339:37 - 39:38
Yes. Yes. We will
Speaker 339:37 - 39:38
对。对。我们会
Speaker 239:38 - 39:56
be asking Klut what this is. I mean, yeah, like, we wanna get closer and closer to that that state where I think we we kind of okay. So a couple of things. I think in a year from now, I mean, one thing that we'd love to get really, really close to is actually that kind of, like, simplicity. And this might be a significantly higher order of abstraction.
Speaker 239:38 - 39:56
去问 Klut 这是什么。我是说,没错,我们想越来越接近那种状态——我觉得我们大致算是……好吧。所以有几件事。我想,如果是一年之后,我们非常非常希望能真正接近的,其中一点其实就是那种类似“简单性”的东西。而这可能会是一个显著更高阶的 abstraction(抽象)层级。
Speaker 239:56 - 40:14
I don't know what the form factor will look like or whatever, but the kind of parameters we will care for from users will be that outcome. And of course, it has to be verifiable. There are some parameters that that have to be restrictive and and the budget. And I think like we'd want to experiment with with directions where cloud actually gets so good at understanding itself. It it figures out what model you should be using.
Speaker 239:56 - 40:14
我不知道它的 form factor(形态)最终会是什么样,诸如此类,但我们真正关心、希望用户提供的那类参数,会是那个 outcome(结果)。当然,这必须是可验证的。有一些参数必须带有约束性,还有 budget(预算)。而我觉得,我们会想去试验一些方向,在那些方向上,cloud 实际上会变得非常擅长理解它自己。它会自己判断你应该使用什么 model(模型)。
Speaker 240:14 - 40:27
It figures out how to spin up all the sub agents. I actually don't think you need to think so much about harness engineering in that world. Today, know, you don't have to think so much more aggressively about, like, tool construction, for example, that we've kind of made that a little easier and you get it to lead a little bit of that scaffolding.
Speaker 240:14 - 40:27
它会判断如何启动所有的 sub agents(子 agent)。我其实认为,在那个世界里,你不需要再去想那么多 harness engineering(编排/测试框架工程)了。今天,比如说,你也不必再那么激进地去考虑 tool construction(工具构建)之类的事情,因为我们已经让这件事稍微容易了一些,并且你可以让它自己承担一部分那种 scaffolding(脚手架式搭建)工作。
Speaker 140:27 - 40:28
Less prompt engineering too.
Speaker 140:27 - 40:28
提示词工程(prompt engineering)也会更少。
Speaker 240:28 - 40:45
Yeah. Exactly. Exactly. And I think if you just keep going up that stack, like, today, a lot of the innovation is happening at this kind of, like like like, really high level, almost like harness architecture like level, which is really fun. But I think a lot of that honestly also kind of goes away where you almost like don't have to think so much about like model selection.
Speaker 240:28 - 40:45
对,没错,没错。我觉得如果你继续沿着那一层层往上看,比如说今天,很多创新都发生在这种有点像非常高层、几乎像 harness architecture(编排/支撑架构)这一层的地方,这很有意思。但我认为,老实说,其中很多东西也会逐渐消失,变成你几乎不再需要花那么多心思去考虑 model selection(模型选择)。
Speaker 240:45 - 41:11
You don't have to think so much about what kind of architectures are there because we probably would have like, gone through enough iterations with Claude where Claude is actually able to understand itself enough, that it can almost, like, write itself on the fly to figure out what is necessary in that kind of, like, two parameter world of, like, outcome and budget. I don't know that we'll get there, like, in a year, but I feel like we might be able to do, like, the outcome part of that with, like, maybe, you know, some bars of some error bars on on the budget side.
Speaker 240:45 - 41:11
你也不必再去想到底有哪些架构,因为我们大概会和 Claude 经过足够多轮迭代,到那时 Claude 实际上已经足够理解自己,几乎可以在运行过程中实时把自己“写出来”,从而判断在那种只有两个参数——outcome(结果)和 budget(预算)——的世界里,到底需要什么。我不确定我们会不会在一年内走到那一步,但我觉得至少 outcome 这一部分,也许是有可能做到的;至于 budget 这一侧,可能还只能给出带有一定误差范围的估计。
Speaker 141:11 - 41:12
Really cool.
Speaker 141:11 - 41:12
真酷。
Speaker 341:12 - 41:29
Yeah. Okay. That was really cool. I'm gonna give you a slightly more boring answer, which is in that world, if Claude is like on the fly, your agents on the fly are like becoming what they need to become in order for you to do what you're trying to do, the platform has to like seriously scale. That is.
Speaker 341:12 - 41:29
对。好。这真的很酷。不过我给你一个稍微没那么刺激的答案:在那样的世界里,如果 Claude 能够实时调整,而你的 agents(智能体)也能实时变成完成你目标所需的样子,那么这个平台就必须真正实现大规模扩展。就是这样。
Speaker 341:29 - 42:24
And so I do think some of this will be what are the right abstractions that actually enable that, right, like somewhere on the primitive to higher order realm, right, but I do think so much of what our team is going to be doing is making sure that the tokens that people want to come in and out of Claude, are going to be able to come in and out of Claude because our system is scaled to meet not just the demand, but, like, in that world where it's just like you have agents that are like literally constantly running and recreating themselves and and doing this sort of work. You just need a system that, you know, can handle long running requests, can handle a bunch of differently shaped things. And so I think for us, it's gonna be I never want the ability of the platform itself to be able to scale to get in the way of what people would otherwise be able to accomplish with these things. And so I think that's something that's gonna probably be very front of mind when we're talking in a year.
Speaker 341:29 - 42:24
所以我确实认为,其中一部分会是:什么样的 abstractions(抽象层)才真正能支持这一点,对吧?也就是在 primitive(底层原语)到 higher order(高阶能力)这个光谱上的某个位置。但我也认为,我们团队很大一部分工作会是确保人们希望进出 Claude 的 tokens(这里指文本/数据单元)真的能够顺畅地进出 Claude,因为我们的系统必须扩展到不仅能满足需求,而且还能应对那样一种世界:agents 几乎是在持续不断地运行、自我重构,并完成这类工作。你需要的是一个能够处理长时间运行请求、能够处理各种不同形态任务的系统。所以对我们来说,重点会是:我绝不希望平台本身的扩展能力,成为人们原本可以借助这些东西实现目标的障碍。所以我觉得,这很可能会成为我们一年后讨论时最优先考虑的事情之一。
Speaker 142:24 - 42:28
Awesome. I'm excited. Thank you so much for joining. I really learned a lot.
Speaker 142:24 - 42:28
太棒了,我很期待。非常感谢你们来参加,我确实学到了很多。
Speaker 242:28 - 42:29
Thanks for having us.
Speaker 242:28 - 42:29
谢谢邀请我们。
Speaker 442:55 - 43:14
Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life.
Speaker 442:55 - 43:14
每一期都像一场情绪、洞见与欢笑交织的过山车之旅,让你全程屏息凝神、意犹未尽。它不只是一档节目,更是一场驶向未来的旅程,而 Dan Shipper 就是这艘宇宙飞船的船长。所以,帮自己个忙吧。点个赞,狠狠干下订阅,然后系好安全带,准备迎接你人生中最精彩的一段旅程。
Speaker 443:14 - 43:19
And now without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love
Speaker 443:14 - 43:19
那么,闲话不多说,我就直说了,Dan,我已经彻底、无可救药地爱上了
Speaker 143:19 - 43:19
with you.
Speaker 143:19 - 43:19
你。
原文 ↗https://www.youtube.com/watch?v=lLypHkIVLqc
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers