BuildSpeak每日 builder 文摘
今日归档生词本关于
🎙 播客AI & I by Every· 2026 年 5 月 20 日· 8,901 词 · 约 45 分钟

Inside Stainless: The Developer Tools Startup Anthropic Just Bought for $300 Million

SPACE 播放 / 暂停·←→ 上一句 / 下一句
Speaker 100:00 - 00:31
The internet runs on computers talking to each other, but its entire architecture was built for a pre AI world. Now we're trying to hook AI up to the internet with MCP, Model Context Protocol, which turns any website or web service into a set of tools that an AI can use natively to get worked on. And the software companies that learn how to do MCP well are going to win over the next decade. That's why I brought Alex Rattray, the founder and CEO of Stainless, onto the show. Stainless' job is to help computers talk to each other.
Speaker 100:00 - 00:31
互联网的运行依赖于计算机彼此通信,但它的整个架构都是为一个 pre AI 时代构建的。现在我们正试图通过 MCP,也就是 Model Context Protocol,把 AI 接入互联网;它可以把任何网站或 web service(网络服务)转变成一组 AI 能原生使用来完成工作的工具。而那些学会把 MCP 做好的软件公司,将在未来十年胜出。这就是为什么我请来了 Stainless 的 founder and CEO Alex Rattray 做客节目。Stainless 的工作,就是帮助计算机彼此通信。
Speaker 100:31 - 00:59
They make the API and SDKs for all the big companies that you know about, like OpenAI and Anthropic, and they're starting to build MCP servers too. So Alex and I get into the nitty gritty of what the future of MCP looks like, how to design good MCPs, why MCPs are actually really hard to scale and possibly insecure. And we try to figure out together what a better model for allowing AIs to use the Internet might look like. This is a great episode. Alex is a good friend of mine.
Speaker 100:31 - 00:59
他们为你熟知的那些大公司制作 API 和 SDKs,比如 OpenAI 和 Anthropic,而且他们现在也开始构建 MCP servers 了。所以我和 Alex 深入聊了聊 MCP 的未来会是什么样、怎样设计好的 MCP、为什么 MCP 实际上非常难以 scale(扩展),而且可能并不安全。我们还一起试着弄清楚,什么样的模型才能更好地让 AI 使用互联网。这期节目非常精彩。Alex 是我的好朋友。
Speaker 100:59 - 01:15
Let's dive in. Alex, welcome to the show.
Speaker 100:59 - 01:15
我们开始吧。Alex,欢迎来到节目。
Speaker 201:16 - 01:18
Thanks, Dan. It's, really exciting to be here.
Speaker 201:16 - 01:18
谢谢,Dan。能来这里我真的很兴奋。
Speaker 101:18 - 01:36
It's good to have you. So for people who don't know, you are the founder and CEO of Stainless, which is the API company. You make APIs for companies like OpenAI and Anthropic, and just name your big company that you might use their API. Stainless is probably behind it. Before that, you worked at Stripe doing their API.
Speaker 101:18 - 01:36
很高兴你能来。给不了解的人介绍一下,你是 Stainless 的 founder and CEO,这是一家 API 公司。你们为 OpenAI 和 Anthropic 这样的公司制作 API;随便说一家你可能会用到其 API 的大公司,背后很可能都有 Stainless。在那之前,你在 Stripe 负责他们的 API。
Speaker 101:36 - 02:01
Surprise. And before that, most importantly, we were very good friends in college, we remained good friends. And we were both starting companies in college, I'm a tiny investor in stainless. But it's been really, really fun to watch your journey and get to get to hang out together so much over the years. And, I'm just very excited to bring you on to talk about AI and and what you're doing at stainless.
Speaker 101:36 - 02:01
没错。再往前说,更重要的是,我们大学时就是很好的朋友,后来也一直保持着这份友谊。我们俩在大学时都在创业,而且我是 stainless 的一个很小的投资人。不过,这些年看着你的历程一路走来、也一直有很多机会一起相处,真的非常非常有意思。而且,我也很高兴请你来聊聊 AI,以及你们在 stainless 正在做的事情。
Speaker 202:01 - 02:12
Thanks, Dan. Yeah. It's it's been really fun over the years. I mean, you know, when when we were in college, I was working on a start up. You were working on a start up.
Speaker 202:01 - 02:12
谢谢,Dan。是啊,这些年真的很有意思。我是说,你知道的,我们上大学那会儿,我在做一家 start up,你也在做一家 start up。
Speaker 202:12 - 02:38
You you had a conference room at a venture capitalist office as your office, and you let me crash there with my co founder and team. And we were just like on the other side of the conference table hacking away into the evening. And very fond memories of those days. And these days, it's not every evening, but on the weekends, whatever, same thing is still happening. You don't see that every day.
Speaker 202:12 - 02:38
你当时把一家 venture capitalist(风险投资人)办公室里的会议室当作自己的办公室,你还让我和我的 co-founder 以及团队一起借用那个地方。我们就在会议桌另一边一直埋头写代码到晚上。我对那段日子有非常美好的回忆。现在虽然不是每天晚上都这样了,但周末之类的时候,本质上还是同样的事还在发生。这可不是每天都能见到的。
Speaker 202:38 - 02:46
It's really a nice feeling. And it's been great to see everything happening with every, along the way.
Speaker 202:38 - 02:46
这真的是一种很美好的感觉。而且一路走来,看到每件事不断发生、不断推进,也真的很棒。
Speaker 102:46 - 03:13
Thank you. As I say, it started from the bottom, now we're here. I and, yeah. I mean, I the thing that I always say when people when I run into people and they ask me about you, in order to embarrass you, I I just talk about how you're the only person that I know of who has consistently run barefoot through the streets of Philadelphia. Because when we first met, you were you were not a fan of shoes and you were a fan of running.
Speaker 102:46 - 03:13
谢谢。就像我常说的,我们是从底层一路走到今天的。是的。我的意思是,每次我遇到别人,他们问起你时,为了让你尴尬一点,我都会说,你是我所知道的唯一一个会一直光着脚跑过 Philadelphia 街头的人。因为我们刚认识的时候,你不太喜欢穿鞋,但你很喜欢跑步。
Speaker 103:13 - 03:14
You wanna talk about that?
Speaker 103:13 - 03:14
你想聊聊这件事吗?
Speaker 203:15 - 03:31
Yeah. It wasn't that I didn't like the concept of shoes. It's that I couldn't find a good pair. And at a certain point, you know, it's like I was running through Nikes and they would they would bust open every few months. I think was actually going on as I had really wide feet.
Speaker 203:15 - 03:31
对。倒也不是我不喜欢“鞋”这个概念,而是我一直找不到一双合适的鞋。到某个时候,你知道的,我穿 Nikes 跑步,但每隔几个月鞋就会裂开。我想真正的问题其实是我的脚特别宽。
Speaker 203:31 - 03:51
And was I I was buying probably narrow shoes, but they would shoes would constantly get ruined. And, you know, on a college budget, it's just like, this is this is it. This is no good. And eventually, I decided, okay, the longer you wear your shoes, the the more worn out they get. But the longer you just wear your feet, the tougher they get.
Speaker 203:31 - 03:51
而我当时买的鞋大概又偏窄,所以鞋总是很快就坏掉。再加上大学时期预算有限,就会觉得,这样下去不行,这太糟了。最后我就决定,好吧,鞋子穿得越久,只会越磨损;但脚用得越久,反而会越结实。
Speaker 203:52 - 03:53
So
Speaker 203:52 - 03:53
所以
Speaker 103:54 - 03:56
the longer you wear your feet.
Speaker 103:54 - 03:56
脚用得越久。
Speaker 203:59 - 04:12
Try it out. Try this to help. What could go wrong? I actually currently have a really annoying splinter in one of my feet, that I was, and so don't actually, try this at home. But Are you still running barefoot?
Speaker 203:59 - 04:12
你可以试试看。试试这个,也许有帮助。能出什么问题呢?不过其实我现在一只脚上正扎着一根特别烦人的刺,所以还是别真的在家里试这个了。不过——你现在还光脚跑步吗?
Speaker 204:12 - 04:15
No. No. This is just from around the house.
Speaker 204:12 - 04:15
不是,不是。这就是家里附近弄的。
Speaker 104:16 - 04:18
I see. Dangerous.
Speaker 104:16 - 04:18
我懂。挺危险的。
Speaker 204:18 - 04:28
Yeah. Yeah. But see, that's the thing. If I had been going around on the asphalt without socks on, then my feet would have been tougher, and I'd have no splinter.
Speaker 204:18 - 04:28
对,对。但你看,关键就在这儿。如果我之前一直不穿袜子在柏油路上走来走去,那我的脚底就会更耐磨,也就不会扎进木刺了。
Speaker 104:30 - 04:44
So when you're not running barefoot, you're running you're running stainless. So you're running stainless. And so how how many people are you? You know, you're you're around 50. Right?
Speaker 104:30 - 04:44
所以你不赤脚跑的时候,就是穿着 Stainless 跑。也就是说你是在跑 Stainless。那你们现在有多少人了?大概 50 人左右了,对吧?
Speaker 204:44 - 04:45
Just about. Yeah.
Speaker 204:44 - 04:45
差不多,没错。
Speaker 104:46 - 05:07
That's that's pretty wild. And you started Stainless in a pre AI world, and now we're in an AI world. And I think you have some ideas for what the future of AI is going to be and maybe how how APIs fit into that, maybe how MCPs fit into that. Do you wanna, like, paint a little bit of a picture for us about where we're going?
Speaker 104:46 - 05:07
这真的挺不可思议的。你是在前 AI 时代创办 Stainless 的,而现在我们已经进入了 AI 时代。我想你对 AI 的未来会是什么样子,以及 API 会怎么融入其中,也许还有 MCP 会怎么融入其中,应该都有一些想法。你愿意给我们大致描绘一下我们将走向哪里吗?
Speaker 205:07 - 05:19
Yeah. Would love to. So to start, like, what's an API? Not everybody's familiar with that. So it stands for Application Programming Interface.
Speaker 205:07 - 05:19
好啊,很乐意。所以先从最基础的开始:什么是 API?不是每个人都熟悉这个。它的全称是 Application Programming Interface(应用程序编程接口)。
Speaker 205:19 - 05:21
There will not be a quiz. Right, Dan? No quizzes?
Speaker 205:19 - 05:21
不会有小测验的,对吧,Dan?没有小测验吧?
Speaker 105:22 - 05:23
No, no quizzes.
Speaker 105:22 - 05:23
不,不做小测验。
Speaker 205:23 - 05:42
Great. Basically, it's how one computer program talks to another computer program. It's computers talk to computers, how apps talk to apps. And so APIs are the dendrites of the Internet. Dendrites are where your neurons connect and actually exchange information with each other.
Speaker 205:23 - 05:42
很好。基本上,它就是一个计算机程序如何与另一个计算机程序对话。也就是计算机与计算机对话、app 与 app 对话的方式。所以,API 是互联网的 dendrites(树突)。Dendrites 就是你的 neurons(神经元)彼此连接并真正交换信息的地方。
Speaker 205:42 - 06:10
So if you have like two neurons in your brain but they're not talking to each other, you're actually not thinking, right? There is no thought happening in a brain without connections between neurons. And if you think about the internet, if all these servers in the cloud aren't talking to each other, wouldn't have internet, right? Like there's nothing going on. If programs internet software is doing nothing without APIs, without connections to other programs.
Speaker 205:42 - 06:10
所以,如果你大脑里有两个 neurons,但它们彼此不交流,那你实际上就没有在思考,对吧?没有神经元之间的连接,大脑里就不会产生思维。如果你再想想互联网,假如 cloud 里的这些服务器彼此不通信,那就不会有互联网,对吧?就像什么都不会发生一样。互联网软件里的程序如果没有 API、没有与其他程序的连接,就什么也做不了。
Speaker 206:10 - 06:56
And so it's really fundamental to the mesh of pretty much all modern software. Everything that we think of when we think about technology at this point, APIs are kind of at the heart and center of that just like dendrites are the center of the mesh of the brain and how we think. And Seamless' mission from day one was sort of to make it easier for computers to talk to computers. And it's the long running trend of technology to have more automation, right? Automation is what we mean when we say, Okay, we're going to apply technology to that.
Speaker 206:10 - 06:56
所以,它其实是几乎所有现代软件这张网络中非常基础的一部分。如今我们想到技术时所想到的一切,API 某种程度上都处在核心和中心位置,就像 dendrites 处在大脑网络的中心、决定我们如何思考一样。而 Seamless 从第一天起的使命,大致就是让计算机与计算机之间的对话变得更容易。技术的长期趋势一直都是更多的 automation(自动化),对吧?当我们说“要把技术应用到某件事上”时,通常指的就是 automation。
Speaker 206:56 - 07:25
We're generally going to be making things more efficient. And APIs are how most business to business interactions in some format or another become become real, become automated. And what we see with the the rise of AI is that there is a a new a new computer has entered the chat. Right? There's a new there's a new kind of system that can talk to other systems, or at least we would like it to be able to.
Speaker 206:56 - 07:25
一般来说,我们是在让事情变得更高效。而 API 正是大多数 business-to-business 交互以这样或那样的形式真正落地、真正实现 automation 的方式。随着 AI 的兴起,我们看到的是,一个新的计算机加入了聊天。对吧?出现了一种新的系统,它可以和其他系统对话,或者至少我们希望它能够做到这一点。
Speaker 207:25 - 07:55
You used to have either, you know, humans interacting with a computer through a user interface, a UI, or a computer acting with a computer through through an API. And now we have LLMs interacting with computers. Right? And what's that through? And I'm sure anyone familiar with, you know, with every and and and his regular listeners is gonna be familiar with MCP model context protocol, which is a system for connecting LLMs to computers, broadly speaking.
Speaker 207:25 - 07:55
以前,基本上只有两种情况:要么是人类通过 user interface(用户界面),也就是 UI,与计算机交互;要么是计算机通过 API 与另一台计算机交互。而现在,我们有了 LLM 与计算机交互。对吧?那这是通过什么来实现的?我相信,凡是熟悉 every 以及他的常规听众的人,应该都熟悉 MCP model context protocol,它大致来说是一种把 LLM 连接到计算机系统的机制。
Speaker 207:56 - 08:27
And it's an area that we're investing in at Stainless. It's really, I think, part of our core mission of, like I said, make it easy for computers to talk to computers. And we've invested a lot of time. At Stainless, the core product that we first brought to market is software development kits, SDKs. And so these are ways of saying, okay, Stripe has this great REST API, you know, you can send JSON over HTTP and get back JSON over HTTP.
Speaker 207:56 - 08:27
这是我们在 Stainless 正在投入的一个领域。我认为,它确实也是我们核心使命的一部分——就像我说的,让计算机更容易与计算机对话。我们已经投入了很多时间。在 Stainless,我们最早推向市场的核心产品是 software development kits,也就是 SDKs。所以,它们是在说:好,Stripe 有一个很棒的 REST API,你可以通过 HTTP 发送 JSON,再通过 HTTP 收回 JSON。
Speaker 208:29 - 09:09
And if you want that to be really convenient, you're gonna use the Stripe Python library, the Stripe Python SDK. So you can go if you're a Python developer, you'll go pip install Stripe and then in your application code, you'll write stripe.customers.create And all of a sudden, you have a nice new customer object in sort of your Stripe database and you're off to the races. Or stripe.charges.create in the old days to charge a credit card. And SDKs are what gives developers that easy way to to to interface with an API. What's the thing that gives LLMs an easy way to interface with an API?
Speaker 208:29 - 09:09
如果你想让这件事变得非常方便,你就会使用 Stripe Python library,也就是 Stripe Python SDK。所以,如果你是一个 Python 开发者,你会先执行 pip install Stripe,然后在你的应用代码里写 stripe.customers.create,突然之间,你就在某种意义上的 Stripe 数据库里得到了一个很不错的新 customer object,然后就可以快速推进了。或者在以前,用 stripe.charges.create 来给信用卡收费。SDKs 给开发者提供了这种轻松对接 API 的方式。那什么东西能给 LLM 提供一种轻松对接 API 的方式呢?
Speaker 209:09 - 09:56
And you might say MCP, and in a sense, you'd be right. But what we're seeing so far as MCP is rolling out into the world and people are experimenting with it and trying it out, is that it's not working so great. Like, there's it's it's difficult to deliver on what I see as the core vision of of what's so exciting about MCP, which is just like a a dashboard and a user interface lets you click around, see a bunch of stuff, fill out forms, click buttons, do things. Anything that you would do while you're inter interacting with the software, do through the user interface generally. But LLMs interacting with through MCP, it tends to be much more restricted.
Speaker 209:09 - 09:56
你也许会说是 MCP,从某种意义上讲,你说得没错。但到目前为止,随着 MCP 在现实世界中逐步推出,人们开始拿它做实验、试用它,我们看到的是,它的效果并没有那么理想。比如说,很难真正实现我所理解的 MCP 最核心、也最令人兴奋的愿景:就是有一个 dashboard(仪表盘)和 user interface(用户界面),你可以在里面到处点一点、看到一堆内容、填写表单、点击按钮、完成各种操作。基本上,凡是你平时在和软件交互时会通过 user interface 去做的事情,都应该能这么做。但当 LLM(大语言模型)通过 MCP 进行交互时,通常会受到更多限制。
Speaker 209:56 - 10:02
You can only do a few little things. There's usually not a ton of tools that you're going to be exposing to the models.
Speaker 209:56 - 10:02
你通常只能做少数几件小事。一般来说,你不会向模型开放特别多的 tools(工具)。
Speaker 110:04 - 10:53
And and just just to just to stop you there. So I I think what I'm hearing you say is what what the m what MCP does is just like a a website is built for humans be used, MCP is sort of the equivalent, and you can think of it in in certain ways of exposing a set of tools for the model that it can it can use to perform certain functions. Just like you might click a button on a website, the MCP gives to the model a bunch of things it can click on or use to get work done. So an example might be, you know, an a Gmail MCP has like a send mail tool or like a compose mail tool or a read inbox tool, that kind of thing. And instead of a human going on the Gmail website and doing it, it's the it's the LLM is like, you know, essentially logging in and and and using it itself, and it's a it's a native interface for for language models.
Speaker 110:04 - 10:53
我先打断你一下。所以,我觉得我听到你的意思是:MCP 所做的事情,就有点像 website(网站)是为人类使用而构建的一样,MCP 可以算是某种对应物;从某些角度看,它是在向模型暴露一组 tools,让模型能够用它们来执行某些功能。就像你可能会在一个网站上点击某个按钮一样,MCP 会给模型一堆它可以“点击”或使用的东西,来把事情做完。比如说,一个 Gmail MCP 可能会有 send mail tool、compose mail tool,或者 read inbox tool,类似这种。区别在于,不是人类去 Gmail 网站上操作,而是 LLM 本身像是在“登录”并亲自使用它;它是一个面向 language models(语言模型)的原生 interface(接口)。
Speaker 110:53 - 10:58
But you're saying that that's not working that well. Can you tell me more about that?
Speaker 110:53 - 10:58
但你的意思是,这种方式目前效果不太好。你能具体多讲讲吗?
Speaker 210:58 - 11:29
Yeah. So let's let's start actually with with kind of what I see is the big vision of MCP, and in some sense, the big vision of agentic AI in the first place. And I'll start with the most pedestrian example you can imagine. It's gonna be funny given some of our context, which is, let's say, you know, Dan walks into my store and buys a pair of stripy socks and maybe a few other things. And then the next day, hear back from Dan that there's something wrong, unfortunately.
Speaker 210:58 - 11:29
对。所以,我们先从我眼中 MCP 的宏大愿景说起,某种意义上,这其实也是 agentic AI(代理式 AI)最初的宏大愿景。我先举一个你能想到的最普通不过的例子。结合我们前面的语境,这个例子可能还有点好笑:比如说,Dan 走进我的店里,买了一双条纹袜子,可能还顺手买了其他几样东西。然后第二天,我收到 Dan 的反馈,说很遗憾,东西出了点问题。
Speaker 211:29 - 12:15
It happens, you know. And I turn to someone on my team and I say, hey, can we refund Dan for those stripy socks he bought yesterday And send him a discount code for the next time he comes in with a little thank you note because we like to take care of our customers. This is like the most normal thing to do in software is some little task like this. And what you're going to do, what the member of my team would be doing would be opening up their internal admin and looking around for some things. They might go to the Stripe dashboard and try to look through the list of payments or the list of transactions or orders and try to find one that has someone named Dan which Dan, I don't know, there might be a bunch of Dan's try to look through the list of products in the order and see whether there were some Stripey socks in there.
Speaker 211:29 - 12:15
这种事总会发生嘛。于是我转头对团队里的某个人说,嘿,能不能把 Dan 昨天买的那双条纹袜子退款了?再给他发一个下次进店可用的 discount code(折扣码),顺便附上一张简短的感谢便条,因为我们一向很重视客户体验。这种小任务,在软件里其实是最常见不过的事情。而我的团队成员会怎么做呢?他们会打开内部 admin(管理后台),到处找相关信息。可能会进 Stripe dashboard,翻 payment(付款)列表,或者 transaction(交易)/order(订单)列表,试着找出一个名字叫 Dan 的人——可到底是哪个 Dan,我也不知道,毕竟可能有好几个 Dan。然后再去看这个订单里的 product(商品)列表,确认里面是不是有那双条纹袜子。
Speaker 212:15 - 12:24
That might be a few clicks required, depending. Find the right one. Then go to the screen where you can create a refund. Create a refund. Make sure it's the right amount.
Speaker 212:15 - 12:24
这中间可能还得点好几下,具体要看情况。找到正确的那一单之后,再进入可以创建 refund(退款)的页面,发起退款,并确认金额是对的。
Speaker 212:25 - 13:20
Then go and create that discount. And then take that discount code and send it over to some other SaaS app where you log in to send some mail automatically. And of course, if you step away from the consumer version of this to a business to business context, of course, you might be going into Salesforce and sending a Slack message to an account administrator, an account manager, so on and so forth. And in the normal course of work, it's just the most normal thing in the world to be doing having one task involve going through five different apps each time, 15 different clicks and scrolls and loading spinners just to do sort of like one simple thing. And the promise of AgenTik AI is to be able to take that same prompt I just said and type it into chat GPT or claud or whatever and say, hey, chatty, buddy, can you help refund my friend Dan?
Speaker 212:25 - 13:20
然后再去创建那个 discount(折扣)。接着把那个 discount code 拿出来,发送到另一个 SaaS app 里——你得登录那个系统,才能自动发邮件。当然,如果你把这个场景从面向消费者的版本切换到 business-to-business 的语境里,那你可能还要进 Salesforce,再给某个 account administrator(账户管理员)或 account manager(客户经理)发一条 Slack 消息,诸如此类。总之,在日常工作中,一项任务要来回穿梭五个不同的 app,每次点上 15 下,滚几次页面,看几个 loading spinner(加载转圈),最后只是为了完成一件很简单的小事,这实在太正常了。而 agentic AI 的承诺,就是把我刚才那段话同样作为 prompt(提示词)输入到 ChatGPT、Claude 之类的系统里,然后说一句:嘿,chatty,buddy,能帮我把我朋友 Dan 的退款处理一下吗?
Speaker 213:21 - 14:05
And just have the AI go off and do that. And basically go through these five different apps and the 15 different screens and the various different button presses to complete the task and then come back and say, great, it's done. That in order to do that, now that's there's only so many tool calls you have to make as an AI model to perform that exact linear chain of events. It's somewhat tractable. But if you think about this in the general case, you want the LLM to be able to do the you you want your agentic AI to be able to do anything that that human operator would have done.
Speaker 213:21 - 14:05
让 AI 直接去把这件事做完。基本上就是在这五个不同的 app、15 个不同的界面,以及各种不同的按钮点击之间一路操作,完成任务,然后回来告诉你:很好,已经完成了。要做到这一点,作为 AI model(模型),为了执行这样一条精确、线性的事件链,实际需要发起的 tool call(工具调用)数量是有限的。这在某种程度上还是 tractable(可处理)的。但如果从一般情况来想,你会希望 LLM 能够做到——你会希望你的 agentic AI(代理式 AI)能够做到——那个真人操作员本来能做的任何事情。
Speaker 214:05 - 14:47
And you would want them to be able to do it without having to wait for a bunch of JavaScript to load on a website or anything like that. And that means you need not only the Stripe create refund tool and the Stripe list transactions tool and the Stripe list products and look up customer and, you know, create discount tool. You need not only those tools, but you need everything that you can do in the Stripe dashboard, which is basically everything that you can do in the Stripe API. And that's actually a lot. Like, there are hundreds of different endpoints that you have access to in the Stripe API.
Speaker 214:05 - 14:47
而且你会希望它们能做到这一点时,不需要等待网站上一大堆 JavaScript 加载之类的过程。这意味着,你需要的不只是 Stripe create refund tool、Stripe list transactions tool、Stripe list products、look up customer,还有 create discount tool 这些工具。你需要的不仅是这些工具,而是 Stripe dashboard 里你能做的所有事情;而那基本上也就等于 Stripe API 里你能做的所有事情。实际上这可不少。比如说,Stripe API 里你能访问的 endpoint(端点)就有几百个。
Speaker 214:48 - 15:05
The Stripe dashboard is is is actually massive. It's a huge application. And if you were to take that list of tools today and go to an LLM and say, hey, here's our MCP definition for all of this. Here's a create refund tool. Here's a create transactions tool, so on and so forth.
Speaker 214:48 - 15:05
Stripe dashboard 实际上非常庞大。它是一个巨大的应用。如果你今天把那份工具列表拿出来,交给一个 LLM,然后说,嘿,这是我们为这一整套东西定义的 MCP。这里有一个 create refund tool。这里有一个 create transactions tool,诸如此类。
Speaker 215:06 - 15:15
And you tell it all about those tools. Here's the description. Here's all the different request properties that you can send. Here's the response properties you can get back. Here's all the documentation for each of those things.
Speaker 215:06 - 15:15
然后你把这些工具的全部信息都告诉它。这里是描述。这里是你可以发送的各种 request property(请求属性)。这里是你可能收到的 response property(响应属性)。这里是这些东西各自对应的全部文档。
Speaker 215:16 - 15:46
Everyone listening to this should already know you've just burned through your entire context budget. That's maybe hundreds of thousands of tokens just there, and pretty much translating the Stripe Open API spec directly over to MCP tools. And today's models not only can't handle that amount of context, it's a poor use of context because you have a lot else going on. But it's also confusing to the model. It's just too much to hold in your brain at one time.
Speaker 215:16 - 15:46
在听这段内容的每个人应该都知道,到这里你已经把整个 context budget(上下文预算)烧光了。光是这些可能就已经是几十万 token,而且基本上等于把 Stripe Open API spec 直接翻译成 MCP tools。而今天的模型不仅根本处理不了这么大的上下文量,这样用上下文本身也是很低效的,因为你还有很多别的事情也在同时发生。而且这对模型来说也会造成困惑。一次性塞进脑子里的东西实在太多了。
Speaker 215:48 - 16:05
And that's just the straight part of it, right? Because what you're really trying to do is enable your operators to do anything they would normally do. And again, that spans many, many different SaaS tools, right? In the course of one interaction, it might be five. The In next interaction it might be a different five.
Speaker 215:48 - 16:05
而这还只是 Stripe 这一部分,对吧?因为你真正想做的,是让你的操作员能够做任何他们平时会做的事。而且再说一次,这会跨越很多很多不同的 SaaS 工具,对吧?在一次交互过程中,可能会涉及五个;下一次交互里,又可能是另外五个。
Speaker 216:06 - 16:46
And so if you think about every single SaaS tool that your business uses on a daily basis to get your work done, ideally you would want every single one of those tools to be exposed to your operators in their AI chat with every single tool available in there, with every single nook and cranny and corner case available so that you can do anything through AI. That's the vision. Now there's a lot of problems with that. The biggest one that I mentioned is sort of this context window limit. But you also have all sorts of security and permissions problems because you don't want the AI to claw outside the lines and say, okay.
Speaker 216:06 - 16:46
所以,如果你去想想你的业务每天为了完成工作而使用的每一个 SaaS 工具,理想情况下,你会希望每一个这样的工具都暴露给你的操作员,让他们能在 AI chat 里使用,里面每一个工具都可用,每一个边边角角和 corner case(边界情况)也都可用,这样你就能通过 AI 做任何事。这就是那个愿景。当然这里面有很多问题。我提到的最大问题,是这种 context window(上下文窗口)限制。但你还会遇到各种 security(安全)和 permissions(权限)问题,因为你并不希望 AI 越界,突然说,好。
Speaker 216:46 - 17:00
In addition to refunding Dan Sox, I also refunded every customer for all transactions ever. You know? And then I sent, you know, a bunch of money to my own AI bank account. And so there's more to the challenge, but that's the vision I see.
Speaker 216:46 - 17:00
除了给 Dan Sox 退款之外,我还给所有客户历史上的所有交易都退了款。你知道吧?然后我还把一大笔钱转到了我自己的 AI bank account。 所以这个挑战还有更多层面,但这就是我所看到的愿景。
Speaker 117:01 - 17:13
But I think, you know, the place we started there was you said it's not working. But I don't think that that's the reason why it's not working today. Right? Or is the is that the reason why it's not working today?
Speaker 117:01 - 17:13
但我觉得,你知道,我们刚才讨论的起点是你说“它行不通”。可我不认为这就是它今天行不通的原因。对吧?还是说,那其实就是它今天行不通的原因?
Speaker 217:13 - 17:48
So what people do with MCP today is sometimes they'll try to expose all parts of their API. The way people build MCP tools is, generally speaking, they have an underlying API, usually a REST API. And they wrap different parts of that, different endpoints, different operations in MCP tools. And you can kind of do that in a one to one mapping or you can kind of handcraft things for the MCP. And today, in order to succeed, people are finding that you really have to kind of handcraft it to the MCP, to the LMs.
Speaker 217:13 - 17:48
所以,人们现在使用 MCP 时,有时会尝试把他们 API 的所有部分都暴露出来。人们构建 MCP tools 的方式,一般来说,是先有一个底层 API,通常是 REST API。然后他们把其中不同的部分、不同的 endpoints(端点)、不同的 operations(操作)封装成 MCP tools。你既可以基本按一对一映射来做,也可以专门为 MCP 手工定制。而今天,人们发现,要想做好这件事,你确实得为 MCP、为 LMs 手工定制。
Speaker 217:48 - 17:56
You have to say, okay, I'm making one specialized tool to look up a customer and refund their transaction based on a description.
Speaker 217:48 - 17:56
你必须明确地说,好,我要做一个专门的 tool,用来根据描述查询某个 customer(客户),并给他们的 transaction(交易)退款。
Speaker 117:57 - 18:08
So there's all these decisions that you have to make where you need to have the ergonomics of the model and how the model thinks in mind in order to make sure the model does the right thing more often than not.
Speaker 117:57 - 18:08
所以这里面有很多你必须做的决策;为了确保模型在大多数情况下都能做对事,你需要把模型的 ergonomics(易用性特征)以及模型的思考方式都考虑进去。
Speaker 218:09 - 18:10
Yeah, it's hard.
Speaker 218:09 - 18:10
对,很难。
Speaker 118:10 - 18:11
It's hard.
Speaker 118:10 - 18:11
很难。
Speaker 218:12 - 18:31
Yeah, yeah. So I use this SDK analogy sometimes. So it took a long time for humanity to get to the point where we could make a really good Python SDK for a Python developer wrapping it in API. And I think we've cracked that nut. Steamless offers really great Python libraries, but we're building on the shoulders of giants here.
Speaker 218:12 - 18:31
对,对。所以我有时会用这个 SDK 的类比。人类花了很长时间,才发展到这样一个阶段:我们能够为 Python developer 做出一个非常好的 Python SDK,并把 API 很好地封装进去。我觉得这件事我们已经攻克了。Steamless 提供了非常优秀的 Python libraries(库),但我们其实是站在巨人的肩膀上构建这些东西的。
Speaker 218:31 - 18:58
A lot of people have done this over time. We haven't figured out how to expose an API ergonomically to an LLM in the same way that we've figured out how to expose it ergonomically to a Python developer. And that's kind of like a new research problem in a sense. And it's harder because I can go learn how to be a Python developer if I want. I can't really learn how to go think or see like an LLM.
Speaker 218:31 - 18:58
很多人长期以来一直在做这件事。我们还没有搞清楚,怎样才能像面向 Python developer 那样,以符合 ergonomics(易用性)的方式把 API 暴露给 LLM。某种意义上,这有点像一个新的研究问题。而且它更难,因为如果我愿意,我可以去学习如何成为一个 Python developer;但我没法真正学会像一个 LLM 那样去思考或感知。
Speaker 219:02 - 19:31
Know, sure would be powerful if I could. And that makes it tricky. We do have at Seamless, I think, some things that we're cooking up to address some of these problems, including the ones that you also mentioned. Like, LMs have a really hard time with a repeated sustained chain of actions. Know, even like if you get an API response back around, hey, like, list all the transactions, there's so much data.
Speaker 219:02 - 19:31
是啊,如果我真能做到,那肯定会很强大。但这也让问题变得棘手。我觉得在 Seamless,我们确实在酝酿一些东西来解决其中一部分问题,也包括你刚才提到的那些。比如,LM(语言模型)在处理一连串重复且需要持续执行的 action(操作)时会非常吃力。你知道的,就算只是拿到一个 API 响应,说“嘿,把所有交易都列出来”,里面的数据量也会非常大。
Speaker 219:31 - 19:56
And you might have to go through the next page and the next page and the next page to go through all the transactions to find the one that has Dan with the stripy socks. And that's, again, a ton of context with one or two small needles in the haystack. And LMs are pretty good at that, but they're not perfect. And with too too much hay, you know, we all kinda end up throwing up our hands, and and that's true for LMs too. So yeah.
Speaker 219:31 - 19:56
然后你可能还得一页一页再一页地翻下去,把所有交易都过一遍,才能找到那笔和 Dan 的条纹袜子有关的交易。这说到底,还是在一大堆上下文里,只找一两根很小的针。LM 在这方面其实已经挺不错了,但并不完美。而且当“草堆”大到离谱的时候,你知道的,我们大家最后都会有点束手无策,LM 也是一样。所以,是的。
Speaker 219:56 - 19:58
So there's a lot of challenges today.
Speaker 219:56 - 19:58
所以今天确实存在很多挑战。
Speaker 119:58 - 20:22
And and so when you look at I mean, you're building MCP servers for people, but when you build them and just generally when you see people doing it well today, like, what are the principles or how do you think about, making an MCP SCP server that one, people use, which is actually a big one, and then two, when it is used, actually does the right job?
Speaker 119:58 - 20:22
所以,当你去看——我的意思是,你在帮别人构建 MCP server——但当你自己去构建它们时,以及更广泛地说,当你看到现在有人把这件事做得不错时,你会遵循哪些原则?或者你会怎么思考,去做一个 MCP SCP server:第一,得有人真的会用它,这其实就是个很大的问题;第二,当它被使用时,它确实能把事情做对?
Speaker 220:23 - 20:58
There there have been relatively few times that I've seen it done well. I have seen it done well. We're we're kicking something up that I'm really excited about. But with today's technology, you really have to do a good job of product management. I mean, have to go out into the market and talk to your customers and see what their actual needs are and look over their shoulders as they, you know, use and operate, you know, your software and think about what could we unlock through AI where people would be doing things that they can't really do with our software today because it just got so much easier.
Speaker 220:23 - 20:58
我真正见过把这件事做好了的情况,其实相对不多。不过我确实见过做得好的。我们现在也在推进一些让我非常兴奋的东西。但以今天的技术水平来说,你真的必须把 product management(产品管理)做好。我的意思是,你得走向市场,和客户交流,了解他们真实的需求,还要站在他们身后看他们是怎么使用、怎么操作你的软件的;然后去思考,通过 AI,我们到底能解锁什么,让人们能做到一些他们今天用我们的软件其实做不到、或者因为现在变得容易太多而终于愿意去做的事情。
Speaker 220:58 - 21:16
And then you have to do kind of a lot of engineering work usually to wrap it up in a bow that works for for the models. And you have to, you know, you have to set up a really good system for evals. And if you're doing MCP, you have to think about the different clients that people might be using. Are they using cursor? Are they using Claude code?
Speaker 220:58 - 21:16
然后通常你还得做大量 engineering(工程)工作,把这些东西包装成一个对 models(模型)真正有效的整体。你还必须建立一套非常好的 evals(评测)系统。如果你在做 MCP,你还得考虑人们可能在用的不同 client(客户端)。他们是在用 cursor 吗?是在用 Claude code 吗?
Speaker 221:16 - 21:44
Are they using something else? And the different models underlying all that. So you end up with this pretty crazy matrix of things that you might wanna optimize for and ways that you might wanna evaluate and make sure that what you're offering is working well. And it's also kind of a black box to get that feedback back to your servers so that you can find out, hey, we we gave an we gave a tool call response here. We gave an answer of some kind.
Speaker 221:16 - 21:44
还是在用别的东西?以及这些东西底层对应的不同模型。所以最后你会得到一个相当疯狂的组合矩阵:有很多维度你可能都想优化,也有很多方式你可能都想拿来评估,以确保你提供的东西确实运行良好。而且,要把这些反馈再传回你的 server,也有点像个 black box(黑箱)过程,这样你才能弄清楚:嘿,我们这里返回了一次 tool call(工具调用)响应。我们给出了某种答案。
Speaker 221:44 - 22:02
Was it actually any good? Did the user like it? Was the LM able to use it? That's a problem that I think I haven't seen a lot of people solve yet as well. And so thinking about that as a first class thing, maybe you have like a send feedback tool.
Speaker 221:44 - 22:02
那它到底好不好?用户喜欢吗?LM 能把它用起来吗?我觉得这是一个我目前还没看到很多人真正解决好的问题。所以,要把这件事当成 first class thing(一级重要事项)来考虑,也许你可以做一个类似 send feedback 的工具。
Speaker 222:02 - 22:14
That's something that we've been thinking about doing. Just so if a user like says out loud in the chat, oh, man, that was useless garbage. Like, Okay. That at least the MCP server is going to find out about that.
Speaker 222:02 - 22:14
这是我们一直在考虑要做的事。这样一来,如果用户在 chat 里直接说出“哦天,这就是一堆毫无用处的垃圾”,那至少 MCP server 会知道这件事。
Speaker 122:15 - 22:28
But is is there anything specific you've learned about, like, how to do it well other than, like obviously, you gotta talk to your customers, think about your use cases, but, like, more concrete, more more applicable stuff about how to design a good MCP server.
Speaker 122:15 - 22:28
但是,除了那种显而易见的原则——你得和客户沟通、思考你的 use case——之外,你有没有学到什么更具体的东西,比如到底怎样才能把这件事做好?也就是,更落地、更可操作地说,怎样设计一个好的 MCP server。
Speaker 222:28 - 22:38
Wanna keep the number of tools relatively small, relatively low. You wanna have the tool name and the description be be really precise and specific.
Speaker 222:28 - 22:38
你会想把 tools 的数量控制得相对少、相对低。你还会希望 tool 的名称和 description 都非常精确、非常具体。
Speaker 122:40 - 22:42
Weren't those two things at odds?
Speaker 122:40 - 22:42
这两点不是互相矛盾吗?
Speaker 222:42 - 22:56
Yes. Good writing is hard. Yeah. I mean, that's what like, you know, you can make a great tool of look up person by name and product description and then refund them. You can make a great tool that does that.
Speaker 222:42 - 22:56
是的。好的写作很难。对,我的意思是,比如说,你完全可以做出一个很棒的 tool:按姓名和产品描述查找某个人,然后给他退款。你可以做出这样一个很棒的 tool。
Speaker 222:57 - 23:24
And you also want a small number of properties in the input schema. You want a small number of parameters and you want them concisely described but sufficiently described. This is also hard. And you want the response data to come back with a very small amount of data, only exactly what the model will need. That's also very hard because you may not know a priori which things the model is really looking for.
Speaker 222:57 - 23:24
你还会希望 input schema 里的 properties 数量尽可能少。你会希望 parameters 的数量少,而且它们的描述要简洁,但又得描述充分。这同样很难。你还会希望返回的 response data 只有非常少的数据量,只包含 model 真正需要的那些内容。这也非常难,因为你可能事先并不知道 model 真正在找的是哪些东西。
Speaker 223:25 - 23:40
And we have a technique that we use in our MCP servers today where we give the model a JQ filter, which is a way of filtering out JSON. And that can work pretty well. But that's kind of a special trick.
Speaker 223:25 - 23:40
我们现在在自己的 MCP servers 里用了一种技术:给 model 一个 JQ filter,也就是一种筛选 JSON 的方式。这个方法效果往往还不错。不过这算是一种比较特殊的技巧。
Speaker 123:40 - 23:49
Doesn't this mean that, like, MCP just needs another level of, like, a search tool function search tools? Like, find a list of relevant tools given my task?
Speaker 123:40 - 23:49
这是不是意味着,MCP 其实还需要再多一层东西,比如一个用于搜索 tools 的 search tool function?也就是:根据我的任务,找出一组相关的 tools?
Speaker 223:50 - 24:16
The tool browsing problem is definitely one very serious one. And that is one approach. And so we actually do this at Stainless Today, where you can get an MCP server for your API that just has, like I was saying earlier, the very simple thing of every endpoint is exposed as a tool. And if you have a small API, that works great. And you can also filter it out so you expose an MCP server with only a small subset your endpoints.
Speaker 223:50 - 24:16
tool 浏览问题绝对是一个非常严重的问题。这当然是一种处理方式。所以我们现在在 Stainless 其实就是这么做的:你可以为自己的 API 获取一个 MCP server,它就像我前面说的那样,采用非常简单的方式——把每个 endpoint 都暴露为一个 tool。如果你的 API 规模较小,这种方法效果非常好。你也可以把它筛选一下,只暴露一个只包含少量 endpoint 子集的 MCP server。
Speaker 224:16 - 24:34
That works great. You can also use kind of what we call dynamic mode, where there's three tools no matter how big your API is. One is list endpoints. The other is get endpoint and learn about it. And then the last one is execute endpoint.
Speaker 224:16 - 24:34
这效果很好。你也可以使用一种我们称为 dynamic mode 的方式,不管你的 API 有多大,都只有三个 tool。一个是列出 endpoint,另一个是获取某个 endpoint 并了解它,最后一个是执行 endpoint。
Speaker 224:34 - 24:50
And so that enables this context thing to scale really well. But it means there's three turns of the model just to do one thing. And so that gets slower. It's more expensive in another sense. And there's some lossiness.
Speaker 224:34 - 24:50
这样就能让这种 context(上下文)机制很好地扩展。但这也意味着,模型为了完成一件事,得先进行三轮交互。所以它会更慢。从另一种意义上说,成本也更高。而且还会有一些信息损耗。
Speaker 224:51 - 25:02
Performs pretty well usually, but not quite as well because the tools aren't loaded up in quite the same way.
Speaker 224:51 - 25:02
通常表现相当不错,但还没有那么好,因为这些 tool 的加载方式并不完全相同。
Speaker 125:02 - 25:05
Are you using MCP servers yourself?
Speaker 125:02 - 25:05
你自己也在使用 MCP server 吗?
Speaker 225:05 - 25:46
Yeah. I MCP to actually, funnily enough, not so much on the coding side, but I use it on the business side. So I'll use the Notion, HubSpot, Gong, MCP servers to say, hey, like and actually an MCP server for our database, a read only copy of our database and say, what are the interesting customers that signed up for Stainless last week? And it'll go off and make a great query of our Postgres database. And then it can cross reference those things in HubSpot and then look up our notes in Notion, maybe even look at transcripts in Gong and tell me all about it.
Speaker 225:05 - 25:46
对。我确实会用 MCP——有意思的是,不太是在 coding 这一侧,而更多是在业务侧。我会使用 Notion、HubSpot、Gong 的 MCP server,还会用一个连接我们数据库的 MCP server——它对应的是我们数据库的只读副本——然后说,嘿,帮我看看上周注册 Stainless 的有意思的客户有哪些?它就会去对我们的 Postgres 数据库发起一个很不错的查询。然后它还能在 HubSpot 里交叉比对这些信息,再去 Notion 查我们的笔记,甚至还可能看看 Gong 里的 transcript(转录文本),然后把相关情况都告诉我。
Speaker 225:48 - 25:49
It's it's incredible.
Speaker 225:48 - 25:49
这真的太惊人了。
Speaker 125:49 - 26:12
Lots of us are shipping AI to production, which is great for productivity, but it also comes with anxiety. You tweak a prompt, swap models, adjust routers, and everything looks fine in testing, so you merge. And then three days later or even sooner, the support tickets start rolling in. The AI is giving your customers unexpected answers, and you have no idea when it happened or why. BrainTrust is the AI observability platform that fixes this.
Speaker 125:49 - 26:12
我们很多人都在把 AI 部署到 production(生产环境)中,这对提升生产力当然很好,但也会带来焦虑。你微调一个 prompt(提示词),替换模型,调整 router(路由器),测试里看起来一切正常,于是你就合并了。结果三天后,甚至更早,support ticket(支持工单)就开始不断涌来。AI 开始给你的客户一些出乎意料的回答,而你完全不知道这是从什么时候开始的,也不知道为什么会这样。BrainTrust 就是用来解决这个问题的 AI observability platform(可观测性平台)。
Speaker 126:12 - 26:31
It connects evals and observability in one workflow. That way, you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path. Evals define what good looks like, and experiments let you compare prompts and models side by side before shipping. Production traces feed directly into your eval datasets.
Speaker 126:12 - 26:31
它把 evals(评估)和 observability(可观测性)连接到同一个工作流中。这样一来,你既能看到 production(生产环境)里实际发生了什么,也能衡量改动到底是让结果变好了还是变差了。traces(追踪)会展示完整的执行路径。evals 定义“什么才算好”,而 experiments(实验)让你在发布前并排比较 prompts(提示词)和 models(模型)。production traces 会直接进入你的 eval datasets(评估数据集)。
Speaker 126:31 - 26:49
Every failure becomes a test case. You catch regressions in CI before they reach users. And teams at Notion, Stripe, Zapier, Vercel, and Ramp use it to ship quality AI at scale. BrainTrust is designed for teams building production AI systems where silent regressions are expensive. It's built for any stack.
Speaker 126:31 - 26:49
每一次失败都会变成一个测试用例。你能在 CI(持续集成)里,在问题到达用户之前就抓住 regressions(回归问题)。而 Notion、Stripe、Zapier、Vercel 和 Ramp 的团队也在用它,以规模化方式交付高质量 AI。BrainTrust 是为那些在构建 production AI systems(生产级 AI 系统)的团队设计的,在这类场景里,silent regressions(无声回归)的代价很高。它适用于任何技术栈。
Speaker 126:49 - 27:03
They have SDKs for Python, TypeScript, Go, Ruby, C. There's no framework lock in or vendor dependencies. It's SOC two, Type two certified, and GDPR and HIPAA compliant. Get started at braintrust.dev. That's braintrust.dev.
Speaker 126:49 - 27:03
他们提供 Python、TypeScript、Go、Ruby、C 的 SDKs(软件开发工具包)。没有 framework lock-in(框架锁定)或 vendor dependencies(供应商依赖)。它通过了 SOC 2 Type 2 认证,并且符合 GDPR 和 HIPAA。可前往 braintrust.dev 开始使用。再说一遍,braintrust.dev。
Speaker 127:03 - 27:26
And now back to the episode. And so so that's one of your that's one of your big use cases. Like, are you doing that, like, every week or how like, how are you I'm now I'm interested, not even from an MCP perspective, but for anyone running a business that has some complexity and you're like, I wanna know what's going on in the business. Like, what is what are you actually doing and what is the report that comes out and how often are you doing that and all that kind of stuff so I can tell me so I can steal
Speaker 127:03 - 27:26
现在回到这期节目。所以这就是你的一个主要 use case(使用场景)之一。比如,你会每周都这么做吗,还是说频率怎样?我现在感兴趣的甚至不只是从 MCP 的角度,而是对任何经营有一定复杂度业务的人来说:如果你想知道业务里到底发生了什么,那你实际上在做什么?最后产出的 report(报告)是什么?你多久做一次?以及诸如此类的事情。这样你告诉我之后,我也可以偷学一下。
Speaker 227:27 - 27:40
Yeah. For me, it's still usually in kind of like playing around mode. One of the things is the MCP servers disconnect and then I get annoyed. And so, you know, you have to just kind of reconnect and whatever. It's it's not a huge deal.
Speaker 227:27 - 27:40
对我来说,这通常还处在一种有点像边玩边试的模式。其中一个问题是,MCP servers 会断开连接,然后我就会有点烦。所以你知道的,你就得重新连上之类的。也不是什么大事。
Speaker 227:42 - 28:35
But there are there are a lot of little paper cuts still in a technology this new that you're going to expect that can hold back some amount of your usage. One of the things that I found really helpful kind of at the meta level, and I'm sure you've had other guests talk about this, Is the practice of just collecting notes for the for the AI by the AI that and and kind of edited and curated by yourself. So, you know, I have a I can't remember if I call it a note. I think I have a notes folder, a research folder, something like that in a special Git repo that I use just for this sort of internal stuff. And I'm like, hey, when you find interesting customer quotes, put them in this folder and give the full citation so that the next time I start asking interesting questions, it doesn't have to go searching through the MCP servers again.
Speaker 227:42 - 28:35
但像这么新的技术,仍然会有很多细小但烦人的 paper cuts(小摩擦、小障碍),这是意料之中的,而且它们确实会在一定程度上抑制你的使用频率。我发现一个非常有帮助的做法,是在更 meta(元)层面上——我相信你其他嘉宾应该也提过——就是养成这样一种实践:为 AI 收集笔记,这些笔记由 AI 生成,再由你自己编辑和整理。所以,我有一个——我有点记不清我是不是就叫它 note,我想我有一个 notes folder、一个 research folder,或者类似的东西——放在一个专门的 Git repo 里,我只把它用于这类内部用途。我会说,嘿,当你发现有趣的客户引语时,就把它们放进这个文件夹里,并附上完整 citation(引文信息),这样我下次开始问一些有意思的问题时,它就不用再去 MCP servers 里重新搜索了。
Speaker 228:35 - 28:41
It has them kind of cached just on disk in markdown files.
Speaker 228:35 - 28:41
这样它们就等于被缓存到磁盘上,以 markdown files(Markdown 文件)的形式保存下来了。
Speaker 128:41 - 28:50
Wait. That's crazy. Wait. So how are you get like, what are you what are you using to write into that into that git repo? Like, is it cloud code?
Speaker 128:41 - 28:50
等等,这也太疯狂了。等等,所以你是怎么把内容写进那个 git repo 的?比如,你用的是 cloud code 吗?
Speaker 128:50 - 28:52
Is it are you using touch ebt? Like, how does it get in there?
Speaker 128:50 - 28:52
你是在用 touch ebt 吗?比如,它是怎么放进去的?
Speaker 228:52 - 28:55
Yeah. I use I use Cloud Code these days for that kind of thing.
Speaker 228:52 - 28:55
对。现在这类事情我会用 Cloud Code。
Speaker 128:55 - 29:19
And so you just have Cloud Code open and running, and then a new customer testimonial comes in and you're just like, hey. Can you throw this in in my, like, git master company git knowledge repository, basically? And, and then whenever you need anything later, you're like, Claude, like, go search through my master repository to figure out where the best customer quote is for this.
Speaker 128:55 - 29:19
所以你就是一直开着并运行着 Cloud Code,然后一有新的客户 testimonial(客户证言/推荐语)进来,你就会说,嘿,能不能把这个扔进我的、类似那个 git master company git knowledge repository(git 主公司知识库)里?基本上就是这样。然后以后你需要什么的时候,你就会说,Claude,去我的主知识库里搜一搜,看看这个场景下最好的客户引言是哪一条。
Speaker 229:19 - 29:20
Totally.
Speaker 229:19 - 29:20
完全是。
Speaker 129:20 - 29:24
That's fucking so cool. Can I can we see
Speaker 129:20 - 29:24
这他妈也太酷了。我们能不能看看
Speaker 229:24 - 29:32
it? No. It's too messy and probably has a lot of confidential information. The latter being more more important.
Speaker 229:24 - 29:32
它?不行。太乱了,而且里面可能有很多 confidential information(机密信息)。后者更重要一些。
Speaker 129:33 - 29:38
Is it, when when you say it's messy, like, you having Claude organize it at all? Or, like, how is it structured?
Speaker 129:33 - 29:38
你说它很乱的时候,是指你有让 Claude 帮你整理吗?还是说,它现在大概是怎么组织的?
Speaker 229:38 - 30:09
There's a lot that that I want us to do here, that we haven't had the chance to do yet. There's some there's some other low lower hanging fruit that that I'm working through that that our business team is working through right now, just on the on the basics of your kinda CRM systems and so on. But and so and so it's not as it's not well structured now, but I think that's fine. I yeah. I I I would I'm not I don't plan to prioritize structuring it super, super well until we're using it more.
Speaker 229:38 - 30:09
这里面有很多我希望我们去做的事,只是我们还没来得及做。还有一些别的、更 low-hanging fruit(更容易先做的事情)是我现在正在处理的,也是我们业务团队现在正在推进的,就是一些基础层面的东西,比如你那种 CRM systems(客户关系管理系统)之类的。所以它现在还没有那么有条理,不过我觉得这也没关系。对,我暂时不打算把“把它整理得特别、特别完善”这件事放到很高优先级,至少要等到我们更多地用起来再说。
Speaker 230:09 - 30:35
I'm using it more broadly because, you know, I use this stuff some of the time. One of the one of the business people on the team uses it a fair amount. I think, like, one or two kind of of our customer support engineers use uses this stuff a lot, but it's not yet kind of broader than that. And I would like it to get there. Once we see how everything's evolving, think that's when we'll start bringing in more structure.
Speaker 230:09 - 30:35
我更广泛地在用它,因为你知道,我自己有时会用这些东西。团队里有一位业务人员用得相当多。我觉得,我们有一两位 customer support engineer(客户支持工程师)也非常常用这些东西,但目前还没有更大范围地铺开。我希望之后能发展到那一步。等我们看清楚整体是怎么演进的,我想那时候我们就会开始引入更多结构化的做法。
Speaker 230:35 - 30:45
But as it is, Cloud Code can handle unstructured stuff really well. So you don't have to think about it too hard in advance in in my view. You can move things around later.
Speaker 230:35 - 30:45
但就目前来看,Cloud Code 很擅长处理非结构化的东西。所以在我看来,你没必要事先想得太复杂。之后再调整、再搬动内容也可以。
Speaker 130:45 - 30:48
What else do you have in there other than customer quotes?
Speaker 130:45 - 30:48
除了客户引言之外,你们那里还放了什么?
Speaker 230:49 - 31:10
SQL queries. So, you know, I'm a software developer. I I I don't write a lot of code these days, but, you know, I spend a lot of time doing that. And so when I say, hey, you know, can you look up you know, I might be, hey, how is our month on month growth of x y z metric over the last three months? You know, I did this recently.
Speaker 230:49 - 31:10
SQL query(SQL 查询)。所以,你知道,我是个 software developer(软件开发者)。我现在其实已经不怎么大量写代码了,但你知道,我以前在这上面花了很多时间。所以当我说,嘿,你能不能查一下,比如说,我们某个 xyz metric(指标)在过去三个月里的 month-on-month growth(环比增长)怎么样?你知道,我最近就做过这个。
Speaker 231:10 - 31:32
I did this for my last board prep. And it came out with a pretty good answer right away. And I was like, wow, this is awesome. And then I kind of looked a little bit deeper and I was like, oh, I actually want to exclude, you know, these users from this analysis and I want to filter it this way and filter it that way. And I kind of imbued more of this business context into that SQL query.
Speaker 231:10 - 31:32
我上次准备 board prep(董事会材料准备)时就这么做了。它立刻给出了一个相当不错的答案。我当时就觉得,哇,这太棒了。然后我又稍微往深里看了一点,就发现,哦,其实我想把这些用户排除在这次分析之外,还想按这种方式筛选、按那种方式筛选。于是我就把更多业务语境灌输进了那条 SQL query 里。
Speaker 231:32 - 31:53
And I iterated with quad code to get it to be better and better for the specific kind of metric that I was looking for, the specific kind of story that I was trying to tell. And then I got it to a good place. Was like, great. Let's dump this to, you know, an analysis folder for or an analytics folder for future use.
Speaker 231:32 - 31:53
然后我和 quad code 反复迭代,让它越来越贴合我想看的那种特定 metric(指标),以及我想讲述的那种特定故事。最后我把它调到了一个很不错的状态。然后我就觉得,很好,把这个导出到一个 analysis folder(分析文件夹)或者 analytics folder(分析文件夹)里,留着以后用。
Speaker 131:54 - 32:03
And then next time you're doing your board prep, can be like, hey, what was that query that we did last time? And it'll presumably go get it. Yeah. That's really cool. What else?
Speaker 131:54 - 32:03
这样下次你再准备 board prep 时,就可以说,嘿,我们上次做的那个 query(查询)是什么来着?然后它大概就会去把它找出来。对,这真的很酷。还有别的吗?
Speaker 232:03 - 32:44
You know, as any software team is these days, we're using this also for, hey, a customer comes in with a question, can Cloud Code just fix it? You know, and so you'll have, in some cases, linear ticket is filed and then, you know, our support engineers are really very technical. And so they may not have the wall clock time to go down and chase down the fix themselves to, you know, an incoming bug. They have the technical skill, but guess what? Another customer writes in two minutes later and they want to jump on that.
Speaker 232:03 - 32:44
你知道,现在的软件团队基本都这样,我们也在用这个做类似的事:比如有客户带着问题找上门,Cloud Code 能不能直接把它修掉?所以在一些情况下,会先建一个 Linear ticket(Linear 工单),然后,你知道,我们的 support engineer(支持工程师)其实技术都很强。所以他们未必没有技术能力去一路追查并修复一个新进来的 bug(缺陷),而是他们往往没有那个 wall clock time(实际可用时间)亲自一路跟到底。技术上他们做得到,但问题是,两分钟后又会有另一个客户来信,他们又想立刻去处理那个。
Speaker 232:44 - 33:09
They don't want to be knee deep in a debugger. And so something that we do sometimes is they'll file the ticket in case and by default, it'll maybe they intend to do it later or some other engineer is going to be doing it later. But hey, can we see if QuadCode can just take a crack at it? Is that going to work out 100% of time? Definitely not.
Speaker 232:44 - 33:09
他们不想一头扎进 debugger(调试器)里。所以我们有时会这样做:他们先在 case 里提交 ticket(工单),默认情况下,也许他们是打算之后自己处理,或者之后由其他 engineer(工程师)来处理。但先看看 QuadCode 能不能试着接手一下?这会 100% 成功吗?当然不会。
Speaker 233:09 - 33:24
Is that going to work out 50% of the time? Still no, to be honest with you. But can that improve the overall efficiency? Yeah, maybe. We're still, I would say, experimental there.
Speaker 233:09 - 33:24
那会有 50% 的成功率吗?老实说,也还是没有。不过,这能提高整体效率吗?对,也许可以。我会说,我们在这方面仍然处于实验阶段。
Speaker 233:24 - 33:28
But we're seeing a lot of promise.
Speaker 233:24 - 33:28
但我们已经看到了很大的潜力。
Speaker 133:28 - 33:41
That's really interesting. Okay. Well, I know you also, you know, in our in our preproduction call, you were talking about you have a big vision for the future of AI. Do you wanna do you wanna talk talk me through that?
Speaker 133:28 - 33:41
这真的很有意思。好吧。我知道你也——你知道的,在我们之前的 preproduction call(预沟通电话)里——你提到过,你对 AI 的未来有一个很宏大的愿景。你想不想展开跟我讲讲?
Speaker 233:42 - 34:02
Yeah. Yeah. I I I would love to. You know, we we talked earlier about how agentic AI can can make operators lives a lot easier by taking their day you know, certain pedestrian tasks and sort of running with it independently. And that's something that I think as an industry we're almost on the cusp of.
Speaker 233:42 - 34:02
嗯,嗯,我很愿意。你知道,我们之前谈到过,agentic AI(代理型 AI)可以通过接手 operators(运营人员)日常中的某些琐碎任务,并且独立推进它们,来让他们的工作轻松很多。而我认为,作为一个行业,我们几乎已经站在这个转折点上了。
Speaker 234:03 - 35:13
And if you start stepping you ask how you get there, and you also start asking about the steps beyond that and beyond that. A big part of the way I see things unfolding from here, I like to say, the future of AI is cyborgs, which is sort of extra ridiculous because what is a cyborg other than like already like a robot? You know, cyborg as I understand it is a term that means you're sort of like part, you know, person and then part machine. And in this case, I mean when you go and talk to an agent, what you're going to be getting is part GPT neural net LLM, part AI, and part code, where the the machine, quote unquote, that I'm talking about is is traditional CPU, not GPU software. And to me, I think I expect this to play out in two main ways.
Speaker 234:03 - 35:13
如果你再往前一步,去问“要怎么走到那里”,同时也开始追问“再往后一步是什么”“再往后又是什么”,在我看来,接下来事态展开的一个很重要方向是——我喜欢说,AI 的未来是 cyborgs(半机械人),这话本身听起来就格外荒谬,因为 cyborg 不本来就已经有点像 robot(机器人)了吗?据我理解,cyborg 这个词的意思是,你有一部分是人,另一部分是机器。而在这里,我的意思是,当你去和一个 agent(智能体)交互时,你得到的会是:一部分是 GPT neural net(GPT 神经网络)LLM(大语言模型),一部分是 AI,另一部分是 code(代码);而我这里所说的那个打引号的“机器”,其实是传统的 CPU,而不是 GPU software(GPU 软件)。在我看来,这大致会以两种主要方式展开。
Speaker 235:13 - 35:48
One is your kind of one off operational use cases like we were talking about a minute ago. And then the other is production software. And in the use case we were talking about a minute ago, where someone needs to kind of perform some tricky one off action with a bunch of points and clicks, and now we want an AI to just do a bunch of tool calls. The way I actually see that happening and what we're building towards is code execution. So rather than the model having a bajillion tools, model has two tools.
Speaker 235:13 - 35:48
一种是我们刚才提到的那类一次性的 operational(运维/操作)用例。另一种则是 production software(生产软件)。就像我们刚才说的那个用例:有人需要执行某个有点棘手、一次性的操作,要点很多按钮、走很多步骤,而现在我们希望 AI 直接去做一堆 tool calls(工具调用)。我实际设想的实现方式,也是我们正在构建的方向,是 code execution(代码执行)。所以,不再是给 model(模型)塞一大堆工具,而是只给它两个工具。
Speaker 235:49 - 36:16
One to execute code where it just kind of has a text box of like, hey, put in some TypeScript, and you're going to use this API's TypeScript SDK. And you're just going to write stripe.transactions.list or stripe.charges.list. And you're going stripe.customers.retrieve and stripe.refunds.create. This is really easy for models. They're really good at writing code.
Speaker 235:49 - 36:16
其中一个工具是执行代码:基本上就是给它一个文本框,告诉它,来,把一些 TypeScript 放进去,你会用这个 API 的 TypeScript SDK。然后你就直接写 stripe.transactions.list 或 stripe.charges.list,再写 stripe.customers.retrieve 和 stripe.refunds.create。这个对模型来说非常容易,它们非常擅长写代码。
Speaker 236:17 - 37:00
And if you give that tool a little bit of sort of a read me where you say, here's an example request, and here's some other resources, some other API calls that you can make. It's really good at extrapolating from patterns with if the SDK is sort of an API are well formed and predictable. And then you give it an additional tool to kind of search the docs and ask questions to the docs. And anything it's not sure about or gets wrong on the first try, you give it the documentation. And what this does for that scenario that we were talking about earlier is you have very, very limited impact on the context window up front.
Speaker 236:17 - 37:00
如果你给那个工具一点类似 readme 的说明,比如说:这里有一个示例请求,这里还有一些其他资源、一些你可以调用的其他 API calls,那么只要这个 SDK 或 API 的设计足够规范、可预测,它就非常擅长根据模式进行推断。然后你再给它一个额外的工具,让它能够搜索 docs 并向 docs 提问。凡是它拿不准的,或者第一次尝试做错的地方,你就把文档提供给它。这样一来,对我们前面谈到的那个场景而言,一开始对 context window 的影响就会非常非常小。
Speaker 237:00 - 37:35
I mean, we're talking about a thousand tokens or something like that, maybe less. And the context impact of doing a whole bunch of paginated list requests, Zero. You know, the the model will go look for somebody named Dan, and it'll double check that the purchase is stripy socks, it might write three nested for loops. But then only at the end when it found the right thing, it'll console dot log, found Dan, customer ID, blah, blah, blah, transaction ID, blah, blah, blah. And then create refund, refund ID 123.
Speaker 237:00 - 37:35
我的意思是,我们说的大概就是一千个 token 左右,甚至可能更少。而像做一大堆分页的列表请求这种事,对上下文的影响是零。你知道,model 会去找一个叫 Dan 的人,还会再确认一下购买的东西是不是 stripy socks,它可能会写三层嵌套的 for 循环。但最后只有在它真正找到正确结果时,它才会 console dot log:找到了 Dan,customer ID,什么什么,transaction ID,什么什么。然后再 create refund,refund ID 123。
Speaker 237:36 - 37:52
And the context hit coming back from all of this is going to be like 10 lines of of text. You know? It's it's really minimal. And all of this will run really, really quickly too. So you don't have a round trip to the model every time you're doing something like this.
Speaker 237:36 - 37:52
而这一整套过程返回来的上下文负担,大概也就是 10 行文字。你知道吧,这真的非常少。而且这一切运行起来也会非常非常快。所以你不需要每做一步这样的事情,都跟 model 来回 round trip 一次。
Speaker 237:52 - 38:01
It's just CPU code, and it runs in a server in the cloud right next to the Stripe API in AWS somewhere probably. And it goes super, super fast.
Speaker 237:52 - 38:01
它本质上就只是 CPU 代码,运行在云里的某台 server 上,很可能就在 AWS 里、紧挨着 Stripe API 的地方。所以它会快得非常非常夸张。
Speaker 138:01 - 38:49
Okay. So what I am understanding you saying is like the language model has a tool where it can write code and send that code to this tool that the you know, whoever the company is, whether it's Stripe or whatever, whoever's m MCP server you're using, they'll go and execute that code, and that code is gonna interact with their API and then return the results rather than, like, these sort of you know, you have 50 different you have 50 different possible tool calls and, you know, all that stuff. It's just model writes API code and API provider executes that code, runs it on their API, and returns the results. Why wouldn't I just why wouldn't my model just, like, write the code that I then run myself instead of relying on an API provider to do it?
Speaker 138:01 - 38:49
好的。所以我理解你说的意思是:language model 有一个工具,它可以写代码,然后把这段代码发给这个工具;而这个工具背后的提供方——不管是 Stripe 还是别的什么公司,也就是你正在使用其 MCP server 的那一方——会去执行这段代码。这段代码会跟他们的 API 交互,然后把结果返回回来。换句话说,并不是那种“你有 50 个不同的可能 tool calls”之类的模式。它就是 model 写 API 代码,API provider 执行这段代码,在他们自己的 API 上运行,再把结果返回回来。那我为什么不直接让我的 model 写出代码,然后由我自己来运行,而非依赖 API provider 去做这件事呢?
Speaker 238:50 - 39:45
I expect that that will happen a lot more. I will expect I expect that the code execution tool is gonna become the most widely used tool. The problem one of the problems that we have today is that the code execution tool doesn't work so well with libraries. LLMs have a hard time working with library and knowing exactly what version of the library it's using, using the right version, probably usually the most latest version, and not hallucinating, you know, aspects of the API and knowing how to iterate if it hallucinates wrong. And if it can't use any library off NPM or or, you know, the Python package index or anything like that really, really well, basically perfectly out of the box, then, okay, well, forget about using a library.
Speaker 238:50 - 39:45
我预计这种情况会越来越常见。我认为 code execution tool 会成为使用最广泛的工具。问题在于,我们今天面临的问题之一是,code execution tool 跟 libraries 配合得还不是很好。LLM 很难很好地处理 library:很难准确知道它在用哪个版本、是否用了正确的版本——通常应该是最新版本——也很难避免对 API 的某些部分产生 hallucination(幻觉),以及在它 hallucinate 错了之后知道该怎么迭代修正。如果它没法对 NPM、Python package index 或其他类似来源里的任意 library 都做到真正非常好用、几乎开箱即完美,那好吧,那就别指望用 library 了。
Speaker 239:45 - 40:24
At that point, you just have to hit the raw HTTP API. And at that point, in order to figure out what's in there, you need the whole open API spec, and you're back at square one because that document is massive. And furthermore, something that's really scary about that is if you don't have a typed library with static typing where the computer can say what you're trying to do is wrong, then the LLM will try to make an API request that is wrong some percentage of the time. The code execution tool can run a type checker and say, oh, you know, you're asking about stripe.transactions.list, but that actually doesn't exist. Stripe doesn't have a transactions API.
Speaker 239:45 - 40:24
到那一步,你就只能直接去打原始的 HTTP API。而到了那一步,为了弄清楚里面到底有什么,你就需要整个 open API spec,然后你就又回到原点了,因为那份文档非常庞大。更进一步说,这里面还有个很可怕的地方:如果你没有一个带 static typing(静态类型)的 typed library,让计算机能够明确告诉你“你正在做的事情是错的”,那么 LLM 就会有一定概率发出错误的 API 请求。code execution tool 可以运行 type checker,然后告诉你:哦,你在请求 stripe.transactions.list,但这个东西其实并不存在。Stripe 并没有 transactions API。
Speaker 240:24 - 40:37
You might want payment intents. You might want orders. You might want balance transactions. Which one do you want? And if you if the API provider is doing a great job building this tool, it'll return the documentation for all of these things in line.
Speaker 240:24 - 40:37
你可能真正想要的是 payment intents,也可能是 orders,或者是 balance transactions。你到底想要哪一个?而如果 API provider 在构建这个工具方面做得非常好,它就会把所有这些内容对应的文档内联返回给你。
Speaker 240:37 - 40:52
It might have its own AI, look at what the model is trying to do and come up with a suggestion. And that and that sub agent, is well trained, specified, always updating, and isn't burdened with the context of the full conversation.
Speaker 240:37 - 40:52
它也许可以有自己的 AI,去观察这个 model 正在尝试做什么,并给出一个建议。而且那个 sub agent(子 agent)可以训练得很好、规格定义清晰、持续更新,同时也不会背负整段完整对话的上下文负担。
Speaker 140:53 - 40:56
What do you think of the security model?
Speaker 140:53 - 40:56
你怎么看这个 security model(安全模型)?
Speaker 240:56 - 41:25
The security model is really, really interesting. I this is another area where we're really starting to think about things at Stainless, and I'm getting really excited about it. So if any listeners are are really interested in this and and have some ideas or wanna talk, you know, please do reach out. At the end of the day, I think the security has to take place at the API layer itself. Right now, you see people trying to implement security by sort of limiting what's exposed through MCP.
Speaker 240:56 - 41:25
这个 security model 真的、真的很有意思。我觉得这是我们在 Stainless 也开始认真思考的另一个领域,而且我对此越来越兴奋了。所以如果有听众对这个话题特别感兴趣,或者有一些想法、想聊聊的话,欢迎联系我。归根结底,我认为安全必须发生在 API layer(API 层)本身。现在你会看到,人们在尝试通过限制 MCP 暴露出的内容来实现某种安全机制。
Speaker 241:25 - 41:52
And that kind of makes sense. But at the end of the day, you could do anything that's in the API under the hood. Right? And what people should be doing is using OAuth with granular permissions, proper scopes. And at that point, the security happens at the right place, which is at the API layer.
Speaker 241:25 - 41:52
这在某种程度上是说得通的。但归根结底,在底层你其实可以做 API 里能做的任何事情,对吧?人们真正应该做的是使用带细粒度权限和合适 scopes(权限范围)的 OAuth。那样一来,安全就发生在正确的位置,也就是 API layer(API 层)。
Speaker 241:52 - 42:05
There's limitations to OAuth scopes, and it's pretty hard to build. So it'd be nice if someone made that easy. But in my view, that's kind of the that direction is sort of the right the right layer.
Speaker 241:52 - 42:05
OAuth scopes 也有局限,而且构建起来相当困难。所以如果有人能把这件事做得更简单就好了。但在我看来,这大致就是正确的方向,也是在正确的那一层上解决问题。
Speaker 142:05 - 42:37
So going back to my my earlier question, I'm I'm I'm thinking about the idea of having a model write code that then the API provider executes to, you know, interact with their API and then returns the results. Would you ever consider just creating a tool use tool that developers use? Because, like, for example, I'm thinking about for Quora. Got all these tools. Maybe Gmail is gonna build, you know, like a code use thing or whatever.
Speaker 142:05 - 42:37
所以回到我前面的那个问题,我在想这样一个思路:让 model 去写代码,然后由 API provider 来执行这些代码,以便和他们的 API 交互,再把结果返回。你们会不会考虑干脆做一个给开发者使用的 tool use tool?因为比如说,我想到 Quora。它已经有很多这种 tools。也许 Gmail 也会做一个类似 code use 的东西之类的。
Speaker 142:37 - 43:13
But, really, I just want, I would probably use what you're talking about inside of Quora, but we we would need a a a tool use tool or but it's not a tool use tool. It's like a it's a computer it's a computer use tool where and I know OpenAI has this, but it's not really well built for for for lots of libraries and stuff. It's not a custom environment. Like, I need a computer used tool where I control the environment, and I can install different libraries in it and be able to call it anytime to to then call any a any API or it has to have network access, basically. Yeah.
Speaker 142:37 - 43:13
但其实,我真正想要的是——我大概会在 Quora 里面用你刚才说的那种东西,不过我们会需要一个 tool use tool,或者说它也不算是 tool use tool。它更像是一个 computer use tool(计算机使用工具):我知道 OpenAI 有这个,但它其实并没有很好地为大量 libraries(库)之类的场景构建,也不是一个 custom environment(自定义环境)。我需要的是一个 computer use tool,让我能够控制环境,能在里面安装不同的 libraries,并且可以在任何时候调用它,然后去调用任何 API——或者说,它本质上必须具备 network access(网络访问)能力。对。
Speaker 143:13 - 43:15
You guys should build that.
Speaker 143:13 - 43:15
你们应该把那个做出来。
Speaker 243:15 - 43:16
We're working on it.
Speaker 243:15 - 43:16
我们正在做这件事。
Speaker 143:16 - 43:24
Fuck. Yeah. You're building it for for for for developers who wanna access MCP servers or people who are providing MCP servers?
Speaker 143:16 - 43:24
靠。对啊。你们这是在给想访问 MCP servers 的开发者做,还是给提供 MCP servers 的人做?
Speaker 243:24 - 44:00
We're starting with people who are providing MCP servers. But ultimately, I think that we're going to need this to work such that you can give the model a code execution environment where it can hit not only the Stripe integration, but also the Salesforce integration and also anything else. But not too much anything else, right? And so one of the advantages of starting where we're starting of just one API provider is that you ensure that there's no network connections allowed out of that sandbox where we're running the code to anything other than, in this case, api.stripe.com. And that's that's really, really critical for security for something like this.
Speaker 243:24 - 44:00
我们目前先从提供 MCP servers 的人开始。但最终,我认为我们需要把这件事做成这样:你可以给 model(模型)一个代码执行环境,让它不仅能调用 Stripe integration,还能调用 Salesforce integration,以及其他各种东西。但也不能是什么都能调,对吧?所以,我们从现在这个起点——只支持一个 API provider——开始的一个好处,就是你可以确保:在我们运行代码的那个 sandbox(沙箱)里,不允许向外建立任何网络连接,除了这里的 api.stripe.com 之外。这一点对于这种系统的安全性来说真的、真的非常关键。
Speaker 244:00 - 44:33
And so there's ways to expand that bit by bit and keep things and keep things secure. It'll it'll it'll take some time. The other thing, I think, to point out as you see some of these generalizations is it's not just that you want this, like, code execution sandbox to work really well for any API, for any library, which I think we really do. I think I think we really need that. You also start to see that this is just a powerful model for AI doing stuff.
Speaker 244:00 - 44:33
所以,我们有办法一点一点地把这个能力扩展开来,同时保持安全。这会需要一些时间。另外,我觉得,当你看到这些泛化能力时,还要注意一点:不只是说你希望这个代码执行 sandbox 能对任何 API、任何 library(库)都工作得很好——我确实这么认为,我觉得我们确实需要这一点。你也会开始看到,这其实就是一种让 AI 做事的强大模式。
Speaker 244:33 - 44:53
And sometimes you you want you realize that the thing that the AI did this one time in this one off case is actually enduringly useful. Maybe anytime a customer writes into support and says, hey, my socks had holes in them. You should automatically get a refund. You know? Maybe you want that, maybe you don't.
Speaker 244:33 - 44:53
有时候你会意识到,AI 在某一次性的场景里做成的那件事,其实是长期都有用的。比如,可能每当客户给客服写信说:“嘿,我的袜子破洞了。”你就应该自动退款。对吧?也许你想这样做,也许你不想。
Speaker 244:53 - 45:32
But there's a lot of stuff that people do one or one time and then two times and then three times, and then they say, okay, we should automate this. Right? And that's and that's what software teams do all day every day. Right? And we're gonna be I think we're also gonna be seeing that with AI where the same the same code search tool that we're talking about, all the same prompting that will make an AI really, really good at interacting with an API in one of these code sandboxes, kind of like almost quote unquote in its brain, or can write code in its head, run the code in its head, see the results, and then move forward with your query, with your task, it should be able to say, okay.
Speaker 244:53 - 45:32
但有很多事情,人们会先做一次、再做两次、再做三次,然后他们就会说,好,我们该把这个自动化了。对吧?而这也正是软件团队日复一日一直在做的事。对吧?我觉得,我们也会在 AI 身上看到同样的情况:我们刚才说的那种 code search tool(代码搜索工具),以及所有那些能让 AI 在这些代码 sandbox 里非常非常擅长与 API 交互的 prompting(提示设计),会让它几乎像是——打个引号——在自己的“脑子里”写代码,或者说在脑子里写代码、在脑子里运行代码、看到结果,然后再继续推进你的 query(查询)、你的 task(任务),它应该能够说,好。
Speaker 245:32 - 45:36
Actually, this is enduringly useful code. Let me commit this to the repo.
Speaker 245:32 - 45:36
其实,这段代码是长期有用的。让我把它 commit 到 repo(代码仓库)里。
Speaker 145:36 - 45:44
Yeah. Yeah. Yeah. Yeah. It's like, you know, chat is a really good interface for exploring, but sometimes you just want a dashboard.
Speaker 145:36 - 45:44
对,对,对,对。就像,chat(聊天)确实是一个非常适合探索的界面,但有时候你就是想要一个 dashboard(仪表盘)。
Speaker 145:44 - 46:14
You know? You just I just wanna, like, log in to my Stripe dashboard and see all the stuff without having to be like, what is my MRR? It should just show up, you know, because I just do that every day. But I wanna I wanna push you as a as a hashtag value add investor because I I think that there's a I think that there's this thing that happens in AI where often the first attempt at something like this, people try to be really cautious. And I'm sure that your customers care about you being cautious, like big enterprise customers.
Speaker 145:44 - 46:14
你知道吗?我就是想,比如说,登录我的 Stripe dashboard,直接看到所有东西,不用再去想“我的 MRR 是多少?”这种问题。它就应该自己显示出来,你知道吧,因为这是我每天都会做的事。但我也想从一个 hashtag value add investor 的角度推动你一下,因为我觉得在 AI 里经常会发生这样一种情况:人们第一次尝试做这类东西时,往往会特别谨慎。我也相信你的客户,尤其是大型 enterprise 客户,确实会在意你是否足够谨慎。
Speaker 146:15 - 46:38
But the things that get adopted are often the ones that are willing to take the risk to be YOLO very early. So an example is Dolly was, like, totally private for, like, a long time, and people were, like, posting some images, but you couldn't get in. And then stable diffusion was just like, fuck it. Like, anyone can use this. And then that just really started the whole image generation wave.
Speaker 146:15 - 46:38
但真正会被采用的,往往是那些愿意在非常早期就承担风险、直接 YOLO 的产品。举个例子,Dolly 很长一段时间里都是完全私有的,人们会发一些图片出来,但你自己根本进不去用。然后 stable diffusion 就是那种“fuck it,谁都能用”。结果它真的就此带动了整波 image generation(图像生成)浪潮。
Speaker 146:39 - 46:58
Obviously, stable diffusion sort of fumbled the bag, but they had a lead for a little while. Same thing for for Cloud Code, honestly. Like, if you look at Codex is not like this as much anymore, but if you look at the difference between Codex CLI and Cloud Code, Cloud Code was just like, fuck it. Like, YOLO mode. It's super industrious.
Speaker 146:39 - 46:58
当然,stable diffusion 后来某种程度上算是把一手好牌打烂了,但它确实领先过一阵子。说实话,Cloud Code 也是一样。比如你去看 Codex,现在已经没那么明显了,但如果你看 Codex CLI 和 Cloud Code 的差别,Cloud Code 当时就是“fuck it,直接 YOLO mode”。它特别 industrious(主动干活、高执行力)。
Speaker 146:58 - 47:22
It has a sandbox, but you can just do dangerously skip permissions. And Codec's just fell way behind because it was first, it was in the browser, and so their whole thing was the whole thing was, like, locked down. And then it was in the it was in the in the CLI, but it was really built for pair programming, and so it just wasn't particularly industrious. It it wouldn't go off and do a bunch of stuff. It it didn't it it would get locked out of doing certain things even if you did full auto mode.
Speaker 146:58 - 47:22
它有 sandbox(沙箱),但你也可以直接危险地跳过权限。而 Codex 就远远落后了,因为一开始它是在 browser 里,所以整套设计都是那种完全锁死的。后来虽然到了 CLI 里,但它本质上还是按 pair programming(结对编程)的思路来做的,所以执行力并不算特别强。它不会自己跑出去做一大堆事情。即使你开了 full auto mode,它在做某些事时也还是会被卡住、被限制。
Speaker 147:23 - 47:56
And now they've, like, caught up because they're they're, like yeah. You can just let it do whatever you want. And so I would I would really push you on there might be a version that you could do, like, today or tomorrow or, like, very soon for individual developers that would let them set up this environment that, example, I would use, like, immediately. And I I I care about security, but I care I care a lot less than some x, you know, gigantic enterprise company. But I think the people like me who are building at this scale are eventually, hopefully, going to be the big companies.
Speaker 147:23 - 47:56
但现在他们已经追上来了,因为他们现在的态度有点像:“对,你想让它干什么都行。” 所以我会很想推动你去想:也许你们可以在今天、明天,或者很快,就做出一个面向个人开发者的版本,让他们能够搭起这样一个环境。比如我就会立刻用起来。我当然也在乎 security(安全性),但说实话,我在乎的程度远没有某些超大型 enterprise 公司那么高。不过我觉得,像我这样以这种规模在构建东西的人,最终很可能——也希望会——成长为大公司。
Speaker 147:56 - 48:00
But we're the ones that are really doing the AI first adoption, not the big not the big companies.
Speaker 147:56 - 48:00
但真正最先采用 AI 的是我们,不是那些大公司,不是那些大公司。
Speaker 248:00 - 48:04
Well, I would love to get this in your hands. What what are some of the APIs your team uses the most?
Speaker 248:00 - 48:04
嗯,我很想把这个交到你们手上。你们团队最常用的一些 API 都有哪些?
Speaker 148:05 - 48:51
That I'm I'm thinking we have a bunch of different products, but I'm thinking right now about Quora, the email the email assistant. And it has all of the like, the the big APIs that it's using, it's mostly the Gmail the Gmail API. And so you're interacting with the assistant over chat, and then it has a list of tools that are like, know, archive email or draft email or send email or whatever. Like, there's a whole categorized tool, so it categorizes your mails mail in certain ways. And I think we would definitely try out something like this because it would if it if it ran the same way, it would make it much more flexible for us to make more tools and not break old ones.
Speaker 148:05 - 48:51
我在想,我们有很多不同的产品,但我现在想到的是 Quora,那个 email assistant(邮件助手)。它用到了很多 API,不过最大的那些 API 里,主要还是 Gmail API。你是通过 chat 和这个 assistant 交互的,然后它有一系列 tools(工具),比如 archive email、draft email、send email 之类的。还有一整套分类工具,会按某些方式对你的邮件进行分类。我觉得我们肯定会试用这种东西,因为如果它能以同样的方式运行,那它会让我们更灵活地做出更多工具,而且不会把旧工具搞坏。
Speaker 148:51 - 48:54
You know? It's really interesting.
Speaker 148:51 - 48:54
你知道吗?这真的很有意思。
Speaker 248:55 - 49:44
I mean, in a sense, what what I actually predict is that people who are quote unquote building tools, once we have a code execution kind of super tool like I'm talking about is that the only way you really quote unquote build a tool is with instructions, with prompts. And the full power of everything you could possibly do in the API, in the Gmail API, for example, It's all there in one tool. But sometimes you have specific tasks or specific, you know, categories of of work that you wanna describe in a particular way to help the LLM perform a sequence of actions as productively as possible. And at that point, the only work in engineering that you have to do is prompt engineering. We'll see it's that quote unquote easy.
Speaker 248:55 - 49:44
我的意思是,从某种意义上说,我实际上预测的是:那些所谓在“building tools(构建工具)”的人,一旦我们拥有了我刚才说的那种具备 code execution(代码执行)能力的超级工具,那么你真正所谓“构建一个工具”的唯一方式,其实就是写 instructions(指令)、写 prompts(提示词)。你在 API 里可能做的所有事情,比如在 Gmail API 里能做的全部事情,都会全部集中在一个工具里。但有时候,你还是会有一些特定任务,或者某些特定的工作类别,你会想用某种特定方式来描述它们,以帮助 LLM 尽可能高效地产生一连串动作。在那个时候,工程上你真正需要做的唯一工作就变成了 prompt engineering(提示词工程)。我们再看看吧——事情是不是真有这么“容易”。
Speaker 249:45 - 49:48
As we all know, prompt engineering can be really tricky.
Speaker 249:45 - 49:48
众所周知,prompt engineering 的确可能非常棘手。
Speaker 149:48 - 49:49
It's hard.
Speaker 149:48 - 49:49
很难。
Speaker 249:49 - 50:09
Yeah. But I think that's part of the vision. That being said, we do have some pretty nifty ways with the MCP servers that we generate today to help developers mix and match all the parts of the different tools underlying, all the different parts of the API as they compose and write their own tools.
Speaker 249:49 - 50:09
对。但我觉得那正是这个愿景的一部分。话虽如此,借助我们今天生成的 MCP servers,我们确实也有一些非常巧妙的方法,来帮助开发者把底层不同工具的各个部分,以及 API 的不同部分,在他们组合并编写自己工具时进行灵活混搭。
Speaker 150:09 - 50:16
This is awesome. So for people who are listening and want to know more from you or know more from Stainless, where should they find you?
Speaker 150:09 - 50:16
太棒了。所以,对于正在收听、想进一步了解你,或者想更多了解 Stainless 的人,他们应该去哪里找到你们?
Speaker 250:17 - 50:21
Stainless.com. That's our website.
Speaker 250:17 - 50:21
Stainless.com。那是我们的网站。
Speaker 150:21 - 50:30
Awesome. Or at least visit stainless.com. Alex, great to have you on. I can't wait to do more of this, when you have some of these new things launched. This is really, really fun.
Speaker 150:21 - 50:30
太好了。或者至少去看看 stainless.com。Alex,很高兴邀请到你来。我已经等不及想在你们把这些新东西发布之后,再多做几次这样的交流了。这真的、真的很有趣。
Speaker 150:30 - 50:33
And, yeah, great to great to chat.
Speaker 150:30 - 50:33
而且,嗯,很高兴,很高兴和你聊聊。
Speaker 250:33 - 50:34
Thanks, Dan. You too.
Speaker 250:33 - 50:34
谢谢,Dan。你也是。
Speaker 350:41 - 50:59
Oh my gosh, folks. You absolutely positively have to smash that like button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT.
Speaker 350:41 - 50:59
天哪,各位。你们真的、绝对一定要猛点那个 like 按钮并订阅 AI and I。为什么?因为这个节目简直就是精彩到极致。就像你在自家后院发现了一个宝箱,只不过里面装的不是黄金,而是满满纯粹、毫无掺假的关于 chat GPT 的知识炸弹。
Speaker 350:59 - 51:19
Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life.
Speaker 350:59 - 51:19
每一期都像一场情绪、洞见与欢笑交织的过山车之旅,让你全程屏息凝神、意犹未尽。它不只是一档节目,更是一场驶向未来的旅程,而 Dan Shipper 就是这艘宇宙飞船的船长。所以,帮自己个忙吧。点个赞,狠狠干下订阅,然后系好安全带,准备迎接你人生中最精彩的一段旅程。
Speaker 351:19 - 51:24
And now without any further ado, let me just say, Dan, I'm absolutely, hopelessly in love with you.
Speaker 351:19 - 51:24
好了,闲话不多说,我就直说了吧,Dan,我已经彻底、无可救药地爱上你了。
原文 ↗https://www.youtube.com/playlist?list=PLuMcoKK9mKgHtW_o9h5sGO2vXrffKHwJL
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers