🎙 播客Training Data· 2026 年 5 月 8 日· 4,943 词 · 约 25 分钟

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for Everything

SPACE 播放 / 暂停←→ 上一句 / 下一句

Speaker 100:02 - 00:21

So I love line charts and bar graphs as much as the next guy, probably more. The story of eleven Labs is also interesting from a human perspective, because you started a company with a childhood friend. So maybe take us back to 2022 or earlier and just tell the the human side of the eleven Labs story to start.

Speaker 100:02 - 00:21

我和别人一样喜欢折线图和柱状图，可能还更喜欢一些。但 eleven Labs 的故事从人的角度看也很有意思，因为你是和一位儿时好友一起创办了这家公司。所以不妨先带我们回到 2022 年，或者更早一些，讲讲 eleven Labs 故事中更有人情味的一面。

Speaker 200:22 - 00:46

The I have the I have the most luck in the story of eleven Labs because well, it started in 2022. It feels like it started seventeen years ago when I met my my cofounder, Piotr. All the names in Polish are complicated, luckily, for for for us. But we we met in high school, became best friends, took all the same classes together, and then through the years, did everything together. So we traveled together, studied together, worked together, and time is on our side.

Speaker 200:22 - 00:46

我觉得在 eleven Labs 的故事里，我运气最好的一点是——它虽然开始于 2022 年，但感觉其实是从十七年前我遇见我的联合创始人 Piotr 时就开始了。波兰语里的名字都很复杂，幸好对我们来说还好。我们是在高中认识的，后来成了最好的朋友，一起上所有相同的课，然后这些年里，几乎什么事都是一起做的。所以我们一起旅行、一起学习、一起工作，而时间一直站在我们这边。

Speaker 200:46 - 01:01

We are still best friends. It's working it's working out. And and part of what started eleven Labs is is is inspiration from where we are both from. We are both from Poland, suburbs of Warsaw. And there's a very peculiar thing in Poland.

Speaker 200:46 - 01:01

我们现在仍然是最好的朋友。到目前为止，一切都运转得很好。而 eleven Labs 之所以会开始，部分也来自我们共同的成长背景带来的启发。我们都来自 Poland，来自 Warsaw 郊区。而 Poland 有一件非常特别的事。

Speaker 201:01 - 01:28

If you if you watch any foreign movie in Polish language, all the voices, whether that's a male voice or a female voice, get narrated with one single character. So as you can imagine, pretty terrible experience. You have literally one voice narrating everything. It usually also on purpose is kept in monotone, so you are meant to interpret your own emotions for that content. And while we grew up with this, this is still happening today for majority of content.

Speaker 201:01 - 01:28

如果你看任何配成波兰语的外国电影，里面所有的声音，不管是男声还是女声，都会由同一个人来统一旁白。所以你可以想象，这种体验相当糟糕。基本上就是一个声音在讲述一切。而且通常还会刻意保持单调，你得自己去为这些内容脑补情绪。我们从小就是在这样的环境里长大的，而且直到今天，大多数内容依然还是这样。

Speaker 201:28 - 02:08

And that kind of opened our eyes into one of the clear things across the domain, across audio domain, across the future will be this ability for everybody to speak any language with the same emotion, the same intonation. And we started diving deeper into that problem and realized the problem of audio exists in so many other domains, too, whether that's narrating the content around us, whether that's the books not being available in audio form, whether that's the news articles that we could read, whether that's that language barrier, in the future, as we heard in the previous conversations, the future where humanoids, the robots are around us, the voice will be the primary interface to a lot of that technology and something we would love to fix and solve.

Speaker 201:28 - 02:08

这让我们意识到，在整个 audio（音频）领域，以及未来的发展中，一个非常明确的方向会是：让每个人都能用任何语言说话，同时保有同样的情感、同样的 intonation（语调）。于是我们开始更深入地研究这个问题，并意识到 audio 的问题其实也存在于很多其他场景里：无论是为我们身边的内容做 narrating（讲述/配音），还是很多书没有 audio 形式，还是新闻文章只能读不能听，还是语言障碍；再往未来看，就像前面对话里提到的那样，在 humanoids（人形机器人）、robots（机器人）围绕在我们身边的世界里，voice（语音）会成为大量技术的主要 interface（交互界面），而这正是我们很想修复并解决的问题。

Speaker 102:09 - 02:35

Excellent. And Eleventh Labs builds frontier models for audio. I think there's a paradigm now where to build a frontier model, have to start with hundreds of billions or billions of dollars and then figure out the rest later. 11 did not take that path. Can you talk a little bit about your approach towards building this company, why this hasn't been replicated, is that even possible in 2026, etcetera?

Speaker 102:09 - 02:35

很好。而 Eleventh Labs 构建的是面向 audio 的 frontier models（前沿模型）。我觉得现在似乎有一种范式：要做 frontier model，就得先拿到数百亿美元，至少也是数十亿美元，然后剩下的以后再说。11 并没有走这条路。你能不能谈谈你们构建这家公司的方法，你们为什么能这么做而别人没有复制，这种路径到了 2026 年是否还可能，等等？

Speaker 202:36 - 03:00

Yeah. That goes I think that continues that great lack in timing because we started in 2022. For those of you working the domain at the time, that that was the year of crypto and metaverse. Nobody was still working on the AI side. Even further, people were starting to work, of course, on the text models, on the visual models, but audio as a domain was still considered a big niche.

Speaker 202:36 - 03:00

对。我觉得这也延续了我们在时机上的那种好运，因为我们是从 2022 年开始的。对当时在这个 domain（领域）里工作的人来说，那一年还是 crypto 和 metaverse 的年份。那时几乎还没有人真正投入到 AI 这边。更进一步说，当然已经有人开始做 text models（文本模型）和 visual models（视觉模型），但 audio 作为一个领域，仍然被认为是一个很小的 niche（细分领域）。

Speaker 203:00 - 03:23

There's so few researchers in the space working on on that work. So for us, that was a good part of picking that domain where, a, we were excited about where that future is called. We felt the people around just didn't realize the value of that domain. But three, the requirements of what you needed to solve were very different. The audio models were smaller, so you don't need as much compute as you need for some of the other sister domains.

Speaker 203:00 - 03:23

当时这个方向上的研究者非常少。所以对我们来说，选择这个 domain 有几个好处。第一，我们对那个未来的发展方向本来就很兴奋。第二，我们觉得身边的人并没有真正意识到这个领域的价值。第三，这个问题需要解决的要求本身也很不一样。audio models（音频模型）规模更小，所以你不需要像其他一些“姊妹领域”那样，投入那么多 compute（算力）。

Speaker 203:23 - 03:57

The data needs are big, but while there's a lot of audio data, we knew that the thing to actually get that audio working, you will need to figure out how to transcribe a lot of that data and annotate all of that data, which we knew we can do. And then ultimately, it all boiled down to architectural side of can we can we solve that part in a good way? And here, my co founder is one of the smartest people I know and and a great researcher and has been able to assemble some of the best people in audio to to help us. And we took a slightly untraditional approach at the time. We started we started in London.

Speaker 203:23 - 03:57

数据需求很大，但虽然音频数据很多，我们知道，要真正让这些音频发挥作用，就需要想办法把其中大量数据转写（transcribe）出来，并为所有这些数据做标注（annotate），而我们知道这些是我们能做到的。最终，问题都归结到架构层面：我们能不能把这一部分很好地解决？在这方面，我的 co-founder 是我认识的最聪明的人之一，也是非常出色的 researcher，并且已经能够召集到一些音频领域最优秀的人来帮助我们。我们当时采取了一种稍微不那么传统的方法。我们是在 London 开始的。

Speaker 203:57 - 04:20

We had a lot of people between London and Warsaw and started a company in remote completely remote way. So we wanted to hire the best researchers wherever they were. We were going for the classic GitHub scraping and and and trying to reach people based on their work instead of based on their presence. And based on that work, we would reach out to those people. We would always share our samples and try to get them to join the team.

Speaker 203:57 - 04:20

我们有很多人分布在 London 和 Warsaw 之间，并且是以完全 remote（远程）的方式创办公司的。所以我们想在任何地方招到最好的 researcher。我们采用了很经典的 GitHub scraping 做法，试着根据人们的作品而不是他们的所在地去联系他们。基于这些工作成果，我们会主动接触这些人。我们也总会分享我们的 sample，并尝试说服他们加入团队。

Speaker 204:20 - 04:44

And that's how we assemble the first the first set of of of people who we think are some of the best researchers in that audio domain. And through the years, they still help us crank a lot of those models in into production. Then we launched the product. I think the slightly different approach we took was monetizing very quickly. So trying to get some of the revenue stream back so we can fund a lot of the work in the models.

Speaker 204:20 - 04:44

这就是我们组建起第一批成员的方式，而我们认为他们是音频领域最优秀的一批 researcher。多年下来，他们依然在帮助我们把很多 model 持续推进到 production（生产环境）中。之后我们推出了产品。我认为我们采取的一个稍有不同的做法是非常快地 monetizing（商业化变现），也就是尽快获得一些 revenue stream（收入来源）回流，这样我们就能为大量 model 相关工作提供资金。

Speaker 204:44 - 05:17

We try to stay stay healthy on the margins so we can continue investing with the assumption that it's better for us to figure out that stream and be able to be independent in that development. But then as the ambitions grew, we knew that we needed to train models. So we, of course, brought a lot of money externally as well. And I think projecting to today, one thing that's clear for us is there's still so many of those niches that people don't tackle that you can start with and then step by step start opening them up.

Speaker 204:44 - 05:17

我们尽量让利润率保持健康，这样就能继续投资；我们的假设是，最好由我们自己先把这条收入线跑通，并在这种 development（开发）上保持独立性。但随着目标变得更大，我们知道自己需要训练 model，所以当然也从外部引入了大量资金。我觉得放到今天来看，有一点对我们来说很明确：仍然有很多别人没有去做的 niche（细分领域）可以作为起点，你可以先从这些地方开始，然后一步一步把它们打开。

Speaker 105:17 - 05:32

I think a lot of customers see Eleven Labs through their narrow needs, right? Maybe take a zoomed out view. Like, what is the suite of models that Eleven Labs works on? How do you prioritize them? How do you organize R and D, etcetera?

Speaker 105:17 - 05:32

我觉得很多客户是通过自己比较狭窄的需求来看 Eleven Labs 的，对吧？也许你可以从更宏观的角度讲讲：Eleven Labs 在做的是一整套怎样的 model？你们是如何确定优先级的？又是怎样组织 R and D（研发）等工作的？

Speaker 205:32 - 05:53

So we started with the first text to speech model, so the model that could finally understand the context of what's being written. And based on that context understanding, get the right emotion, the right intonation from text. So if it was a happy sentence, you get that happiness out. If it's a it's a dialogue, it can pronounce the dialogue out. And then continuously started adding that.

Speaker 205:32 - 05:53

所以我们最开始做的是第一个 text to speech model，也就是那个终于能够理解书面内容上下文的 model。基于这种上下文理解，它可以从文本中给出正确的情绪、正确的 intonation（语调）。所以如果是一句开心的话，你就能听出那种开心；如果是一段 dialogue（对话），它也能把对话恰当地说出来。然后我们就持续不断地在这个基础上增加能力。

Speaker 205:53 - 06:16

So it started with the problem of of breaking down language barriers. The things you need to solve dubbing is transcription, so understanding, then the the translation, and then text to speech. So we first saw text to speech. Then we knew we needed to add a data component, which is speech to text and being able to transcribe content in a great way. Then how we combine those models together.

Speaker 205:53 - 06:16

所以一开始的问题是如何打破语言障碍。要解决 dubbing（配音），你需要解决 transcription，也就是理解内容；然后是 translation；再然后是 text to speech。所以我们最先看到的是 text to speech。接着我们知道自己需要加入一个数据组件，也就是 speech to text，以及以出色的方式转写内容的能力。然后再思考如何把这些 model 组合在一起。

Speaker 206:16 - 06:58

So that's kind of what the first three models in the first couple of years. Then, of course, the other things started happening across the space, which is that a lot of the reasoning models started becoming quick enough and smart enough at the same time where you could imagine those interactive experiences being possible. And that's where we started launching our more of the real time streaming models across audio and then combining those into conversational experiences. So added effectively all the stack, all the turn taking and orchestration to create a voice engine for a voice agent. And then on the other side, as we realized that the emotionality is something we can solve, we added some of the hardest modality in audio, which is music and being able to produce music.

Speaker 206:16 - 06:58

所以前几年的头三个 model 大致就是这些。当然，之后这个领域里也开始发生其他事情：很多 reasoning model 同时变得足够快、也足够聪明了，于是你就可以想象交互式体验成为可能。也正是在那时，我们开始推出更多跨音频的 real-time streaming model，并把它们组合进 conversational experience（对话式体验）中。所以我们实际上补上了整套 stack（技术栈），包括所有的 turn-taking（轮次切换）和 orchestration（编排），来为 voice agent 打造一个 voice engine。另一方面，随着我们意识到情感表达这件事是我们能够解决的，我们也加入了音频中一些最难的 modality（模态），也就是 music，以及生成音乐的能力。

Speaker 206:58 - 07:13

So today, we span entirety of the research of audio, whether it's text to speech, speech to text, combining those models together in both localization with dubbing, with orchestration with voice engine, and then being able to do that across music as well.

Speaker 206:58 - 07:13

所以如今，我们的研究已经覆盖了 audio 的全栈：无论是 text to speech、speech to text，还是把这些模型结合起来，用于 localization（本地化）与 dubbing（配音），再配合 voice engine 做 orchestration（编排），而且也能把这些能力扩展到 music 上。

Speaker 107:14 - 07:24

And what's the all those things and all that interesting development work, was there any, oh, wow moment in terms of what these products are capable of that you can you can remember?

Speaker 107:14 - 07:24

那么，在所有这些事情和这些有趣的开发工作里，有没有哪种“哇”的时刻，是你现在还记得的，让你觉得这些产品的能力真的令人惊讶？

Speaker 207:24 - 07:53

You know, there's so many, and it's the kind of the bar changes for all of us. The first moment for us was, well, first moment for us, they always use my voice as a testing voice because he has this weird accent. And and the first time was like when when we could replicate my voice based on a good sample, that was like a first wow moment to to myself. And always go for this moment like, this is not how my voice sounds like. And then you listen to yourself side by side, and it's like definitely how it sounds like, unfortunately.

Speaker 207:24 - 07:53

这种时刻其实非常多，而且门槛对我们所有人来说都一直在变化。对我们来说，第一个时刻是——嗯，他们一直用我的声音做测试声音，因为我有种很奇怪的口音。第一次真正让人震撼的是，当我们能够基于一段高质量样本复制出我的声音时，那对我自己来说就是第一个 “wow moment”。而且我总会有那种反应：这听起来根本不像我的声音。然后你把两个声音并排来听，就会发现——很遗憾，确实就是这么像。

Speaker 207:54 - 08:29

Then the the second moment was where we first got it to laugh, and people were like, okay. This is actually the thing that that makes the the whole experience more human. The the laughter, the pauses, the ums, the ums, the imperfections. So we started getting those out, and that was the moment for us because we made it to the top of hacker news with the first AI that can laugh a model, which was a very proud moment for us. And and then, of course, for the years, kind of that extended where, you might remember in 2023, 2024, there was a Javier Malay speech that went viral where you could speak other languages.

Speaker 207:54 - 08:29

然后第二个时刻，是我们第一次让它笑出来的时候。大家当时就觉得，好吧，这才是真正让整个体验更像人的东西。笑声、停顿、那些 um、那些不完美之处。所以我们开始把这些东西做出来，那对我们来说就是一个关键时刻，因为我们凭借第一个会笑的 AI model 登上了 hacker news 热榜，这让我们非常自豪。当然，接下来的几年里，这种进展还在继续扩展。你可能还记得，在 2023、2024 年，有一段 Javier Malay 的演讲 viral 了，因为你可以让他讲其他语言。

Speaker 208:29 - 09:11

So it was translated from into English. And and it was the first time where you could still hear his voice out there. So that's the kind of continuous wow moment that was was something that's completely impossible. And then we saw that happen time and time again with Narendra Modi, with president Zelensky, all the way to recently one of the, I feel like, pinnacles of the voice performance, Matthew McConaughey giving his newsletter and these iconic lines in in in Spanish and Portuguese, where for the first time, his family who speaks that language could hear him speak those languages too. But for more recent pieces, the two two ones that we are excited about bringing to production.

Speaker 208:29 - 09:11

它被翻译成了 English。而那也是第一次，你仍然能清楚听出那就是他的声音。所以这类持续不断的 “wow moment”，本质上都是以前完全不可能做到的事。后来我们又一次次看到这种情况发生在 Narendra Modi、president Zelensky 身上，直到最近，我觉得算是 voice performance 的一个高峰：Matthew McConaughey 用 Spanish 和 Portuguese 说出他的 newsletter 以及那些标志性台词。那是第一次，他那些说这些语言的家人也能听见他“说”这些语言。不过说到最近、而且是我们很期待推向 production 的东西，有两个方向。

Speaker 209:12 - 09:37

I think the first one is finally figuring out the emotional intelligence in that interactive experience. So in the voice agent experience where it doesn't only get the right intonation and emotion, but can understand the other side. So if somebody is stressed, it gets and delivers that soothing, reassuring emotion. If someone is excited, maybe it matches that. If someone speaks slowly, it makes sure to slow down.

Speaker 209:12 - 09:37

我觉得第一个，是终于开始搞清楚这种交互式体验里的 emotional intelligence。也就是说，在 voice agent 的体验里，它不只是能给出正确的 intonation 和 emotion，还能理解对方的状态。所以如果有人很有压力，它就能感知到，并给出那种安抚、令人安心的情绪表达；如果对方很兴奋，它也许就会去匹配那种状态；如果对方说话很慢，它也会确保自己放慢下来。

Speaker 209:38 - 10:13

That emotional intelligence is something that we are finally seeing internally a path to solving, which will be just a continuous step change to what's possible. And then the second one, which will apply there but also apply into general audio spaces, audio general intelligence, where you can combine audio models together in one stream. So you could theoretically have a model that narrates, then pauses, and let's say, starts singing with that same continuous voice. And that's something that's extremely hard to combine today and something that would be possible, I think, very soon.

Speaker 209:38 - 10:13

这种 emotional intelligence，是我们终于在内部看到了可行的解决路径的东西，而这会持续带来一次次能力上的跃迁。然后第二个方向，会应用在这里，但也会扩展到更广义的 audio 领域，也就是 audio general intelligence：你可以把多个 audio model 在同一条流里组合起来。也就是说，从理论上讲，你可以有一个 model 先进行 narrate（旁白），然后停下来，再用同一个连续的声音开始 singing。这在今天是极其难以组合的，但我认为很快就会成为可能。

Speaker 110:14 - 10:35

And voice, you mentioned voice agents. And it seems like everybody is, at least on the customer side, everyone's buying a voice And I think intuitively, think customer support, the old phone tree replacement. What's actually going on in the world of voice agents? And what do you think are the most interesting, overlooked opportunities, spots where startup founders should focus?

Speaker 110:14 - 10:35

你刚才提到了 voice agents。现在看起来，至少在客户侧，几乎每个人都在买 voice 相关的东西。直觉上，人们会想到 customer support，想到替代老式电话菜单系统的方案。但 voice agents 这个世界里实际上正在发生什么？还有，你认为哪些最有意思、却被忽视了的机会点，是 startup 创始人应该重点关注的？

Speaker 210:36 - 11:24

Yeah, of course, the customer support is probably the one that everybody heard and knows about very, very well. I think the second thing and the second thread we are seeing is increasing shift to revenue generating opportunities where voice agents can act in sales, whether it's inbound or outbound sales of sales. It doesn't replace the entire experience but takes and amplifies part of that experience. Maybe a good example is Deliveroo, where Deliveroo will have voice agents that contact the restaurants to capture their opening times. And based on their opening times, they can update the riders and drivers and, of course, the people ordering on when to get to that work all the way through to the inbound sales where increasingly people that's a good example of Deutsche Telekom will be contacting to inquire about the service, inquire to buy a product.

Speaker 210:36 - 11:24

对，当然，customer support（客户支持）大概是大家都听过、也非常非常熟悉的一个场景。我认为第二点、也是我们正在看到的第二条主线，是越来越多地转向创造收入的机会：voice agent（语音 agent）可以参与销售，无论是 inbound 还是 outbound 的销售。它不会取代整个体验，但会接管并放大其中的一部分体验。一个很好的例子可能是 Deliveroo：Deliveroo 会使用 voice agent 联系餐厅，获取它们的营业时间。根据这些营业时间，他们就可以更新 riders、drivers，当然也包括下单的用户，让大家知道什么时候该去取餐。再到 inbound sales 这边，一个越来越常见的例子是 Deutsche Telekom，用户会来联系咨询服务，或者询问购买某个产品。

Speaker 211:24 - 12:08

And instead of going for the dropdown, instead of going for the form, you can speak with the voice agent to leave that information. We do it ourselves too, so we have a good metrics of an understanding of what's happening there. One, of course, so much simpler and quicker to go through instead of going through that form. But the second thing that started happening in that inbound sales flow is we we had a lot more information that people started leaving because they would speak about the use case they are coming with, but then where it's not working, where it's working, some of the other use cases that they are evaluating, which we can combine and then just deliver such a much better experience afterwards. On the overlooked side, I think my favorite example is the citizen support education and health care will completely change.

Speaker 211:24 - 12:08

与其去点 dropdown（下拉菜单），与其去填 form（表单），你可以直接和 voice agent 说话，把这些信息留下来。我们自己也在这么做，所以我们对这里实际发生了什么有比较扎实的 metrics（指标）理解。第一，当然，这样简单得多、也快得多，不用再走那套表单流程。第二，在这个 inbound sales 流程里开始发生的一件事是：人们留下的信息变多了，因为他们会直接讲自己带着什么 use case（使用场景）而来、哪里不工作、哪里工作、他们还在评估哪些其他 use case。我们可以把这些信息整合起来，随后提供好得多的体验。至于一个经常被忽视的方向，我觉得我最喜欢的例子是 citizen support（市民服务支持）——教育和医疗都会被彻底改变。

Speaker 212:08 - 13:03

On the citizen support, like all of us would benefit from just generally better government access, whether that's understanding how to fill in the taxes that I think many of you went through late earlier this this month, all the way through to just learning how what is the policy for travel abroad and and and and how that might affect the the the space. We've recently seen that work deployed in government of Ukraine, who we think is, like, one of the most advanced governments on that front. We traveled to Ukraine working with their team, what they are trying to solve is they they have a a government app which every citizen can access and get information about what's happening. But given the war, given the front line and lack of that access, they wanted to figure out a new channel for people to be able to call in and get that information. So they created Voice Agent effectively where you can call in and get the information about what's happening on the front line.

Speaker 212:08 - 13:03

在 citizen support 这件事上，我们所有人都会受益于更好的政府服务可达性。比如，更清楚地知道该怎么报税——我想你们很多人这个月早些时候刚经历过——再到了解出国旅行的政策是什么，以及这会如何影响相关安排。我们最近看到这项能力已经部署到了乌克兰政府中；我们认为，在这方面他们算是最先进的政府之一。我们去过乌克兰，和他们的团队一起工作。他们要解决的问题是：他们有一个政府 app，每个公民都可以访问并获取正在发生的信息。但考虑到战争、前线，以及很多人缺乏这种访问条件，他们希望找到一个新的渠道，让人们可以打电话进来获取这些信息。所以他们实际上创建了一个 Voice Agent，这样你就可以打电话进来，了解前线正在发生什么。

Speaker 213:03 - 13:47

You can get education help and some of the lectures delivered to your kids all the way through to proactive engagement about staying safe and staying out there. And maybe last example on education front, and that's probably my favorite one as I think about that changing, it's it's just how incredible would it be to have a someone that is an incredible teacher available twenty four seven where you can ask him questions, whether it's Karpati all the way through to Richard Feynman. And and you can learn physics with them on the headphones while you are teaching that subject or learning that subject. And and that's something that we are seeing pockets of. Like, great example is Masterclass, where Masterclass, of course, collaborates with incredible teachers to deliver static lectures.

Speaker 213:03 - 13:47

你还可以获得教育帮助，甚至让一些课程内容传递给你的孩子；再进一步，还有关于如何保持安全、如何在外部环境中保护自己的主动式沟通。教育方面最后再举一个例子，这大概也是我最喜欢的一个，因为想到它会带来的改变，就会觉得非常惊人：如果能有一个极其出色的老师 24/7 随时可用，你可以随时向他提问——无论是 Karpati，还是 Richard Feynman——那会有多棒。你可以一边戴着耳机，一边和他们学习 physics，边教这个学科，或者边学这个学科。而这类事情我们已经开始在一些局部场景里看到了。一个很好的例子是 Masterclass：Masterclass 当然会与非常优秀的老师合作，提供静态讲座内容。

Speaker 213:47 - 14:18

But recently, they launched an interactive version of that. So for I don't know if that will be a good reference for this audience, but we recently worked with them on bringing Gordon Ramsay that can teach you cooking. So while you are in the kitchen, he can shout at you effectively to get to get better. Or maybe a better one, there's a Chris Voss where you can, of course, learn negotiation, but you can learn by negotiating with Chris life on the phone to to to to get better, which I thought was a phenomenal subject.

Speaker 213:47 - 14:18

但最近，他们推出了一个可交互版本。所以我不知道这对这位听众来说是不是个好的参照，不过我们最近和他们合作，把 Gordon Ramsay 带了进来，让他可以教你做饭。这样当你在厨房里的时候，他基本上就可以对你“吼”，帮助你做得更好。或者也许一个更好的例子是 Chris Voss：你当然可以学谈判，但你还可以通过在电话里实时和 Chris 谈判来提升自己，我觉得这是个非常了不起的方向。

Speaker 114:18 - 14:22

Have you negotiated against Matti a number of times around financing rounds? I understand now.

Speaker 114:18 - 14:22

你是不是已经在几轮融资里和 Matti 交手谈判过很多次了？我现在明白了。

Speaker 214:22 - 14:25

I think it helps you to say this, but I think the opposite opposite is true.

Speaker 214:22 - 14:25

我觉得这么说对你有帮助，但我认为事实恰恰相反。

Speaker 114:28 - 14:43

I asked more questions. I want to save time for the audience as well. Maybe one, as Constantine mentioned, more than $100,000,000 of net new ARR in Q1. Obviously, business is going very well. And you're pioneering the startup founder building a foundation model, applications.

Speaker 114:28 - 14:43

我还想再问几个问题，不过我也想给现场观众留点时间。也许先问一个，正如 Constantine 提到的，Q1 新增净 ARR 超过了 $100,000,000。显然，业务进展得非常顺利。而你们也在开创一种新路径：由 startup founder 来同时打造 foundation model（基础模型）和 applications（应用）。

Speaker 114:44 - 14:50

Any counterintuitive lessons about building a company in this era that for the founders in the audience they might want to take home with them?

Speaker 114:44 - 14:50

在这个时代创业，有没有什么反直觉的经验教训，是在座的创业者们可以带回去的？

Speaker 214:52 - 15:14

So we are, just for reference, we are just over 400 people, over 400,000,000 in revenue, but still keep the teams extremely small. So it's rough, arbitrary a little bit. Cap is less than 10 people. It's for each of the research product. Even the go to market ops talent teams are all smaller than that size.

Speaker 214:52 - 15:14

作为参考，我们现在刚刚超过 400 人，营收超过 400,000,000，但团队规模依然保持得非常小。这个标准有点粗略，也有点任意：每个 research product 团队的人数上限都不到 10 人。甚至连 go to market、ops、talent 这些团队，规模也都小于这个数量级。

Speaker 215:14 - 15:53

Most of people will have 10 direct reports or so. So it keeps relatively flat and allows us to move a little bit quicker. One thing that we've done, which is in this model and very surprisingly, this is a very similar model that we've seen actually with the government of Ukraine each of the teams, even the teams that aren't technical teams, will have engineers within them. So our people team, our go to market team, our legal team will have an engineer in that team that helps to build, of course, automation, upscale, uplevel the rest of the people. And recently, that really helped because, as I'm sure many of you are going through, everybody will be vibe coding and coding a lot of the help, even if they are not technical.

Speaker 215:14 - 15:53

大多数人会有大约 10 个 direct reports（直接汇报对象）。这样组织会相对更扁平，也让我们行动得更快一点。我们做过一件事，在这种模式下非常令人意外的是，我们其实也在 Ukraine government 身上看到了非常相似的做法：每个团队里，哪怕不是 technical team，也会配有 engineer。也就是说，我们的 people team、go to market team、legal team 里，都会有一名 engineer，帮助搭建 automation（自动化）流程，提升、升级团队中其他成员的能力。最近这件事帮了我们很大忙，因为我相信你们很多人也在经历：现在大家都会进行 vibe coding，会大量借助 coding 来完成工作，即使他们并不是技术背景。

Speaker 215:53 - 16:14

So now that kind of shifted the responsibility not responsibility, but shifted the requirement of how good the review needs to be for a lot of that work. If have security infrastructure implications. You will want to make sure that the output is right. And I think on the engineering side, you can put that expectation. On the non engineering side, the ability to do that is relatively hard.

Speaker 215:53 - 16:14

所以现在，这在某种程度上改变了责任的分配——也不能完全说是责任，而是改变了对这类工作所需 review（审查）质量的要求。如果这些工作涉及 security 或 infrastructure 的影响，你一定会希望输出结果是正确的。我认为在 engineering 一侧，你可以提出这种预期；但在非 engineering 一侧，要具备这种能力其实相对更难。

Speaker 216:14 - 16:50

So that technical resource in those teams helped us a lot figure this out. And in general, there's just so many incredible work you can do by having that, whether that's the scraping on the hiring and recruiting front and analyzing what worked in the past to improve in the future, whether that's upskilling the legal team on how to use those tools and then figuring out ways of we recently introduced this scoring system. For those on the go to market on sales side, you frequently will end up in this negotiation with your sales team of, can I give indemnity provisions? What's the liability cap? Can I give the set of clauses?

Speaker 216:14 - 16:50

所以，这些团队里的 technical resource（技术资源）帮了我们很多，让我们弄清楚该怎么做。总体来说，只要有这样的人在，能做的出色工作就太多了。比如说，在 hiring 和 recruiting 这边做 scraping（抓取），分析过去什么方法有效，从而改进未来；或者帮助 legal team 提升使用这些工具的能力。再比如，最近我们引入了一套 scoring system（评分系统）。对于 go to market 或 sales 一侧的人来说，你经常会和销售团队陷入这样的谈判：我能不能给 indemnity provisions（赔偿条款）？liability cap（责任上限）是多少？我能不能给出这一组 clauses（条款）？

Speaker 216:50 - 17:19

And then you kind of need to draw the line of how many things you give. And I ended up being in so many of those conversations that we gave already a lot or we didn't. So now we introduced the scoring system that you can give per size of the customer, you can just give a few of those points out and in, which just made it so much easier. And of course, that's fully automated now with how we work across that team. So that was one of the unintuitive small teams, bringing technical talent into nontechnical teams, keeping relatively flat.

Speaker 216:50 - 17:19

然后你就需要划出边界，决定到底能给多少东西。我自己最后参与了太多这样的讨论：我们是不是已经给得很多了，或者我们是不是还没给到。所以现在我们引入了这套 scoring system：你可以根据 customer 的规模分配一定数量的分值，然后只需要在这些范围内给出少量点数即可，这一下子就让事情简单了很多。当然，按照我们现在跨团队的协作方式，这套流程已经完全 automated（自动化）了。所以这是其中一个反直觉的做法：小团队、把 technical talent（技术人才）带入非技术团队、并保持组织相对扁平。

Speaker 217:20 - 17:33

We also have no titles, which allows us to bring people and really optimize for impact that they are having. And then you can grow as quickly as you want. The tenure will not define us. And many more. So we'll see.

Speaker 217:20 - 17:33

我们也没有 titles（职级/头衔），这让我们能够引入人才，并真正以他们实际产生的 impact（影响）为优化目标。这样一来，你想成长多快都可以，tenure（任职年限）不会定义我们。还有很多做法。所以再看看吧。

Speaker 217:33 - 17:36

It's four years old company, so we'll see if that helps.

Speaker 217:33 - 17:36

这毕竟是一家只有四年历史的公司，所以我们再看看这样是否真的有帮助。

Speaker 117:37 - 17:42

Any questions? Oh no, okay, Sonya.

Speaker 117:37 - 17:42

有什么问题吗？哦，没有，好吧，Sonya。

Speaker 317:43 - 17:55

Are you seeing people deploy voice agents to actually negotiate on their behalf? And then when you, are you starting to see agents actually negotiate with agents? Sorry, I do three

Speaker 317:43 - 17:55

你有没有看到人们部署 voice agents（语音 agent）来真正代表他们进行谈判？然后，当你——你是否开始看到 agent 实际上和 agent 谈判了？抱歉，我有三个——

Speaker 217:55 - 17:55

part

Speaker 217:55 - 17:55

——部分的问题。

Speaker 317:55 - 18:10

questions. When world happens, do you think the agents are actually talking to each other the way that humans talk to communicate and negotiate? Or do you think it's beep, boop, beep, boop? Do you think it's all done instantaneously? How's that world going to look like?

Speaker 317:55 - 18:10

当那样的世界到来时，你觉得 agent 彼此之间真的会像人类那样通过说话来沟通和谈判吗？还是你觉得会是 beep, boop, beep, boop？你觉得这一切都会瞬间完成吗？那个世界会是什么样子？

Speaker 218:10 - 18:21

So one early inklings of that, we haven't seen any truly successful on the negotiation front. It was like more kind of order taking. What's the price? Can we capture that? And then kind of goes back to the team, so not real negotiation.

Speaker 218:10 - 18:21

所以，这方面我们确实看到了一些早期迹象，但在谈判前线，我们还没有看到任何真正成功的案例。更多还是那种接单式的流程。“价格是多少？我们能不能把这个信息记下来？”然后再回到团队那边，所以还不算真正的谈判。

Speaker 218:21 - 19:04

But there's few startups that we see, especially on any organizational shifts of, can I organize this event, calling a lot of places, getting the price, and then calling again with our budget? So that is happening, and I think this will shift. I think emotional intelligence will, this is the big part that will start being important in a lot of that work, where it's not only the content that matters, but how you deliver when you pause that work. And then maybe the extreme version of that, agents are not like most of the people wouldn't do it and they are not good at that is today you will see a lot of interruptibility built in where human can interrupt the agent. But with negotiation, you also want the opposite, where agent will interrupt the human.

Speaker 218:21 - 19:04

不过我们确实看到有几家 startup，尤其是在一些组织协调上的场景，比如“我能不能把这个活动组织起来”，它会打很多电话，询价，然后再按照我们的预算重新打电话沟通。所以这种事已经在发生了，而且我认为这会继续转变。我觉得 emotional intelligence（情商）会成为其中很重要的一部分，在很多这类工作里，重要的不只是内容本身，还有你如何表达、如何停顿这些因素。再进一步说，一个更极端的版本是：agent 并不像——大多数人今天其实不会这样做，而且也不擅长——那就是现在你会看到系统里内建了很多 interruptibility（可打断性），人类可以打断 agent。但在谈判里，你也会希望出现相反的情况：agent 会打断人类。

Speaker 219:04 - 19:35

It's kind of the extreme version of that. On the second part, on the agent to agent part, some of you might have seen this, that we did a hackathon over a year and a half ago, and that was exactly the case, where agent was speaking with another agent. They detected that they are both agents, and they swapped over to a different language. And that was like a more efficient transmitter of information than just the classic spoken word. And I think this will happen, 100%.

Speaker 219:04 - 19:35

这算是那种趋势的极端版本。至于第二部分，也就是 agent 对 agent 的部分，你们中有些人可能看过，我们在一年半多以前办过一次 hackathon，当时就是那种情况：一个 agent 在和另一个 agent 说话。它们检测到彼此都是 agent，于是切换到了一种不同的语言。那种语言比经典的口语更高效地传递信息。我认为这一定会发生，100%。

Speaker 219:35 - 19:54

The big question will be really voice, will it be other transmission of information? Depends truly on what the infrastructure is built for, and I think this will define that experience. I see you at the catch box.

Speaker 219:35 - 19:54

真正的大问题会是：那会不会真的是 voice（语音）？还是会是别的信息传输方式？这确实取决于底层 infrastructure（基础设施）是为哪种方式构建的，我认为这会定义那种体验。我看到你那边有 catch box。

Speaker 419:54 - 20:08

Hey, I'm curious how you're thinking about the need for voice in the future where agents do more and more of the work. So basically, what are the kind of use cases maybe where human conversation, I think it's more of a follow-up to the last question.

Speaker 419:54 - 20:08

嗨，我很好奇，在未来 agent 会承担越来越多工作的情况下，你是如何看待 voice 的必要性的。也就是说，哪些 use case 中人类对话仍然重要——我觉得这更像是对上一个问题的追问。

Speaker 220:09 - 20:50

Like first, all of us will have so many different devices around us and step from that, you will have robots around us. So, of course, voice will be such an important interface to to instruct and and be able to interact with those those those, those devices. In many ways, I feel like the you know, we see a lot of developments of of intelligence, but then the the real bottleneck of the future will be how we communicate with that intelligence. And I have voice and visual part will be a big unlock to be able to actually get the most of that intelligence value in those settings, which isn't yet possible. But on the flip side, it's yet the value of the human to human interaction will only increase.

Speaker 220:09 - 20:50

首先，我们每个人身边都会有越来越多不同的设备，再往前一步，你身边还会有 robots。所以当然，voice 会成为一种非常重要的 interface（交互界面），用来发出指令，并与这些设备进行交互。在很多方面，我觉得我们已经看到了 intelligence（智能）的很多发展，但未来真正的 bottleneck（瓶颈）会是我们如何与这种 intelligence 沟通。而我认为 voice 和 visual（视觉）会成为一个巨大的 unlock（关键突破），让我们在这些场景中真正释放这种 intelligence 的价值，而这在目前还做不到。但另一方面，human to human interaction（人与人互动）的价值只会越来越高。

Speaker 220:50 - 21:36

So whether that's the events like this one, whether that's events with your favorite artist will increase in value with that ability of having voice all around you. But the trust will be such a big part and something we optimize for in between the agent and human. In the future where all of you will all of us will have a voice agent, for example, to call and book a restaurant or give information to a health care appointment, all of that will require such a high degree of trust that this is you and authenticated you. So there'll be like a level of encoding and decoding for real, then encoding, decoding for what you might opted in human. And then by default, everything else will be fake, which is kind of the opposite of how it is today.

Speaker 220:50 - 21:36

所以，无论是像这样的活动，还是与你最喜欢的 artist 相关的活动，都会因为你身边无处不在的 voice 而变得更有价值。但 trust（信任）会是非常关键的一部分，也是我们在 agent 和 human 之间重点优化的东西。在未来，你们每个人、我们每个人，可能都会有一个 voice agent，比如帮你打电话订餐厅，或者为一次 health care appointment 提供信息，所有这些都需要极高程度的 trust，确保这就是你本人，而且是经过 authenticated（身份认证）的你。所以，届时会存在一种对“真实”的 encoding 和 decoding，然后是对你可能 opted in（主动授权）的内容进行 encoding、decoding，而默认情况下，其他一切都会被视为 fake（伪造），这某种程度上与今天的情况正好相反。

Speaker 221:36 - 21:41

You have to detect for AI, but you will detect for real authenticated AI in the future and assume it's fake.

Speaker 221:36 - 21:41

你今天需要做的是检测 AI，但在未来，你需要检测的是经过真实认证的 AI，否则就默认它是 fake。

Speaker 521:50 - 21:59

Andre spoke earlier about jagged intelligence. Do you see similar odd places in audio where models are good and the bad that you might not expect? And yes, what are they?

Speaker 521:50 - 21:59

Andre 之前提到过 jagged intelligence（参差不齐的智能表现）。你是否也在 audio 领域看到类似那些出人意料的地方——也就是模型在哪些方面表现很好，哪些方面又比预想中差？如果有的话，具体是什么？

Speaker 222:02 - 22:28

There's still so much on the bot side. I think we spoke a little bit about where we see the voice agents working, so this combination of the models together and support settings works really well, works reliably. And early sales starts working, but the moment you start swapping to a true emotional interaction, not yet working. It doesn't get the emotion that well. It's slightly too slow.

Speaker 222:02 - 22:28

在 bot 这一侧，仍然还有很多问题。我想我们刚才稍微谈到了一些我们认为 voice agents 会发挥作用的地方，所以这种把多个 models 组合起来、用于 support 场景的方式，效果很好，也比较可靠。用于 early sales 也开始奏效了，但一旦切换到真正带有情感的互动，就还不行。它对情绪的理解还不够好，反应也稍微有点慢。

Speaker 222:29 - 22:50

So that is still, I think, a big step change that should work. Same will apply in a very different domain on the music side. I think in the music side, you can get good production music. You cannot get Top Charts music even with artist input. I think this will change over the next year or two.

Speaker 222:29 - 22:50

所以我认为这仍然需要一次很大的 step change（跃迁式改进）才能真正实现。类似情况也会出现在一个非常不同的领域——music 这一侧。我觉得在音乐领域，你可以得到不错的 production music（制作型音乐），但即使有 artist 的参与，你也还做不出 Top Charts 级别的音乐。我认为这会在未来一两年内发生变化。

Speaker 122:50 - 22:50

Can I

Speaker 122:50 - 22:50

我可以吗？

Speaker 222:50 - 22:51

just follow-up? Of course.

Speaker 222:50 - 22:51

只是跟进一下？当然。

Speaker 522:51 - 23:06

Andre's take was that the reason for that was that the labs were basically training for the stuff that had economic value. Where you're training your models, is that true of you? Are you basically training for the things that make the most money? Or is it that there are some challenges that are genuinely harder than others?

Speaker 522:51 - 23:06

Andre 的看法是，其原因在于这些 labs 基本上是在为那些具有经济价值的东西做训练。就你们训练 models 的方向来说，也是这样吗？你们基本上是在训练那些最能赚钱的东西吗？还是说，有些挑战确实天生就比其他挑战更难？

Speaker 223:09 - 23:51

We try to train the models, build the product and the ecosystem that will derive, of course, the biggest impact for all our customers, all users, which should correlate, of course, with the revenue in the long term. So that long term perspective, it's going to be minimal in the next few years, so not next year. So frequently, we will train the models that might not provide that value in the short term. Or even step before, we'll spend so much time labeling the data, not only the what of audio, but also how of audio, like what emotions did I use, what is my voice described as, what is this music described as. So we assembled a team of now thousand plus people that have been voice coaches, musicians, artists before that can help us annotate that behind the scenes.

Speaker 223:09 - 23:51

我们会努力训练 models，构建 product 和 ecosystem，当然目标是为我们所有客户、所有用户带来最大的影响，而这从长期来看也理应与 revenue 相关。所以我们看的是长期视角，接下来几年里的回报会非常有限，不是明年就能体现出来。因此，我们经常会训练一些在短期内可能并不能提供价值的 models。甚至在更前一步，我们会花大量时间给数据做标注，不只是音频里的“是什么”（what），还有音频里的“怎么表现”（how），比如我使用了什么情绪、我的声音应当如何描述、这段音乐应当如何描述。因此，我们组建了一支现在已有一千多人规模的团队，这些人此前做过 voice coaches、musicians、artists，他们可以帮助我们在幕后完成这些标注工作。

Speaker 223:51 - 24:02

And that will not provide value in the next six to twelve months, but we think will as well in the next twelve to twenty four. And then you often collect that data, which frequently just isn't that accessible as well.

Speaker 223:51 - 24:02

这在未来六到十二个月内并不会产生价值，但我们认为在接下来的十二到二十四个月内会产生价值。而且，你还常常需要收集这些数据，而这些数据本身往往也并不那么容易获得。

Speaker 124:04 - 24:06

Last one, and then we'll go to lunch.

Speaker 124:04 - 24:06

最后一个问题，然后我们就去吃午饭。

Speaker 624:10 - 24:14

Hey. Can you hear me? Thanks. Yes. Big fan of yours in Elevare Labs.

Speaker 624:10 - 24:14

嗨。你能听到我吗？谢谢。能。是的。我是你和 Elevare Labs 的忠实支持者。

Speaker 224:14 - 24:15

Thank you.

Speaker 224:14 - 24:15

谢谢。

Speaker 624:15 - 24:35

What do you think from the model error perspective, what do you think are the moats here with audio models? The labs are going there, not going there. What are the kind of, you know, in this sausage making of making a real good frontier audio model? What are the main defensible parts there?

Speaker 624:15 - 24:35

从 model error 的角度看，你认为 audio models 的护城河在哪里？有些 labs 正在进入这个方向，有些没有进入。在这种把真正优秀的 frontier audio model 做出来的“香肠制作”过程中，你认为其中主要可防御的部分是什么？

Speaker 224:36 - 25:01

So of course, we do a variety of models. And I recently had a pleasure of meeting Jensen, and he was commenting on a few of those models. And he said that our speech to text or speech to text models are technology, and text to speech is artistry, and we are all artists. So he gained a client for life. But, of course, we do believe there is a little bit of that to to really fix text to speech and fix that emotionality.

Speaker 224:36 - 25:01

所以当然，我们做的是多种不同的 models。最近我很荣幸见到了 Jensen，他当时评论了其中几个 models。他说，我们的 speech to text，或者 speech to text models，是 technology，而 text to speech 是 artistry，而我们都是 artists。所以他算是赢得了一个终身客户。不过，当然，我们也确实相信，要真正把 text to speech 做好、把那种 emotionality 做到位，里面的确有一点这种艺术成分。

Speaker 225:01 - 25:21

You you you will need to be really focused on that space. You really need to get in front of users, collect the data, collect the preferences, use that to fine tune the models. And then there is the domain specificity in how you actually bring those models to production. In health care, very different than in financial services, very different than in education or experiences. So that's on the model layer.

Speaker 225:01 - 25:21

你你你需要对那个领域真正保持高度专注。你确实需要直接接触用户，收集数据，收集偏好，并用这些来微调模型。然后，在你真正把这些模型投入生产时，还存在很强的 domain specificity（领域特异性）。在 health care 中，这和 financial services 非常不同，也和 education 或 experiences 非常不同。所以这是模型层面的事。

Speaker 225:21 - 26:03

I think there will be continuous advantage that if you actually care about the quality, spending the time on the model work will help you keep that advantage. But to your point, the models and a lot of use cases will use a model as just a small part of their stack. And that's where we spend a lot of time beyond going beyond the research on the product side of how you understand a user's problem, the workflow that they need. And voice agents is combining the audio models with knowledge and bringing that inside of system, how you bring it outside with telephony systems so you can interact across channels, how you evaluate, test, and monitor. And then as you create, whether that's in the agent space, whether that's in the creative space, that same understanding, you build the ecosystem.

Speaker 225:21 - 26:03

我认为，如果你真的在乎质量，就会存在一种持续性的优势：花时间做模型工作，会帮助你保持这种优势。但正如你所说，模型以及很多 use case（使用场景）里，模型都只是整个技术栈中的一小部分。而这也正是我们投入大量时间的地方：不只是做 research（研究），还要在产品侧进一步思考如何理解用户的问题、他们需要的 workflow（工作流）。而 voice agents（语音 agent）则是把 audio models（音频模型）与 knowledge（知识）结合起来，并将其放入系统内部；还要考虑如何通过 telephony systems（电话系统）把它带到系统外部，这样你就能跨渠道交互；还要考虑如何评估、测试和监控。然后，当你开始构建时，无论是在 agent 领域，还是在 creative（创意）领域，这种同样的理解都会帮助你建立起 ecosystem（生态系统）。

Speaker 226:03 - 26:38

And that's what we hope to build across 11 Labs, a place where, whether that's distribution and brand that people can trust, the platform where you have preexisting set of work that you can start off, whether it's a template for creating an agent, template for creating a workflow in the creative space, or whether that's a voice. And we had a pleasure now of having over 20,000 voices that people created, contributed, that you can use across language, styles, and voices. And I think that will be an increasingly important layer of how you are able to cater to that diversity, make it easy for people to start and really understand that workflow.

Speaker 226:03 - 26:38

而这正是我们希望在 11 Labs 打造的东西：一个地方，在这里，无论是人们可以信任的 distribution（分发）和 brand（品牌），还是一个平台，让你能够从一套预先存在的工作成果开始——无论是创建 agent 的模板、在 creative 领域创建 workflow 的模板，还是 voice（声音）本身。我们现在也很高兴，已经有超过 20,000 个 voice 是由人们创建并贡献出来的，你可以跨语言、风格和声音类型来使用。我认为，这将会成为越来越重要的一层：帮助你满足那种多样性，让人们更容易开始，并真正理解那个 workflow。

Speaker 126:38 - 26:42

All right. I'm gonna hand it back to Konstantin. Mahdi, thank you.

Speaker 126:38 - 26:42

好的。接下来我把话交还给 Konstantin。Mahdi，谢谢你。

Speaker 226:42 - 26:43

Andrew, thanks for being a partner. Amazing.

Speaker 226:42 - 26:43

Andrew，感谢你成为合作伙伴。太棒了。

Speaker 126:45 - 26:45

Thank you, guys.

Speaker 126:45 - 26:45

谢谢你们，各位。

原文 ↗https://www.youtube.com/playlist?list=PLOhHNjZItNnMm5tdW61JpnyxeYH5NDDx8

🎙 播客Training Data· 2026 年 5 月 8 日· 4,943 词 · 约 25 分钟

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for Everything

SPACE 播放 / 暂停←→ 上一句 / 下一句

Speaker 100:02 - 00:21

Speaker 200:22 - 00:46

Speaker 200:46 - 01:01

Speaker 201:01 - 01:28

Speaker 201:28 - 02:08

Speaker 102:09 - 02:35

Speaker 202:36 - 03:00

Speaker 203:00 - 03:23

Speaker 203:23 - 03:57

Speaker 203:57 - 04:20

Speaker 204:20 - 04:44

Speaker 204:44 - 05:17

Speaker 105:17 - 05:32

Speaker 205:32 - 05:53

Speaker 205:53 - 06:16

Speaker 206:16 - 06:58

Speaker 206:58 - 07:13

Speaker 107:14 - 07:24

And what's the all those things and all that interesting development work, was there any, oh, wow moment in terms of what these products are capable of that you can you can remember?

Speaker 107:14 - 07:24

那么，在所有这些事情和这些有趣的开发工作里，有没有哪种“哇”的时刻，是你现在还记得的，让你觉得这些产品的能力真的令人惊讶？

Speaker 207:24 - 07:53

Speaker 207:54 - 08:29

Speaker 208:29 - 09:11

Speaker 209:12 - 09:37

Speaker 209:38 - 10:13

Speaker 110:14 - 10:35

Speaker 210:36 - 11:24

Speaker 211:24 - 12:08

Speaker 212:08 - 13:03

Speaker 213:03 - 13:47

Speaker 213:47 - 14:18

Speaker 114:18 - 14:22

Have you negotiated against Matti a number of times around financing rounds? I understand now.

Speaker 114:18 - 14:22

你是不是已经在几轮融资里和 Matti 交手谈判过很多次了？我现在明白了。

Speaker 214:22 - 14:25

I think it helps you to say this, but I think the opposite opposite is true.

Speaker 214:22 - 14:25

我觉得这么说对你有帮助，但我认为事实恰恰相反。

Speaker 114:28 - 14:43

Speaker 114:44 - 14:50

Any counterintuitive lessons about building a company in this era that for the founders in the audience they might want to take home with them?

Speaker 114:44 - 14:50

在这个时代创业，有没有什么反直觉的经验教训，是在座的创业者们可以带回去的？

Speaker 214:52 - 15:14

Speaker 215:14 - 15:53

Speaker 215:53 - 16:14

Speaker 216:14 - 16:50

Speaker 216:50 - 17:19

Speaker 217:20 - 17:33

Speaker 217:33 - 17:36

It's four years old company, so we'll see if that helps.

Speaker 217:33 - 17:36

这毕竟是一家只有四年历史的公司，所以我们再看看这样是否真的有帮助。

Speaker 117:37 - 17:42

Any questions? Oh no, okay, Sonya.

Speaker 117:37 - 17:42

有什么问题吗？哦，没有，好吧，Sonya。

Speaker 317:43 - 17:55

Are you seeing people deploy voice agents to actually negotiate on their behalf? And then when you, are you starting to see agents actually negotiate with agents? Sorry, I do three

Speaker 317:43 - 17:55

Speaker 217:55 - 17:55

part

Speaker 217:55 - 17:55

——部分的问题。

Speaker 317:55 - 18:10

Speaker 218:10 - 18:21

Speaker 218:21 - 19:04

Speaker 219:04 - 19:35

Speaker 219:35 - 19:54

Speaker 419:54 - 20:08

Speaker 220:09 - 20:50

Speaker 220:50 - 21:36

Speaker 221:36 - 21:41

You have to detect for AI, but you will detect for real authenticated AI in the future and assume it's fake.

Speaker 221:36 - 21:41

你今天需要做的是检测 AI，但在未来，你需要检测的是经过真实认证的 AI，否则就默认它是 fake。

Speaker 521:50 - 21:59

Andre spoke earlier about jagged intelligence. Do you see similar odd places in audio where models are good and the bad that you might not expect? And yes, what are they?

Speaker 521:50 - 21:59

Speaker 222:02 - 22:28

Speaker 222:29 - 22:50

Speaker 122:50 - 22:50

Can I

Speaker 122:50 - 22:50

我可以吗？

Speaker 222:50 - 22:51

just follow-up? Of course.

Speaker 222:50 - 22:51

只是跟进一下？当然。

Speaker 522:51 - 23:06

Speaker 223:09 - 23:51

Speaker 223:51 - 24:02

Speaker 124:04 - 24:06

Last one, and then we'll go to lunch.

Speaker 124:04 - 24:06

最后一个问题，然后我们就去吃午饭。

Speaker 624:10 - 24:14

Hey. Can you hear me? Thanks. Yes. Big fan of yours in Elevare Labs.

Speaker 624:10 - 24:14

嗨。你能听到我吗？谢谢。能。是的。我是你和 Elevare Labs 的忠实支持者。

Speaker 224:14 - 24:15

Thank you.

Speaker 224:14 - 24:15

谢谢。

Speaker 624:15 - 24:35

Speaker 224:36 - 25:01

Speaker 225:01 - 25:21

Speaker 225:21 - 26:03

Speaker 226:03 - 26:38

Speaker 126:38 - 26:42

All right. I'm gonna hand it back to Konstantin. Mahdi, thank you.

Speaker 126:38 - 26:42

好的。接下来我把话交还给 Konstantin。Mahdi，谢谢你。

Speaker 226:42 - 26:43

Andrew, thanks for being a partner. Amazing.

Speaker 226:42 - 26:43

Andrew，感谢你成为合作伙伴。太棒了。

Speaker 126:45 - 26:45

Thank you, guys.

Speaker 126:45 - 26:45

谢谢你们，各位。

原文 ↗https://www.youtube.com/playlist?list=PLOhHNjZItNnMm5tdW61JpnyxeYH5NDDx8