BuildSpeak每日 builder 文摘
今日归档生词本关于
🎙 播客Training Data· 2026 年 5 月 13 日· 6,567 词 · 约 33 分钟

Suno's Mikey Shulman: Everyone Can Make Music Now

SPACE 播放 / 暂停·←→ 上一句 / 下一句
Speaker 100:00 - 00:25
Before Suno, basically everybody was a consumer of music. You know, compared to the 8,000,000,000 people on the planet, there are very few people who make music and the rest of us consume it. The crazy thing about Suno is that in any given day, 90% of the users are going to create something. And the thing that's hard to wrap your head around is you're not creating it to go bring it elsewhere by and large, to do something with it. People are creating music for the fun and enjoyment and fulfillment that comes with being creative.
Speaker 100:00 - 00:25
在 Suno 出现之前,基本上每个人都只是音乐的消费者。你知道,相比地球上的 8,000,000,000 人,真正创作音乐的人非常少,其余人都在消费音乐。Suno 最疯狂的一点是,在任何一天里,90% 的用户都会创作出一些东西。而最难让人理解的是,大体上你并不是为了把它拿到别处去、拿它做点什么而创作。人们创作音乐,是为了享受创造本身带来的乐趣、愉悦和满足感。
Speaker 100:25 - 00:30
And so that, the creation, is actually the entertaining bit. And that is the big step change.
Speaker 100:25 - 00:30
所以,创作这件事本身,实际上才是最有娱乐性的部分。而这就是那个巨大的跃迁。
Speaker 200:47 - 01:06
I'm delighted to welcome Mikey Schulman. Mikey is founder and CEO of Suno, which is building a music company or a creative entertainment platform and has been one of the most novel consumer applications I've seen out of AI. And I'm very, very excited to ask you about your journey and what's ahead for Suno. So thank you for joining us today.
Speaker 200:47 - 01:06
我非常高兴欢迎 Mikey Schulman。Mikey 是 Suno 的 founder 和 CEO,Suno 正在打造一家音乐公司,或者说一个创意娱乐平台,也是我在 AI 领域见过最具新意的 consumer application(消费级应用)之一。我非常、非常期待和你聊聊你的历程,以及 Suno 的未来发展。感谢你今天加入我们。
Speaker 101:06 - 01:08
Thank you for having me. I'm excited.
Speaker 101:06 - 01:08
谢谢邀请我来。我很期待。
Speaker 201:09 - 01:25
Okay, awesome. I want start with your background because it is very, very unexpected. You went from a physics PhD at Harvard, I think quantum computing with solid state spins, to building the largest AI music company in the world. What insight connected those two things for you?
Speaker 201:09 - 01:25
好的,太棒了。我想先从你的背景谈起,因为这真的非常、非常出人意料。你从 Harvard 的 physics PhD——我记得是研究 quantum computing 和 solid state spins——一路走到打造全球最大的 AI 音乐公司。这两件事之间,对你来说是什么洞见把它们连接起来的?
Speaker 101:27 - 02:01
You know, I don't know how I don't know, like on paper, I guess I have no business building a consumer entertainment company, but a lot of people went from physics into AI, just like, you know, thirty years ago a lot of people went from physics into quantitative trading. I'll be honest, though. Like, I was an okay physicist only, and there are a lot of better physicists, including one of my cofounders. And I think what I mostly learned is playing at the nexus of two things that don't usually play together is just a massive opportunity in all domains. It can be music and technology.
Speaker 101:27 - 02:01
你知道,我也不知道该怎么说。我也不知道,至少从履历上看,我大概并不像是会去打造一家 consumer entertainment company(面向消费者的娱乐公司)的人;但很多人都从 physics 转到了 AI,就像,三十年前很多人从 physics 转去做 quantitative trading(量化交易)一样。不过说实话,我最多只能算是个还不错的 physicist,而且有很多人比我更优秀,包括我的一位 cofounder。我觉得我真正学到的主要是:在两种通常不会结合在一起的事物交汇处去做事,在任何领域里都是巨大的机会。它可以是音乐和技术。
Speaker 102:01 - 02:07
It can be quantum mechanics and low temperature microwave engineering, or it could be whatever else you're gonna do.
Speaker 102:01 - 02:07
它也可以是 quantum mechanics 和低温 microwave engineering,或者你之后想做的任何别的事情。
Speaker 202:09 - 02:29
You and I got connected in the very early days of Suno. One of our mutual friends, Harrison Chase, was one of the earliest Suno Discord users, and he was having far too much fun making songs in your Discord. Maybe tell us about the early days of SunO. How did it come together? Did you set out to build a music company?
Speaker 202:09 - 02:29
你和我是在 Suno 非常早期的时候建立联系的。我们共同的朋友之一,Harrison Chase,是最早一批使用 Suno Discord 的用户之一,他当时在你们的 Discord 里做歌,玩得不亦乐乎。要不你给我们讲讲 Suno 的早期阶段吧。它是怎么成形的?你们一开始就是奔着打造一家音乐公司去的吗?
Speaker 102:31 - 03:06
Originally, we thought this would actually be too hard, and it's because you have to rewind. This is like pre the ChatGPT moment. We did some like back of the envelope math. We knew we loved audio, but the back of the envelope math told us that actually producing good music, making good music, generating good music, was like a couple of orders of magnitude away in terms of compute and model size and capability. And it's because music sound in general is like very unwieldy, it's not in discrete bits like text is.
Speaker 102:31 - 03:06
一开始,我们以为这事其实太难了,因为你得把时间往回拨。这差不多是在 ChatGPT 爆发之前。我们做过一些粗略估算(back of the envelope math)。我们知道自己很喜欢 audio(音频),但那些粗略估算告诉我们,要真正做出好音乐、创作好音乐、生成好音乐,在 compute(算力)、model size(模型规模)和 capability(能力)方面,大概还差着几个数量级。原因在于,music sound(音乐声音)整体上非常难处理,不像 text(文本)那样是离散的 bits(比特)。
Speaker 103:06 - 03:21
And so we actually started building a company that was all around using the same technologies to make sense of audio, not to produce it. And, very happily, pretty early on, we had the right breakthroughs, and we realized, oh, we actually can make music.
Speaker 103:06 - 03:21
所以我们一开始实际是在做一家围绕同样技术、但用来理解 audio(音频)而不是生成它的公司。很幸运的是,在相当早期,我们就取得了关键突破,然后意识到,哦,我们其实可以做音乐。
Speaker 203:21 - 03:24
You're pretty good at math, what'd you get wrong with your back of the napkin math then?
Speaker 203:21 - 03:24
你数学挺好的,那你当时用那些粗略估算,到底算错了什么?
Speaker 103:24 - 03:48
The math was right, we just had some breakthroughs that said like it's actually you don't need that amount of compute. You can make the right technological breakthroughs to, if you wanna think about it, basically just compress audio really, really efficiently. And that worked a hell of a lot better than we anticipated. So it was like a very nice being wrong moment. Not all being wrong moments are are so pleasant.
Speaker 103:24 - 03:48
数学本身没错,只是后来我们有了一些突破,让我们意识到其实不需要那么多 compute(算力)。如果你愿意这么理解的话,关键在于找到合适的技术突破,本质上就是把 audio(音频)压缩得非常、非常高效。而这一点的效果比我们预期的好得多。所以那是一次非常美好的“被证明错了”的时刻。并不是所有发现自己错了的时刻都会这么令人愉快。
Speaker 103:48 - 03:54
And to be clear, at the beginning, the music was terrible, but we still, stayed He up
Speaker 103:48 - 03:54
另外要说明的是,一开始做出来的音乐很糟糕,但我们还是,坚持了下来。
Speaker 203:55 - 03:59
thought it was good. He was only our first 10 users, I think. He thought he was pretty good.
Speaker 203:55 - 03:59
他觉得那已经不错了。我想他大概就是我们的前 10 个用户之一。他觉得自己还挺行。
Speaker 104:00 - 04:31
Certainly before we put it on Discord, the music was very terrible. Before we put it on Discord, we could make like twelve and a half second clips that wouldn't always listen to the words you asked them to sing, but we had so much fun doing it, and we thought other people might have fun doing it, and so we kind of took the example of mid journey, and we said, it's really easy to put a Discord bot out and see. Will people enjoy it? And we put it out there, and a hell of a lot of people enjoyed it. And that was a really confirmatory moment for us.
Speaker 104:00 - 04:31
当然,在我们把它放到 Discord 之前,做出来的音乐非常糟糕。在放到 Discord 之前,我们大概只能做出十二秒半的片段,而且它们也不总会按你要求的歌词去唱,但我们玩这个的时候特别开心,也觉得别人可能也会玩得开心。于是我们有点借鉴了 mid journey 的做法,心想,把一个 Discord bot(机器人)放出去试试看其实很容易。人们会喜欢吗?结果我们把它放出去之后,喜欢的人多得惊人。那对我们来说是一个非常有力的确认时刻。
Speaker 104:31 - 04:48
And so a lot of people told us not to build a music company. It's not the easiest business to work in. Speech is really big. There's a lot of great business use cases for building speech technologies. But when you are staying up late playing with the thing and you don't wanna go to sleep, it's like a really good sign that that is what you are meant to be doing.
Speaker 104:31 - 04:48
所以很多人都劝我们不要做一家音乐公司。这并不是一个最容易做的行业。Speech(语音)非常大,构建 speech technologies(语音技术)也有很多很好的商业 use cases(应用场景)。但当你深夜还在玩手里的这个东西,玩到根本不想去睡觉时,那其实就是一个非常明确的信号:这就是你该去做的事。
Speaker 104:48 - 04:50
And so that's what we did.
Speaker 104:48 - 04:50
所以这就是我们所做的。
Speaker 204:50 - 04:52
I love that. Are you a musician?
Speaker 204:50 - 04:52
我喜欢这个。你是音乐人吗?
Speaker 104:52 - 05:03
I am. I play almost every day. I grew up playing a lot of piano and, ended up picking up picking up a bass around age 12 and and, playing a lot a lot more of that.
Speaker 104:52 - 05:03
是的。我几乎每天都演奏。我从小就经常弹钢琴,后来大概在 12 岁时开始学 bass,之后也更多是在弹这个。
Speaker 205:04 - 05:06
Okay. So personal passion point. That's awesome.
Speaker 205:04 - 05:06
好的。所以这是你的个人热情所在。太棒了。
Speaker 105:07 - 05:19
You know, the the revisionist history is that, which is true, is that we used to have jam sessions at our last company in one of my co founder's basements. And it's true. We had a lot of fun there. It's not why we started the company. Again, we thought it would be too hard to do this.
Speaker 105:07 - 05:19
你知道,所谓“修正主义版本”的说法是——而且这也确实是真的——我们以前会在上一家公司时,在我一位 co-founder 家的地下室里办 jam sessions。这个是真的。我们在那里玩得很开心。但那并不是我们创办这家公司的原因。还是那句话,我们当时觉得做这件事会太难了。
Speaker 105:19 - 05:20
It was just fun.
Speaker 105:19 - 05:20
那就只是图个开心。
Speaker 205:20 - 05:21
Meaning at Kensho?
Speaker 205:20 - 05:21
你是指在 Kensho 的时候?
Speaker 105:21 - 05:24
At Kensho. Yes. Where I met the great Harrison Chase.
Speaker 105:21 - 05:24
在 Kensho。是的。我就是在那里认识了非常棒的 Harrison Chase。
Speaker 205:24 - 05:33
The Kensho mafia is, like, pretty unparalleled. There's Harrison, there's also Daniel Nadler, Sam Whitmore, you. Well, there are a lot of you.
Speaker 205:24 - 05:33
Kensho mafia 可以说是相当无与伦比。那里有 Harrison,也有 Daniel Nadler,还有 Sam Whitmore,还有你。总之,你们那帮人真的很多。
Speaker 105:33 - 05:50
There's there's a lot of us. I just credit Daniel with that, honestly. Daniel is, like, I think the best, object lesson in what talent density can do for a company. And it was a lot of people with nontraditional backgrounds. It skewed very young, but he was great at finding people and great at convincing them to join.
Speaker 105:33 - 05:50
对,我们人确实很多。说实话,这一点我得归功于 Daniel。我觉得 Daniel 几乎是“高人才密度(talent density)能为一家公司带来什么”的最佳活教材。那里面很多人都不是传统背景出身。整体上偏年轻,但他特别擅长发现人才,也特别擅长说服他们加入。
Speaker 205:50 - 06:01
I love that. Okay. So walk us through what happens when somebody types upbeat nineties hip hop track about a road trip. You get the prompt in. What happens?
Speaker 205:50 - 06:01
我很喜欢这个。好,那你给我们讲讲,当有人输入“upbeat nineties hip hop track about a road trip”时,会发生什么?你们收到了这个 prompt(提示词)之后,接下来会怎样?
Speaker 206:01 - 06:06
What is the modern model doing to be able to pass something back to the user that seems like it's quite special?
Speaker 206:01 - 06:06
现代模型到底做了什么,才能把某种返回给用户的内容做得看起来如此特别?
Speaker 106:07 - 06:35
In some way, it's actually pretty simple. A prompt like that, you have to figure out what are the words of this song, and we use various LLMs to do that, to make the lyrics, and it's taking basically the cue there is road trip, and so like what should this road trip be about? And it will probably get it wrong because you didn't give us enough information, but that's actually okay, you can iterate on it. And then you said nineties hip hop, and we tried to expand that out into a set of cues that the model can really understand. What is the genre?
Speaker 106:07 - 06:35
从某种意义上说,其实相当简单。像这样的 prompt,你首先得弄清楚这首歌的词该是什么,而我们会用各种 LLM(大语言模型)来做这件事,去生成歌词。它基本上会抓住里面的线索,比如 road trip,于是就会去想:这场 road trip 应该讲什么?它很可能会弄错,因为你给的信息不够多,但这其实没关系,你可以继续迭代。然后你又说了 nineties hip hop,我们就会尝试把这个展开成一组模型真正能理解的 cues(线索)。这首歌的 genre(流派)是什么?
Speaker 106:35 - 06:46
What is the style of this music? And then you put those things together. You have a lot of lyrics. You have a lot of styles. And we have our models that are trained to take in all of that information and just produce sound.
Speaker 106:35 - 06:46
这段音乐的 style(风格)是什么?然后你把这些东西组合起来。你有大量歌词,也有大量风格信息。接着,我们有经过训练的模型,能够接收所有这些信息,然后直接生成声音。
Speaker 106:46 - 07:14
The amazing thing here is that the models don't know that there's vocals and instruments. It doesn't know what kind of instruments there are. Very early on it was actually quite obvious to us that the more musical knowledge we give the model, the more constrained it will be in a bad way. And so we actually just model everything as sound, and that's what made it so hard, but ultimately that's what make these things so powerful. So just to be concrete about it, in Western music, there are 12 tones.
Speaker 106:46 - 07:14
这里最惊人的一点是,这些模型并不知道有 vocals(人声)和 instruments(乐器)之分。它也不知道具体有哪些乐器。很早的时候,我们其实就很明显地意识到:你给模型灌输的音乐知识越多,它反而会在一种不好的意义上被限制得越死。所以我们实际上是把一切都当作 sound(声音)来建模,这也正是为什么这件事会这么难,但归根结底,也正是这让这些系统变得如此强大。具体一点说,在西方音乐里,一共有 12 个音。
Speaker 107:14 - 07:38
If you tell the model there are 12 tones, it will only ever produce those 12 tones. You will be forever limited. And if you tell the model there's 200 instruments, those are the only sounds that you'll ever be able to make, and you won't get the next Skrillex using Suno. And so for us, it was all about let's throw away everything we know about music, and let's try to do this from scratch. And it's like, it's just a sound wave.
Speaker 107:14 - 07:38
如果你告诉模型“有 12 个音”,那它以后就只会生成这 12 个音。你会被永远限制住。要是你告诉模型“有 200 种乐器”,那你以后能做出来的也只会是这些乐器的声音,你也就不可能用 Suno 做出下一个 Skrillex。所以对我们来说,关键就在于:把我们已知的关于音乐的一切都先扔掉,然后试着从零开始做这件事。说到底,它就只是一段 sound wave(声波)。
Speaker 107:38 - 07:55
It's just sampled at 48,000 times a second, and it is a continuous, you know, float 32 number, and let's figure out how to model that. And, that was a lot of the early breakthroughs that we had to make. But once we did, now you are only constrained by what you can describe in your imagination.
Speaker 107:38 - 07:55
它只是以每秒 48,000 次进行采样,而且它是连续的、你知道的、float 32 数值,我们得想办法搞清楚该如何对它建模。而这正是我们早期必须实现的许多突破之一。但一旦做到了这一点,你唯一受到的限制,就只剩下你能在想象中描述出来的东西。
Speaker 207:56 - 08:14
That's so cool. Have you found that we've basically just rediscovered the existing genres of music and the 12 notes? Have you, I guess, independently seen just that same behavior emerge when you're trying to learn music from first principles? Or have you seen different set of capabilities emerge?
Speaker 207:56 - 08:14
这太酷了。你们有没有发现,我们基本上只是重新发现了现有的音乐流派和那 12 个音?或者我想问的是,当你们试图从第一性原理出发学习音乐时,是否也独立地看到了同样的行为自然涌现?还是说你们看到了另一套不同的能力涌现出来?
Speaker 108:15 - 08:25
No. The amazing thing is now we see new things emerge that you never would have thought of. And so most of the time, this looks like blending genres that really have no business going together.
Speaker 108:15 - 08:25
没有。最惊人的是,我们现在看到了一些你以前根本想不到的新东西涌现出来。所以大多数时候,这看起来像是在混合那些本来根本不该放在一起的流派。
Speaker 208:26 - 08:26
Mhmm.
Speaker 208:26 - 08:26
嗯。
Speaker 108:26 - 08:52
And so you'll get, I don't know, trap with a sitar in it, or you'll get country with eight zero eights in it, or whatever it is. And, again, this is, like, really empowering people to do the things that are in their heads, and it's in a way that would not have been possible without a technology like this or would have been really, really hard. We see microtonal music. It is really inspiring to go and just look at all of the crazy things that people are making. Yeah.
Speaker 108:26 - 08:52
所以你会得到,比如说,我也不知道,带有 sitar 的 trap,或者带有 eight zero eights 的 country,诸如此类。不管具体是什么,这再一次,真的就是在赋能人们把他们脑子里的东西做出来,而且是以一种如果没有这种技术就不可能实现、或者至少会非常非常困难的方式。我们还看到了 microtonal music(微分音音乐)。光是去看看人们创作出来的那些疯狂作品,就已经非常鼓舞人心了。是的。
Speaker 108:52 - 08:58
A lot of them sound like genres you know, and a ton of them sound, like, totally strange and bizarre and lovely.
Speaker 108:52 - 08:58
其中很多听起来像是你熟悉的流派,还有大量作品听起来则完全陌生、怪异,但又很美。
Speaker 208:59 - 09:06
That's awesome. That's really cool. Are there certain genres that you're you're finding your model is better at and certain genres where you're worse at?
Speaker 208:59 - 09:06
太棒了,真的很酷。你们有没有发现,某些流派是你们的模型更擅长的,而某些流派则表现得没那么好?
Speaker 109:06 - 09:26
Definitely. We are I mean, I attempt not to say, like, good and bad about music other than, you know, it's sampled well, like the the full bit depth or full sampling rate. But to the extent that you can make such generalizations. We're, like, very good at country. We're very good at pop music.
Speaker 109:06 - 09:26
绝对有。我们——我的意思是,除了比如说采样得好不好,也就是是否具有完整 bit depth(位深)或完整 sampling rate(采样率)之外,我会尽量避免用“好”或“坏”来评价音乐。但如果可以做这种概括的话,我们对 country 非常擅长,对 pop music 也非常擅长。
Speaker 109:27 - 09:58
And I think the cartoon maybe to have in your head is that there are some genres that are some somewhat more formulaic than other genres, and so perhaps we're, like, better at those. But I have some sneaking suspicion that for those, it's it's as much raising the floor as it is raising the ceiling. And for the things where we're less good at it, we've not raised the floor, and so we make a lot of bad music. But we have also raised the ceiling. And if you're willing to go for long enough, you'll find amazing stuff.
Speaker 109:27 - 09:58
我觉得你脑子里可以有这样一个大致图景:有些流派比另一些流派更公式化一些,所以也许我们在这些流派上会更强一些。但我隐隐觉得,对这些流派来说,这与其说是在提高 ceiling(上限),不如说同样是在提高 floor(下限)。而对于那些我们不那么擅长的东西,我们还没有把 floor 提高起来,所以会做出很多糟糕的音乐。但我们也提高了 ceiling。所以如果你愿意持续尝试足够久,最终还是会找到非常惊艳的东西。
Speaker 209:58 - 10:06
That's so cool. Suno V5 seems like it was a real step change in quality. What goes into one of those step changes?
Speaker 209:58 - 10:06
那太酷了。Suno V5 看起来确实是在质量上实现了一次真正的跃迁。像这种跃迁式提升背后,通常都包含哪些因素?
Speaker 110:07 - 10:56
You know, it's really hard to predict when the step changes happen because it's really nonlinear in both the research inputs, but actually it's not even linear in how much like our testing says the model is better. And so just as an example, we can measure how much one model is preferred to another model, and you may come up with it's, you know, 10% preferred or 15% preferred. And you can take two different models and one is 10% preferred or 15% preferred, and the uptake on the other end, how much our users actually love it and use it or how much the product grows when you release it, won't necessarily be all that correlated with what the preference signal is. And it's because music is messy, and there's lots of other things that go into it. But to take a huge step back, like, we have a pretty aggressive research road map.
Speaker 110:07 - 10:56
你知道,这种跃迁什么时候会发生,其实非常难预测,因为这在 research(研究)投入上本身就高度非线性;但实际上,甚至连我们的测试结果里“模型变好了多少”这件事,也不是线性的。举个例子,我们可以衡量一个模型相对于另一个模型被偏好的程度,你可能会得出它“高出 10% 偏好”或者“高出 15% 偏好”这样的结果。你也可能拿两个不同模型来比较,其中一个被偏好多 10% 或 15%,但在另一端的实际采用情况——也就是我们的用户到底有多喜欢、会不会真的去用,或者产品发布之后会增长多少——未必会和这个偏好信号有那么强的相关性。原因在于,music(音乐)是很复杂、很混乱的,里面还牵涉很多其他因素。不过如果退一步从更宏观的角度来看,我们有一条相当激进的 research road map(研究路线图)。
Speaker 110:56 - 11:22
And in some weird way we're always working on this thing, you know, like we know what v six and v seven are. At some point, there's lots of things that you want to have your model do, there's lots of improvements that you want to make, and it's almost an arbitrary cutoff of saying like, okay. This is the break. This is what we're gonna call v 5.5, and everything that comes after is gonna go into the next models. And almost just to keep it on a steady cadence of when we release things.
Speaker 110:56 - 11:22
而且从某种奇怪的意义上说,我们其实一直都在做这件事。我们大致知道 v six 和 v seven 会是什么样。到了某个阶段,你会希望模型具备很多能力,也会有很多想做的改进,这时候说“好,到这里为止,这就是分界线,这一版我们叫它 v 5.5,后面的内容都放进下一代模型里”,某种程度上几乎是一个带有任意性的切分。这样做也几乎只是为了保持一个稳定的发布节奏,知道我们在什么时候发布东西。
Speaker 111:22 - 11:33
Because what you would hate to have happen is we don't release stuff for like two years and we try to make, you know, the music model to save humanity. And that's going come out in two years and we're going to do nothing before then.
Speaker 111:22 - 11:33
因为你最不希望发生的情况就是:我们两年都不发布任何东西,然后试图做出一个、你知道、那种“拯救人类”的音乐模型。结果它两年后才出来,而在那之前我们什么都不做。
Speaker 211:33 - 11:47
Yeah, totally. How much of each of these improvements do you think is just a function of scale, scaling compute, scaling data, and then getting a lot of human preference data back? How much are guys doing, I guess, more novel research?
Speaker 211:33 - 11:47
对,完全同意。你觉得这些改进里,有多少只是 scale(规模)带来的结果——比如扩大 compute(算力)、扩大 data(数据),再拿回大量 human preference data(人类偏好数据)?又有多少算是你们在做更有新意的 research(研究)?
Speaker 111:48 - 12:14
Music is really not a scale problem. The models are pretty small for a variety of reasons. And I think people will often incorrectly take what they know from LLM land, where models are giant and scale helps a ton, and apply it to music. And I think the cartoon that I have in my head is that in LLM land, there's all of these benchmarks. And you can quibble about which ones are flawed and which ones are good, but these benchmarks exist.
Speaker 111:48 - 12:14
音乐其实真的不是一个 scale 问题。出于各种原因,这类模型都相当小。我觉得人们经常会错误地把他们在 LLM 领域里的认知——那里模型巨大,而且 scale 带来的帮助非常明显——直接套用到音乐上。而我脑海里对这件事的一个简化图景是:在 LLM 领域里,有各种各样的 benchmarks(基准测试)。你当然可以争论哪些 benchmark 有缺陷、哪些更靠谱,但这些 benchmark 确实是存在的。
Speaker 112:15 - 12:33
And scale is actually a pretty efficient way to climb up the ladder and just keep doing better and better on the benchmarks. In music, there are no right answers. There are no benchmarks. And so scale is somewhat less helpful in solving it. It's like a it's a messier problem in many ways, aligning models to creative human tastes.
Speaker 112:15 - 12:33
而 scale 实际上是一种非常高效的方式,能让你一路往上爬,在这些 benchmarks 上持续做得越来越好。但在音乐里,没有标准答案,也没有 benchmarks。所以 scale 在解决这个问题上帮助就没那么大。从很多方面来说,这是一个更混乱的问题:要让模型去对齐人类在创造性上的品味,本来就更难。
Speaker 112:33 - 12:36
You and I are not gonna agree on every song. You and I aren't even gonna agree on
Speaker 112:33 - 12:36
你和我不可能在每一首歌上都意见一致。你和我甚至连……都不会一致。
Speaker 212:36 - 12:39
I'll just defer to whatever you say. You you have systems. Don't have
Speaker 212:36 - 12:39
那我就听你的吧。你们有系统。没有——
Speaker 112:39 - 13:08
I mean, I I don't think you wanna do that. But so the the and the models not being that big actually lets us get you the music quicker, which turns out to be really important for good UX. And so I think a lot of this boils down to research and preference data. And so we we gather preference data that lets us align models to what our users like. A really underappreciated thing is how much this preference data actually lets us do research.
Speaker 112:39 - 13:08
我的意思是,我觉得你不会想那样做。但模型没那么大这一点,实际上让我们能更快把音乐交付给你,而这对好的 UX(用户体验)来说结果证明非常重要。所以我觉得,这里面很多事情都归结为 research(研究)和 preference data(偏好数据)。我们会收集偏好数据,让我们能够把模型对齐到用户喜欢的东西上。一个非常被低估的点是,这些偏好数据实际上让我们能够做多少研究。
Speaker 113:08 - 13:23
Like, the scale of preference data that we have, we wouldn't even be able to develop the techniques that we are using. And so there are really some virtuous cycles there in how the product itself keeps getting better just by virtue of having people use it.
Speaker 113:08 - 13:23
就拿我们拥有的 preference data(偏好数据)的规模来说,如果没有它,我们甚至都无法开发出现在正在使用的这些技术。所以这里面确实存在一些良性循环:产品本身会仅仅因为有人在使用它,就持续变得更好。
Speaker 213:23 - 13:36
Interesting. And I guess you can use the human preference data in a much stronger way than the text models because they're all worried about sicko fencing, right? And for you, I guess that's much less of a challenge.
Speaker 213:23 - 13:36
有意思。我想你们大概能比 text models(文本模型)更强力地使用 human preference data(人类偏好数据),因为它们都在担心 sicko fencing,对吧?而对你们来说,我猜这方面的挑战要小得多。
Speaker 113:36 - 13:46
100%. 100%. And so I think, yeah, there's just a tremendous amount of edge comes from our ability to understand it, do research on it, and then RL that back into our models.
Speaker 113:36 - 13:46
完全是。完全是。所以我觉得,没错,我们巨大的优势确实来自于:我们能够理解这些数据,对它做 research(研究),然后再通过 RL(强化学习)把这些反馈回灌进模型里。
Speaker 213:46 - 14:00
That's awesome. Okay. Want to switch gears a little bit and talk about music as a consumer phenomenon. And you've mentioned consumer creative entertainment platform at the beginning. I want to dig into what that means.
Speaker 213:46 - 14:00
太棒了。好,我们稍微换个话题,聊聊音乐作为一种 consumer phenomenon(消费现象)。你一开始提到过 consumer creative entertainment platform(面向消费者的创意娱乐平台),我想深入挖一挖这到底是什么意思。
Speaker 214:00 - 14:24
Maybe starting with like, seems like music is just like a cultural social phenomenon of, you know, I like this song. I send it to my friend. You know, it's a scarce resource. We bond over liking that song and, you know, having the mix tapes, listening to it together, etcetera. And to me, music has always just been like this shared cultural experience.
Speaker 214:00 - 14:24
也许可以先从这里开始:音乐看起来就是一种文化性的、社交性的现象。比如,你知道的,我喜欢这首歌,我把它发给朋友。它是一种稀缺资源。我们会因为都喜欢那首歌而建立联系,还会一起做 mix tapes,一起听它,等等。对我来说,音乐一直就是这样一种共享的文化体验。
Speaker 214:24 - 14:29
That's what it is. Do you agree with that? And then if so, what does AI music mean for that?
Speaker 214:24 - 14:29
它本来就是这样。你同意这个看法吗?如果同意的话,那 AI music(AI 音乐)对这件事又意味着什么?
Speaker 114:30 - 14:49
I agree with that very strongly. Music has a very different place in culture than other media in a variety of ways. One is actually people's tastes are far more developed in music than they are in other media. Like everybody has taste in music in a way that most people don't have taste in film or literature. And the other thing is that music is actually inherently a much more social medium.
Speaker 114:30 - 14:49
我非常强烈地同意这一点。音乐在文化中的位置与其他媒介在很多方面都非常不同。其中一点其实是,人们在音乐上的品味远比他们在其他媒介上的品味发展得更成熟。比如说,几乎每个人都有音乐品味,而大多数人并不会以同样的方式对电影或文学形成明确品味。另一点是,音乐本质上其实是一种更具社交属性的媒介。
Speaker 114:50 - 15:26
And if you think about it how going to a concert is an inherently social thing even though you're only really looking at the performers and it's because of the people around you in a way that, let's say, going to a movie in a movie theater isn't quite as elevated as it would be compared to an empty movie theater, for example. And so I think this is actually largely that humans communicating sonically through our mouths and ears and therefore music is like a much earlier method of communication than than writing. It's like much more in our DNA, I think, compared to other things. I'm obviously biased. I obviously love music.
Speaker 114:50 - 15:26
如果你想一想,去听演唱会本身就是一件天然带有社交性的事,尽管你真正注视的其实只有表演者;但某种程度上,正是你周围的人构成了这种体验。相比之下,比如去电影院看电影,就不会因为周围有人而被提升到同样的程度;拿空荡荡的电影院来对比,这种差异就更明显了。所以我认为,这在很大程度上是因为人类通过嘴巴和耳朵进行声音交流,因此音乐其实是一种比文字更早的沟通方式。和其他事物相比,我觉得它更深地写在我们的 DNA 里。当然我有偏见,我显然很爱音乐。
Speaker 115:28 - 15:55
I'm not sure I think people assume that, oh, you're just gonna have, like, AI powered Spotify and it's gonna dehumanize it and music is gonna get terrible. That seems to me to be obviously wrong. I don't think you're gonna make a better Spotify just by powering it with AI. And the thing that's really interesting is actually how can we not just change but elevate the place of music in culture? And music has this other funny thing that by virtue of being so ubiquitous, it ends up being in the background a lot.
Speaker 115:28 - 15:55
我不太确定。我觉得人们会想当然地认为:哦,无非就是做一个 AI 驱动的 Spotify,然后它会把音乐去人性化,音乐会变得很糟。这在我看来显然是错的。我不认为只要给 Spotify 加上 AI,就能做出一个更好的 Spotify。真正有意思的问题其实是:我们如何不仅改变音乐在文化中的位置,而且提升它的位置?而且音乐还有一个有趣的特点:正因为它无处不在,它很多时候反而成了背景。
Speaker 115:56 - 16:13
And, the thing that's amazing is that AI can be used to actually change that and to, augment how music is perceived in society and in culture, augment how it is used socially because it's actually become less social in the last thirty years. And so that is the corner of the universe that we play in and that we are really excited about.
Speaker 115:56 - 16:13
而真正令人惊叹的是,AI 可以被用来真正改变这一点,提升音乐在社会和文化中被感知的方式,提升它在社交场景中的使用方式,因为在过去三十年里,它实际上变得没那么社交了。所以这就是我们所专注的那一块领域,也是我们真正感到兴奋的方向。
Speaker 216:14 - 16:25
Do you see yourselves as today and I guess when you look at your users, are people more creators of music, or are they more consumers of music or both?
Speaker 216:14 - 16:25
你们现在如何看待自己?或者说,当你们观察自己的用户时,人们更多是在创作音乐,还是更多在消费音乐,还是两者都有?
Speaker 116:25 - 16:41
This is the crazy thing about Suno. Before Suno, basically, everybody was a consumer of music. You know, compared to the 8,000,000,000 people on the planet, there are very few people who make music, and the rest of us consume it. And that's fine. It tends to cater to passivity.
Speaker 116:25 - 16:41
这就是 Suno 最疯狂的地方。在 Suno 出现之前,基本上每个人都是音乐的消费者。你想,相对于地球上 8,000,000,000 人来说,真正创作音乐的人非常少,其余的人都在消费音乐。而这也没什么问题。只是这种模式往往会迎合被动性。
Speaker 116:41 - 17:08
It tends to cater to making it less social and more impersonal. And the crazy thing about Suno is that in any given day, percent of the users are going to create something. And the thing that's hard to wrap your head around is you're not creating it to go bring it elsewhere by and large, to do something with it. People are creating music for the fun and enjoyment and fulfillment that comes with being creative. And so that, the creation, is actually the entertaining bit, and that is the big step change.
Speaker 116:41 - 17:08
它往往会让音乐变得更不社交、更缺乏人与人之间的连接。而 Suno 最疯狂的一点在于,在任何一天里,都会有相当比例的用户去创作一些东西。真正难以一下子理解的是,总体来说,你创作这些内容并不是为了把它拿到别处去,用它做点别的什么。人们创作音乐,是为了享受创造本身带来的乐趣、愉悦和满足感。所以,创作这件事本身其实就是最有娱乐性的部分,而这正是那个巨大的跃迁。
Speaker 117:08 - 17:35
It's like everybody in the world is creative. Being creative makes you feel a certain way. This is, like, in our DNA, and we are basically using technology to allow everybody to feel those warm and fuzzy feelings. A lot of, like, the inspiration for me personally for doing this, comes from just remembering, like, the the fondest memories that I've had or some of the fondest memories that I've ever had are making music with my friends, not even performing in bands. Like, practice was so much fun, and you get really close to people making music.
Speaker 117:08 - 17:35
这就好像世界上的每个人都有创造力。创造会让你产生某种特别的感受。这就像是写在我们 DNA 里的东西,而我们本质上是在用技术让每个人都能感受到那种温暖而美好的感觉。对我个人来说,做这件事的很多灵感,都来自于回想我最珍贵的一些记忆,或者说我人生中最珍贵的一些记忆,就是和朋友们一起做音乐,甚至都不是在乐队里演出。比如排练本身就特别有趣,而且一起做音乐会让你和别人变得非常亲近。
Speaker 117:36 - 17:45
And it's because it feels really good to be productive in a way that, doomscrolling your favorite app for an hour does not feel so good when you're done.
Speaker 117:36 - 17:45
这是因为,以一种有生产力的方式做事,感觉真的很好;而在你最喜欢的 app 里 doomscrolling(无休止刷负面/碎片信息)一个小时,结束时的感觉就没那么好了。
Speaker 217:46 - 17:50
I was an orchestra kid, so I was not doing nearly as cool music as you, but I totally did.
Speaker 217:46 - 17:50
我以前是管弦乐队里的孩子,所以我做的音乐远没有你那么酷,但我完全懂。
Speaker 117:50 - 17:51
What did you play?
Speaker 117:50 - 17:51
你演奏什么?
Speaker 217:51 - 17:52
Violin.
Speaker 217:51 - 17:52
小提琴。
Speaker 117:52 - 17:53
Do you still?
Speaker 117:52 - 17:53
你现在还拉吗?
Speaker 217:53 - 18:05
Yeah. Not no way. Excellent. I have perfect pitch, and I let's just say I'm definitely not playing 12 tones now, so my ears bleed when I play. But I totally agree with you.
Speaker 217:53 - 18:05
对。也不是完全不拉。太好了。我有绝对音高,而且这么说吧,我现在肯定不是在玩 12 tones(十二音体系)那一套,所以我一演奏就觉得刺耳得不行。但我完全同意你的看法。
Speaker 218:05 - 18:16
Okay. So, it's like a self expression slash active entertainment platform, some parallels gaming, and some parallels to even like Claude Code. Right?
Speaker 218:05 - 18:16
好。所以,它有点像一个自我表达 / 主动式娱乐平台,和 gaming(游戏)有一些相似之处,甚至和 Claude Code 也有一些相似之处。对吧?
Speaker 118:16 - 18:36
Absolutely. So I think the thing that's amazing about making music is that you feel good and fulfilled and you enjoy making it, and then you listen to it. And there are parallels, and so that's what we call creative entertainment. The entertaining part is being creative. It's not that you are being creative for the sake of bringing the piece of content somewhere else.
Speaker 118:16 - 18:36
完全同意。所以我觉得,创作音乐最惊人的地方在于:你会感觉很好、很有满足感,而且你享受创作它的过程,之后你还会去听它。这其中确实有一些相通之处,所以我们把这叫作 creative entertainment(创造性娱乐)。其中“娱乐”的部分就在于“创造”本身。并不是说,你进行创作是为了把这段内容再带到别的地方去。
Speaker 118:37 - 19:16
I think you see that in cooking. People like to cook even though they can get a better meal at a restaurant, and it's because it it is fun to cook and it is fun to consume what you make. And I think a lot of what makes Claude Code or any of the other platforms so special is that it's fun to build things, and it's fun to use what you build. And even though most of the things that I build are definitely not meant to be hosted in AWS and used by millions of people, I actually enjoy the act of building, and I enjoy the act of using the thing that I built. And so I predict that like in ten or twenty years there will be way more of these creative entertainment things all over the place, and it's because that's actually finally possible.
Speaker 118:37 - 19:16
我觉得你能在做饭这件事上看到这一点。人们喜欢做饭,尽管他们在餐厅里能吃到更好的饭菜,因为做饭本身很有趣,吃掉自己做出来的东西也很有趣。我认为,Claude Code 或其他任何平台之所以特别,很大一部分原因就在于:创造东西很有趣,使用自己创造出来的东西也很有趣。虽然我做的大多数东西显然都不是为了部署在 AWS 上、给几百万人使用的,但我确实享受创造的过程,也享受使用自己做出来的东西。所以我预测,十年或二十年后,到处都会出现更多这类富有创造性的娱乐事物,因为这终于真的成为可能了。
Speaker 119:16 - 19:24
Like that is the thing that AI unlocks. It unlocks lots of intelligence things too, but it actually lets everybody be creative in almost any domain.
Speaker 119:16 - 19:24
这正是 AI 解锁的东西。它当然也解锁了很多智能层面的能力,但它真正做到的是:让每个人几乎都能在任何领域里发挥创造力。
Speaker 219:24 - 19:28
Yeah. I'm guessing you have an opinion on this. What do think of the word slop?
Speaker 219:24 - 19:28
对。我猜你对此是有看法的。你怎么看 slop 这个词?
Speaker 119:30 - 19:51
I do I do have an opinion. I actually I mean, my answer is usually it's it's thrown around without any meaning, and I don't know I don't I don't know what people what people mean by that. I made two songs with my five year old yesterday. Is that slop? In the sense that 99.999% of the planet has no interest in hearing that?
Speaker 119:30 - 19:51
我确实有看法。其实,我通常的回答是:这个词经常被随手乱用,却没有什么明确含义,我也不知道大家说这个词时到底指的是什么。昨天我和我五岁的孩子做了两首歌。那算 slop 吗?如果从“地球上 99.999% 的人都不会想听”这个意义上说的话?
Speaker 119:51 - 20:11
Sure. But that's really meaningful to me, and so if you call that slop, don't I'm not sure I care. It's an interesting question, though. Right? Like, this has this has happened before, at least in music, where when way more people start to be able to produce something, people get afraid that it's just going to flood all of our ears and all of the platforms with more content.
Speaker 119:51 - 20:11
当然算。但那对我来说真的很有意义,所以如果你把那叫作 slop,我也不确定我是否在乎。不过,这的确是个有意思的问题。对吧?这种事以前也发生过,至少在音乐领域是这样:当越来越多的人开始能够生产某种内容时,人们就会担心这些内容会淹没我们的耳朵,也淹没所有平台。
Speaker 120:12 - 20:40
And this happened when people started to able to make music on their laptops. You had like a lot of 13 year olds making beats in their bedrooms, and you fast forward to today, that seems like obviously a good thing. Yeah, there's way more music. It means that there's way more quote unquote bad music, but it also means that there's way more great music, and there's new kinds of music that get made, and there's new kinds of stars that get made. And, I see no reason why way more people making music again would be any different from that.
Speaker 120:12 - 20:40
当人们开始能在自己的 laptop 上制作音乐时,就发生过这种情况。你会看到很多 13 岁的孩子在卧室里做 beats,而快进到今天,这显然看起来是件好事。没错,音乐变多了。这意味着所谓“糟糕的音乐”也更多了,但也意味着优秀的音乐更多了,还会出现新的音乐类型,也会诞生新类型的明星。而且,我看不出为什么这一次有更多人制作音乐,会和那时有什么不同。
Speaker 220:40 - 20:57
I love that. So we talked about the floor, the slop floor or the non slop floor. What about the ceiling? Like, what tell us a little bit about the most incredible things people have been able to do with SunO. And I think you guys have had some chart topping hits now.
Speaker 220:40 - 20:57
我喜欢这个说法。所以我们刚才谈了下限,也就是 slop 的下限,或者非 slop 的下限。那上限呢?能不能跟我们讲讲,人们用 SunO 做出的那些最不可思议的东西?而且我想你们现在应该已经有一些登上排行榜榜首的热门作品了。
Speaker 220:57 - 20:58
Maybe talk a little bit about that.
Speaker 220:57 - 20:58
也许可以稍微讲讲这个。
Speaker 120:58 - 21:33
We have had some chart topping hits. We've had people sign to record deals. We've had people make single songs that chart, and that's amazing. And I think about that as that is a new creator coming with a new perspective that resonates very strongly with people, and so that is obviously the ceiling going up. My favorite example is Zanaya Monet, who it's the the stage name of a poet who took all of her beautiful poetry that she had been writing for, a decade and started to make music out of it and found an entirely new voice and an entirely new audience to resonate with her art.
Speaker 120:58 - 21:33
我们已经有一些登上排行榜榜首的热门作品。也有人签下了唱片合约。还有人做出了单曲并打进排行榜,这非常了不起。我会把这看作是:一个带着全新视角的新 creator(创作者)出现了,而这种视角与人们产生了非常强烈的共鸣,所以这显然意味着上限正在提高。我最喜欢的例子是 Zanaya Monet,这其实是一位诗人的 stage name(艺名);她把自己十年来写下的那些优美诗作拿出来,开始把它们做成音乐,并由此找到了一个全新的声音,以及一个能与她艺术产生共鸣的全新受众。
Speaker 121:33 - 21:50
And, like, yeah, I think this is fantastic. This is this is people connecting. Right? These are this is, like, the the most personal thing in the world, and when you go listen to the music, you'll realize it's extremely personal. Like, the best music will always require human guidance, and it's because again, music has no right answer.
Speaker 121:33 - 21:50
而且,没错,我觉得这太棒了。这是人与人之间的连接,对吧?这是世界上最个人化的东西之一;当你去听这些音乐时,你会意识到它极其私人化。最好的音乐始终都需要人的引导,因为说到底,音乐并没有唯一正确答案。
Speaker 121:51 - 22:32
You like a piece of music because of how it sounds and because of the messenger who delivers it. And we will find new messengers with new sounds, and we already are. And that to me, it's it's like obviously the ceiling is going up there. The other thing that's really cool is even if they're like I just know that there are tons of charting tracks that have little bits of Suno in them, they're not entirely Suno, and it's because for the professionals it's also just an amazing tool to use as part of your workflow, it's not your whole workflow. And so there's this weird thing where I think people incorrectly say it's like either all AI or non AI, and the vast majority of music will have some AI in it, just like the vast majority of music today is auto tuned or is digitally produced.
Speaker 121:51 - 22:32
你喜欢一段音乐,是因为它听起来的样子,也因为传递它的那个 messenger(表达者、传递者)。而我们会找到拥有新声音的新 messenger,我们其实已经在这样做了。对我来说,这显然也意味着上限正在提高。另一件非常酷的事是:即便如此,我也知道有大量上榜歌曲里都带有一点 Suno 的成分;它们并不完全是 Suno 生成的。这是因为对专业人士来说,Suno 也只是工作流中的一个极佳工具,而不是你的全部工作流。所以这里有一种很奇怪的情况:我认为人们错误地把它说成非此即彼,好像不是全 AI(人工智能)就是完全非 AI。但事实上,绝大多数音乐里都会有一些 AI,就像今天绝大多数音乐都会经过 auto-tune 处理,或者本来就是数字化制作的一样。
Speaker 122:32 - 22:42
And again, more tools let you push music forward faster, let you find new sounds faster. To me, this is like obviously the ceiling going higher.
Speaker 122:32 - 22:42
再说一次,更多工具能让你更快地推动音乐向前发展,更快地找到新的声音。在我看来,这显然就是上限在变得更高。
Speaker 222:42 - 23:08
That's amazing. Okay. So you chose to go after music, which is like probably the one industry that a lawyer would tell you not to go after because the minute you're there, you have pitchforks coming after you. I think you just had a pretty landmark settlement partnership with Warner. Can you tell us more about that and what you think that means for the future of collaboration with the existing professional music industry?
Speaker 222:42 - 23:08
太厉害了。好,那么你选择了音乐这个方向,而这大概是 lawyer(律师)会告诉你最不该进入的行业——因为你一进去,立刻就会有人举着 pitchforks(草叉,意指群起而攻之)冲向你。我想你们最近和 Warner 达成了一项相当具有里程碑意义的 settlement partnership(和解合作)。你能多讲讲这件事吗?以及你认为这对未来与现有专业音乐行业的协作意味着什么?
Speaker 123:08 - 23:29
Absolutely. I think just to back up, I think people, incorrectly assume that we hate the existing music industry, and especially we hate the record labels. People also expect me to say, like, oh, the record labels are cooked. I think that's obviously wrong. They're some of the most culturally important institutions in the world.
Speaker 123:08 - 23:29
当然可以。我想先稍微往回说一点:我觉得人们错误地以为我们讨厌现有的音乐行业,尤其讨厌 record labels(唱片公司)。人们也期待我会说什么“哦,唱片公司完了”。但我觉得这显然是错的。它们是世界上最具文化重要性的机构之一。
Speaker 123:29 - 24:01
They understand music, and they understand music culture. They cultivate and grow stars, that resonate with billions of people. And the way I see it, it would be a real shame if there were two worlds of music, if there were like an AI world of music and a non AI world of music. One is it makes no sense because most music will have some AI in it. But the other is it's it's just bad for the end user to think about having to to separate these things in your head and to have to go to different platforms to have effectively similar usage patterns or interactions.
Speaker 123:29 - 24:01
他们懂音乐,也懂音乐文化。他们培养并塑造明星,而这些明星能够与数十亿人产生共鸣。在我看来,如果音乐世界分裂成两个世界,那会非常可惜:一个是 AI 音乐世界,一个是非 AI 音乐世界。首先,这本身就说不通,因为大多数音乐里都会有一些 AI。其次,让最终用户在脑子里把这些东西分开、还得去不同平台上完成本质相似的使用模式或互动,这本身就不是一件好事。
Speaker 124:01 - 24:27
And so what I'm most excited about doing with Warner is actually building things together that could never have existed before and building products that let fans interact with their favorite artist and really deepen, the artist fan connection in ways that are just positive some for everyone. It's great for the artist. They get to engage with their fans. It's great for the fans, they get to feel like they are engaging with their favorite artists through music. It's great for the rights holders.
Speaker 124:01 - 24:27
所以,我对和 Warner 一起做的事情最兴奋的一点,其实是共同打造那些以前根本不可能存在的东西,打造能让粉丝与自己最喜欢的 artist(艺人)互动的产品,并真正以对所有人都是 positive-sum(正和)的方式,加深 artist 与 fan(粉丝)之间的连接。这对 artist 很棒,他们能和粉丝互动;这对粉丝也很棒,他们会感觉自己能够通过音乐与喜爱的 artist 互动;这对 rights holders(权利持有人)也很棒。
Speaker 124:27 - 24:59
Obviously this is like a heavily monetizable thing, and it's something that literally could not have existed up until like approximately right now. And my sincere hope is that going forward we find way more of these opportunities of things to build together that couldn't have existed until today. And just to say it out loud, like the digital musical experience has basically not changed for twenty five years. We've just been streaming music for twenty five years, and I think music is, like, due for a new innovation and a new format. And so that's what we're here to do.
Speaker 124:27 - 24:59
很显然,这是一件非常容易 monetizable(变现)的事,而且它确实是在大约就是当下这个时间点之前根本不可能存在的东西。我的真诚希望是,接下来我们能找到更多这类可以一起打造、而且直到今天才有可能存在的机会。也直接把话说出来吧,数字音乐体验在过去二十五年里基本上没怎么变过。我们这二十五年基本上一直都只是在 streaming music(流媒体听音乐),而我觉得,音乐早就该迎来一次新的创新和一种新的 format(形式)了。所以,这就是我们来到这里要做的事。
Speaker 224:59 - 25:01
When are we gonna see Suno at Coachella?
Speaker 224:59 - 25:01
我们什么时候能在 Coachella 看到 Suno?
Speaker 125:02 - 25:07
You probably have already. It's, like, it's probably in a lot of the music. It's probably in a lot of the backing I'm sure some
Speaker 125:02 - 25:07
你可能其实已经看到了。它很可能已经出现在很多音乐里了,我觉得它大概也已经出现在很多 backing(伴奏)里了,肯定有一些——
Speaker 225:07 - 25:12
But I mean, like, main stage, you know, like a consumer consumer participation thing.
Speaker 225:07 - 25:12
但我的意思是,比如说,main stage(主舞台)那种,你知道的,真正面向 consumer(消费者)的、consumer participation(消费者参与)式的东西。
Speaker 125:13 - 25:51
I hope that at some point in the next year, we see a truly interactive concert where the audience is actually able to participate and make music with that artist. One of the coolest parts of my job is if I go and I demo Sudo to an audience of hundreds or even a thousand people is making a song with that many people all at once. And it's a very special moment. It's like almost religious, you know, like a lot of religions will do like chanting and singing in large groups, and why shouldn't why does that have to be like confined only to a religious context? Why can't that happen at Coachella where people are already so excited about being together at a festival?
Speaker 125:13 - 25:51
我希望在明年某个时候,我们能看到一场真正 interactive(互动式)的演唱会,观众实际上能够参与进来,并和那个 artist(艺术家)一起创作音乐。我这份工作里最酷的部分之一,就是如果我去给几百人、甚至上千人的 audience(观众)现场 demo(演示)Suno,就是和这么多人同时一起做一首歌。那是一个非常特别的时刻。它几乎有点像宗教体验,你知道吗?很多宗教都会让大群体一起 chant(吟唱)和 singing(歌唱),那为什么这种体验非得只局限在宗教语境里呢?为什么它不能发生在 Coachella 呢?在那里,人们本来就已经因为共同参加一个 festival(音乐节)而如此兴奋。
Speaker 125:51 - 25:54
And so my sincere hope is that happens in the next twelve months.
Speaker 125:51 - 25:54
所以,我真心希望这件事会在未来十二个月内发生。
Speaker 225:54 - 26:17
I love it. Okay, we talked a lot about the model layer and then the, I guess, the cultural experience of making music. I'd love to talk about the application layer product building, because I think that's also an area where you guys have been really, really innovative. What's your approach to how to think about building in the application layer?
Speaker 225:54 - 26:17
我喜欢这个想法。好,我们刚才谈了很多 model layer(模型层),然后还有我想可以称之为音乐创作的 cultural experience(文化体验)。我也很想聊聊 application layer(应用层)的产品构建,因为我觉得这也是你们一直非常、非常有创新性的一个领域。你们对于如何思考 application layer 的构建,采取的是什么方法?
Speaker 126:17 - 27:07
I guess a lot to say here. The first thing is actually there really isn't enough innovation for consumers right now, but the average consumer is not willing to put up with rough edges and it's because you're not using this for work, you're using this for fun, you're probably paying for it and not your boss is paying for it or your company is paying for it. And so there just needs to be a bigger emphasis on the actual experience that we deliver to people. Also just if we're being honest, like it's unclear how much moat exists in only a model, and it's like I'll just say it like Google has started to build music models, and while ours are way better today, they're Google and they'll outspend us seven days a week, and they can probably catch up on the model side. And so I think it's just really undervalued to invest in the product and the UI and the UX to make sure that you're constantly delighting people.
Speaker 126:17 - 27:07
我觉得这里有很多可说的。第一点其实是,现在面向消费者的创新真的还不够,但 average consumer(普通消费者)是不愿意容忍那些 rough edges(粗糙、不完善之处)的,因为你不是为了工作在用这个,你是为了好玩;而且很可能是你自己在付钱,不是你老板在付,也不是你公司在付。所以,我们必须把更大的重点放在我们实际交付给用户的体验上。还有,如果我们坦率一点说,单靠 model(模型)本身到底能形成多大的 moat(护城河),其实并不明确。话我就直说了:Google 已经开始做音乐模型了,虽然我们今天的模型要好得多,但他们是 Google,他们一周七天都能比我们花更多钱,而且他们很可能会在模型这一侧追上来。所以我认为,投入到 product(产品)、UI(用户界面)和 UX(用户体验)上,确保你能持续不断地让用户感到惊喜,这件事其实被严重低估了。
Speaker 127:09 - 27:36
You know, one of our company values is actually like we're just a music company, and in many ways I don't think of us as a technology company. And this is to make sure that we're not building technology for the sake of building technology, we're building technology for the sake of delighting people, And infusing that in the culture is actually just like really helpful in getting people to realize what the whole point of the company is. And so that manifests itself in lots of little ways, but from a product building strategy, that's what it is.
Speaker 127:09 - 27:36
你知道,我们公司的一个价值观其实很像是:我们就只是一家音乐公司,而且在很多方面,我并不把自己看作一家 technology 公司。这样做是为了确保我们不是为了做 technology 而做 technology,我们做 technology 是为了取悦人们。把这一点注入公司文化,实际上非常有助于让大家意识到这家公司存在的根本意义。所以这一点会体现在很多细小之处,但如果从产品构建策略来说,核心就是这个。
Speaker 227:37 - 27:46
That's awesome. What are some of the, I guess, consumer product decisions you've made that you're most proud of or that were the most contrarian?
Speaker 227:37 - 27:46
太棒了。那我想问,你们做过哪些 consumer product(消费级产品)决策,是你最自豪的,或者说最反常识的?
Speaker 127:46 - 28:08
A bunch. One that I got wrong was getting off Discord very quickly. I thought we would be on Discord for a while. We got off Discord at the 2023, and we released a a pretty thin, not full featured web app, and it took five days for 90% of the traffic to move to the web. So it's just like a an overwhelming signal that I had gotten that wrong.
Speaker 127:46 - 28:08
有很多。一个我判断错了的决定,是很快离开 Discord。我原以为我们会在 Discord 上待一段时间。我们在 2023 年离开了 Discord,发布了一个功能还比较薄、并不完整的 web app,结果只用了五天,90% 的流量就迁移到了 web。这就像一个压倒性的信号,说明我当时确实判断错了。
Speaker 128:09 - 28:38
Maybe the biggest one and the most contrarian one is at the time a lot of people were experimenting with music. Let me give two actually. One was to focus on songs and not just background music, to focus on lyrical music. And it's because a song is a story and captivates you in a way that vocalist background music just won't. It was also just way harder, and so nobody was really able to do that at the time, and so by figuring that out that was certainly a source of moat.
Speaker 128:09 - 28:38
也许最大、也最反常识的一点是,在当时很多人都在尝试做音乐。其实我给你两个例子。一个是专注于 songs,而不只是 background music,专注于带歌词的音乐。这是因为 song 是一个故事,它吸引你的方式是纯 vocalist background music 做不到的。而且这件事本身也难得多,所以当时基本没人真正能做到;我们把这个问题解决掉之后,这当然也成了我们的 moat(护城河)来源之一。
Speaker 128:38 - 29:17
But in hindsight, it's not just that we were able to do something hard, it's that the human voice touches people in a certain way and just makes the product way more delightful than just making background music for fun. And then the other is also in the same direction. We decided to make full songs. And so again, a song is a story, you know, it's maybe on average three or three and a half minutes, and we optimized for that even though originally most technologies just let you make something like ten or twelve seconds of music at the expense of sound quality. And for the longest time, our audio was really not crisp, and every single one of our competitors had just way crisper audio, and everybody could hear one second of a Suno song and know, oh, that sounds like crap.
Speaker 128:38 - 29:17
但事后看,不只是因为我们做成了一件困难的事,更是因为 human voice 会以某种特殊方式打动人,这让产品比起只是做着玩的 background music 要令人愉悦得多。另一个决定也在同一个方向上。我们决定做完整的 songs。所以再说一次,song 是一个故事,通常平均大概三分钟到三分半钟,我们就是围绕这个目标来优化的,尽管最初大多数技术只能让你生成大约十秒或十二秒的音乐,而且还要牺牲音质。在很长一段时间里,我们的音频其实一点都不 crisp,而我们的每一个竞争对手的音频都要清晰得多,大家只要听一秒 Suno song,就会知道:哦,这听起来真糟。
Speaker 129:17 - 29:50
That's a Suno song. And to just go all in on that and say, like, okay, we're going to make full songs, and, yes, they're not gonna sound amazing, but they are still gonna tell the story instead of making perfectly sounding audio that just is like background music. And so the choice of technologies there was to use autoregression instead of diffusion, but it was really kind of product driven and to say, like, it's not just that we like autoregression because we have, like, emotional attachment to that technology. It's because we think that making a song and telling a story is more important than making crisp audio.
Speaker 129:17 - 29:50
这就是一首 Suno song。我们就是要在这件事上 all in(全力投入),并且说,好,我们要做完整的 songs,没错,它们听起来不会特别惊艳,但它们依然能够讲故事,而不是去做那种听起来很完美、却只是 background music 的音频。所以这里的技术选择是用 autoregression(自回归)而不是 diffusion(扩散),但这其实首先是 product 驱动的。也就是说,并不是因为我们对 autoregression 这种技术有什么情感上的偏爱,而是因为我们认为,做出一首 song、讲出一个故事,比做出 crisp audio 更重要。
Speaker 229:50 - 30:00
That's so cool. What's ahead for SunO? 300,000,000 in revenue run rates. I mean, you've made it extraordinarily far. What's ahead?
Speaker 229:50 - 30:00
太酷了。那 SunO 接下来会做什么?300,000,000 的 revenue run rates(收入年化运行率)。我是说,你们已经走得非常远了。接下来呢?
Speaker 130:01 - 30:08
A lot. I think it's really early. Most people don't even know about us. The product is still very crude. There's a lot of room to run.
Speaker 130:01 - 30:08
很多。我觉得现在还非常早,大多数人甚至都还不知道我们。产品也依然很粗糙。还有非常大的增长空间。
Speaker 130:08 - 30:32
I think you'll see us do a couple of things. One is try to increasingly make it a more social interaction. So, like, music is meant to be social, and so you are meant to be sharing music more with people, but also creating more with people. And that can be both synchronous and asynchronous. And so perhaps one day, I'm going to share you not even a song, but a template for a song that you are gonna explicitly riff on and send back to me, and that is you and I kind of co creating.
Speaker 130:08 - 30:32
我觉得你会看到我们做几件事。其一,是努力让它越来越成为一种更具社交性的互动。比如说,音乐本来就应该是社交的,所以人们本就应该更多地彼此分享音乐,但也应该更多地一起创作。而这种共创既可以是 synchronous(同步)的,也可以是 asynchronous(异步)的。所以也许有一天,我分享给你的甚至不是一首歌,而是一个歌曲模板,你会基于它明确地 riff(即兴发挥)再发回给我,而这某种意义上就是你我在共同创作。
Speaker 130:32 - 31:04
Maybe you're gonna do that with your favorite artist with some of their old music that never got released, whatever that may be. And, I think you will see us go, much more, in the direction of letting people express themselves in the music. And so, like, the last big feature we've released is your the ability to use your own voice. When you hear yourself in the song, you get so much more attached to it. But, actually, even more so is when I send you a song and you can hear me in it, that song will resonate much more than some nondescript voice, even if that nondescript voice is very good.
Speaker 130:32 - 31:04
也许你还会和你最喜欢的 artist(艺术家)一起这么做,拿他们一些从未发布的旧作品来玩,不管具体是什么形式。我想,你会看到我们更大幅度地朝着让人们在音乐中表达自我的方向前进。比如,我们最近发布的一个重要功能,就是让你能够使用自己的声音。当你在歌曲里听到自己时,你会对它产生强得多的情感连接。但其实更进一步的是,当我发给你一首歌,而你能在里面听到我的声音时,那首歌带来的共鸣会比某个不具名的声音强得多,哪怕那个不具名的声音本身非常好。
Speaker 131:04 - 31:17
And it's because the human ear is highly attuned to voices. We kind of felt that way. And so both of those, being more social and letting people pour themselves into the music, will be a huge focus for us for the next twelve months.
Speaker 131:04 - 31:17
这是因为人耳对声音极其敏感。我们大致就是这么判断的。所以,这两件事——变得更具社交性,以及让人们把自我更多地注入音乐——会是我们未来十二个月的重点方向。
Speaker 231:17 - 31:19
I love that. Love music videos.
Speaker 231:17 - 31:19
我太喜欢这个了。我很喜欢 music videos。
Speaker 131:20 - 31:27
I love music videos. I don't see enough music videos getting made. I grew up watching music videos, like, on empty.
Speaker 131:20 - 31:27
我喜欢 music videos。我觉得现在被制作出来的 music videos 还远远不够多。我从小就是看 music videos 长大的,几乎是一直在看。
Speaker 231:27 - 31:28
Me too.
Speaker 231:27 - 31:28
我也是。
Speaker 131:28 - 31:56
And there's just a huge difference between a music video, which is heightening the song and telling the story versus background music to put behind whatever YouTube content that I may make. And I love the the former, and I'm, like, much less into the latter. And it's because what we would like to do is pull people into music more than they are now and not just have music be forever a background thing. There's actually a video product in beta in Suno right now, and so people really love it.
Speaker 131:28 - 31:56
而且,music video 这种会强化歌曲、讲述故事的形式,和那种只是给我做的某个 YouTube 内容垫在后面的背景音乐之间,差别实在太大了。我非常喜欢前者,而对后者就没那么感兴趣。因为我们真正想做的是,把人们更深地拉回到音乐里,而不是让音乐永远只是背景。现在 Suno 里其实已经有一个处于 beta(测试)阶段的视频产品了,所以大家真的很喜欢它。
Speaker 231:57 - 32:07
Nice. That's really cool. I can't wait. Why do you think there's so few consumer founders in AI right now? Like, what's up with that?
Speaker 231:57 - 32:07
不错,真的很酷。我已经等不及了。你为什么觉得现在 AI 领域里做 consumer(面向消费者)产品的 founder(创始人)这么少?到底是怎么回事?
Speaker 232:07 - 32:18
Everyone's willing for the enterprise. Like, OpenAI just shut down Sora, which to me was, you know, I mean, I understand the reasons, but why do you think there's so few people building in consumer right now?
Speaker 232:07 - 32:18
大家都愿意做 enterprise(企业市场)。比如,OpenAI 刚刚关掉了 Sora,这对我来说,怎么说呢,我理解其中的原因,但你觉得为什么现在做 consumer(消费端)的人这么少?
Speaker 132:19 - 32:38
I mean, I should ask you that. You're the professional investor. I mean, my theory is like, it's just harder and there are a lot of obvious business problems to solve. And I'm, you know, like I'm happy to have less competition, honestly. Why do you think it is?
Speaker 132:19 - 32:38
我的意思是,这个问题我应该问你。你才是专业投资人。我的理论是,这条路就是更难,而且有很多很明显的商业问题要解决。而且说实话,你知道的,竞争少一点我还挺乐见其成的。你觉得原因是什么?
Speaker 232:38 - 33:11
I think it's very clear to see how AI is going to automate a lot of existing business processes. I think it takes real creativity to dream about how AI can seep into the way that we actually play and create. I think it takes real creativity to see that. And most people, when they think AI music, probably think AI Spotify, which just sounds terrible. And I think it takes a lot of creativity to do what you're doing.
Speaker 232:38 - 33:11
我觉得很容易看清 AI 会如何自动化大量现有的业务流程。但要设想 AI 会如何渗透进我们真正玩乐和创作的方式里,就需要真正的创造力。我觉得看见这一点需要很强的创造力。大多数人一想到 AI music(AI 音乐),可能想到的都是 AI Spotify,而那听起来就很糟。我觉得你们现在在做的事情,需要非常多的创造力。
Speaker 133:11 - 33:34
Well, thank you. Yeah, I think, like, we are much more inspired and motivated by doing something that wasn't possible until today instead of automating or speeding up something that already exists. Not that again, there's a lot of business value in automating and speeding up something that exists. In some sense, it is just more fun to do something that could never have been done before.
Speaker 133:11 - 33:34
嗯,谢谢。对,我觉得,相比自动化或加速某种已经存在的东西,我们更容易从去做某种直到今天之前都不可能做到的事情中获得启发和动力。倒不是说自动化和加速已有事物没有商业价值——那里面当然有很多商业价值。从某种意义上说,去做以前根本做不到的事情,就是更有趣。
Speaker 233:34 - 33:39
Yeah. And, like, what are we gonna do with all our time after all the robots are doing all our work? We're
Speaker 233:34 - 33:39
是啊。而且,等到机器人把我们的工作都做完之后,我们到底要拿自己的时间做什么呢?我们
Speaker 133:40 - 33:44
You're not going to want to doom scroll for now. We're going to want to be productive and fulfilled.
Speaker 133:40 - 33:44
到时候你肯定不会想继续 doom scroll(无休止刷负面信息)了。我们会想要变得有生产力,也更有满足感。
Speaker 233:44 - 34:12
Yeah, exactly. Awesome. Mikey, thank you for sharing everything about your journey to SunO has been so cool. And to see you at the helm of a music company and an active entertainment platform and just defining what the creator layer means in the world of AI, it's been extraordinary to watch your journey since the original days of Harrison and your Discord. And so kudos on what you've done.
Speaker 233:44 - 34:12
对,完全没错。太棒了。Mikey,感谢你分享关于你走到 SunO 这一路上的一切,真的非常酷。看到你掌舵一家音乐公司、一个活跃的娱乐平台,并且不断定义在 AI 世界里 creator layer(创作者层)到底意味着什么,自从 Harrison 和你的 Discord 最早那段时间起,一路看着你的历程,真的非常了不起。所以,向你所取得的一切成就致敬。
Speaker 234:12 - 34:14
Big admirer of you and Sunu.
Speaker 234:12 - 34:14
我非常欣赏你和 Sunu。
Speaker 134:14 - 34:16
Thank you so much. This was a lot of fun.
Speaker 134:14 - 34:16
非常感谢你。这次过程非常有趣。
原文 ↗https://www.youtube.com/watch?v=Jq3BIGz4vXQ
BuildSpeak — 关于本项目BUILT IN PUBLIC · 跟随 builders 而非 influencers