Andrej Karpathy

I like training large deep neural nets.

1 最新7 累计6 期

每条推文 hover 显示单独 ▶

2026 年 7 月 22 日 · 1 条 →

One pattern I find useful for working with LLMs is a nice long ramble session. Sometimes the LLM needs more bits to understand what you're trying to achieve, but you're too lazy to type them. In these cases I like to lean back, switch to /voice and just ramble for like 10 minutes, total mess, anything goes, full stream of consciousness. Sometimes I declare it up top, something like "switching to speech recognition sorry for any typos...". Sometimes I turn it into a small interview of a few turns. But I find that the LLMs are somehow very good at reconstructing long incoherent rambles and often their echo of your own tangle of thoughts comes out quite a bit cleaner than what you started with. The result is that you improve the mind meld and have to correct things less from that point on.

我发现一个对付 LLMs 很有用的模式，是来一场又长又随意的 ramble（漫谈）session。有时候，LLM 需要更多信息碎片来理解你到底想达成什么，但你又懒得把这些都敲出来。这种情况下，我喜欢往后一靠，切到 /voice，然后就开始一通讲上 10 分钟左右，完全乱来，想到哪说到哪，彻底的 stream of consciousness（意识流）。有时我会先在开头声明一下，比如说“切到 speech recognition 了，抱歉如果有任何 typos……”。有时我会把它变成一个只有几轮的小采访。但我的体会是，LLMs 不知怎么就特别擅长重建这种又长又不连贯的漫谈，而且它们对你那一团乱麻式思路的“回声”，往往会比你最初说出来的东西干净清楚不少。结果就是，你和模型之间的 mind meld（思维对齐）会做得更好，之后需要纠正它的地方也会少很多。

♥ 34.4K↻ 2.8K💬 1.8K7/21 · 16:53x.com ↗

2026 年 6 月 13 日 · 1 条 →

In awe of SpaceX and its story - past, present and the future. You can think about it in 10+ different ways and continue re-blowing your mind in circles. Huge congrats to the team! 🚀

对 SpaceX 及其过去、现在与未来的故事，我充满敬畏。你可以从 10 多种不同的角度去思考它，并一遍又一遍地被震撼到头皮发麻。向团队致以热烈祝贺！🚀

♥ 17.0K↻ 791💬 2576/12 · 17:45x.com ↗

2026 年 6 月 10 日 · 1 条 →

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

这是一次超级令人兴奋的发布——Claude Fable 5 与 Mythos 使用的是同一个底层模型，只是加入了额外的 safeguards（安全防护）。各项 benchmark（基准测试）表现非常出色，几乎在所有项目上都以明显优势达到 SOTA（state of the art，当前最先进水平）；但我还想补充一点：就 *qualitatively*（定性体验）而言，这也是一次足以称得上主版本升级的跃迁式进步（我认为其量级和 Claude 4.5 在 11 月那次相当），尤其是在针对极难问题进行长时间问题求解时表现最突出。你可以交给它比自己以往习惯的更有野心的任务，模型会“懂你的意思”，然后直接开干，而且我从来没有像现在这样强烈地想彻底不再看代码了（但在 prod（生产环境）里千万别这么做！）。这个模型仍然有一些用户会碰到的 quirks（小毛病/怪癖），而且这些 safeguards 在发布时的配置也有点过于容易触发，不过希望之后能随着时间逐步调优。我能感觉到很多事情都在变化：可用的软件正越来越像拧开水龙头一样随取随得。Jevon's paradox（杰文斯悖论）开始生效了，我也感觉自己对软件的需求正在大幅增长。你几乎可以要求任何东西——explainer（讲解器）、visualizer（可视化工具）、dashboard（仪表盘）、按需定制的一次性 app（应用）（比如一个完整的 wandb，但超高度针对、只服务于你的项目），你可以把自己的 test suite（测试套件）提升 10X，自动优化代码，用定制 HTML 输出结果来跑大型研究项目，什么都可以！“Free your mind”（出自 Matrix）。真的非常期待大家会做出些什么！

♥ 20.4K↻ 1.9K💬 9306/9 · 18:10x.com ↗

2026 年 5 月 20 日 · 1 条 →

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

个人近况更新：我已加入 Anthropic。我认为，未来几年处在 LLMs 前沿的发展阶段将尤其具有塑造性。我非常高兴加入这里的团队，并重新回到 R&D（研发）工作中。我依然对教育怀有深厚热情，并计划在适当的时候继续这方面的工作。

♥ 131.9K↻ 10.0K💬 7.2K5/19 · 15:05x.com ↗

2026 年 5 月 12 日 · 1 条 →

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

顺便说一句，这个方法真的很好用：在你的 query（查询）最后让 LLM “structure your response as HTML”，然后在浏览器里查看生成的文件。我也试过让 LLM 把输出做成 slideshows（幻灯片）之类，效果也不错。更广泛地说，imo，人类更偏好用 audio（音频）作为给 AI 的输入，但更偏好用 vision（视觉：图像/动画/video）来接收它们的输出。我们大约有三分之一的大脑，本质上都是一个专门用于视觉处理的海量并行处理器；视觉就像信息进入大脑的 10 车道超级高速公路。随着 AI 进步，我认为我们会看到一种逐步演进、并充分利用这一点的形式：1）raw text（纯文本，阅读困难/费力）2）markdown（粗体、斜体、标题、表格，视觉上稍微轻松一些）<-- 当前默认 3）HTML（底层依然是带代码的 procedural〔程序式〕形式，但在图形、布局，甚至交互性上灵活得多）<-- 还早期，但正在形成新的良好默认 ……4,5,6,… n）interactive neural videos/simulations（交互式神经视频/模拟）。在我看来，沿着这条路径外推下去——虽然技术现在还不存在——终点会是某种由 diffusion neural net（扩散神经网络）直接生成的交互式视频。至于精确/程序式的 “Software 1.0” artifacts（例如 interactive simulations）将如何与 neural artifacts（如 diffusion grids）编织结合起来，仍有很多开放问题；但总体方向上，大致就是最近爆火的那类东西。与此同时，输入端也还有必要且即将到来的改进。单靠 audio、text 或 video 都不够；比如我会想要在屏幕上指点、做手势，类似你和一个真实坐在你电脑屏幕旁边的人交流时会做的那些事。TLDR：人类和 AI 之间在输入/输出层面的“mind meld（心智融合）”仍在持续推进，还有很多工作要做、很多重要进展要取得，远远还没到直接一步跳进 neuralink-esque BCI（脑机接口）之类的时候。就现阶段值得探索的东西来说，一个实用热建议：试着要求它输出 HTML。

♥ 13.0K↻ 1.3K💬 6625/11 · 16:20x.com ↗

2026 年 5 月 1 日 · 2 条 →

This is the the quote I've been citing a lot recently.

这是我最近经常引用的那段话。

♥ 31.4K↻ 2.8K💬 4884/30 · 17:43x.com ↗

Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.

大约一周前在 Sequoia Ascent 2026 的一场炉边谈话。几个要点：我努力推动的第一个主题是，LLM 不只是把原来已有的东西加速而已（例如 coding）。三个“新边界”的例子：1. menugen：一种可以被 LLM 完全吞没的 app，不需要任何经典代码：输入一张图像，输出一张图像，而 LLM 原生就能完成这件事。2. 用 install .md skills 替代 install .sh scripts。比如，要安装一款软件时，如果你可以把安装过程用文字写出来，再说一句“把这个直接给你的 LLM 看”，那为什么还要去写复杂的 Software 1.0 bash script 呢？LLM 是一种高级的英语解释器，能够智能地针对你的具体环境执行安装、内联调试所有问题，等等。3. LLM knowledge bases（知识库）是另一类例子：这类东西用经典代码是*不可能*实现的，因为它涉及对非结构化数据（知识）的 computation，这些数据来自任意来源、采用任意格式，包括纯文本文章等。我之所以强调这些，是因为每一次新的范式变化里，最显而易见的事情总是“把原有东西加速”或“以某种方式改进”，但这里我们看到的是一些功能：它们要么突然看起来甚至不该存在（1、2），要么在此前从根本上就不可能实现（3）。第二个（仍在持续展开的）主题，是试图解释 LLM 中那种 jaggedness（锯齿状、不均匀能力分布）的模式。为什么同一个产物可以同时 1）连贯地重构一个 10 万行的代码库，*并且* 2）告诉你走去 car wash 洗你的车。我之前写过，这种现象的来源与一个领域是否可验证（verifiability）有关；这里我进一步展开，认为它也与 economics（经济学）有关，因为 revenue/TAM 决定了 frontier labs 会在 RL 期间选择把什么内容打包进 training data distributions。你要么处在数据分布之内（跑在 RL circuits 的轨道上）一路飞驰，要么就是拿着 machete 在丛林里越野，至少相对而言是这样。我对这个解释仍然没有 100% 满意，但如果你想在实践中利用 LLM 的力量、同时避开它们的陷阱，就必须持续努力建立一个准确的 LLM 能力模型，而这也引出了……最后一个主题：agent-native economy。也就是把产品和服务分解为 sensors、actuators 和 logic（分别散落在 1.0/2.0/3.0 计算范式中），我们如何让信息对 LLM 尽可能 legible（可读、可解析），关于快速兴起的 agentic engineering 及其技能组合的一些看法、相关的招聘实践，等等，甚至还包括一些提示/梦想：未来也许 fully neural computing 能处理绝大多数 computation，而（经典）CPU coprocessors 只提供部分辅助。

♥ 3.4K↻ 411💬 1444/30 · 17:28x.com ↗