🐦 X · 动态Matt Turck @mattturck· 2026 年 5 月 19 日· 105 词 · 约 1 分钟

Matt Turck · @mattturck

SPACE 播放 / 暂停←→ 上一句 / 下一句

Genuinely impressive release by Google today (remember when they were behind?) Gemini 3.5 Flash perf: * Building on prior strengths (83.6% of MMMU-Pro for multimodal), * big jump on agentic coding (76.2% on Terminal-Bench for agentic coding and 56.5% on Toolathon for real world tasks) * progress and expert tasks (57.9% on Finance Agent 2... we are cooked) * leading scores across SWE-Bench, OSWorld etc. (also, elegant to bold the top scores in the chart below even if when it's not Google leading) Ofc, just benchmarks, and also not cheap (~$9/M output), but Google is cookin'... we are all so spoiled to have the 3 labs compete

Google 今天发布的东西确实令人印象深刻（还记得他们以前还落后吗？）Gemini 3.5 Flash 的性能：* 在既有优势上继续提升（多模态方面，MMMU-Pro 达到 83.6%），* agentic coding（代理式编程）有大幅跃升（在 Terminal-Bench 上做 agentic coding 达到 76.2%，在 Toolathon 上做现实世界任务达到 56.5%）* 在进展与专家级任务上也有提升（Finance Agent 2 上达到 57.9%……我们完了）* 在 SWE-Bench、OSWorld 等基准上也拿到领先分数（而且很巧妙的是，下面图表里把最高分都加粗了，即使领先的不是 Google）当然，这些终究只是 benchmarks（基准测试），而且价格也不便宜（输出约 ~$9/M），但 Google 确实火力全开……我们这些人真是太幸福了，能看到这 3 家实验室彼此竞争

♥ 88↻ 14💬 14x.com ↗

Breaking: Anthropic attains sainthood, officially annointed by AI Jesus

突发：Anthropic 荣登圣坛，已获 AI Jesus 官方加冕为圣

♥ 40↻ 3💬 1x.com ↗

原文 ↗https://x.com/mattturck