“We got a tool to perform poorly” is the lowest form of science and journalism imo and is only relevant when the tool is, in fact, extremely useful
在我看来,“我们让一个工具表现得很差”是最低级的 science(科学)和 journalism(新闻报道)形式,而且只有当这个工具事实上极其有用时,这种说法才有相关性
Knives Can Blind You When You Stick Them in Your Eye submitted 17 Apr 2026
把刀插进自己眼睛里会把你弄瞎,提交于 2026 年 4 月 17 日
mythos obviously looks incredibly capable and im psyched to use it also if you're panicking about it: benchmarks don't measure model capability alone they measure model capability after a human has done the work of finding a prompt that lets the model’s capability appear that work is non-trivial, and requires skilled expert humans doing something that looks very much like a job
mythos 显然看起来能力强得惊人,我也很期待用它;另外,如果你正因此感到恐慌:benchmarks(基准测试)衡量的不只是 model(模型)能力本身,它们衡量的是在人类已经完成了寻找一个 prompt(提示词)、从而让模型的能力得以显现之后的模型能力;而这项工作并不简单,需要有技能的专家级人类去做某种看起来非常像一份工作的事情