Almost all AI model and agent progress is downstream from evals. Open weights post training for specific domains comes down to evals. Agent improvements in the applied AI layer is all about evals. Agentic enterprise deployments that actually can augment work is all about evals. It’s all evals. This will become a core competency of any enterprise in the future. The companies that are able to best understand their own (and/or customers) workflows and how well agents participate in that work will be in the best position to actually drive real automation.
几乎所有 AI model 和 agent 的进展,都是由 evals 驱动的下游结果。针对特定领域的 open weights 后训练,归根结底也取决于 evals。应用层 AI 中 agent 的改进,核心也全是 evals。真正能够增强工作的 agentic enterprise 部署,同样全都离不开 evals。一切都是 evals。未来,这将成为任何 enterprise 的核心能力。那些最能理解自身(和/或客户)工作流,以及 agent 在这些工作中参与效果的公司,将最有机会真正推动实际的自动化。