Big week for all my Google friends, I can assure you all they’ve been cooking
对我所有在 Google 的朋友们来说,这是重要的一周;我可以向你们保证,他们一直在埋头做事
Watch the full talk here: Check out my interview with Alex: Get the written takeaways:
在这里观看完整演讲:查看我与 Alex 的采访:获取书面版要点总结:
5. Build evals based on real traces + feedback Read actual customer conversations with your model to build product sense, and use Claude to synthesize feedback into top themes. Don't run "eval theater" on generic academic benchmarks. As models get smarter, evals need to get harder to keep producing signal.
5. 基于真实 trace(轨迹记录)+ feedback(反馈)来构建 evals(评估)阅读客户与您的 model(模型)的真实对话,以培养产品直觉,并使用 Claude 将反馈综合为最主要的主题。不要在通用的学术基准上搞“eval theater(评估作秀)”。随着 model(模型)变得更聪明,evals(评估)也需要变得更难,才能持续产生有效信号。