Today we're launching dreaming in Claude Managed Agents as a research preview. Dreaming extends memory by reviewing past sessions to find patterns and help agents self-improve. We're also making outcomes, multiagent orchestration, and webhooks available to developers building with Managed Agents. Together, these updates make agents more capable at handling complex tasks with minimal steering. Build self-improving agents with dreaming Dreaming is a scheduled process that reviews your agent sessions and memory stores, extracts patterns, and curates memories so your agents improve over time. You decide how much control you want: dreaming can update memory automatically, or you can review changes before they land. Dreaming surfaces patterns that a single agent can’t see on its own, including recurring mistakes, workflows that agents converge on, and preferences shared across a team. It also restructures memory so it stays high-signal as it evolves. This is especially useful for long-running work and multiagent orchestration. Together, memory and dreaming form a robust memory system for self-improving agents. Memory lets each agent capture what it learns as it works . Dreaming refines that memory between sessions , pulling shared learnings across agents and keeping it up-to-date. Dreaming is available in Managed Agents on the Claude Platform; developers can request access here . Deliver better outcomes With outcomes , you write a rubric describing what success looks like and the agent works toward it. A separate grader evaluates the output against your criteria in its own context window, so it isn't influenced by the agent's reasoning. When something isn't right, the grader pinpoints what needs to change and the agent takes another pass. Agents do their best work when they know what "good" looks like. For example, a structural framework, a presentation standard, or a set of requirements that need to be met. With outcomes, agents can check their work against that bar and self-correct until the output is good enough, without a human needing to review each attempt. Outcomes is particularly useful for tasks that require attention to detail and exhaustive coverage. It also works for subjective quality, like whether copy matches a brand voice or a design follows visual guidelines. In testing, outcomes improved task success by up to 10 points over a standard prompting loop, with the largest gains on the hardest problems. Outcomes also improved file generation quality, with +8.4% task success on docx and +10.1% on pptx in our internal benchmarks. You can also now define an outcome, let the agent run, and get notified by a webhook when it's done. Handle complex tasks with multiple agents When there is too much work for a single agent to do well, multiagent orchestration lets a lead agent break the job into pieces and delegate each one to a specialist with its own model, prompt, and tools. For example, a lead agent can run an investigation while subagents fan out through deploy history, error logs, metrics, and support tickets. These specialists work in parallel on a shared filesystem and contribute to the lead agent's overall context. The lead agent can check back in with other agents mid-workflow because events are persistent and every agent remembers what it's done. You can also trace every step in the Claude Console : which agent did what, in what order, and why, giving you full visibility into how your task was delegated and executed. What teams are building Teams are using dreaming, outcomes, and multiagent orchestration to ship agents that verify their own work, learn across sessions, and parallelize complex jobs: Harvey uses Managed Agents to coordinate complex legal work like long-form drafting and document creation. With dreaming, their agents remember what they learned between sessions, including filetype workarounds and tool-specific patterns. Completion rates went up ~6x in their tests. Netflix's platform team built an analysis agent that processes logs from hundreds of builds across different sources. With changes that affect thousands of applications, what matters is finding the issues that recur across many of them. Multiagent orchestration lets the agent analyze batches in parallel and surface only the patterns worth acting on. Spiral by Every is using multiagent orchestration and outcomes to power the writing agent behind their new API and CLI. The lead agent runs on Haiku : it fields incoming requests, poses quick follow-up questions when needed, then delegates the drafting to subagents running on Opus . When a user asks for multiple drafts, the subagents run in parallel. Writing quality is Spiral's core value, so they use outcomes to enforce it. Each draft is scored against a rubric of Every's editorial principles and the user's voice, both pulled from memory. Only drafts that clear the bar are returned. Wisedocs built a document quality check agent on Managed Agents, using outcomes to grade each review against their internal guidelines. Reviews now run 50% faster, while staying aligned with their team's standards.
今天,我们在 Claude Managed Agents 中以 research preview(研究预览)的形式推出 dreaming。Dreaming 通过回顾过去的 session(会话)来扩展 memory(记忆),从中发现模式,并帮助 agent 自我改进。我们还将 outcomes、multiagent orchestration(多 agent 编排)和 webhooks 提供给使用 Managed Agents 进行构建的开发者。这些更新结合在一起,让 agent 在只需极少引导的情况下,更有能力处理复杂任务。使用 dreaming 构建可自我改进的 agent。Dreaming 是一个定期执行的流程,它会审查你的 agent session 和 memory store(记忆存储),提取模式,并整理记忆内容,从而让你的 agent 随时间不断改进。你可以决定自己希望掌控到什么程度:dreaming 可以自动更新 memory,也可以先由你审核变更后再落地。Dreaming 能发现单个 agent 自己看不到的模式,包括反复出现的错误、agent 最终收敛出的工作流,以及团队成员共享的偏好。它还会重构 memory,使其在持续演化过程中始终保持高信号。这对长期运行的工作以及 multiagent orchestration 特别有用。Memory 和 dreaming 结合起来,构成了一套健壮的记忆系统,用于打造可自我改进的 agent。Memory 让每个 agent 在工作过程中记录自己学到的内容。Dreaming 则在 session 之间打磨这些 memory,汇总多个 agent 之间共享的经验,并让内容保持最新。Dreaming 已在 Claude Platform 的 Managed Agents 中提供;开发者可以在这里申请访问。带来更好的 outcomes。借助 outcomes,你可以编写一套 rubric(评分标准),描述“成功”应当是什么样子,而 agent 会朝这个目标努力。一个独立的 grader(评估器)会在它自己的 context window(上下文窗口)中,按照你的标准评估输出,因此不会受到 agent 推理过程的影响。当结果不正确时,grader 会准确指出需要修改的地方,agent 随后会再尝试一轮。Agent 在知道“好的结果”是什么样时,表现最佳。例如,一个结构框架、一套演示标准,或一组必须满足的要求。借助 outcomes,agent 可以对照这条标准线检查自己的工作,并自我纠正,直到输出足够好,而无需人类审阅每一次尝试。Outcomes 特别适用于那些要求关注细节和覆盖全面的任务。它也适用于主观质量判断,例如文案是否符合品牌语气,或设计是否遵循视觉规范。在测试中,与标准 prompting loop(提示循环)相比,outcomes 将任务成功率最高提升了 10 个百分点,而且在最困难的问题上提升最明显。Outcomes 还提升了文件生成质量:在我们的内部基准测试中,docx 任务成功率提升了 +8.4%,pptx 提升了 +10.1%。你现在还可以定义一个 outcome,让 agent 运行,并在完成时通过 webhook 收到通知。使用多个 agent 处理复杂任务。当工作量大到单个 agent 难以高质量完成时,multiagent orchestration 允许一个 lead agent(主导 agent)将工作拆分成若干部分,并把每一部分委派给拥有各自 model、prompt 和 tools 的 specialist(专门 agent)。例如,一个 lead agent 可以发起调查,而 subagent(子 agent)则分别深入 deploy history(部署历史)、error logs(错误日志)、metrics(指标)和 support tickets(支持工单)。这些 specialist 会在共享文件系统上并行工作,并将结果纳入 lead agent 的整体 context(上下文)。Lead agent 还可以在工作流中途再次与其他 agent 交互,因为事件是持久化的,而且每个 agent 都记得自己做过什么。你还可以在 Claude Console 中追踪每一步:哪个 agent 做了什么、按什么顺序、为什么这么做,从而完整了解你的任务是如何被委派和执行的。团队正在构建什么。各团队正在使用 dreaming、outcomes 和 multiagent orchestration 来交付这样的 agent:它们能够验证自己的工作、跨 session 学习,并将复杂任务并行化。Harvey 使用 Managed Agents 协调复杂的法律工作,例如长篇起草和文档创建。借助 dreaming,他们的 agent 能在 session 之间记住学到的内容,包括 filetype(文件类型)规避方案和 tool 特定模式。在他们的测试中,完成率提升了约 6 倍。Netflix 的 platform team(平台团队)构建了一个 analysis agent(分析 agent),用于处理来自不同来源、覆盖数百次构建的日志。对于会影响数千个应用的变更而言,关键在于找出其中反复出现的问题。Multiagent orchestration 让该 agent 能并行分析多个批次,并只呈现那些值得采取行动的模式。Spiral by Every 正在使用 multiagent orchestration 和 outcomes,为其新 API 和 CLI 背后的写作 agent 提供支持。Lead agent 运行在 Haiku 上:它接收传入请求,在需要时提出简短的后续问题,然后把起草工作委派给运行在 Opus 上的 subagent。当用户请求多个草稿时,这些 subagent 会并行运行。写作质量是 Spiral 的核心价值,因此他们使用 outcomes 来强制执行这一标准。每份草稿都会根据 Every 的编辑原则和用户语气构成的 rubric 进行评分,这两者都从 memory 中提取。只有达到标准线的草稿才会被返回。Wisedocs 在 Managed Agents 上构建了一个文档质量检查 agent,使用 outcomes 按其内部指南为每次审查打分。现在审查速度提高了 50%,同时仍然与他们团队的标准保持一致。