📰 BLOGClaude Blog· Apr 10 2026· 3,762 words · ~19 min

Preparing your security program for AI-accelerated offense

SPACE play←→ sentence

Earlier this week, we announced Project Glasswing—our urgent attempt to put the strong cybersecurity capabilities of our newest frontier model, Claude Mythos Preview, to use for defensive purposes. In the announcement —and the accompanying technical blog post —we described how AI models are rapidly reducing the required resources, time, and skill required to find and exploit vulnerabilities in software. With an eye on the lightning-fast progress of AI, we also noted that it will not be long before models of similar capability levels are widely available. Within the next 24 months, vast numbers of bugs that sat unnoticed in code, possibly for years, will be found by AI models and chained into working exploits. Indeed, it is already the case that publicly available, sub-Mythos-level models can find serious vulnerabilities that traditional reviews have missed for long periods of time. Thankfully, this works both ways: although attackers can use AI to move faster, so can defenders who adopt AI tools to secure themselves. In this post, we offer security recommendations and practical tips based on what our security teams and researchers have observed and learned from using frontier AI models to secure real codebases and systems. We hope security teams and others will find this advice useful as we enter the age of AI-driven cybersecurity. Many of the pieces of advice below are already part of the existing security consensus; we have prioritized them according to which controls we have seen hold and which we have seen degrade. If your organization reports against SOC 2 and ISO 27001, these will map directly onto controls you are already tracking. We’ll update this guidance as we and our Project Glasswing partners continue our cybersecurity work. What to do now 1. Close your patch gap AI models are very effective at recognizing the signatures of known, already-patched vulnerabilities in unpatched systems. Reversing a patch into a working exploit is exactly the kind of mechanical analysis at which these models excel. This means that the window between a patch being published and an exploit becoming available is shrinking. Patch everything on the CISA Known Exploited Vulnerabilities (KEV) catalog immediately. This catalog contains vulnerabilities that are confirmed to be under active exploitation. Anything on this list which is reachable from a network should be treated as an emergency. Use EPSS to prioritize the rest. Exploit Prediction Scoring System (EPSS) provides a daily-updated probability that a given Common Vulnerability and Exposure (CVE) will be exploited in the next 30 days. Patching the KEV list first and then everything above a chosen EPSS threshold will help you turn thousands of open CVEs into a manageable queue. Reduce time-to-patch on internet-exposed systems. We recommend patching internet-facing applications within 24 hours of an exploit becoming available, and within days for other vulnerabilities. Automate patch deployment and reboots where the risk of an automated update causing an outage is acceptable. Manual approval steps add delay, and delay is now the primary risk. Practical tip: Most cloud and OS vendors already ship patch automation; enabling it is often a simple configuration change. For container images and dependency manifests, several open-source scanners run as a single continuous integration step and annotate CVEs with data from the KEV catalogue and EPSS, so prioritization is built in. 2. Prepare to handle a much higher volume of vulnerability reports Over approximately the next two years, the processes you use to receive, prioritize, and fix vulnerabilities (both in your own code and in the software you buy from vendors) will be under far more pressure than they are today. Your Vulnerability Management process should plan for many more patches, from vendors and upstream. Plan for an order-of-magnitude increase in finding volume. Aspects like intake, triage, and remediation tracking need to keep pace with the increasing numbers of vulnerabilities being exposed. If your security meetings are still built around a spreadsheet and a weekly meeting, it’s unlikely that you’ll keep up. It’s worth considering some amount of automation—with, of course, humans in the loop, to assist with the sheer volume here. Check the security of your open-source dependencies. Most software supply chains are mostly open source. Most open-source projects have no service-level agreement or commitment to maintain a high level of security. OpenSSF Scorecard automatically scores every dependency on signals like branch protection, fuzzing coverage, signed releases, and maintainer activity. It runs in CI and helps to identify unmaintained packages. Apply the same expectations to your vendors. Your third-party risk management process should ask suppliers how they are themselves preparing for accelerated exploit timelines and whether they are scanning their own code. ‍ Practical tip: Look into open source software and third-party services that evaluate the reachability of vulnerable code. Build automated processes that continuously deliver new software updates to your IT and production infrastructure, by doing regression testing on updates to gain confidence that you can deploy them quickly. Above we mentioned automation of these processes. There are a number of important ways that AI can assist: Speeding up triage. Triage is a bottleneck, because it requires expert review and classification. A frontier model can deduplicate findings against an existing backlog, use its knowledge of your assets to estimate exposure, and draft remediation tickets where the affected code paths are pre-identified. Check your dependencies for redundancy. Most large codebases accumulate multiple libraries doing the same job (several HTTP clients; several JSON parsers). This gives attackers more opportunity, all for no functional gain on your part. Pointing an LLM at a lockfile and asking which dependencies overlap (and what migration and consolidation would look like) is a one-hour exercise that often pays off. AI upgrade automation. Frontier models are increasingly capable of generating patches to include alongside vulnerability reports. When the report is clear and thorough, maybe even with a proof-of-concept, the model can directly test the patch to confirm that the exploit path is closed. It can also directly automate the process of accepting the upstream patch, validating that the upgrade doesn’t break tests or internal systems. AI vendoring . Some small dependencies will score poorly on the OpenSSF Scorecard—perhaps because they’re not actively maintained. You shouldn’t continue to rely on these; instead, you should consider having an LLM write its own code to reimplement the functionality you actually use. 3. Find bugs before you ship them Prevention is always better than cure. You should assume that bugs that reach production will eventually be found, so your security testing needs to happen well before. Add static analysis and AI-assisted code review to your continuous integration pipeline, and block merges on high-confidence findings. If false positives make this impractical, you should keep the check, but address the tooling. The OWASP Application Security Verification Standard defines what “passing” a test looks like at three different levels of rigor. Add automated penetration testing to your continuous delivery pipeline. You can run the same scanning for staging that attackers will run against your production systems. Secure the build pipeline. An attacker who can inject code between commit and deployment does not need to find a vulnerability. The SLSA security framework provides a graded path: lower levels establish which commit produced which artifact, and higher levels make the build itself verifiable. Adopt Secure by Design practices. CISA’s pledge commitments (multi-factor authentication by default; no default passwords; transparent vulnerability reporting) are a reasonable minimum bar. Prefer memory-safe languages for new code. A large share of severe vulnerabilities are memory-safety bugs that do not occur in Rust, Go, or managed runtimes. CISA, the NSA, and the NCSC have published useful roadmaps . Existing C/C++ code does not need to be rewritten, but new C/C++ code should require a justification. AI assisted rewrites are increasingly viable, as well. Practical tip: Static application security testing (SAST) tooling that runs as a CI action with OWASP Top 10 and language-specific rule sets is widely available, both open-source and built into code hosting platforms (CodeQL on GitHub being the most common starting point). To assess build provenance, OpenSSF publishes a reusable workflow that produces SLSA Level 3 attestations from GitHub Actions; adopting it is significantly less work than the SLSA spec suggests. As before, there are some clear opportunities for accelerating this work with AI: AI vulnerability scanning. The logic here is straightforward: you should scan your own code and systems with the same kind of model an attacker would use, before they do. This approach just requires an isolated agent, a verification step to filter noise, and a path into your existing triage process. You can do this with an LLM today. If you implement one thing from this section, implement this. Patch generation. When SAST or a scanner produces a finding, a frontier model can usually propose a patch for it. This does not remove the need for review, but it changes the developer’s job from “understand the bug and write a fix” to “verify a proposed fix is correct.” The latter is faster. The same approach applies to memory-safe migration: LLMs can port a self-contained C module to Rust with tests; a reviewer can validate the equivalence rather than writing the whole thing from scratch. 4. Find the vulnerabilities already in your code Patching addresses known vulnerabilities in software you depend on. But your own codebase contains unknown ones. Most long-running production code has been reviewed by humans many times, but has never been examined by a frontier model, and that kind of analysis tends to surface new, previously-overlooked issues . Proactively scanning can identify vulnerabilities that are within the reach of modern LLMs before attackers discover them themselves. Prioritize by exposure. Start with code that parses untrusted input, enforces an authentication or authorization decision, or is reachable from the internet. These are the paths where a finding is most likely to matter. Include legacy code. Code that predates current review practices, or whose original authors have moved on, often has the least recent scrutiny. That’s where you have the most to gain from a fresh pass. Budget for remediation. A well-structured model scan of older code typically produces fewer findings than a SAST rollout, but a higher share of them are real. Plan engineering time to fix the bugs. Practical tip: Pick one internet-facing service with few current owners and scan its input handling and auth logic. Run the agent in isolation and add a verification step so you’re acting on confirmed findings. One service done properly is a reasonable basis for estimating what a broader program will cost. 5. Design for breach Attackers will try to get a foothold somewhere. You need to limit what they can reach from there. Mitigations whose value comes from friction—making an attack tedious —rather than a hard barrier (extra pivot hops, rate limits, non-standard ports, SMS-based MFA) are much less effective against an adversary that can grind through those tedious steps. Our recommendations below favor controls that hold even when the attacker has unlimited patience: hardware-bound credentials, expiring tokens, and network paths that do not exist rather than paths that are merely inconvenient. Adopt zero trust architecture. Authenticate and authorize every request between services as if it came from the internet. CISA's Zero Trust Maturity Model and the NCSC's zero trust principles both provide staged adoption paths. Tie access to verified hardware rather than credentials. Production systems and sensitive internal tools should only be reachable from managed employee devices with attested hardware identity, paired with phishing-resistant 2FA (FIDO2 or passkeys). Stolen credentials alone should never be sufficient to gain access. Even calls between production services should be rooted in hardware identity. Isolate services by identity. A compromised build server should not be able to query production databases. A compromised laptop should not be able to reach build infrastructure. Enforce this at the receiving end: every workload should carry its own cryptographic identity, and each service should accept connections only from the specific callers of its policy names. Network segmentation can still reduce blast radius and noise, but it is a backstop. Replace long-lived secrets with short-lived tokens. Static API keys, embedded credentials, and shared service-account passwords are among the first things an attacker with model-assisted code analysis will find. Use short-lived, narrowly-scoped tokens issued by an identity provider. Practical tip: Full zero-trust is a multi-year program, but an identity-aware access proxy puts device-verified, MFA-gated access in front of internal services without having to fundamentally change their architecture. Each major cloud provider offers a native option, and several open-source and commercial alternatives exist for on-premises or multi-cloud environments. For secrets, every major cloud has a managed secrets store; moving the single most widely-shared credential into one and rotating it is a useful forcing function for the rest. 6. Reduce and inventory what you expose This section is based on two important principles. First, you cannot defend systems you don’t know about. Second, the smaller the exposed surface, the less there is to attack. Maintain a current inventory of every internet-facing host, service, and API endpoint in your systems. Attackers can run automated reconnaissance; your inventory should be at least as accurate. Include these systems in your pentests and red-teaming. Decommission unused systems. Legacy services with no clear owner are typically also unpatched. Minimize what each service exposes. Default-deny network ingress and limit API surface area to what is actually required. Practical tip: Internet-wide scan indexes are publicly searchable; querying one for your own IP ranges and domains shows you what an attacker’s reconnaissance sees. For cloud assets, native inventory tools (AWS Config, Azure Resource Graph, GCP Asset Inventory) already exist; the work is in querying them. AI can help directly here, too: Pruning stale code and systems. Identifying unused code is tedious—but as noted above, AI models are good at tedious tasks. A model with read access to a codebase and traffic logs can list endpoints that have no callers and have not received traffic; from there, it can explain what removing each one would affect. Autonomous external red-teaming. Point an AI offensive agent at your own perimeter from the outside, with no credentials and no source access. Then, let it do what an attacker would: work out what is reachable, fingerprint it, and attempt to chain what it finds into a foothold. This kind of automated red-teaming can catch things source scanning doesn’t see: forgotten hosts, exposed management interfaces, default credentials, and misconfigured storage. Run it on the same cadence as your inventory refresh. 7. Shorten your incident response time Exploits can appear within hours of a patch. Response processes that take days are too slow. Here are some ideas for how to reduce your incident response time: Put a model at the front of your alert queue. Every inbound alert should get an automated first-pass investigation before a human sees it. This kind of “triage agent” with read-only access to your Security Information and Event Management (SIEM) platform and a well-scoped set of query tools can direct your attention to the alerts that need human judgement most. Put instrument dwell time and coverage before anything else. These are the two metrics that AI automation has the greatest ability to move; both matter most when exploit windows shorten. Automate the bookkeeping around incidents. During an active incident, models should be taking notes, capturing artifacts, pursuing parallel investigation tracks, and drafting the postmortem and root-cause analysis. On the other hand, humans should be making the containment calls, disclosure calls, and customer-comms calls. Human decision speed during an incident should never be rate-limited on aspects that would be better handed to an AI, like evidence collection or write-ups. Let models drive the detection flywheel. Ingesting threat intelligence , generating candidate detections, hunting for matches, and tuning what fires are all now within reach of frontier models, who can run the process end-to-end. Run a tabletop for five simultaneous incidents. The standard exercise assumes one critical CVE with a working exploit hits on a Monday. Given the improved AI capabilities we’re seeing, this might be unwise. To truly stress-test your responses, you should run the version where five incidents hit in the same week. Map detection coverage against MITRE ATT&CK . ATT&CK provides a standard vocabulary of attacker techniques that most detection tools already use. Knowing which techniques you can detect (and which you can’t), is more useful than a general goal to “improve detection.” You should prioritize coverage for lateral movement and credential access. Establish emergency change procedures in advance. A two-week change-approval cycle for production patches is itself a security risk. The same applies to emergency containment actions (like taking a service offline, rotating a credential, or blocking a network path). You should decide in advance who can authorize these and how fast. Practical tip: Pick one noisy rule with a known-high false positive rate. Wire a frontier model into its alert stream with read-only access to the underlying data, and have it produce a structured disposition for every firing. Measure agreement against a human reviewer for two weeks. If the agreement rate is tolerable, expand to the next rule. It’s not worth trying to automate the whole queue at once. Separately, Atomic Red Team is an open-source library of small, safe tests mapped to ATT&CK techniques; running a handful and checking which ones your existing logging actually detected is a one-afternoon exercise that produces a concrete coverage map. Here are some ways AI can assist with response times: First-pass triage at 100% coverage. A well-scoped triage agent can investigate every alert (where humans might look only at those above a given severity threshold), and produce a structured disposition a human can accept, reject, or escalate. The mechanism that makes this work is giving your model a minimal tool set (query, think, report), letting it choose its own investigation strategy, and measuring the output against operational metrics. Incident scribe and parallel investigator. During an active incident, a model can take contemporaneous notes, timestamp artifacts as they are collected, pursue independent investigation tracks the responder has not gotten to yet, and draft the postmortem from the transcript once the incident closes. This is the least glamorous application of frontier models to security work—but it’s probably the highest-impact one. Proactive hunting against your own environment. The same kind of agent that can find vulnerabilities in source code can hunt for misconfigurations and indicators of compromise across your telemetry. You can run it on the same cadence as your external attack-surface scan. Advice for submitting vulnerability reports to others If you are scanning code—your own dependencies, open-source projects, or vendor products—and reporting findings upstream, the quality of those reports determines whether anyone acts on them. Open-source maintainers are already receiving large volumes of low-quality automated reports, and many have started ignoring anything that looks AI-generated. Adding to that volume without adding signal makes the problem worse for everyone, including you. A report should be sent only when a human has verified it and is willing to put their name on it. Concretely: State the bug and its impact in plain language. A maintainer should be able to understand what is wrong and why it matters from the first paragraph, without running anything. Walk through the code path. Show where the input enters, where it is mishandled, and where the consequence occurs. This is the part that distinguishes a real finding from a pattern match. Provide a working reproduction. A proof-of-concept the maintainer can run, or a test case that fails, is more credible than any amount of explanation. Include a proposed patch you would accept if you were the maintainer. A patch demonstrates that the reporter understands the codebase well enough to fix the problem in a way that fits the project’s conventions. Disclose AI involvement upfront. If a model found the bug or drafted the report, say so in the first line. Maintainers will find out anyway; concealing it costs more credibility than disclosing it. Defer to the maintainer's judgment. If they decline the report, you should make peace with that. The goodwill from being easy to work with is worth more than winning an argument over one bug. Practical tip: A useful self-check before sending a vulnerability report is to close the editor and explain the bug from memory. If you cannot describe what goes wrong without referring back to the model output, you do not understand it well enough to report it. If you don’t have a security team Most of the above advice assumes that your organization has a dedicated security function. If you are a small organization, a solo developer, or an open-source maintainer, the same risks apply but the actions are simpler: Turn on automatic updates for your operating system, browser, and every application that offers it. This is the single most effective action available and requires no ongoing effort. Prefer managed services over self-hosting. Letting a provider with a security team run the database, authentication, and email shifts the patching burden to them. The cost of a managed service like this is almost always lower than the cost of one incident. Use passkeys or hardware security keys on every account that supports them. SMS codes can be intercepted and passwords get reused; a hardware key cannot be phished. Enable the free security tooling on your code host. GitHub's Dependabot, secret scanning, and CodeQL are free for public repositories and catch a meaningful share of what enterprise tools catch. Enabling them takes minutes. If you maintain an open-source project, publish a SECURITY.md stating who to contact and what to expect when they’re contacted. AI-assisted scanning means you will receive more vulnerability reports than before. Some will be valuable; some will be automated noise. A clear intake process helps you tell them apart, and signals to good-faith reporters that their effort will not be wasted. Topic Reference Patch prioritization CISA KEV Catalog , FIRST EPSS , CISA BOD 22-01 Baseline controls ACSC Essential Eight , CISA CPGs , CIS Controls v8 , NCSC 10 Steps Secure development NIST SSDF (SP 800-218) , OWASP ASVS , OWASP SAMM , CISA Secure by Design Memory safety CISA/NSA Memory Safe Roadmaps Supply chain & build integrity SLSA , OpenSSF Scorecards , CISA SBOM resources , NIST SP 800-161 Zero trust CISA Zero Trust Maturity Model , NIST SP 800-207 , NCSC Zero Trust Principles Detection & response MITRE ATT&CK , MITRE D3FEND Program framework NIST Cybersecurity Framework 2.0 , NCSC Cyber Assessment Framework Acknowledgements This article was written by members of Anthropic’s Security Engineering and Research teams, including Donny Greenberg, Jason Clinton, Michael Moore, Abel Ribbink, and Jackie Bow, with contributions from Jannet Park, Gabby Curtis, and Stuart Ritchie.

本周早些时候，我们宣布了 Project Glasswing——这是我们一项紧急尝试，旨在将我们最新前沿模型 Claude Mythos Preview 的强大网络安全能力用于防御目的。在公告以及配套的技术博客文章中，我们说明了 AI 模型如何正在迅速降低发现和利用软件漏洞所需的资源、时间与技能门槛。着眼于 AI 闪电般的进展，我们也指出，具备类似能力水平的模型很快就会广泛可得。在未来 24 个月内，大量可能已在代码中潜伏多年、未被注意到的 bug，将被 AI 模型发现并串联成可实际运行的 exploit（利用程序）。事实上，这种情况已经出现：公开可用、能力低于 Mythos 级别的模型，已经能够发现一些传统审查长期遗漏的严重漏洞。值得庆幸的是，这是一体两面的：攻击者可以用 AI 提速，采用 AI 工具保护自身的防御者也同样可以。在这篇文章中，我们基于安全团队和研究人员使用前沿 AI 模型保护真实代码库与系统时的观察和经验，提供一些安全建议与实用提示。我们希望，随着我们进入 AI 驱动的网络安全时代，安全团队及其他相关人员会发现这些建议有用。下面的许多建议，已经是现有安全共识的一部分；我们根据哪些控制措施被我们观察到依然有效、哪些已经开始退化，来进行了优先级排序。如果你的组织按照 SOC 2 和 ISO 27001 进行报告，那么这些建议将直接映射到你已经在跟踪的控制项上。随着我们以及 Project Glasswing 合作伙伴继续推进网络安全工作，我们也会更新这份指导。现在该做什么 1. 缩小你的 patch（补丁）缺口 AI 模型非常擅长识别未打补丁系统中已知、且已有补丁漏洞的特征。把一个 patch 逆向还原成可运行的 exploit，正是这类模型最擅长的机械化分析。这意味着，从补丁发布到 exploit 出现之间的窗口正在缩短。请立即修补 CISA Known Exploited Vulnerabilities (KEV) catalog 中列出的所有内容。这个目录包含已确认正在被活跃利用的漏洞。凡是在这份列表上且可从网络到达的项目，都应被视为紧急事件。其余部分则用 EPSS 来确定优先级。Exploit Prediction Scoring System (EPSS) 每日更新，给出某个 Common Vulnerability and Exposure (CVE) 在未来 30 天内被利用的概率。先修补 KEV 列表，再修补所有高于你设定 EPSS 阈值的项，有助于你把成千上万个待处理 CVE 变成一个可管理的队列。缩短 internet-exposed（暴露在互联网）的系统的补丁部署时间。我们建议，在 exploit 可用后的 24 小时内修补面向互联网的应用，其他漏洞则应在数日内修补。在自动更新导致故障的风险可以接受的情况下，自动化补丁部署与重启。人工审批步骤会增加延迟，而延迟现在已是主要风险。实用提示：大多数 cloud 和 OS 厂商已经提供 patch 自动化；启用它通常只是一次简单的配置变更。对于 container image 和 dependency manifest，一些开源扫描器可以作为单一步骤运行在 continuous integration 中，并结合 KEV 目录和 EPSS 数据为 CVE 添加注释，因此优先级机制是内置的。2. 准备处理大幅增加的漏洞报告量在接下来的大约两年里，你用于接收、确定优先级和修复漏洞的流程（无论是你自己的代码，还是从 vendor 采购的软件）将承受远高于今天的压力。你的 Vulnerability Management（漏洞管理）流程应该为来自 vendor 和上游的大量新增 patch 做好准备。要按数量级增长来规划发现量。像 intake（接收）、triage（分诊）和 remediation tracking（修复跟踪）这样的环节，需要跟上暴露出来的漏洞数量增长。如果你的安全会议仍然围绕一张 spreadsheet 和每周一次会议来运作，那么你很可能跟不上节奏。值得考虑引入一定程度的自动化——当然，要保持 humans in the loop（人类参与闭环）——以应对这里庞大的处理量。检查你的 open-source dependencies（开源依赖）的安全性。大多数软件供应链主要由 open source 构成。而大多数 open-source 项目并没有 service-level agreement，也没有维持高安全水平的承诺。OpenSSF Scorecard 会根据 branch protection、fuzzing coverage、signed releases 和 maintainer activity 等信号，自动为每个 dependency 打分。它可以在 CI 中运行，并帮助识别无人维护的包。对你的 vendors 也应施加同样的要求。你的 third-party risk management（第三方风险管理）流程应询问供应商：他们自己如何为 exploit 时间线加速做准备，以及他们是否正在扫描自己的代码。实用提示：可以研究一些开源软件和第三方服务，它们可以评估易受攻击代码的 reachability（可达性）。通过对更新进行 regression testing（回归测试）来建立信心，从而构建自动化流程，持续将新软件更新交付到你的 IT 和生产基础设施中，让你能够快速部署它们。上文我们提到了这些流程的自动化。AI 可以在若干重要方面提供帮助：加速 triage。Triage 是一个瓶颈，因为它需要专家审查和分类。前沿模型可以针对现有积压进行去重，利用它对你资产的了解估计暴露程度，并起草 remediation ticket，同时预先标出受影响的代码路径。检查依赖中的冗余。大多数大型代码库都会逐渐积累多个完成相同任务的库（例如多个 HTTP client；多个 JSON parser）。这会给攻击者提供更多机会，而对你并无功能收益。把一个 LLM 指向 lockfile，并询问哪些 dependencies 存在重叠，以及迁移与整合会是什么样子，这通常只需一小时，却往往很划算。AI 升级自动化。前沿模型越来越能够生成可随漏洞报告一同提交的 patch。当报告清晰且完整，甚至附带 proof-of-concept 时，模型可以直接测试 patch，确认 exploit 路径已被关闭。它还可以直接自动化接受上游 patch 的过程，并验证升级不会破坏测试或内部系统。AI vendoring。一些小型 dependencies 可能在 OpenSSF Scorecard 上得分很低——也许是因为它们并未被积极维护。你不应继续依赖这些包；相反，你应考虑让 LLM 编写自己的代码，重新实现你实际使用的功能。3. 在发布前找到 bug 预防永远胜于治疗。你应假定，进入生产环境的 bug 最终都会被发现，因此你的安全测试需要更早发生。把 static analysis（静态分析）和 AI-assisted code review（AI 辅助代码审查）加入 continuous integration pipeline，并在高置信度发现上阻止 merge。如果 false positive（误报）让这件事不切实际，你仍应保留这项检查，但应解决工具问题。OWASP Application Security Verification Standard 定义了在三种不同严格程度下，“通过”一项测试意味着什么。把自动化 penetration testing（渗透测试）加入你的 continuous delivery pipeline。你可以对 staging 运行与攻击者将对你的生产系统运行的同类扫描。保护 build pipeline。一个能在 commit 与 deployment 之间注入代码的攻击者，并不需要去发现漏洞。SLSA 安全框架提供了一条分级路径：较低级别建立“哪个 commit 产生了哪个 artifact”，较高级别则使构建过程本身可验证。采用 Secure by Design 实践。CISA 的 pledge commitments（默认启用 multi-factor authentication；无默认密码；透明的漏洞报告）是一个合理的最低门槛。新代码优先使用 memory-safe languages（内存安全语言）。大量严重漏洞都属于 memory-safety bug，而这类问题不会出现在 Rust、Go 或托管运行时中。CISA、NSA 和 NCSC 都已发布有用的 roadmap。现有的 C/C++ 代码不必重写，但新的 C/C++ 代码应要求给出理由。AI 辅助重写也正变得越来越可行。实用提示：能够作为 CI action 运行、并提供 OWASP Top 10 与语言特定规则集的 static application security testing (SAST) 工具已经广泛可用，既有开源版本，也有集成在代码托管平台中的版本（GitHub 上最常见的起点是 CodeQL）。为了评估 build provenance（构建来源），OpenSSF 发布了一个可复用 workflow，可从 GitHub Actions 生成 SLSA Level 3 attestations；采用它所需工作量远小于 SLSA 规范字面上给人的印象。和前面一样，AI 也有一些明显的加速机会：AI 漏洞扫描。这里的逻辑很直接：你应在攻击者之前，用他们会使用的同类模型来扫描你自己的代码和系统。这种方法只需要一个隔离的 agent、一个用于过滤噪音的验证步骤，以及一条接入你现有 triage 流程的路径。今天你就可以用 LLM 做到这一点。如果你只从本节落实一件事，那就落实这一件。Patch 生成。当 SAST 或扫描器产出一条发现时，前沿模型通常都能为其提出 patch。这并不能替代审查，但它把开发者的工作从“理解 bug 并编写修复”变成了“验证所提修复是否正确”。后者更快。相同的方法也适用于 memory-safe 迁移：LLM 可以把一个自包含的 C 模块连同测试一起移植到 Rust；审查者只需验证等价性，而不是从零开始重写全部内容。4. 找出你代码中已经存在的漏洞打 patch 解决的是你所依赖软件中的已知漏洞。但你自己的代码库中还包含未知漏洞。大多数长期运行的生产代码都被人类审查过很多次，却从未被前沿模型检查过，而这种分析往往会暴露出新的、此前被忽视的问题。主动扫描可以在攻击者自己发现之前，识别出现代 LLM 能力范围内的漏洞。按暴露程度确定优先级。首先从解析不可信输入、执行身份认证或授权决策、或可从互联网到达的代码开始。这些路径上的发现最有可能真正重要。把 legacy code（遗留代码）也纳入其中。那些早于当前审查实践、或原作者早已离开的代码，往往是最近最少被审视的地方。你从一次全新检查中获益最大。为 remediation 预留预算。对旧代码进行结构良好的模型扫描，通常产生的发现数量会少于一次 SAST 推广，但其中真正有效的比例更高。要为修复这些 bug 预留工程时间。实用提示：挑选一个当前负责人数较少的 internet-facing service，扫描其输入处理和 auth 逻辑。让 agent 在隔离环境中运行，并增加验证步骤，以确保你是基于已确认的发现采取行动。把一个服务真正做好，就是估算更大范围计划成本的合理基础。5. 按“终将失陷”来设计攻击者会试图在某处获得一个 foothold（立足点）。你需要限制他们从那里还能接触到什么。那些价值来自“制造摩擦”——让攻击变得繁琐——而不是来自硬性屏障的缓解措施（额外的 pivot hops、rate limits、非标准端口、基于 SMS 的 MFA），对于一个可以机械地熬过这些繁琐步骤的对手来说，效果会差得多。下面的建议优先考虑即便攻击者拥有无限耐心也依然有效的控制措施：与硬件绑定的凭据、会过期的 token，以及根本不存在的网络路径，而不是仅仅“不太方便”的路径。采用 zero trust architecture。像对待来自互联网的请求一样，对服务之间的每一个请求都进行认证和授权。CISA 的 Zero Trust Maturity Model 和 NCSC 的 zero trust principles 都提供了分阶段采纳路径。将访问绑定到已验证硬件，而不是仅绑定到凭据。生产系统和敏感内部工具应只允许来自受管员工设备、且具有经过证明的硬件身份的访问，并结合抗钓鱼的 2FA（FIDO2 或 passkeys）。仅有被盗凭据绝不应足以获得访问权。即便是生产服务之间的调用，也应建立在硬件身份之上。按身份隔离服务。被攻陷的 build server 不应能查询生产数据库。被攻陷的 laptop 不应能访问 build 基础设施。要在接收端强制执行这一点：每个 workload 都应携带自己的 cryptographic identity（密码学身份），而每个服务应只接受来自其策略名称所指定调用方的连接。网络分段仍然可以减少爆炸半径和噪音，但它只是最后一道保险。用短期 token 取代长期 secret。静态 API key、嵌入式凭据和共享 service-account 密码，是具备模型辅助代码分析能力的攻击者最先会发现的东西之一。应使用由 identity provider 签发的、短期且权限范围狭窄的 token。实用提示：完整的 zero-trust 是一个多年项目，但 identity-aware access proxy 可以在不从根本上改变内部服务架构的情况下，在其前面加上一层基于设备验证、并以 MFA 为门槛的访问控制。每个主流 cloud provider 都提供原生选项，也存在若干适用于 on-premises 或 multi-cloud 环境的开源和商业替代方案。对于 secrets，每个主流 cloud 都有托管的 secrets store；把那个共享最广的凭据迁移进去并轮换它，会成为推动其余工作开展的一个有用抓手。6. 缩减并清点你暴露出去的内容本节基于两条重要原则。第一，你无法防御自己并不知道存在的系统。第二，暴露面越小，可被攻击的东西就越少。维护一份当前清单，列出你系统中每一个对外网开放的 host、service 和 API endpoint。攻击者可以进行自动化侦察；你的清单至少应与他们看到的一样准确。把这些系统纳入你的 pentest 和 red-teaming。下线未使用系统。没有明确负责人的 legacy service，通常也没有及时打 patch。尽量减少每个服务暴露的内容。对网络入口默认拒绝，并将 API surface area 限制在实际所需范围内。实用提示：可对全互联网扫描建立索引的服务是公开可搜索的；用你自己的 IP 范围和域名去查询它们，就能看到攻击者侦察时会看到什么。对于 cloud 资产，原生 inventory 工具（AWS Config、Azure Resource Graph、GCP Asset Inventory）已经存在；真正的工作在于查询它们。AI 在这里也能直接帮忙：清理陈旧代码和系统。识别未使用代码很枯燥——但正如上文所说，AI 模型很擅长枯燥任务。一个对代码库和流量日志有读取权限的模型，可以列出没有调用方、且没有收到流量的 endpoint；从那里，它还能解释删除每一个 endpoint 会影响什么。自主外部 red-teaming。把一个 AI offensive agent 放到你自己的外围边界之外，不给它任何凭据，也不给它源码访问。然后让它像攻击者那样行动：弄清哪些东西可达，为其做 fingerprinting，并尝试把发现的问题串联成一个 foothold。这种自动化 red-teaming 可以捕捉源码扫描看不到的问题：被遗忘的主机、暴露的管理接口、默认凭据以及配置错误的存储。以与你刷新 inventory 相同的频率运行它。7. 缩短你的 incident response 时间 exploit 可能会在 patch 发布后的数小时内出现。需要几天才能完成的响应流程已经太慢了。下面是一些缩短 incident response 时间的思路：把模型放在你的告警队列最前端。每一条入站告警，在人类看到之前，都应先经过自动化的首次调查。一个对 Security Information and Event Management (SIEM) 平台拥有只读访问权限，并配备一组范围明确查询工具的“triage agent”，可以把你的注意力引向最需要人类判断的告警。把 instrument dwell time 和 coverage 放在其他一切之前。这是 AI 自动化最有能力推动改善的两个指标；而当 exploit 窗口缩短时，这两者也最重要。把 incident 周边的文书工作自动化。在事件处理中，模型应负责记笔记、保存 artifact、并行推进多条调查线索，以及起草 postmortem 和 root-cause analysis。相对地，人类应负责做 containment、disclosure 和 customer-comms 决策。在 incident 期间，人类的决策速度绝不应被那些更适合交给 AI 的事情——例如证据收集或写总结——所限制。让模型驱动 detection flywheel。摄取 threat intelligence、生成候选检测规则、搜索匹配项并调优告警触发，这些现在都已进入前沿模型的能力范围，它们可以端到端地跑完整个流程。为五起同时发生的 incident 做一次 tabletop 演练。标准演练通常假设：周一出现一个有可用 exploit 的关键 CVE。考虑到我们看到的 AI 能力提升，这样的假设可能并不明智。若要真正压力测试你的响应能力，你应演练同一周内发生五起事件的版本。把 detection coverage 映射到 MITRE ATT&CK。ATT&CK 提供了一套标准化的攻击者技术词汇，大多数检测工具都已在使用。知道你能检测哪些技术、不能检测哪些技术，比一个笼统的“提升检测能力”目标更有用。你应优先覆盖 lateral movement 和 credential access。提前建立紧急变更流程。针对生产 patch 的两周审批周期本身就是一种安全风险。紧急遏制措施（例如让某个服务下线、轮换凭据或封锁某条网络路径）也是同样。你应提前决定谁有权批准这些行动，以及能有多快。实用提示：选一条已知 false positive 率很高、而且很嘈杂的规则。将一个前沿模型接入其告警流，并赋予其对底层数据的只读权限，让它为每次触发都产出结构化 disposition（处置结论）。用两周时间衡量它与人工审查者的一致率。如果一致率可接受，再扩展到下一条规则。不值得一开始就尝试自动化整个队列。另一个独立建议是，Atomic Red Team 是一个开源的小型安全测试库，映射到 ATT&CK 技术；运行其中少量测试，并检查现有日志实际检测到了哪些，只需一个下午，就能产出一张具体的覆盖图。以下是 AI 可以帮助缩短响应时间的一些方式：100% 覆盖率的首次 triage。一个范围界定良好的 triage agent 可以调查每一条告警（而人类可能只查看高于某个严重程度阈值的那些），并生成结构化 disposition，供人类接受、拒绝或升级处理。使这一机制奏效的方法，是给模型一组最小工具集（query、think、report），让它自行选择调查策略，并依据运营指标衡量输出。Incident 记录员与并行调查员。在活动 incident 期间，模型可以同步做笔记、在 artifact 被收集时为其加时间戳、推进响应人员尚未来得及处理的独立调查线索，并在 incident 结束后基于记录起草 postmortem。这可能是前沿模型应用于安全工作中最不“炫”的一种方式——但它很可能也是影响最大的一种。针对自身环境的主动 hunting。能够在源代码中发现漏洞的那类 agent，也可以在你的遥测数据中寻找配置错误和 compromise indicators（入侵指标）。你可以按与外部攻击面扫描相同的频率运行它。关于向他人提交漏洞报告的建议如果你正在扫描代码——无论是你自己的 dependencies、open-source 项目，还是 vendor 产品——并向上游报告发现，那么这些报告的质量将决定是否有人采取行动。Open-source maintainer 已经在接收大量低质量的自动化报告，其中许多人已开始忽略任何看起来像 AI 生成的内容。如果在增加报告量的同时没有增加有效信号，只会让所有人的问题变得更糟，包括你自己。只有在人工已经验证，并愿意为报告署名时，才应发送报告。具体来说：用直白语言说明 bug 及其影响。Maintainer 应能在第一段就明白哪里出了问题，以及为什么重要，而无需运行任何东西。逐步走通代码路径。展示输入从哪里进入、在哪里被错误处理、以及后果在哪里发生。这一点能把真实发现与模式匹配区分开来。提供可工作的复现。一个 maintainer 可以运行的 proof-of-concept，或一个会失败的测试用例，比再多解释都更有说服力。附上一个如果你是 maintainer 也会接受的拟议 patch。Patch 能证明报告者对代码库的理解足以按项目惯例修复这个问题。提前披露 AI 参与。如果是模型发现了 bug 或起草了报告，请在第一行就说明。Maintainer 终归会发现；隐瞒它损失的可信度，比披露它要大。尊重 maintainer 的判断。如果他们拒绝该报告，你应接受这一点。易于合作所带来的善意，比为一个 bug 争论输赢更有价值。实用提示：在发送漏洞报告之前，一个很有用的自检方法是关掉编辑器，凭记忆解释这个 bug。如果你无法在不回看模型输出的情况下描述清楚出了什么问题，那么你对它的理解还不够，不应提交报告。如果你没有安全团队上述大多数建议都假设你的组织拥有专门的安全职能。如果你是小型组织、独立开发者，或 open-source maintainer，同样的风险依然存在，但可采取的行动更简单：为你的操作系统、浏览器，以及每一个支持自动更新的应用启用自动更新。这是最有效的单项行动，而且不需要持续投入。优先使用 managed services，而不是 self-hosting。让拥有安全团队的 provider 来运行数据库、认证和邮件服务，会把打 patch 的负担转移给他们。像这样的 managed service，其成本几乎总是低于一次 incident 的代价。为所有支持 passkeys 或 hardware security keys 的账户启用它们。SMS 验证码可能被拦截，密码会被重复使用；硬件密钥则无法被钓鱼。启用你代码托管平台提供的免费安全工具。GitHub 的 Dependabot、secret scanning 和 CodeQL 对公开仓库免费，并能捕获企业工具所能捕获问题中的相当一部分。启用它们只需几分钟。如果你维护一个 open-source 项目，请发布一个 SECURITY.md，说明该联系谁，以及联系后会发生什么。AI 辅助扫描意味着你收到的漏洞报告会比以前更多。有些很有价值，有些只是自动化噪音。清晰的 intake 流程有助于你区分两者，也能向善意报告者表明，他们的努力不会被浪费。主题参考 Patch 优先级：CISA KEV Catalog，FIRST EPSS，CISA BOD 22-01 基线控制：ACSC Essential Eight，CISA CPGs，CIS Controls v8，NCSC 10 Steps 安全开发：NIST SSDF (SP 800-218)，OWASP ASVS，OWASP SAMM，CISA Secure by Design 内存安全：CISA/NSA Memory Safe Roadmaps 供应链与构建完整性：SLSA，OpenSSF Scorecards，CISA SBOM resources，NIST SP 800-161 Zero trust：CISA Zero Trust Maturity Model，NIST SP 800-207，NCSC Zero Trust Principles 检测与响应：MITRE ATT&CK，MITRE D3FEND 项目框架：NIST Cybersecurity Framework 2.0，NCSC Cyber Assessment Framework 致谢本文由 Anthropic Security Engineering 和 Research 团队成员撰写，包括 Donny Greenberg、Jason Clinton、Michael Moore、Abel Ribbink 和 Jackie Bow，Jannet Park、Gabby Curtis 与 Stuart Ritchie 也作出了贡献。

原文 ↗https://claude.com/blog/preparing-your-security-program-for-ai-accelerated-offense

📰 BLOGClaude Blog· Apr 10 2026· 3,762 words · ~19 min

Preparing your security program for AI-accelerated offense

SPACE play←→ sentence

原文 ↗https://claude.com/blog/preparing-your-security-program-for-ai-accelerated-offense