What does "agentic AI" mean in the context of penetration testing?

Agentic AI doesn’t just match patterns. It reasons about what to do, adapts when an initial approach fails, and chains findings together the way a skilled human researcher would. That’s different from automated scanners or LLMs used for report writing, and it’s the capability gap that matters most right now. For a deeper look, check out what agentic AI means in penetration testing .

Is AI replacing human security researchers?

Not anytime soon, and not entirely. AI handles speed and scale. Humans handle the creative, context-dependent work: multi-system attack chains, business logic vulnerabilities, zero-day research. The programs that combine both consistently outperform either approach alone.

How do I know if a vendor is actually offering agentic AI or just rebadging a scanner?

Ask this: can the system adapt when its first attack approach fails? Scanners retry the same pattern. Agentic AI reasons through why it failed and changes strategy. That single question cuts through most of the AI marketing noise in the market right now.

What is Mythos and why does it matter for enterprise security?

Mythos is Anthropic’s frontier AI model, restricted from public release because of its autonomous ability to discover and exploit vulnerabilities at scale, including zero-days that survived decades of human security review. It’s the clearest signal yet that AI-enabled adversaries aren’t theoretical. The access curve for offensive AI has already collapsed.

Jun 5, 2026

What I Told Security Leaders at Gartner SRM 2026

At Gartner SRM 2026 this week I gave a talk called “Cutting Through AI Noise: Defending Against Machine-Speed Cyber Adversaries.” The room was full of security leaders who’ve been through enough hype cycles to be skeptical of seeing AI on the label. That skepticism is warranted, and I built the session around it. Here’s what […]

On average, only 32% of enterprise attack surfaces get tested in a given year.
That 68% untested gap is where adversaries are looking.
Human + AI is the only model that works.
64% of security leaders prefer agent-led testing with human oversight, because AI can't yet replicate the creative reasoning required to chain vulnerabilities across systems.
The question to ask every vendor: "Is your AI agentic, and can it adapt when the first approach fails?"

What Metasploit Can Tell Us About Today’s AI Threat

I opened with 2004 because it’s the closest analogy to where we are right now.

Before Metasploit went open source, running a professional-grade exploit required nation-state resources or years of specialist skill. After it launched, a curious teenager with a laptop could do it. The threat didn’t change, the access curve collapsed.

Mythos did exactly the same thing for AI-driven exploitation. Before Mythos, finding zero-day vulnerabilities at scale still required elite red teams and weeks of work. Then Anthropic’s model autonomously discovered thousands of zero-days across every major operating system and browser, including bugs that survived decades of human review. Within weeks, OpenAI reported GPT-5.5 reached comparable performance on the same security standards.

Here’s the practical implication: the organizations that adapted fastest after 2004—the ones that built continuous security validation into their programs rather than reacting to each new exploit—came out ahead. Mythos is the same inflection point. The only question is whether you respond before or after your adversaries do.

The Penetration Testing Coverage Gap

I put a slide up in the session that made a few people visibly uncomfortable. According to Omdia’s 2026 State of Agentic AI in Pentesting, 95% of organizations rank pentesting as a top or high priority, but on average, they’re testing 32% of their attack surface per year.

68% of Your Attack Surface is Untested

One in three organizations estimate their penetration testing coverage gap is 20% or less of their infrastructure on any regular basis. And 55% of security leaders say traditional testing fails to communicate findings in a way their teams can actually act on. These aren’t fringe organizations. They’re enterprises with mature security programs.

The uncomfortable truth is that the gap can’t be solved by hiring more pentesters. It’s a structural problem. Attackers operating with AI don’t work on a human clock. Defenders running annual or quarterly testing cycles are playing catch-up against adversaries who aren’t.

But here’s the part that doesn’t get discussed enough: closing the coverage gap creates its own challenge. Test more of your attack surface and you’ll find more vulnerabilities, and at a certain point, prioritizing which to go after becomes overwhelming. If you’re solely relying on CVSS scores, you don’t have the full picture. Security leaders will need more context to sort through the upcoming wave of discovered vulnerabilities. Signals like EPSS (Exploit Prediction Scoring System), KEV (CISA’s Known Exploited Vulnerabilities catalog), and asset criticality give teams a much clearer picture of where to prioritize exploitable risk. The goal is to make sure the findings that surface actually drive decisions—not just reports.

Agentic AI vs. Scanners vs. Gen AI and How to Tell the Difference

This was the part of my talk I enjoyed most, and the part that seemed to land well. When a vendor says “AI-powered,” they could mean very different things:

Automation: High-speed pattern matching against known signatures. It’s fast and useful, but cannot reason about novel attack paths. Most organizations already have scanner data, the challenge is turning scanner findings into confirmed, actionable risk.
Generative AI: Better reports with faster summaries and a cleaner UX. Although this is an interface improvement, it’s not a testing quality improvement. You’re getting the same underlying capability with a nicer wrapper.
Agentic AI: This can plan, pivot, and act autonomously. Agentic AI in pentesting chains vulnerabilities and adapts when the first approach fails. This is the engine behind Mythos. It’s also the engine behind Sara AI Pentesting.

The question to ask every vendor is: Can your AI adapt when the first approach fails? If the answer is “it retries the same pattern,” you’re looking at category one or two. If it can reason through why the first approach failed and change strategy, you’re looking at agentic AI.

That distinction goes beyond a feature checklist because having agentic AI determines whether you’re actually closing the coverage gap or just scanning it faster.

In a Saturated Market, Trust is the Differentiator

Another factor to weigh is trust. When the majority of vendors are pushing the AI-powered angle, their track record, history of success, and credibility can speak volumes. Buyers are looking for the vendors they can rely on when the tool is running in a live enterprise environment with real assets on the line.

That’s where 13 years of running offensive security programs at scale matters. It’s where having more than 10 million hours of testing logged by a vetted researcher community means something. This represents a depth of institutional knowledge that’s been baked into how Sara was engineered—what real attack chains look like, what actually gets missed, where the edge cases live that a lab benchmark won’t surface.

The Exoskeleton Model in Human + AI Pentesting

There’s a metaphor I’ve used before that I think captures where this is all heading: AI as an exoskeleton for the human researcher. Give a master carpenter a drum sander, and they don’t spend their time on what the machine does best. They spend it on the joinery.

In practice, the division of an AI and human-powered penetration platform looks like this. Sara handles active recon and asset enumeration, port scanning at machine speed, known-pattern vulnerability detection, and continuous coverage. The Synack Red Team handles horizontal chaining across systems, business logic and trust relationship attacks, novel creative exploit development, and the context-aware adversarial reasoning that AI doesn’t yet replicate reliably.

The research backs this up. Overall, 64% of security leaders in the Omdia report prefer agent-led testing with human oversight. And notably, 1.6x more of them view that human oversight as a permanent model rather than a transitional one.

What AI Can’t Do Yet

Part of my goal at Gartner was to say something most vendors won’t: no one has fully autonomous security solved today.

Horizontal vulnerability chaining still requires human expertise.
Purely autonomous approaches carry higher false positive rates, which matter more than most people acknowledge when you’re thinking about automated remediation workflows.
Complex business logic vulnerabilities need human context. And even the frontier labs disagree on the right architecture right now.

The adoption barriers are real too. In the Omdia data, 46% of non-adopters cite security concerns about AI systems themselves; 40% don’t trust AI decision-making; 40% are worried about integration complexity. These aren’t irrational concerns. They’re the right questions to be asking before you put agentic AI into a live production environment.

The roadmap I laid out is honest about this: machine-assisted testing leads to autonomous triage, which leads to AI-remediation guides, which eventually gets to AI-written code fixes. That’s a progression, not a flip of a switch. The organizations that will get the most value from this transition are the ones who understand where they are in that progression right now.

My Big Takeaway from Gartner SRM

Gartner SRM draws CISOs and security architects who’ve seen enough hype cycles to be appropriately skeptical. What struck me about the conversations I had was that the skepticism had shifted. It wasn’t “is AI real in security?” anymore. It was “how do I tell what’s real from what’s marketing?”

That’s a meaningful change. And it’s why the decoder ring matters. The buyers who can distinguish between automated scanners, better UX, and genuinely agentic AI are the ones who will build programs that hold up against machine-speed adversaries. The ones who can’t are going to spend a lot of money closing a coverage gap they never actually close.If any of this maps to what you’re dealing with in your own program, let’s chat. It might be time to check out Sara AI Pentest, which deploys in days and delivers a full pentest report with human-validated findings. Start a free trial today.