What Happens When Sara Pentest Gets Six Hours With a Live Application
In a single six-hour session, with no human intervention, Sara found and fully exploited multiple high-severity vulnerabilities across a live application including a SQL injection (SQLi), an admin account takeover, and stored cross-site scripting. In fact, 70% of Sara’s findings on this target were rated high or critical. This post walks through three of them, end to end.
The type of findings Sara uncovered represents real, immediate organizational risk. In adversarial hands, this kind of chain could compromise an entire user base in minutes.
And Sara found all three. Alone and full autonomously.
Account Takeover, SQLi, Stored Payloads: The Combination That Defines Real Risk
Before I walk through what Sara did, it’s worth explaining why the combination of these three vulnerabilities matters.
- Account takeover gives an attacker control of user sessions, which is especially dangerous when those sessions belong to administrators.
- SQL injection can expose every record in the system: credentials, roles, email addresses, everything needed to map and compromise the user base.
- Stored cross-site scripting (XSS) plants a persistent payload that executes in every victim’s browser, silently stealing session tokens and enabling further takeovers.
Each one is serious on its own. Combined, they represent a complete organizational compromise: you can enumerate all users, take over their accounts, and plant persistent code that survives remediation unless every infection point is found and cleaned.
That’s what Sara demonstrated. And it’s not just the findings themselves that are remarkable, it was how Sara reasoned through each one.
Vulnerability 1: Account Takeover via Exposed Reset Token
The first finding was an account takeover. Sara identified the application’s “forgot password” flow as a test surface and began probing it. Standard behavior: enter an email, the system emails a password reset link.
First Sara tested the original credentials:

What Sara found was that the application was making a critical mistake in its API response. Instead of simply confirming the reset request was received, it was returning the reset token directly in the response body. That token (normally sent privately to the user’s email) was visible to anyone who could observe the API.

Sara extracted the token, injected it into the password reset request, and successfully changed the admin account’s password.

Then, to confirm full exploitation rather than just flag a potential weakness, Sara attempted to log in using the original credentials. That failed, confirming the password had changed. Then Sara logged in using the new password. That succeeded.

This was complete, verified exploitation of an admin account takeover, documented step-by-step, with evidence at every stage. The implication is that any attacker who knew an admin’s email address (or could enumerate one) could silently take over that account with a few API calls.
Curious what Sara might find in your hidden endpoints? Request a demo →
Vulnerability 2: Two-Stage SQLi via a Hidden Endpoint
In this scenario, the vulnerable endpoint wasn’t visible in the standard application UI. Sara found it by analyzing client-side JavaScript and following application references, effectively doing the reconnaissance that an experienced attacker would do to map a target before testing it. The endpoint was an “offers” API, and Sara identified it, constructed a valid request against it, and then began examining that request for injection points.
What Sara found was a SQLi on a sort parameter, which is a common but often overlooked attack vector. The injection was a two-stage exploit: Sending the injection payload didn’t return the result immediately, instead the application responded with a timestamp. To retrieve the actual injected data, Sara had to send that timestamp to a separate “status” endpoint.
Sara made that leap. It recognized that the timestamp represented a deferred job, identified the corresponding endpoint, and queried it to retrieve the injection output. That’s a multi-step reasoning chain that connects two separate parts of an application.
To confirm the injection was real, Sara first extracted the database version (MySQL 8.0.42) then escalated systematically.


It enumerated the database name, the table names, and the column structure.
Then it constructed a payload that concatenated email addresses and hashed passwords and extracted these values into a single output, cleanly formatted for the reviewer.
Every extracted user was flagged as a max-privilege account.
Combined with the account takeover vulnerability, this creates a clear compromise path: extract all user email addresses via SQLi, then use the password reset exploit to silently take over every account. With automated tooling, that’s a minutes-long operation once the vulnerabilities are confirmed.
Vulnerability 3: Stored XSS via Hidden Form Fields
The stored cross-site scripting finding demonstrates Sara’s ability to find an attack surface that’s invisible through normal interaction.
In the application’s user settings panel, certain fields were present in the UI but not editable through any visible form control. But Sara recognized that those fields must exist somewhere in the application’s update request structure, even if they weren’t exposed in the UI.
It identified the field labels (including fields like API guide configuration and toolbar link) and injected a JavaScript payload directly into the fields via an API request to update settings.
Now, when any user loads the affected page, that payload can execute silently in their browser and grab the session tokens stored there.
The payload is stored in the application itself and doesn’t just affect one person. It fires for every user who loads that page, turning a single hidden field into a persistent trap.
In this scenario, Sara looked beyond what the UI presented and reasoned about what the underlying API must support. That’s the kind of non-obvious attack path that automated scanners running against visible surface area will consistently miss.
Six Hours, Start to Finish
The entire engagement took approximately six hours. That covers initial pre-scan and crawl through investigation and into full exploitation. Sara demonstrated in this engagement that it can operate at the level of a senior security researcher on findings that matter. Not faster versions of simple tasks. Actual high-severity, multi-step exploitation chains, fully confirmed.
What this Means for the Synack Red Team
Sara doesn’t replace the Synack Red Team (SRT), it changes where they spend their time. The initial coverage work that used to take days of preparation and manual testing now happens in hours, at machine speed. That means SRT researchers start engagements with a foundation already laid: surface area mapped, common attack paths tested, initial findings documented. Their expertise goes where it has the most impact, the novel attack paths, the business logic flaws, the application specific chains that don’t follow any rulebook.
The vulnerabilities that Sara found in this engagement wouldn’t have waited two weeks. They were there on day one, and Sara found them before any human attacker could operationalize them.
That’s what an end-to-end security agent looks like. It reasons, explores, escalates, and documents, the same way a skilled researcher would, at the speed and scale that only AI can deliver.
Want to see Sara Pentest in action? Request a demo →
Frequently Asked Questions
How does Sara Pentest differ from an automated vulnerability scanner?
Automated scanners apply pattern-matching rules against known vulnerability signatures. Sara operates as an agentic AI: it builds a contextual model of the target application, reasons about how different parts of the system relate to each other, constructs valid requests before testing them for weaknesses, and follows multi-step exploitation chains to confirmed impact. The SQL injection Sara found in this engagement required identifying a hidden endpoint through JavaScript analysis, constructing a valid request against it, executing a two-stage injection across two separate API endpoints, and escalating from version detection to full credential extraction. A conventional scanner would not have reached that finding.
What does fully exploited mean in this context? Did Sara stop at detection?
Sara doesn’t flag potential vulnerabilities, it demonstrates confirmed exploitation. In the account takeover finding, Sara extracted the reset token, changed the admin password, confirmed the original credentials no longer worked, and confirmed the new credentials did. In the SQL injection, Sara extracted actual user records including hashed credentials. In the stored XSS, Sara demonstrated payload execution and session data access. These are end-to-end exploitation proofs, not theoretical risk flags.
How does Sara handle findings that require multi-step reasoning across different parts of an application?
This is precisely where Sara differentiates from simpler tooling. Sara maintains a contextual model of the application as it explores, tracking endpoints, parameters, relationships between API calls, and application state. When it encounters a two-stage injection where the result requires querying a separate status endpoint, it reasons about the application’s structure to identify and use that second endpoint. When it encounters UI fields that aren’t editable through visible controls, it reasons about what the underlying API must support and tests accordingly.
How long does a Sara engagement typically take?
Engagement duration varies based on application scope and complexity. In the engagement described here, the full cycle (from initial pre-scan and crawl through investigation and exploitation) completed in approximately six hours. The crawl phase, which enumerates application structure and builds Sara’s working model, represents a significant portion of that time.
Can Sara’s findings be chained together to assess compounded organizational risk?
Yes, and the engagement described here is a direct example. The SQL injection and account takeover vulnerabilities are individually high-severity. Together, they represent something worse: an attacker can use the SQL injection to enumerate all user email addresses, then use the reset token exploit to silently take over every account. Sara’s reporting identifies these relationships, giving security teams visibility into the compounded risk, not just individual findings.
Does Sara recommend remediation guidance?
Sara produces structured, actionable findings that include remediation guidance consistent with the type and severity of each vulnerability, similar to what SRT researchers provide. For organizations that can provide application source code, Synack’s team can work toward more precise, implementation-specific remediation recommendations.
Is Sara replacing human security researchers?
No. Sara operates most powerfully alongside the Synack Red Team, not in place of it. Sara handles breadth: comprehensive surface area coverage, parallel testing, and structured documentation of initial findings at machine speed. SRT researchers handle depth: novel attack development, business logic exploitation, and the high-context adversarial judgment that comes from years of elite offensive security experience. The combination is what makes Synack’s model distinct.
What types of applications and targets can Sara test?
Sara is designed for web applications and APIs. It performs reconnaissance and crawling, tests authentication flows, API endpoints, input handling, and application logic, and escalates findings to full exploitation where confirmed. The hidden endpoint discovery and two-stage injection chain demonstrated in this engagement reflect Sara’s ability to go beyond visible surface area to find what’s actually exploitable.


