Test Flows.
The LLM analyses the page and writes 3–7 realistic user journeys — sign up, search, add-to-cart, checkout, contact form — as structured step lists. The runner replays each one in a fresh Playwright tab, screenshotting every step and recording pass/fail. Failures pinpoint exactly which step broke and why.
What it does generateTestFlows() + runTestFlows()
From the entry screenshot + DOM, the LLM proposes N flows (default 5, configurable 1–10). Each flow is a JSON object:
{
"title": "Sign up with email and reach the dashboard",
"goal": "Verify a new visitor can complete signup end-to-end",
"steps": [
{ "type": "goto", "url": "/signup" },
{ "type": "fill", "selector": "input[name=email]", "value": "test@example.com" },
{ "type": "fill", "selector": "input[name=password]", "value": "Hunter2-strong" },
{ "type": "click", "selector": "button[type=submit]" },
{ "type": "verify", "selector": "h1", "text": "Welcome" }
]
}
The runner opens each flow in a clean Playwright page, executes every step (click, fill, goto, wait, verify), screenshots after each one, and records the verdict. A flow is "passed" only when every step succeeds. On the first failure, the runner halts that flow and captures the error + the screenshot at the point of failure.
What it finds
Real failures we've seen in production runs:
- Broken submission: "Step 4 — click
button[type=submit]— element is hidden behind a modal that doesn't dismiss with the close button" - Selector drift: "Step 2 — fill
input[name=email]— selector not found. The signup form was redesigned and the input is nowinput#email" - Missing success state: "Step 5 — verify h1 contains 'Welcome' — page navigated to
/dashboardbut h1 is empty. Likely a JS hydration race" - Form rejection: "Checkout flow — step 6 'enter shipping ZIP' rejected with 'Invalid ZIP' for valid US ZIPs. Backend regex appears overly strict"
- Timeout: "Search flow — step 3 'wait for search results' — no
.resultelements appeared within 10s. Empty-state UI is missing" - Flaky locator: "Same flow passed and failed on consecutive runs —
button:has-text('Continue')matched two elements when a cookie banner was open" - Auth wall: "All 5 flows failed on step 1 — page redirects to
/loginfor unauthenticated visitors. Configureauthin the submit to test the post-login experience"
Coverage
flows.count option- No multi-page state continuity — each flow starts in a clean browser context, so you can't run "sign up" followed by "use the account I just made" as separate flows. (The agent's steps within a flow share state.)
- No real user think-time or analytics-tracked engagement metrics — flows execute fast.
- Flows are plausible, not guaranteed business-critical — for a fixed regression suite, define them yourself as Custom Tests.
- Doesn't test concurrent users, server-side rate limits, or load behaviour — single-session at a time.
- Real-money transactions are out of scope — flows stop at the Stripe / payment iframe boundary.
Sample finding
// One entry from report.testFlows.flows[] { "name": "Search products and add the first result to cart", "passed": false, "error": "step 4 failed: timeout waiting for selector .cart-count", "steps": [ { "type": "click", "selector": ".search-toggle", "status": "passed" }, { "type": "fill", "selector": "input[type=search]", "status": "passed", "value": "blue widget" }, { "type": "click", "selector": ".result:first-child", "status": "passed" }, { "type": "click", "selector": "button.add-to-cart", "status": "passed" }, { "type": "verify", "selector": ".cart-count", "status": "failed", "detail": "selector never appeared within 10s — cart icon DOM never updated", "screenshotUrl": "https://.../flow-3-step-5-failure.jpg" } ] }