Functional · scripted replays 0.35 credits × flow count

Test Flows.

The LLM analyses the page and writes 3–7 realistic user journeys — sign up, search, add-to-cart, checkout, contact form — as structured step lists. The runner replays each one in a fresh Playwright tab, screenshotting every step and recording pass/fail. Failures pinpoint exactly which step broke and why.

What it does generateTestFlows() + runTestFlows()

From the entry screenshot + DOM, the LLM proposes N flows (default 5, configurable 1–10). Each flow is a JSON object:

{
  "title": "Sign up with email and reach the dashboard",
  "goal":  "Verify a new visitor can complete signup end-to-end",
  "steps": [
    { "type": "goto",   "url": "/signup" },
    { "type": "fill",   "selector": "input[name=email]", "value": "test@example.com" },
    { "type": "fill",   "selector": "input[name=password]", "value": "Hunter2-strong" },
    { "type": "click",  "selector": "button[type=submit]" },
    { "type": "verify", "selector": "h1", "text": "Welcome" }
  ]
}

The runner opens each flow in a clean Playwright page, executes every step (click, fill, goto, wait, verify), screenshots after each one, and records the verdict. A flow is "passed" only when every step succeeds. On the first failure, the runner halts that flow and captures the error + the screenshot at the point of failure.

What it finds

Real failures we've seen in production runs:

Broken submission: "Step 4 — click button[type=submit] — element is hidden behind a modal that doesn't dismiss with the close button"
Selector drift: "Step 2 — fill input[name=email] — selector not found. The signup form was redesigned and the input is now input#email"
Missing success state: "Step 5 — verify h1 contains 'Welcome' — page navigated to /dashboard but h1 is empty. Likely a JS hydration race"
Form rejection: "Checkout flow — step 6 'enter shipping ZIP' rejected with 'Invalid ZIP' for valid US ZIPs. Backend regex appears overly strict"
Timeout: "Search flow — step 3 'wait for search results' — no .result elements appeared within 10s. Empty-state UI is missing"
Flaky locator: "Same flow passed and failed on consecutive runs — button:has-text('Continue') matched two elements when a cookie banner was open"
Auth wall: "All 5 flows failed on step 1 — page redirects to /login for unauthenticated visitors. Configure auth in the submit to test the post-login experience"

Coverage

Breadth

User-visible success paths: signup, login, search, checkout, contact form, plan upgrade, share, export

Depth

Per-step screenshots + a verbatim step list — failures pinpoint the exact step + selector that broke

Default count

5 flows per audit, tunable 1–10 via the flows.count option

Output

Pass/fail per flow + per step, with screenshots. Failed flows include the error and the page state.

No multi-page state continuity — each flow starts in a clean browser context, so you can't run "sign up" followed by "use the account I just made" as separate flows. (The agent's steps within a flow share state.)
No real user think-time or analytics-tracked engagement metrics — flows execute fast.
Flows are plausible, not guaranteed business-critical — for a fixed regression suite, define them yourself as Custom Tests.
Doesn't test concurrent users, server-side rate limits, or load behaviour — single-session at a time.
Real-money transactions are out of scope — flows stop at the Stripe / payment iframe boundary.

Sample finding

// One entry from report.testFlows.flows[]
{
  "name":  "Search products and add the first result to cart",
  "passed": false,
  "error":  "step 4 failed: timeout waiting for selector .cart-count",
  "steps": [
    { "type": "click",   "selector": ".search-toggle",           "status": "passed" },
    { "type": "fill",    "selector": "input[type=search]",        "status": "passed", "value": "blue widget" },
    { "type": "click",   "selector": ".result:first-child",       "status": "passed" },
    { "type": "click",   "selector": "button.add-to-cart",        "status": "passed" },
    { "type": "verify",  "selector": ".cart-count",               "status": "failed",
      "detail": "selector never appeared within 10s — cart icon DOM never updated",
      "screenshotUrl": "https://.../flow-3-step-5-failure.jpg" }
  ]
}

Test Flows.

What it does generateTestFlows() + runTestFlows()

What it finds

Coverage

Sample finding

See also