Agentic · LLM-driven exploration 0.45 credits × step count

Exploratory Agent.

An LLM-driven browser session that picks one action at a time and watches what happens. Eight selectable test types (smoke → adversarial) crossed with eight focuses (forms, security, perf, …) gives 64 possible exploration modes. Catches the interactive bugs that static analysis and scripted flows both miss.

What it does exploratory.js → runExploratoryAgent()

Launches a fresh Playwright browser context. Each step:

Snapshot the page — interactive elements (capped at 40), recent console messages, network activity, URL, title
Ask the LLM for the next action: click, type, scroll, hover, navigate, or done
Execute the action via Playwright
Wait for navigation / network settle
Ask the LLM to verify: "did the expected outcome happen? Did you see anything broken or unexpected?"
If something's broken, emit a structured finding (title, severity, category, repro steps, fix suggestion)
Loop until maxSteps (default 7) or the LLM emits done: true

Findings are deduplicated by (title, category, severity) so the same bug surfacing on steps 3 and 5 lands as one entry with both pieces of evidence.

Modes — how the agent thinks

Critical — hypothesis-driven testing tours. Picks one of four tours (feature-tour, error-tour, data-tour, consistency-tour) and runs disciplined evidence-collection. Best for "find me real bugs, fast."
Creative — weird inputs (RTL Arabic, emoji-only, 10,000-char strings, pasting binary data into text fields), feature mashups, out-of-order flows. Best for "find edge cases I'd never think of."
Default — obvious paths first, then edge cases. Best for general coverage.

Types — what the agent looks for

Smoke — basic core flows actually work end-to-end
Happy — primary success scenarios for a typical user
Negative — invalid inputs, wrong order, expected rejections handled gracefully
Edge — boundary conditions (empty, very long, unexpected types)
Scenario — realistic multi-step goal ("compare three products, buy the cheapest")
Monkey — random clicks to surface stability + state-corruption bugs
Adversarial — security-relevant behaviour (XSS-shaped input, auth bypass attempts, PII leakage)
Destructive — try to break things (submit garbage, race the UI, abort partway through)

Focuses — what part of the app

general, core-paths, forms, navigation, search, security, a11y, perf

What it finds

Real findings from production runs:

Dead-end flow: "Selected a colour variant, page navigated to /product/null, no back button or breadcrumb"
Missing validation: "Submitted contact form with empty email — accepted, generic 'Thanks' shown, but no email was sent (verified by absence of network call)"
State corruption: "Clicked 'Add' twice in 250ms — quantity jumped to 3 instead of 2. Suggests no debounce on the cart-update handler"
Hidden affordance: "The 'My Account' link looked like static text — only discovered by hovering and seeing the cursor change to a pointer"
Loading-state bug: "Clicked 'Save', button immediately re-enabled while the network request was still in flight — double-clicking saved twice"
Auth leak (adversarial): "Navigating directly to /admin as a signed-out user returned a JSON response with {users: [...]} instead of a 401"
Console PII (adversarial): "Login flow logs {email, password} to console.debug on every keystroke — would land in any installed Sentry / LogRocket"
Stale state (creative): "Pasting Excel cell-separator characters into the name field broke the form's serialiser — submit hung indefinitely"
Monkey discovery: "After 30 random clicks, navigation drawer ended up open AND closed at the same time, blocking interaction with main content"

Coverage

Step budget

Default 7 steps, configurable 1–40. Each step is one LLM call + one Playwright action.

Configurability

8 types × 8 focuses × 3 modes = 192 distinct exploration profiles. Add a customPrompt for further steering.

Where it goes

Single starting URL, then wherever the agent decides to navigate within the same site.

Output

Step-by-step audit trail (action + reasoning + screenshot + verdict) plus a deduplicated finding list with risk level.

Bounded budget — 7 steps is enough to surface 1–3 distinct bugs on a typical SaaS page; complex flows need 15–40 steps (bumps credit cost).
Doesn't replace exploratory testing by a human QA — the agent is faster + cheaper + tireless, but a human catches subtler taste-level issues ("this CTA feels pushy").
No load testing or concurrent users — single browser session.
No real payment processing — adversarial mode stops at the Stripe iframe boundary.
No real authentication bypass — the adversarial mode probes for access-control mistakes (unprotected routes), it does not attempt to break crypto / steal sessions / exploit CVEs.

Sample finding

// One entry from report.exploratory.findings[]
{
  "title":    "Add-to-cart double-counts on rapid clicks",
  "severity": "medium",
  "category": "reliability",
  "steps_to_reproduce": [
    "Navigate to any product detail page",
    "Click 'Add to cart' twice within 300ms",
    "Open cart drawer",
    "Observe: quantity is 3 (expected 2: 1 from each click)"
  ],
  "evidence": [
    "step 4: screenshot exploratory-step-04.jpg shows cart-count=3",
    "step 5: console log: 'POST /api/cart 200' fired 3 times"
  ],
  "fix_suggestion": "Add a 'pending' flag to the cart-add handler; disable the button until\nthe server response returns. OR debounce the click handler to 500ms.",
  "discovered_at_step": 4
}