Agentic · LLM-driven exploration 0.45 credits × step count
Exploratory Agent.
An LLM-driven browser session that picks one action at a time and watches what happens. Eight selectable test types (smoke → adversarial) crossed with eight focuses (forms, security, perf, …) gives 64 possible exploration modes. Catches the interactive bugs that static analysis and scripted flows both miss.
What it does exploratory.js → runExploratoryAgent()
Launches a fresh Playwright browser context. Each step:
- Snapshot the page — interactive elements (capped at 40), recent console messages, network activity, URL, title
- Ask the LLM for the next action:
click, type, scroll, hover, navigate, or done
- Execute the action via Playwright
- Wait for navigation / network settle
- Ask the LLM to verify: "did the expected outcome happen? Did you see anything broken or unexpected?"
- If something's broken, emit a structured finding (title, severity, category, repro steps, fix suggestion)
- Loop until
maxSteps (default 7) or the LLM emits done: true
Findings are deduplicated by (title, category, severity) so the same bug surfacing on steps 3 and 5 lands as one entry with both pieces of evidence.
Modes — how the agent thinks
- Critical — hypothesis-driven testing tours. Picks one of four tours (feature-tour, error-tour, data-tour, consistency-tour) and runs disciplined evidence-collection. Best for "find me real bugs, fast."
- Creative — weird inputs (RTL Arabic, emoji-only, 10,000-char strings, pasting binary data into text fields), feature mashups, out-of-order flows. Best for "find edge cases I'd never think of."
- Default — obvious paths first, then edge cases. Best for general coverage.
Types — what the agent looks for
- Smoke — basic core flows actually work end-to-end
- Happy — primary success scenarios for a typical user
- Negative — invalid inputs, wrong order, expected rejections handled gracefully
- Edge — boundary conditions (empty, very long, unexpected types)
- Scenario — realistic multi-step goal ("compare three products, buy the cheapest")
- Monkey — random clicks to surface stability + state-corruption bugs
- Adversarial — security-relevant behaviour (XSS-shaped input, auth bypass attempts, PII leakage)
- Destructive — try to break things (submit garbage, race the UI, abort partway through)
Focuses — what part of the app
- general, core-paths, forms, navigation, search, security, a11y, perf
What it finds
Real findings from production runs:
- Dead-end flow: "Selected a colour variant, page navigated to
/product/null, no back button or breadcrumb"
- Missing validation: "Submitted contact form with empty email — accepted, generic 'Thanks' shown, but no email was sent (verified by absence of network call)"
- State corruption: "Clicked 'Add' twice in 250ms — quantity jumped to 3 instead of 2. Suggests no debounce on the cart-update handler"
- Hidden affordance: "The 'My Account' link looked like static text — only discovered by hovering and seeing the cursor change to a pointer"
- Loading-state bug: "Clicked 'Save', button immediately re-enabled while the network request was still in flight — double-clicking saved twice"
- Auth leak (adversarial): "Navigating directly to
/admin as a signed-out user returned a JSON response with {users: [...]} instead of a 401"
- Console PII (adversarial): "Login flow logs
{email, password} to console.debug on every keystroke — would land in any installed Sentry / LogRocket"
- Stale state (creative): "Pasting Excel cell-separator characters into the name field broke the form's serialiser — submit hung indefinitely"
- Monkey discovery: "After 30 random clicks, navigation drawer ended up open AND closed at the same time, blocking interaction with main content"
Coverage
Step budget
Default 7 steps, configurable 1–40. Each step is one LLM call + one Playwright action.
Configurability
8 types × 8 focuses × 3 modes = 192 distinct exploration profiles. Add a customPrompt for further steering.
Where it goes
Single starting URL, then wherever the agent decides to navigate within the same site.
Output
Step-by-step audit trail (action + reasoning + screenshot + verdict) plus a deduplicated finding list with risk level.
- Bounded budget — 7 steps is enough to surface 1–3 distinct bugs on a typical SaaS page; complex flows need 15–40 steps (bumps credit cost).
- Doesn't replace exploratory testing by a human QA — the agent is faster + cheaper + tireless, but a human catches subtler taste-level issues ("this CTA feels pushy").
- No load testing or concurrent users — single browser session.
- No real payment processing — adversarial mode stops at the Stripe iframe boundary.
- No real authentication bypass — the adversarial mode probes for access-control mistakes (unprotected routes), it does not attempt to break crypto / steal sessions / exploit CVEs.
Sample finding
// One entry from report.exploratory.findings[]
{
"title": "Add-to-cart double-counts on rapid clicks",
"severity": "medium",
"category": "reliability",
"steps_to_reproduce": [
"Navigate to any product detail page",
"Click 'Add to cart' twice within 300ms",
"Open cart drawer",
"Observe: quantity is 3 (expected 2: 1 from each click)"
],
"evidence": [
"step 4: screenshot exploratory-step-04.jpg shows cart-count=3",
"step 5: console log: 'POST /api/cart 200' fired 3 times"
],
"fix_suggestion": "Add a 'pending' flag to the cart-add handler; disable the button until\nthe server response returns. OR debounce the click handler to 500ms.",
"discovered_at_step": 4
}
See also