The Core Idea
Traditional E2E tests ask: "Does feature X work?"
Persona-driven tests ask: "Can user type Y accomplish their goal?"
Instead of testing isolated features, you validate entire user workflows from different perspectives. Each persona represents a distinct user archetype who would use your application differently.
Key Innovation
Capture screenshots throughout the journey, then use a vision-capable AI model to analyze them for UX issues, accessibility problems, and design feedback that automated tests cannot detect.
The Persona Definition
Every persona has five core attributes, plus optional Behavioral Realism attributes for human-realistic testing:
Core Attributes
-
1
Name and Role
A memorable identifier (e.g., "Alex - Trial Evaluator")
-
2
Background
Who they are, their context, and experience level
-
3
Goals
What they're specifically trying to accomplish on your site
-
4
Behaviors
How they typically interact with software (heavy searcher, skimmer, keyboard user, etc.)
Behavioral Realism Attributes
These optional attributes enable human-scale timing and emotional journey tracking:
-
5
Interaction Patterns
Human-scale timing: scan time (1-8 seconds), retry delays (2-10 seconds), max retries, reading pace, typing speed. Even frustrated users don't click 100x/second.
-
6
Emotional Baseline
Starting frustration level (0-100), frustration escalation style (volatile/moderate/patient), trust level, and urgency. Marcus starts at 60 frustration after failed email attempts.
-
7
Cognition
How they process information: reading speed, decision style (impulsive/deliberate), focus duration, uncertainty response, and visual scan pattern (F-pattern, center-first, etc.).
-
8
Prior Experience
Products they compare you against (e.g., "Zendesk", "Intercom"), patterns they expect, things that delight them, and pet peeves that trigger instant frustration.
-
9
Session Context
Whether they're a returning user, what happened before this session, distraction level, and time of day (morning-fresh vs evening-tired).
Behavioral Realism
Traditional E2E tests often produce false positives - issues that appear in automated tests but wouldn't occur with real human users. Behavioral Realism addresses five critical testing flaws:
The Five Flaws
| Flaw | Problem | Solution |
|---|---|---|
| Psychology without Physics | Tests define frustration thresholds but click at machine speed, finding errors no human would trigger | interactionPatterns - Human-scale timing with realistic delays |
| Finding without Noticing | Tests find elements instantly that take humans 3+ seconds to visually locate | cognition.scanPattern - Simulate how users actually scan pages |
| Frustration without History | Each test starts fresh, but Marcus arrives already at 60% frustration from failed emails | emotionalBaseline - Starting emotional state and escalation patterns |
| Context without Comparison | Tests evaluate in isolation, but users compare to Zendesk, Intercom, etc. | priorExperience - Reference products and expected patterns |
| Tasks without Sessions | Tests forget what happened earlier, but users remember being frustrated 5 minutes ago | sessionContext - Prior activity and returning user state |
Human-Scale Timing
Even the most impatient user clicks retry every 2-4 seconds, not 100 times per second. The interactionPatterns attribute enforces realistic timing:
import { interactionPatternDefaults } from 'personaspec';
// Impatient but still human - faster but not machine-speed
interactionPatternDefaults.impatient = {
scanTime: { min: 500, max: 2000 }, // 0.5-2 sec to scan
retryDelay: { min: 2000, max: 4000 }, // 2-4 sec between retries!
maxRetries: 3,
readingPace: 2000, // 2 sec per 100 words
typingSpeed: 300, // 300 chars/minute
};
Emotional Journey Tracking
Frustration compounds over time. A user who starts frustrated and encounters three small issues may abandon, while a calm user would continue. The emotionalBaseline attribute tracks this journey:
const marcus = definePersona({
name: 'Marcus',
role: 'Frustrated Customer',
// ... core attributes ...
emotionalBaseline: {
frustrationLevel: 60, // Already frustrated from email
frustrationEscalation: 'volatile', // Compounds quickly
trustLevel: 'skeptical', // Doubts the company cares
urgency: 'urgent', // Needs resolution now
},
sessionContext: {
isReturning: true,
priorActivity: 'Failed email support attempts',
distractionLevel: 'low', // Focused on getting help
},
});
Observations Include Emotional Context
When you record observations, the collector can track the persona's emotional state at that moment. This helps AI analysis understand why certain issues matter more for certain personas.
Test Structure
File Organization
Create one test file per persona in a dedicated personas/ directory within your tests folder. Also create a shared utility for collecting observations and metrics.
Test File Anatomy
Each persona test file contains:
- Persona definition at the top as a JSDoc comment and exported object
- A serial test suite (tests run in order, simulating a continuous session)
- Shared observation collector instantiated once for all tests
- Lifecycle hooks:
beforeAll: Start the observation sessionbeforeEach: Attach console error listenersafterAll: End session and save observations
- 4-6 task-based tests representing realistic user goals
- One free exploration test at the end simulating open-ended browsing
Task Tests
Each task test follows this pattern:
- Track start time for duration measurement
- Initialize success flag as false (prove success, don't assume it)
- Navigate to starting point and capture a screenshot
- Attempt the task as the persona would naturally try it
- Evaluate outcomes:
- Goal achieved: mark success, note what worked
- Partially achieved: mark success with friction observations
- Blocked: keep failure, add frustration observation
- Capture result screenshot with context
- Record the task with success/failure, duration, and notes
Good Task Examples
Tasks should represent things users actually want to do: "Find how to export my data", "Understand the pricing within 30 seconds", "Navigate the checkout using only keyboard", "Find customer support contact info".
Free Exploration Tests
Every persona ends with a free exploration test that simulates realistic browsing without a specific goal. This catches issues that task-focused tests miss.
Structure
- Start on the main page
- Define actions the persona would naturally take (scroll, click interesting things, try different sections)
- Execute each action with error handling so one failure doesn't stop exploration
- Capture screenshots at interesting moments
- Always mark as successful (observations matter more than pass/fail)
The Observation Collector
A shared utility class that captures everything during test execution:
Session Metrics
- Start and end timestamps
- Page load count
- Click and search counts
- Back navigation count (high numbers indicate confusion)
- Console errors captured automatically
Screenshots
- PNG files saved to disk
- Base64 encoding for vision model analysis
- Associated URL, page title, name, and context description
Task Results
- Task name and success boolean
- Duration in milliseconds
- Free-form notes about what happened
Output
The collector saves a JSON file per persona containing all metrics, task results, observations categorized by type, and screenshots with metadata and base64 data.
Observation Types
Use these consistently across all tests:
Core Types
| Type | When to Use |
|---|---|
| success | Goal achieved smoothly, good UX discovered |
| note | Neutral observation, suggestion for improvement |
| confusion | Unclear what to do next, feedback ambiguous |
| frustration | Goal blocked, feature missing, error encountered |
Behavioral Realism Types
Additional observation types for richer emotional and discovery tracking:
| Type | When to Use |
|---|---|
| noticed | Element entered visual attention (after realistic scan time) |
| overlooked | Element on screen but not noticed by this persona's scan pattern |
| relief | Tension released - found what they needed, problem resolved |
| delight | Exceeded expectations - pleasantly surprised |
| disappointment | Failed to meet expectations set by marketing or prior experience |
Comparative Types
Types for tracking how the experience compares to the persona's reference products:
| Type | When to Use |
|---|---|
| better-than-expected | Compares favorably to reference products (e.g., faster than Zendesk) |
| worse-than-expected | Compares unfavorably to what they're used to |
| familiar-pattern | Recognized expected UX pattern from prior experience |
| unexpected-pattern | Pattern violated expectations from reference products |
Human-Realistic Flag
Each observation can include a humanRealistic flag in its context. When set to true, it indicates the observation would occur under realistic human timing. When false, the issue was found through machine-speed testing and may be a false positive.
Vision Model Analysis
The most powerful aspect of this methodology is using a vision-capable AI model (like Claude) to analyze the captured screenshots.
How It Works
After tests complete, feed the observations.json file (which contains base64 screenshots) to a vision model along with:
- The persona definition (background, goals, behaviors)
- The task being attempted when each screenshot was taken
- Context description for each screenshot
- Observations already recorded by the test
What the Vision Model Can Identify
UX Issues
Confusing layouts, missing CTAs, form fields without labels, unclear navigation paths, overwhelming information density.
Accessibility Problems
Insufficient color contrast, small text, missing focus indicators, poor heading structure, icons without labels.
Design Feedback
Inconsistent spacing, typography issues, color palette problems, mobile responsiveness issues, visual clutter.
Prompting the Vision Model
When asking a vision model to analyze screenshots, provide context:
"Here are screenshots from a user journey test.
The persona is [name], who [background].
Their goal was to [goal].
They typically [behaviors].
Analyze each screenshot and identify:
1. UX issues that would frustrate this specific persona
2. Accessibility problems visible in the interface
3. Design inconsistencies
4. Whether the user's goal appears achievable
5. Specific recommendations for improvement"
Key Implementation Details
Serial Execution
Tests must run in serial order (not parallel) to simulate a continuous user session where state carries over.
Screenshot Strategy
Capture screenshots at:
- Start of each task
- Key decision points
- Results/outcomes
- Anything confusing or problematic
Base64 Encoding
Screenshots are saved both as PNG files on disk and as base64-encoded strings in the JSON output. The base64 format allows the observations file to be self-contained and directly processable by vision models.
Flexible Selectors
Use realistic selectors that a user would conceptually understand. If searching for a "Votes" section, look for headings containing "Vote" rather than test IDs that users can't see.
Graceful Failures
Wrap actions in try/catch so one failure doesn't cascade. Record what went wrong and continue the exploration.
Realistic Timing
Use realistic timeouts. If a user would give up after 3 seconds of waiting, set that as your timeout. This catches performance issues that pass/fail tests miss.
Analyzing Results
Automated Analysis
Review the observations.json files for each persona. Look for these patterns:
| Pattern | Indicates |
|---|---|
| High back navigation count | Users getting lost |
| Multiple frustration observations | Critical UX problems |
| Multiple confusion observations | Information architecture issues |
| Failed tasks | Broken critical paths |
| Long task durations | Performance or complexity issues |
Pro Tip
The qualitative observations from both automated tests and vision model analysis are often more valuable than pass/fail results. A test might "pass" while recording significant friction that only becomes apparent when a vision model sees the actual interface.
When to Use This Approach
Best For
- Public-facing websites and applications
- Products with multiple distinct user types
- UX-focused teams wanting to surface usability issues
- Validating that changes don't break user workflows
- Getting design feedback without manual review
- Supplementing traditional feature-based E2E tests
Less Suited For
- API testing
- Unit/integration testing
- Very simple single-purpose applications
- Performance benchmarking (use dedicated tools)
Ready to Get Started?
Check out the step-by-step guide to run your first persona test in 10 minutes.
Get Started