Methodology - PersonaSpec

The Core Idea

Traditional E2E tests ask: "Does feature X work?"
Persona-driven tests ask: "Can user type Y accomplish their goal?"

Instead of testing isolated features, you validate entire user workflows from different perspectives. Each persona represents a distinct user archetype who would use your application differently.

Key Innovation

Capture screenshots throughout the journey, then use a vision-capable AI model to analyze them for UX issues, accessibility problems, and design feedback that automated tests cannot detect.

The Persona Definition

Every persona has five core attributes, plus optional Behavioral Realism attributes for human-realistic testing:

Core Attributes

1

Name and Role

A memorable identifier (e.g., "Alex - Trial Evaluator")
2

Background

Who they are, their context, and experience level
3

Goals

What they're specifically trying to accomplish on your site
4

Behaviors

How they typically interact with software (heavy searcher, skimmer, keyboard user, etc.)

Behavioral Realism Attributes

These optional attributes enable human-scale timing and emotional journey tracking:

5

Interaction Patterns

Human-scale timing: scan time (1-8 seconds), retry delays (2-10 seconds), max retries, reading pace, typing speed. Even frustrated users don't click 100x/second.
6

Emotional Baseline

Starting frustration level (0-100), frustration escalation style (volatile/moderate/patient), trust level, and urgency. Marcus starts at 60 frustration after failed email attempts.
7

Cognition

How they process information: reading speed, decision style (impulsive/deliberate), focus duration, uncertainty response, and visual scan pattern (F-pattern, center-first, etc.).
8

Prior Experience

Products they compare you against (e.g., "Zendesk", "Intercom"), patterns they expect, things that delight them, and pet peeves that trigger instant frustration.
9

Session Context

Whether they're a returning user, what happened before this session, distraction level, and time of day (morning-fresh vs evening-tired).

Behavioral Realism

Traditional E2E tests often produce false positives - issues that appear in automated tests but wouldn't occur with real human users. Behavioral Realism addresses five critical testing flaws:

The Five Flaws

Flaw	Problem	Solution
Psychology without Physics	Tests define frustration thresholds but click at machine speed, finding errors no human would trigger	`interactionPatterns` - Human-scale timing with realistic delays
Finding without Noticing	Tests find elements instantly that take humans 3+ seconds to visually locate	`cognition.scanPattern` - Simulate how users actually scan pages
Frustration without History	Each test starts fresh, but Marcus arrives already at 60% frustration from failed emails	`emotionalBaseline` - Starting emotional state and escalation patterns
Context without Comparison	Tests evaluate in isolation, but users compare to Zendesk, Intercom, etc.	`priorExperience` - Reference products and expected patterns
Tasks without Sessions	Tests forget what happened earlier, but users remember being frustrated 5 minutes ago	`sessionContext` - Prior activity and returning user state

Human-Scale Timing

Even the most impatient user clicks retry every 2-4 seconds, not 100 times per second. The interactionPatterns attribute enforces realistic timing:

            Interaction Pattern Presets
            TypeScript
          

            import { interactionPatternDefaults } from 'personaspec';

// Impatient but still human - faster but not machine-speed
interactionPatternDefaults.impatient = {
  scanTime: { min: 500, max: 2000 },    // 0.5-2 sec to scan
  retryDelay: { min: 2000, max: 4000 }, // 2-4 sec between retries!
  maxRetries: 3,
  readingPace: 2000,                    // 2 sec per 100 words
  typingSpeed: 300,                     // 300 chars/minute
};
          

Emotional Journey Tracking

Frustration compounds over time. A user who starts frustrated and encounters three small issues may abandon, while a calm user would continue. The emotionalBaseline attribute tracks this journey:

            Frustration Escalation
            TypeScript
          

            const marcus = definePersona({
  name: 'Marcus',
  role: 'Frustrated Customer',
  // ... core attributes ...

  emotionalBaseline: {
    frustrationLevel: 60,           // Already frustrated from email
    frustrationEscalation: 'volatile', // Compounds quickly
    trustLevel: 'skeptical',         // Doubts the company cares
    urgency: 'urgent',               // Needs resolution now
  },

  sessionContext: {
    isReturning: true,
    priorActivity: 'Failed email support attempts',
    distractionLevel: 'low',        // Focused on getting help
  },
});
          

Observations Include Emotional Context

When you record observations, the collector can track the persona's emotional state at that moment. This helps AI analysis understand why certain issues matter more for certain personas.

Test Structure

File Organization

Create one test file per persona in a dedicated personas/ directory within your tests folder. Also create a shared utility for collecting observations and metrics.

Test File Anatomy

Each persona test file contains:

Persona definition at the top as a JSDoc comment and exported object
A serial test suite (tests run in order, simulating a continuous session)
Shared observation collector instantiated once for all tests
Lifecycle hooks:
- beforeAll: Start the observation session
- beforeEach: Attach console error listeners
- afterAll: End session and save observations
4-6 task-based tests representing realistic user goals
One free exploration test at the end simulating open-ended browsing

Task Tests

Each task test follows this pattern:

Track start time for duration measurement
Initialize success flag as false (prove success, don't assume it)
Navigate to starting point and capture a screenshot
Attempt the task as the persona would naturally try it
Evaluate outcomes:
- Goal achieved: mark success, note what worked
- Partially achieved: mark success with friction observations
- Blocked: keep failure, add frustration observation
Capture result screenshot with context
Record the task with success/failure, duration, and notes

Good Task Examples

Tasks should represent things users actually want to do: "Find how to export my data", "Understand the pricing within 30 seconds", "Navigate the checkout using only keyboard", "Find customer support contact info".

Free Exploration Tests

Every persona ends with a free exploration test that simulates realistic browsing without a specific goal. This catches issues that task-focused tests miss.

Structure

Start on the main page
Define actions the persona would naturally take (scroll, click interesting things, try different sections)
Execute each action with error handling so one failure doesn't stop exploration
Capture screenshots at interesting moments
Always mark as successful (observations matter more than pass/fail)

The Observation Collector

A shared utility class that captures everything during test execution:

Session Metrics

Start and end timestamps
Page load count
Click and search counts
Back navigation count (high numbers indicate confusion)
Console errors captured automatically

Screenshots

PNG files saved to disk
Base64 encoding for vision model analysis
Associated URL, page title, name, and context description

Task Results

Task name and success boolean
Duration in milliseconds
Free-form notes about what happened

Output

The collector saves a JSON file per persona containing all metrics, task results, observations categorized by type, and screenshots with metadata and base64 data.

Observation Types

Use these consistently across all tests:

Core Types

Type	When to Use
success	Goal achieved smoothly, good UX discovered
note	Neutral observation, suggestion for improvement
confusion	Unclear what to do next, feedback ambiguous
frustration	Goal blocked, feature missing, error encountered

Behavioral Realism Types

Additional observation types for richer emotional and discovery tracking:

Type	When to Use
noticed	Element entered visual attention (after realistic scan time)
overlooked	Element on screen but not noticed by this persona's scan pattern
relief	Tension released - found what they needed, problem resolved
delight	Exceeded expectations - pleasantly surprised
disappointment	Failed to meet expectations set by marketing or prior experience

Comparative Types

Types for tracking how the experience compares to the persona's reference products:

Type	When to Use
better-than-expected	Compares favorably to reference products (e.g., faster than Zendesk)
worse-than-expected	Compares unfavorably to what they're used to
familiar-pattern	Recognized expected UX pattern from prior experience
unexpected-pattern	Pattern violated expectations from reference products

Human-Realistic Flag

Each observation can include a humanRealistic flag in its context. When set to true, it indicates the observation would occur under realistic human timing. When false, the issue was found through machine-speed testing and may be a false positive.

Vision Model Analysis

The most powerful aspect of this methodology is using a vision-capable AI model (like Claude) to analyze the captured screenshots.

How It Works

After tests complete, feed the observations.json file (which contains base64 screenshots) to a vision model along with:

The persona definition (background, goals, behaviors)
The task being attempted when each screenshot was taken
Context description for each screenshot
Observations already recorded by the test

What the Vision Model Can Identify

UX Issues

Confusing layouts, missing CTAs, form fields without labels, unclear navigation paths, overwhelming information density.

Accessibility Problems

Insufficient color contrast, small text, missing focus indicators, poor heading structure, icons without labels.

Design Feedback

Inconsistent spacing, typography issues, color palette problems, mobile responsiveness issues, visual clutter.

Prompting the Vision Model

When asking a vision model to analyze screenshots, provide context:

            Example Prompt
          

            "Here are screenshots from a user journey test.
The persona is [name], who [background].
Their goal was to [goal].
They typically [behaviors].

Analyze each screenshot and identify:
1. UX issues that would frustrate this specific persona
2. Accessibility problems visible in the interface
3. Design inconsistencies
4. Whether the user's goal appears achievable
5. Specific recommendations for improvement"
          

Key Implementation Details

Serial Execution

Tests must run in serial order (not parallel) to simulate a continuous user session where state carries over.

Screenshot Strategy

Capture screenshots at:

Start of each task
Key decision points
Results/outcomes
Anything confusing or problematic

Base64 Encoding

Screenshots are saved both as PNG files on disk and as base64-encoded strings in the JSON output. The base64 format allows the observations file to be self-contained and directly processable by vision models.

Flexible Selectors

Use realistic selectors that a user would conceptually understand. If searching for a "Votes" section, look for headings containing "Vote" rather than test IDs that users can't see.

Graceful Failures

Wrap actions in try/catch so one failure doesn't cascade. Record what went wrong and continue the exploration.

Realistic Timing

Use realistic timeouts. If a user would give up after 3 seconds of waiting, set that as your timeout. This catches performance issues that pass/fail tests miss.

Analyzing Results

Automated Analysis

Review the observations.json files for each persona. Look for these patterns:

Pattern	Indicates
High back navigation count	Users getting lost
Multiple frustration observations	Critical UX problems
Multiple confusion observations	Information architecture issues
Failed tasks	Broken critical paths
Long task durations	Performance or complexity issues

Pro Tip

The qualitative observations from both automated tests and vision model analysis are often more valuable than pass/fail results. A test might "pass" while recording significant friction that only becomes apparent when a vision model sees the actual interface.

When to Use This Approach

Best For

Public-facing websites and applications
Products with multiple distinct user types
UX-focused teams wanting to surface usability issues
Validating that changes don't break user workflows
Getting design feedback without manual review
Supplementing traditional feature-based E2E tests

Less Suited For

API testing
Unit/integration testing
Very simple single-purpose applications
Performance benchmarking (use dedicated tools)

The Core Idea

The Persona Definition

Core Attributes

Name and Role

Background

Goals

Behaviors

Behavioral Realism Attributes

Interaction Patterns

Emotional Baseline

Cognition

Prior Experience

Session Context

Behavioral Realism

The Five Flaws

Human-Scale Timing

Emotional Journey Tracking

Test Structure

File Organization

Test File Anatomy

Task Tests

Free Exploration Tests

Structure

The Observation Collector

Session Metrics

Screenshots

Task Results

Output

Observation Types

Core Types

Behavioral Realism Types

Comparative Types

Vision Model Analysis

How It Works

What the Vision Model Can Identify

Prompting the Vision Model

Key Implementation Details

Serial Execution

Screenshot Strategy

Base64 Encoding

Flexible Selectors

Graceful Failures

Realistic Timing

Analyzing Results

Automated Analysis

When to Use This Approach