113 lines
5.1 KiB
Markdown
113 lines
5.1 KiB
Markdown
---
|
|
title: "AI-Generated Testing: Why Most Approaches Fail"
|
|
description: How Playwright-Utils, TEA workflows, and Playwright MCPs solve AI test quality problems
|
|
---
|
|
|
|
|
|
AI-generated tests frequently fail in production because they lack systematic quality standards. This document explains the problem and presents a solution combining three components: Playwright-Utils, TEA (Test Architect), and Playwright MCPs.
|
|
|
|
:::note[Source]
|
|
This article is adapted from [The Testing Meta Most Teams Have Not Caught Up To Yet](https://dev.to/muratkeremozcan/the-testing-meta-most-teams-have-not-caught-up-to-yet-5765) by Murat K Ozcan.
|
|
:::
|
|
|
|
## The Problem with AI-Generated Tests
|
|
|
|
When teams use AI to generate tests without structure, they often produce what can be called "slop factory" outputs:
|
|
|
|
| Issue | Description |
|
|
|-------|-------------|
|
|
| Redundant coverage | Multiple tests covering the same functionality |
|
|
| Incorrect assertions | Tests that pass but don't actually verify behavior |
|
|
| Flaky tests | Non-deterministic tests that randomly pass or fail |
|
|
| Unreviewable diffs | Generated code too verbose or inconsistent to review |
|
|
|
|
The core problem is that prompt-driven testing paths lean into nondeterminism, which is the exact opposite of what testing exists to protect.
|
|
|
|
:::caution[The Paradox]
|
|
AI excels at generating code quickly, but testing requires precision and consistency. Without guardrails, AI-generated tests amplify the chaos they're meant to prevent.
|
|
:::
|
|
|
|
## The Solution: A Three-Part Stack
|
|
|
|
The solution combines three components that work together to enforce quality:
|
|
|
|
### Playwright-Utils
|
|
|
|
Bridges the gap between Cypress ergonomics and Playwright's capabilities by standardizing commonly reinvented primitives through utility functions.
|
|
|
|
| Utility | Purpose |
|
|
|---------|---------|
|
|
| api-request | API calls with schema validation |
|
|
| auth-session | Authentication handling |
|
|
| intercept-network-call | Network mocking and interception |
|
|
| recurse | Retry logic and polling |
|
|
| log | Structured logging |
|
|
| network-recorder | Record and replay network traffic |
|
|
| burn-in | Smart test selection for CI |
|
|
| network-error-monitor | HTTP error detection |
|
|
| file-utils | CSV/PDF handling |
|
|
|
|
These utilities eliminate the need to reinvent authentication, API calls, retries, and logging for every project.
|
|
|
|
### TEA (Test Architect Agent)
|
|
|
|
A quality operating model packaged as eight executable workflows spanning test design, CI/CD gates, and release readiness. TEA encodes test architecture expertise into repeatable processes.
|
|
|
|
| Workflow | Purpose |
|
|
|----------|---------|
|
|
| `*test-design` | Risk-based test planning per epic |
|
|
| `*framework` | Scaffold production-ready test infrastructure |
|
|
| `*ci` | CI pipeline with selective testing |
|
|
| `*atdd` | Acceptance test-driven development |
|
|
| `*automate` | Prioritized test automation |
|
|
| `*test-review` | Test quality audits (0-100 score) |
|
|
| `*nfr-assess` | Non-functional requirements assessment |
|
|
| `*trace` | Coverage traceability and gate decisions |
|
|
|
|
:::tip[Key Insight]
|
|
TEA doesn't just generate tests—it provides a complete quality operating model with workflows for planning, execution, and release gates.
|
|
:::
|
|
|
|
### Playwright MCPs
|
|
|
|
Model Context Protocols enable real-time verification during test generation. Instead of inferring selectors and behavior from documentation, MCPs allow agents to:
|
|
|
|
- Run flows and confirm the DOM against the accessibility tree
|
|
- Validate network responses in real-time
|
|
- Discover actual functionality through interactive exploration
|
|
- Verify generated tests against live applications
|
|
|
|
## How They Work Together
|
|
|
|
The three components form a quality pipeline:
|
|
|
|
| Stage | Component | Action |
|
|
|-------|-----------|--------|
|
|
| Standards | Playwright-Utils | Provides production-ready patterns and utilities |
|
|
| Process | TEA Workflows | Enforces systematic test planning and review |
|
|
| Verification | Playwright MCPs | Validates generated tests against live applications |
|
|
|
|
**Before (AI-only):** 20 tests with redundant coverage, incorrect assertions, and flaky behavior.
|
|
|
|
**After (Full Stack):** Risk-based selection, verified selectors, validated behavior, reviewable code.
|
|
|
|
## Why This Matters
|
|
|
|
Traditional AI testing approaches fail because they:
|
|
|
|
- **Lack quality standards** — No consistent patterns or utilities
|
|
- **Skip planning** — Jump straight to test generation without risk assessment
|
|
- **Can't verify** — Generate tests without validating against actual behavior
|
|
- **Don't review** — No systematic audit of generated test quality
|
|
|
|
The three-part stack addresses each gap:
|
|
|
|
| Gap | Solution |
|
|
|-----|----------|
|
|
| No standards | Playwright-Utils provides production-ready patterns |
|
|
| No planning | TEA `*test-design` workflow creates risk-based test plans |
|
|
| No verification | Playwright MCPs validate against live applications |
|
|
| No review | TEA `*test-review` audits quality with scoring |
|
|
|
|
This approach is sometimes called *context engineering*—loading domain-specific standards into AI context automatically rather than relying on prompts alone. TEA's `tea-index.csv` manifest loads relevant knowledge fragments so the AI doesn't relearn testing patterns each session.
|