The Test Is the Truth

AI-generated code makes tests the source of truth. Learn how Testing 2.0 shifts software development from code-first to behavior-first.

Wei-Wei Wu

December 5, 2025

5 Min Read

What’s on this page

‍Software 1.0 easily automates what you can specify.

Software 2.0 easily automates what you can verify.

Software testing is behavior verification. We want to know that [code] produces [behavior] as the engineer intended. But that idea relies on a now-outdated assumption that engineers are the ones specifying the code.

In the old world, engineers wrote both the code and the tests that verified it. The two evolved together.

That world is gone. Today, code is generated by AI at a volume and velocity no engineer (or entire engineering team) can keep up with. The bottleneck is no longer writing code. It’s verifying it.

And when humans can no longer feasibly inspect, understand, or manually validate the code being produced, the role of the test changes. It stops being a check on correctness and becomes the definition of correctness.

In this new world, the test isn’t downstream from development: it is the source of truth.

Testing 1.0 –> Testing 2.0

For the past two decades, software testing has lived in the Testing 1.0 paradigm. Tests as scripts, tests as checklists, tests as step-by-step reproductions of UI interactions. They assumed a world in which humans wrote deterministic logic, and testing tools merely exercised it.

In Testing 1.0:

Tests were procedural: click X, type Y, assert Z.
Tests encoded implementation details: DOM selectors, IDs, APIs, and internal structures.
Tests broke whenever the UI moved, a selector changed, or a workflow shifted.
Tests existed to confirm that human-written code still behaved as humans expected.

Testing 1.0 automated what engineers could explicitly script. If a human couldn’t write the test procedure, the tool couldn’t run it. This model worked only because code was slow to change and humans remained in complete control of creation.

In the AI era, code is no longer authored line by line by humans. It is generated, refactored, rearranged, and rewritten by machines at machine speed. Testing 2.0 emerges from a simple realization:

‍- Andrej Karpathy

Testing 2.0 shifts testing from:

low-level scripts → high-level behavior
brittle selectors → semantic understanding
human procedures → machine-interpreted intent
confirming code → defining correctness

Instead of telling the computer how to test (“Click #login-btn”), we tell it what must be true:

“A user can log in.”
“A workspace can be created.”
“Billing updates reflect in the dashboard.”

The system figures out how. Thus, Testing 2.0 automates what humans can reliably verify, not what they can specify in script. Testing 2.0 is not just a better way of testing. It is a necessary response to a world where code is no longer the primary product engineers create: tests are.

Tests as the specification

In Testing 1.0, specifications lived outside the tests. Requirements were documented, discussed, and translated into code. Tests existed to confirm that the implementation matched the spec.

That separation no longer holds.

When AI generates code, the traditional chain of requirements → spec → code → tests collapses. There is no stable intermediate artifact. The code is effectively ephemeral, changed by Claude, or Codex, or Devin to match the requirements. Tests become the durable product.

In Testing 2.0:

Tests define the behavior the system must exhibit.
Tests act as the contract between humans and AI-generated code.
Tests become the executable form of product requirements.
Tests serve as the single source of truth for correctness.

The test is no longer just a verification step; it is the specification. If a behavior isn’t encoded in a test, the system has no obligation to create it, while if it is encoded in a test, the system must satisfy it.

This shift reframes the developer’s role. Instead of writing logic that implements a specification, developers write tests that are the specification and rely on AI systems to implement behavior that satisfies those tests.

The test is the specification, and therefore, the test is the truth.

Behavior over implementation

Testing 1.0 focused on implementation. Tests interacted with specific selectors, DOM structures, API endpoints, and internal functions. It’s Selenium, Cypress, or Playwright. They verified how the system achieved a result, not whether the correct result was achieved. This made tests tightly coupled to the underlying code and therefore fragile.

In Testing 2.0, implementation details are no longer the ground truth. Behavior is.

AI-generated code changes frequently, sometimes radically, without altering the intended outcome. When the implementation is unstable and often opaque, tying tests to code paths becomes pointless. What matters is the visible, user-facing behavior the system must produce.

This shift isn’t new. It’s actually a return to testing’s original intent.

Behavior-driven development (BDD) has always been about specifying outcomes rather than instructions.
Test-driven development (TDD) has always assumed that tests define the boundary of what the system must do.

AI now makes these principles unavoidable. When code is generated rather than hand-crafted, you cannot anchor correctness to the implementation. You can only anchor it to behavior. AI is giving us the opportunity to practice software development the way it was always meant to be: defined by outcomes, not implementation details.

If the behavior matches the test, the system is correct. If it doesn’t, the system is wrong.

The Collapse of QA

QA existed as a distinct function because testing required specialized knowledge of tools, selectors, frameworks, and scripting. Developers wrote the code; QA wrote the tests; both sides tried to keep the two aligned.

When tests are written in natural language, the barrier to writing a test is no longer technical tooling. It’s clarity of thought. Test authors don’t need to know selectors or internal structures. First and foremost, they need to articulate behavior precisely. Testing becomes an expression of intent, not an exercise in automation engineering.

As a result, the traditional boundary between “developer” and “QA engineer” collapses. If the primary skill in testing is the ability to specify behavior unambiguously, then anyone who understands the product can write the tests, including the people building it.

But this collapse doesn’t eliminate testing work. It redefines it as new roles emerge:

Test Engineers who specialize in expressing behavior precisely.
Test Architects who design the overall truth structure of a product.
Test Editors who refine and maintain the corpus of behavioral tests.
Software Critics who evaluate a system the way a critic evaluates a film: not by how it was made, but by what it does.

These roles are about reasoning, articulation, and judgment. They ensure that the test suite expresses the right truths about the product, not just the obvious ones.

Where AI generates the implementation, the human contribution is understanding behavior. The differentiator is the ability to think clearly about how a system should behave and express that behavior unambiguously.

The truth is the test

The shift from automated specification to automated verification will completely change how software is created and how it behaves. When implementation becomes cheap and disposable, and behavior becomes the only durable truth, the test emerges as the primary artifact of software development.

The future isn’t code-first, or even AI-first; it’s behavior-first, with machines generating the implementation needed to satisfy clearly defined tests.

We’ve just raised our Series A to build this at Momentic. Natural-language behavior verification of your software. The test as the truth.

Ship faster. Test smarter.

Get a demo

Don't miss these

View all

Wei-Wei Wu

Mar 2026

Testing Is Now Your Core Competency. Don’t Outsource It

Stop outsourcing QA - tests are now your product’s executable spec for AI-generated code. Own testing to ship faster with verified quality.

Wei-Wei Wu

Jan 2026

5 Best Practices for Playwright E2E Testing

Our top Playwright E2E testing best practices for smoother, faster release cycles – and how more modern AI testing solutions perform in comparison.

Wei-Wei Wu

Jan 2026

Migrating from Selenium to Playwright: A Step-by-Step Guide

How to migrate from Selenium to Playwright, with a step-by-step overview and how to overcome common challenges.

Ship faster. Test smarter.

Get a demo