Momentic
Back to Blog

How AI Will Bring TDD Back from the Dead

How TDD can help ensure quality in AI-generated code, addressing issues of correctness, bloat, and architectural cohesion

TDD is dead. DHH declared it so back in 2014. If it ever had life, the constraints it placed on developers suffocated it and meant developers abandoned the practice for more pragmatic approaches to testing.

But what if it was just one of those ideas before its time? What if, with our newfound ability to get machines to crank out poor-quality code by the megabyte, TDD could be reborn as the essential quality control mechanism our AI-assisted future needs?

TDD's Roller Coaster Journey

Kent Beck introduced Test-Driven Development back in the late 90s as part of Extreme Programming. The promise was compelling: writing tests before implementation code produced cleaner designs, fewer bugs, and a comprehensive safety net for future changes. By the early 2000s, TDD had become the darling of the Agile movement. Books were written, conferences held, and training programs developed, all extolling the virtues of the red-green-refactor cycle.

But as with many revolutionary ideas, the backlash was inevitable. The practice required significant discipline, slowed initial development, and sometimes forced artificial design decisions—it was almost the antithesis of the “move fast and break things” momentum model that dominates startup culture. DHH's inflammatory 2014 blog post "TDD is Dead. Long Live Testing" killed the idea with the hipster coders at the time.

Teams moved to more flexible approaches like test-after development or focused primarily on integration and end-to-end tests rather than the unit-level focus of classic TDD. Others simply reduced their testing disciplines altogether, favoring velocity over verification. Many veteran teams quietly continue practicing TDD, or at least the spirit of it. Still, the rigid orthodoxy of test-first development has become more of a niche practice than the industry standard.

But that was a time when code was good. What about when slop starts to take over codebases?

AI Coding's Quality Problems

There's trouble in paradise.

It's not your code that's bad, obviously. It's AI's fault. Cursor, Windsurf, Lovable—they are pumping out slop in epic proportions. Nothing to do with you.

Quality issues in AI code are a given. GitHub's own research acknowledges a "downward pressure on code quality" when Copilot enters the picture. Developers are pushing more mistakes, introducing more redundancies, and building increasingly complex systems—all while feeling more productive.

The problems are legion:

  • First, there's the correctness issue. AI models generate code that looks plausible but is wrong or insecure. These models hallucinate functions that don't exist, use deprecated APIs for fun, and make errors that look convincing if you don't know what is happening. Without careful review, teams integrate code that quietly undermines their systems' integrity.
  • Then there's the bloat issue. AI doesn't respect YAGNI (You Aren't Gonna Need It). It habitually overengineers, adding fields and functions.
  • Perhaps most concerning is what's happening to architectural cohesion. AI suggestions tend to paste new code rather than find existing utilities in your project. Technical debt accumulates faster than ever, while the codebase grows increasingly brittle.

Automation bias—the tendency to trust computer-generated answers uncritically—leads devs to accept AI suggestions without adequate scrutiny. Tab. Tab. Tab. Tab. The end result? We're creating more code than ever—at a pace that would have seemed magical just a few years ago—but much of it is crud. We've lost the natural friction that once forced reflection and careful design.

This is precisely where TDD comes back into the picture.

TDD as the Natural Solution for AI Coding

If AI pumps out plausible-but-poor code, we need a reliable definition of "right." That's TDD in a nutshell.

TDD creates an unambiguous target for AI to hit by defining expected behavior in executable form before implementation. Write a test, watch it fail, then unleash the AI to make it pass. The test provides guardrails—the AI can't wander off into useless territory because the test specifies exactly what's required. It either passes or fails. No debate.

This approach neutralizes AI's worst tendencies:

  • For correctness issues, tests act as instant verification. Your assertions define precisely what success looks like. The AI can't hallucinate APIs or make logical errors without immediate exposure. No need to trust the AI—the tests tell you definitively whether it works.
  • Against bloat, TDD's minimalist philosophy ("write only what you need to pass the test") counters AI's tendency to overengineer. If your tests don't require those extra fields and functions, they're dead code waiting to be pruned. The test becomes your specification of what's truly needed, not what the AI imagines might be useful someday.
  • For architectural fragmentation, tests force you to think about how components interact. Well-designed tests naturally push toward better abstractions and cleaner interfaces. AI might still generate fresh code rather than reusing existing utilities, but at least that code will be compatible with your architecture.

This is what Kent Beck meant when he observed that as AI automates routine coding, testing skills become exponentially more valuable:

When machines can write code faster than humans, our value shifts to defining problems correctly and verifying solutions work as intended.

The irony is perfect. Once rejected for slowing developers down, TDD becomes the tool that lets us safely accelerate with AI. TDD transforms from burden to essential control mechanism. It's no longer about slowing down to write tests—it's about writing tests so AI can safely speed up.

The New AI-TDD Workflow

So, how do we do this in practice? How does TDD work when AI is generating both tests and code? Here's a possible workflow that leverages the strengths of both:

  1. Specify behavior first: Start by describing what you want your code to do. Not in vague terms, but with concrete inputs and expected outputs. Instead of "I need a user authentication system," try "When users enter valid credentials, they should receive a JWT token. When credentials are invalid, they should get a 401 error."
  2. Generate tests (the red phase): Now the twist: have the AI write your tests first. Prompt it with something like "Write unit tests for a function that validates user credentials and returns JWT tokens." Review these tests carefully—they're now your specification. Delete any tests that test behaviors you don't need, and add any edge cases the AI missed. This is where your domain expertise shines.
  3. Run the tests: Execute the tests. They'll fail spectacularly—that's the point. You now have a clear definition of "done" in executable form.
  4. Generate implementation (the green phase): Only now do you prompt the AI to implement the code: "Write code to make these tests pass." The AI now has guardrails—it must create code that satisfies your specific test requirements, not whatever it thinks you might want.
  5. Rerun tests: Does everything pass? If not, prompt the AI with the specific failures and ask it to fix them. This targeted approach prevents the AI from rewriting everything and potentially introducing new problems.
  6. Refactor: Once the tests pass, look for improvements. Is the code clean? Does it follow your project's patterns? You might ask the AI: "Refactor this code to be more efficient while ensuring tests still pass."

This workflow flips the traditional AI coding experience on its head. Instead of "generate some code and hope it's right," you define correctness first, then generate code to match. Humans stay firmly in control of what gets built, while AI handles the mechanical implementation details.

The beauty is in the balance: humans doing what they do best (defining problems, spotting edge cases, maintaining architectural vision) and AI doing what it excels at (generating implementation code quickly).

Where TDD and AI Go From Here

We're likely heading toward a world where developers spend more time defining problems through tests than writing implementation code. The human's role shifts toward being the quality architect who specifies, verifies, and refines rather than the code typist. Think of it as a new division of labor: humans handling the "what" and "why," while AI tackles the "how."

AI tools themselves will evolve to support this paradigm. Imagine IDEs that automatically generate test suites from requirements documents, or AI assistants that flag when your implementation exceeds your tests' specifications—the ultimate YAGNI enforcer. The concept of "test coverage" may transform into "requirement coverage," measuring how completely your tests capture the intended behavior.

For technical leaders, this shift demands investment in testing culture. Teams that build expertise in writing good tests will thrive with AI, while those chasing raw output without quality guardrails will drown in technical debt. The tools have changed dramatically since Kent Beck first proposed Test-Driven Development, but its core insight remains more relevant than ever: defining success clearly before you begin is the surest path to achieving it.

Accelerate your team with AI testing.

Book a demo