- The Safety Net
- Units of Behavior
- Behavioral Boundaries
- A Taste of TDD
- The Future of Test-Driven Development
- Summary
The Future of Test-Driven Development
One of my favorite personal mottoes is “Never an absolutist.”3 Do I believe TDD is a practice that will never change and will never disappear?
Of course not: It’s already changing. What follows are some of those changes, along with their implications for the future of TDD. In each case, the TDD game is slightly modified, but the thought processes, techniques, and human activities remain essentially the same.
As far as TDD disappearing, well, that won’t happen as long as humans are needed to explain to a computer what it is that we want it to do for us.
TDD and Behavior-Driven Development
Behavior-Driven Development (BDD) is like TDD in many ways. It is the whole-team practice of building software by writing brief, human-readable scenarios, and getting them to work one at a time. BDD is defined as the practice of “exploring desired system behaviors with examples in conversations and formalizing those examples into automated tests to guide development.”4
BDD isn’t merely a renaming of TDD. As Matt Wynne wrote in The Cucumber Book (Pragmatic Bookshelf, 2017), “BDD builds upon TDD by formalizing the good habits of the best TDD practitioners.”
In practice, BDD differs from TDD in the following ways:
The tests are “business-facing.” That is, they express business rules and requests. The suite of tests (“scenarios” or “examples”) represents a runnable product specification, whereas the emphasis of TDD is often on a runnable engineering specification. To keep the scenarios readable to the entire team—whether technically inclined or otherwise—BDD examples are written in a ubiquitous domain language and use a handful of keywords.
BDD is a whole-team activity and is best implemented with nearly continuous collaboration between the product and development specialists. Although this approach might seem burdensome on the surface, BDD can reduce the need for many traditional meetings and hand-offs. The team works closely together to plan, specify, build, test, and demonstrate working software within hours or even minutes.
TDD is included as a practice within the BDD cycle. There are often situations where finer-grained, “infrastructural” testing is required to complete a BDD scenario without cluttering the product specification with engineering details. Ergo, there’s no need to choose between BDD or TDD. They go together nicely, like chocolate and peanut butter.
Many of the test-driven techniques described in this book apply equally to a team’s BDD scenarios: They favor developing more, smaller tests with fewer assertions; setting up just enough to test the business rule; reducing duplication (for example, using Cucumber’s Gherkin Background keyword); making the scenarios readable; strategically replacing challenging dependencies with test doubles; and taking small, quick, and safe steps toward the team’s goal.
TDD and Functional Programming
When using Functional Programming (FP), whenever you create a type with a rigorous contract, you are effectively testing a lot of assumptions at compile-time. But there is also behavioral (runtime) code, and where there’s behavior, there’s the opportunity to get it wrong. Ergo, unit-testing FP behaviors is still beneficial, and building those behaviors by using a test-driven approach is as critical to quality FP as it is to object-oriented development.
FP unit tests contain all the attributes of good tests, as described in Chapter 5, “Sustaining a Test-Driven Practice.” The following example gives a taste of unit testing in F# (a .Net FP language):
[<Test>]
let ``intersection of two sets should only contain common elements``
() =
let left =
initialSet
|> add "Rob"
|> add "Awesome"
let right =
initialSet
|> add "Jason"
|> add "Awesome"
let result =
left
|> intersection right
result
|> contains "Awesome"
|> expectsToBeTrue
result
|> contains "Rob"
|> expectsToBeFalse
result
|> contains "Jason"
|> expectsToBeFalse
Here is the passing implementation of intersection as another taste of FP syntax:
let intersection right left =
left
|> List.filter(fun element ->
right
|> List.contains element
)
Daydreams of TDD, Quantum Computers, and No Implementation
We developers used to fear the day when computers could write their own code. We were worried not only because such computers might build Terminator robots (or worse, mountains of paperclips5) and take over the world, but also because we’d be out of a job.
For many years, I’ve speculated that we would eventually see a computing breakthrough that would allow the computer to write the entire implementation. At the time, I envisioned a quantum computer (QC) that could take a team’s specifications and search a “solution space” for a set of machine instructions that would pass all the tests.
If the computer were fast enough, it could perhaps rewrite the entire implementation each time the team added a new test scenario. In other words, neither the team nor the development computer would ever need to read or refactor the implementation.
Teams could then focus entirely on writing, refactoring, and maintaining the test suite. Future high-level programming languages could be limited to the syntax and structure needed for good test-writing (for example, no more loops or branching statements). Developers would write engineering specifications as unit tests, or co-author product scenarios while working side-by-side with product designers. Or perhaps the team roles of product designer, developer, and tester would blend and merge into something new. A test-driven approach would be the de facto standard for building and maintaining software.
While I was daydreaming about the impacts of this hypothetical QC, a different computing breakthrough was happening. Noting the surging power and popularity of artificial intelligence6 (AI) and large language models (LLMs), I began to wonder whether these new tools would someday make my predictions a reality.
The Reality of TDD, Artificial Intelligence, and “Vibe Coding”
Using a rudimentary OpenAI script that contained a simple prompt including all my tests (I could easily add tests with each run of the script), I explored whether OpenAI could generate code to pass all my tests. And it did so, repeatedly and successfully.
With each new run of the script, my AI agent did not have access to any previous prompt, dialog, or existing implementation. There was nothing for me or the AI to refactor, because it always started from scratch and overwrote all the code with each run.
My experiment was only that: a simple proof-of-concept. I had the script build simple bits and pieces of various classroom exercises (for example, the Salvo game described in the Exercises Appendix at the end of this book).
There are several ways that AI tools are currently assisting real developers. Some integrated development environments (IDEs) offer AI “autocomplete” options that are quite good at predicting what the developer intended to write next, and the complete code appears much faster than the developer could have typed it. Others read all the existing code and will suggest refactorings. The developer merely needs to glance over the proffered code and press a single key to accept the changes.
AI agents can even write tests for you. However, the generated tests that I saw merely confirmed that the code did what it did, not necessarily that it did what was wanted. This is the gap in the workflow where humans are still needed: Someone needs to come up with descriptive and detailed examples of what they want the software to do.
Right now, various methods for incorporating AI into the full development workflow are vying for our attention. One of those approaches is called “vibe coding,” which was described to me as allowing the AI agent to write the implementation without any review. The person receiving the code then tests it and gives the AI feedback. This is happening in a few different ways:
The human manually tests the application and then explains to the AI where it failed, often using an example. They do this repeatedly until the human is satisfied with the results. At least one person who used this method expressed occasional frustration with the AI agent’s tendency to seemingly reinterpret older requirements or to incorrectly anticipate unspoken needs.
The human lets the AI write its own tests, and the human reviews and runs those tests.
The human gives the AI examples, either one at a time or in a batch, and adds to those examples if either the human realizes there is a gap in the examples (a common and natural occurrence with a test-driven approach) or the AI delivers a solution that is in some way too generalized (also common when pairs or ensembles develop code using a test-driven approach). This style most closely resembles my early experiment with OpenAI.
Some things I noted about these reported experiences:
In every case, clear, specific examples—either written by or approved by the person making the requests—were necessary at some point in the workflow.
Except for the time saved by having the AI write code (which was certainly a significant savings), little time was saved during interactions with the AI. Put another way, the person interacting with the AI still spent about the same amount of overall time explaining what they wanted.
None of the applications described to me could directly impact a user’s health, finances, or safety. I asked AI guru Scott Werner if he would use vibe coding for financial, medical, or safety applications, and he responded, “No, probably not anything life critical.”
Computers can’t read our minds, and they don’t do well with ambiguous instructions. When it comes to our safety, health, finances, and other critical domains, we will need to describe—to the computer and to each other—all desired outcomes using complete, descriptive, and detailed examples. That is exactly the practice of TDD.
