"Isolated" test means something very different to different people!

Published by Manuel Rivero on 03/06/2025

Introduction.

A concept that we find very useful both when we do TDD and when introducing tests into legacy code are the FIRS properties (Fast, Isolated, Repeatable, Self-validating), which are used to describe an ideal unit test. This concept originates from the FIRST acronym, which describes ideal unit tests in the context of TDD, where the T stands for Timely^[1].

This is what we mean by Fast, Isolated, Repeatable and Self-validating tests:

Fast: they should execute so quickly that we never feel the need to delay running them.
Isolated: they should produce the same results regardless of the order in which they are executed. This means they do not depend on one another in any way, whether directly or indirectly.
Repeatable: they should be deterministic, their results should not change if the tested behavior and environment remain unchanged.
Self-validating: they should pass or fail automatically, without requiring human intervention to determine the outcome. This property is essential for enabling test automation.

In the distinct contexts of TDD and retrofitting tests in legacy code, the same FIRS properties fulfill different roles, guiding us in designing testable units in the former and uncovering testability problems in the latter^[2].

When retrofitting tests in legacy code, violations of FIRS properties highlight dependencies that impede testing, referred to as awkward collaborations^[3]. These awkward collaborations point to the dependencies we need to break using dependency-breaking techniques^[4] to enable the introduction of unit tests.

In the case of integration tests, it is sufficient to focus on violations of the Isolated, Repeatable and Self-validating properties. Dependencies that violate the Repeatable and Self-validating properties require dependency-breaking techniques to address them, whereas violations of the Isolated property can often be alternatively resolved through other approaches, such as test-specific fixtures or configuration changes.

In the context of TDD, violations of the FIRS properties are a key heuristic to identify the collaborations that we need to push outside the unit under test. These awkward collaborations will be simulated with test doubles in our unit tests.

Notice that, when doing TDD, identifying awkward collaborations is more challenging because we must infer them from the requirements. In contrast, in legacy code, we can identify them more easily since they are visible in the code and manifest through the testability problems they cause.

Identifying awkward collaborations is, therefore, an important skill for designing testable code. In this sense, FIRS properties serve as a valuable guideline for defining the boundaries of a unit, helping ensure testable code.

It seems “isolated test” means something very different to different people!

Isolated. You keep using that word. I do not think it means what you think it means.

If you read the original definition of isolated from Agile in a Flash: F.I.R.S.T. you will notice that it is different from what we expressed in the previous section.

We said that for us isolated meant that “the test should produce the same results regardless of the order in which they are executed. This means they do not depend on one another in any way, whether directly or indirectly”.

In the definition of isolated from Agile in a Flash: F.I.R.S.T., the flash card states:

“Isolated: Failure reasons become obvious.”

Later, in the explanation, they elaborate on what this means (the emphasis in bold was added by us):

Isolated: Tests isolate failures. A developer should never have to reverse-engineer tests or the code being tested to know what went wrong. Each test class name and test method name with the text of the assertion should state exactly what is wrong and where. If a test does not isolate failures, it is best to replace that test with smaller, more-specific tests.

A good unit test has a laser-tight focus on a single effect or decision in the system under test. And that system under test tends to be a single part of a single method on a single class (hence “unit”). Tests must not have any order-of-run dependency. They should pass or fail the same way in suite or when run individually. Each suite should be re-runnable (every minute or so) even if tests are renamed or reordered randomly. Good tests interfere with no other tests in any way. They impose their initial state without aid from other tests. They clean up after themselves.”

Tim Ottinger’s FIRST: an idea that ran away from home summarizes this as:

“Isolated - tests don’t rely upon either other in any way, including indirectly. Each test isolates one failure mode only.”^[5].

For us isolated, in the context of identifying awkward collaborations, means that tests should be isolated from each other, which, in practice, means that they can not share any mutable state or resource. Our definition is less restrictive than Ottinger’s one. We are choosing to consider only one aspect of their definition, that “tests interfere with no other tests in any way”, and not the other one, “tests have a single reason to fail”, (we’ll comment more about this other aspect below).

We think that what we mean by isolated aligns with the definition Kent Beck provides in his book Test Driven Development: By Example. In the section Isolated Test (page 125) of the chapter Test-Driven Development Patterns, he writes:

“How should the running of tests affect one another? Not at all.”

“[…] the main lesson […] tests should be able to ignore one another completely.”

“One convenient implication of isolated tests is that the tests are order independent.”

Moreover, in his more recent work Test Desiderata he defines isolated as:

“tests should return the same results regardless of the order in which they are run”^[6].

Having said that, there is another desirable property for tests in Test Desiderata which is interesting for this discussion, specificity, which Kent Beck explains as:

“Specific: if a test fails, the cause of the failure should be obvious.”

We think that, the other aspect of isolated In Ottinger’s definition, having a single reason to fail, corresponds to the highest possible level of specificity. It seems that what they mean by isolated is intertwining two of the desirable properties of tests from Beck’s Test Desiderata:

the property of returning the same results regardless of the order in which they are run (being isolated).
the property of test failures having an obvious cause” (being specific).

Having a single reason to fail is still a highly desirable property which we also take into account while writing test cases. It can help to compose independent behaviours and to avoid overspecifying some tests^[7].

However, in the context of identifying awkward collaborations, we have found Beck’s definition of isolated to be more useful, in order to avoid the considering-the-class-as-the-unit trap, and to teach the use of test doubles as isolation tools, which is how they are mostly employed in the classic style of TDD.

Summary.

We showed how the FIRS properties can be valuable in both TDD and retrofitting tests in legacy code, guiding developers toward creating more testable and maintainable code.

We explored how the concept of isolated tests differs depending on the author. While Kent Beck’s definition emphasizes independence between tests, ensuring they produce consistent results regardless of execution order, the definition from Agile in a Flash also states that tests should have a single reason to fail. We believe this intertwines Beck’s definition of isolation with another test property: specificity, with having a single reason to fail representing the highest level of specificity.

We think that, both Beck’s and Ottinger’s definitions of isolated are valuable. However, Beck’s version aligns more closely with what we mean by isolated in the context of identifying awkward collaborations. For us, Beck’s definition has proven especially useful while identifying awkward collaborations, to avoid the trap of considering the class as the unit, and to teach how to use test doubles as isolation tools.

The TDD, test doubles and object-oriented design series.

This post is part of a series about TDD, test doubles and object-oriented design:

The class is not the unit in the London school style of TDD.
“Isolated” test means something very different to different people!.
Heuristics to determine unit boundaries: object peer stereotypes, detecting effects and FIRS-ness.
Breaking out to improve cohesion (peer detection techniques).
Refactoring the tests after a “Breaking Out” (peer detection techniques).

Acknowledgements.

I’d like to thank Fran Reyes, Emmanuel Valverde Ramos, Fran Iglesias Gómez, Marabesi Matheus and Antonio De La Torre for giving me feedback about several drafts of this post.

Finally, I’d also like to thank imgflip for their Inconceivable Iñigo Montoya Meme Generator.

References.

Notes.

[1] See also Jay Bazuzi’s post “pure unit test” vs. “FIRSTness” to learn more about categorizing tests according to their FIRSness (removing the T) which can be useful when working with legacy code, or when testing after.

[2] We delve into this topic in depth in our TDD training and our Changing Legacy Code training.

[3] See Fowler’s article Mocks Aren’t Stubs to see where the term awkward collaboration comes from.

[4] See our post Classifying dependency-breaking techniques.

[5] Aside from summarizing very well what “Isolated” means to them, FIRST: an idea that ran away from home dives into the history of how the FIRST properties came about and how their meaning has blurred over time. Plus, it includes a list of sources that discuss FIRST, highlighting changes made by each source, like in the telephone game, and noting that some don’t even give credit to the original authors.

It also refers to another article published by Pragmatic Programmers as the origin of the acronym Unit Tests Are FIRST: Fast, Isolated, Repeatable, Self-Verifying, and Timely, which we believe explains FAST better and even includes code examples.

Finally, it explains the value of writing tests first and criticize writing them afterwards.

[6] Ian Cooper, in his talk TDD, where did it all go wrong states that:

“For Kent Beck, [a unit test] is a test that runs in isolation from other tests.”

“[…] NOT to be confused with the classical unit test definition of targeting a module.”

”A lot of issues with TDD is people misunderstanding isolation as class isolation […]”

We talked about this frequent misunderstanding in our post The class is not the unit in the London school style of TDD.

[7] We may write about this in a future post.

Volver a posts