Heuristics to determine unit boundaries: object peer stereotypes, detecting effects and FIRS-ness

Published by Manuel Rivero on 03/07/2025

Testing, TDD, Test Doubles, Legacy Code, Object-Oriented Design

Introduction.

In our previous post, The class is not the unit in the London school style of TDD, we talked about the distinction that the authors of the GOOS book make between the peers (real collaborators^[1]) and the internals of an object, and how this distinction was crucial for the maintainability of unit tests.

We commented how in the tests we write, we should only rely on test doubles to simulate the peers of an object because they align with the behaviours (roles, responsibilities^[2]) that the behaviour under test directly depends on, not the classes it depends on. This practice is consistent with the recommendation: “mock roles, not objects”^[3]. This helps us focus on testing behaviour rather than implementation details.

We also commented that test doubles should not simulate internal objects (any collaborator that is not a peer of the object under test), as they represent implementation details. Using test doubles for them can result in tests that are tightly coupled to structure rather than behaviour.

So, in order to reduce the coupling of our tests to implementation details, it’s crucial that we correctly identify the peers of an object.

Finally, we commented about the object peer stereotypes which are heuristics presented by the GOOS authors to help us think about our design and identify peers.

In this post, we’ll discuss different heuristics that we may apply to detect the peers of an object, or, in other words, to determine the boundaries of the unit under test. We’ll also examine the relationships between them.^[4].

Heuristics to determine the boundaries of the unit under test.

Heuristic 1: Object peer stereotypes.

According to the GOOS book an object’s peers are cohesive objects^[5] that can be loosely categorized into three types of relationship:

Dependencies: “services that the object needs from its environment so that it can fulfill its responsibilities. The object cannot function without these services. It should not be possible to create the object without them.”
Notifications: “objects that need to be kept up to date with the object’s activity. The object will notify interested peers whenever it changes state or performs a significant action. Notifications are “fire and forget” (the object neither knows nor cares which peers are listening).”
Adjustments: “objects that tweak or adapt the object’s behaviour to the needs of the system. This includes policy objects that make decisions on the object’s behalf (think Strategy Pattern), and component parts of the object if it’s a composite.”

These peer stereotypes should be considered as heuristics to help us think about our design, not as hard rules.

They help us define the unit’s boundaries because dependencies, notifications and adjustments should be outside the unit.

Next, we will introduce two more general heuristics that produce coarser-grained units and explain how these heuristics relate to one of the object peer stereotypes: dependencies.

Heuristic 2: FIRS-ness.

The FIRS properties (Fast, Isolated, Repeatable, Self-validating),^[6] provide an interesting guideline for delineating the boundaries of a unit.

According to this idea, any code that adheres to the FIRS properties (i.e., exhibits FIRS-ness) belongs within the unit, while any code that violates any of these properties is an awkward collaboration (a dependency that impairs testability), and should be pushed outside the unit.

We can push FIRS-violating code (i.e., awkward collaborations) outside the unit by applying the Dependency Inversion Principle (DIP). This allows us to control how the unit depends on the awkward dependencies, and enables the use of test doubles in our test to simulate the behaviour of the awkward dependencies, thereby avoiding the testability problems^[7] (i.e., FIRS violations) they introduce.

Heuristic 3: Detecting effects.

Another guideline to determine unit boundaries is inspired by the separation of effectful^[8] and non-effectful (pure) code in functional programming. From this perspective, unit boundaries emerge “wherever an effect needs to be performed”.

If state mutation isn’t considered an effect, the unit boundaries defined using the FIRS-ness concept and the isolating-non-effectful code guideline would mostly align. This isn’t surprising, as FIRS violations are usually caused by effects (except in the case of really slow computations).

So far, we’ve looked at three heuristics that lead us to unit testable designs.

We already commented that the unit boundaries defined using the FIRS-ness concept and the isolating-non-effectful code guideline would mostly align because FIRS violations are usually caused by effects.

Furthermore, we believe that the unit boundaries, derived from applying the FIRS-ness concept or detecting effects, closely align with those identified when using the “dependencies” peer stereotype. Remember that this stereotype is defined as “”services an object needs from its environment to fulfill its responsibilities”.

Additionally, these unit boundaries align with those in the classic style of TDD, where test doubles are primarily used as isolation tools to avoid effects or FIRS violations in tests.

Notice that we have focused on the “dependencies” peer stereotype as the heuristic that defines unit boundaries similar to those derived from applying the FIRS-ness and detecting effects heuristics. Later in this post, we will explore how the two other object peer stereotypes, adjustments and notifications, contribute to achieving finer-grained units and more maintainable tests.

FIRS-ness and detecting effects heuristics will delineate boundaries where our application is testable independently of its context, making them a useful starting point for defining the ports of our application. Where they fall short, though, is in expressing the ports’ interfaces in terms of the application, using only these two heuristics, we could end up with port interfaces that are not aligned with the application’s domain language^[9].

In contrast, using the dependencies peer stereotype provides an advantage over the previous two heuristics by leading to better-designed port interfaces. Recall that this stereotype is defined as “services an object needs from its environment to fulfill its responsibilities”. As a result, the port interfaces created using this approach align more closely with the ports & adapters pattern because they will reflect the language and concepts defined by the application itself.

Moreover, we can also apply the other two other object peer stereotypes, adjustments and notifications, to design finer-grained, context-independent units. This approach may result in unit boundaries that align with those produced by a generalization of the ports & adapters pattern (remember that this pattern only applies at the application boundaries). Alistair Cockburn recently referred to this generalization as the Component + Strategy pattern.

What heuristics do we usually apply to determine the boundaries of the unit under test?

All of them.

In the context of retrofitting tests into legacy code, we focus primarily on detecting *effects** and FIRS-ness violations to determine where to introduce seams, while keeping in mind that the interfaces coupled to the resulting tests are likely too low-level and not well-suited for the application, suffering from the Mimic Adapter antipattern ^[10].

Once the tests are in place, we will refactor toward more cohesive and higher-level interfaces, using the object peer stereotypes and the Component + Strategy pattern as guiding principles to improve modularity and design clarity.

When doing TDD, identifying the suitable unit boundaries and interfaces can be more challenging because we must infer them from the requirements. In this case, we use all three heuristics to determine the boundaries and interfaces, and produce a testable design.

We identify awkward collaborations by detecting effects or FIRS-ness violations in the specification. Additionally, we use object peer stereotypes, especially the dependencies peer stereotype, to define interfaces in terms of the unit we are test-driving.

What about the two other peer stereotypes: “notifications” and “adjustments or policies”?

So far we have only talked about using the dependencies peer stereotype to delimit the boundaries of the unit under test.

There are two other object peer stereotypes, notifications and adjustments, what about them?

In the following sections, we’ll see how these two other object peer stereotypes can further separate concerns and keep cohesion, leading to finer-grained units and more maintainable code and tests.

“Adjustments” peer stereotype.

When variants for some part of an object’s behaviour exist from the outset, or when a part of an object’s behavior begins to evolve at a different pace than the rest, there are various ways to adjust the code to accommodate these behavioral variations.

Some available options are the parametric option, the polymorphic option and the compositional option.

Not all options have the same benefits and liabilities. We think that the compositional option one is usually the best suited in object-oriented code.^[11].

If we choose to add these variations through composition, we need to encapsulate it first in a separate abstraction to maintain the object’s cohesion. This new abstraction would be an adjustment. In this way adjustments can be used to tweak or adapt the object’s behaviour to the needs of the system by using composition.

To achieve this separation, we apply the dependency inversion principle, to ensure the object depends on the new abstraction rather than a specific implementation of it. We then use dependency injection to decide which concrete implementation of the adjustment we want to use.

The resulting object’s code will be protected from changes in the concrete adjustments^[12].

Furthermore, separating adjustments from the object not only results in better design but also enables us to write more maintainable tests.

On one hand, we can write tests focused solely on the object’s core behavior by using test doubles to simulate any adjustments. This approach ensures the object is tested independently of concrete implementations of its adjustments.

Then, we can write separate tests that check the behaviour of each concrete variant of the adjustment.

This testing approach makes the object’s tests more focused and maintainable. By decoupling them from specific adjustments, we ensure they remain unaffected by changes in or additions of new adjustment implementations.

“Notifications” peer stereotype.

Sometimes there are secondary behaviours associated with an object’s state changes or significant actions. Adding these secondary behaviors directly to the object violates the single responsibility principle (SRP) and introduces temporal coupling^[13] between the object’s core behavior and its associated secondary behaviours.

To maintain SRP, the associated secondary behaviors can be encapsulated in separate collaborators.

	public class ProcessOrder {
	private EmailService emailService;
	private InventoryService inventoryService;
	private LogisticsService logisticsService;
	private AccountingService accountingService;

	// ... code omitted for brevity

	public void execute() {
	// business logic to process an order (code omitted for brevity)
	// ...

	// secondary behaviours associated to an order that has been processed.
	emailService.sendOrderConfirmation(orderDto);
	inventoryService.updateStock(orderDto);
	logisticsService.scheduleDelivery(orderDto);
	accountingService.recordTransaction(orderDto);
	}
	}

view raw ProcessOrderSuperCoupled.java hosted with ❤ by GitHub

However, now the object becomes tightly coupled to all those collaborators. We haven’t removed the temporal coupling.

This tightly coupled design introduces significant difficulties in development, maintenance, and testing. Each time a new secondary behavior is added, the object and its tests must be modified, increasing the risk of bugs and making the codebase more expensive to maintain.

Furthermore, this kind of design often results in brittle tests, which break frequently as the system evolves. This brittleness is commonly blamed on test doubles, instead of recognizing the underlying design flaws^[14]. We should instead “listen to our tests”^[15] and improve the design flaws using notifications.

Notifications act as a decoupling mechanism, preventing temporal coupling between the object’s behavior and the secondary behaviors encapsulated in its collaborators.

Through notifications the object merely signals interested peers (if any) whenever it changes state or performs a significant action.

These notifications are “fire and forget” commands^[16], i.e., the object neither knows nor cares which peers might be listening. This ensures loose coupling between components, making the design much more flexible and adaptable to change.

	public class ProcessOrder {
	private EventPublisher eventPublisher;

	// ... code omitted for brevity

	public void execute() {
	// business logic to process an order (code omitted for brevity)
	// ...

	// notification of the event that an order has been processed
	eventPublisher.publish(createOrderProcessedEvent());
	}

	// ... code omitted for brevity
	}

view raw ProcessOrder.java hosted with ❤ by GitHub

Once notifications are in place, we should avoid checking the object’s behaviour along with all its associated secondary behaviours.

The reason is that including associated secondary behaviors in the object’s tests would require creating test doubles for both the peers of the object and the peers of the notified collaborators, resulting in complex test setups that are difficult to maintain. This difficulty increases as the number of secondary behaviors grows.

Instead, we can both simplify our tests and avoid the brittleness described earlier by taking advantage of notifications.

First, we write tests solely focused on verifying that the object’s behaviour correctly triggers the appropriate notifications. We use test doubles^[17] to simulate the notifications in these tests.

Next, we write separate tests to confirm that a notification triggers the desired secondary behaviors in its listeners. These tests are decoupled from the object that produces the notifications.

This approach makes the object’s tests more focused and maintainable because it decouples them from changes or additions to associated secondary behaviors.

Summary.

This post explores three different heuristics for defining unit boundaries that can be applied both when retrofitting tests in legacy code and when doing TDD: GOOS book’s object peer stereotypes, FIRS-ness and detecting effects.

We explore how these three different approaches often lead to similar unit boundaries, how they can complement one another. Boundaries identified through FIRS-ness or detecting effects often are very similar to the ones derived from the “dependencies” peer stereotype.

We also highlight that the advantage of the “dependencies” peer stereotype is its focus on what “the object needs”. This focus leads to interfaces expressed in the language of their client.

Additionally, we explain how these heuristics can aid in defining application boundaries, drawing a connection with the Ports & Adapters pattern. Again the “dependencies” peer stereotype’s emphasis on the object’s explicit needs results in better interfaces, helping prevent the mimic adapter antipattern.

Finally, we see how the adjustments and notifications peer stereotypes, both reduce coupling, improve cohesion, and result in more maintainable and focused test suites.

In future posts, we’ll discuss other techniques for detecting peers that are based on detecting test smells, leveraging domain knowledge, or design patterns.

Acknowledgements.

I’d like to thank Fran Reyes, Emmanuel Valverde, Fran Iglesias, Marabesi Matheus, Manu Tordesillas and Alfredo Casado for giving me feedback about several drafts of this post.

Finally, I’d also like to thank Petra Nesti for the photo.

References.

Notes.

[1] Any object that helps a given object to fulfil its responsibilities is called a collaborator. It seems that this etymology comes from Class-responsibility-collaboration cards originally proposed by Ward Cunningham and Kent Beck as a teaching tool in their paper A Laboratory For Teaching Object-Oriented Thinking.

In that paper, they write “the last dimension we use in characterizing object designs is the collaborators of an object. We name as collaborators objects which will send or be sent messages in the course of satisfying responsibilities.”

In our previous article, The class is not the unit in the London school style of TDD, we explained that according to the GOOS book the collaborators of an object belong to one of two categories: peers (real collaborators) and internals (implementation details).

[2] The OO style described in the GOOS book is influenced by Rebecca Wirfs-Brock’s Responsibility-driven design.

[3] From Mock roles, not objects by Steve Freeman, Nat Pryce, Tim Mackinnon and Joe Walnes.

[4] This post originated from a response to a comment on the post The class is not the unit in the London school style of TDD.

[5] Following the single responsibility principle, i.e., objects that are cohesive. See section Object Peer Stereotypes in chapter 6, Object-Oriented Style, of the GOOS book.

[6] In our post “Isolated” test means something very different to different people!, we explain what each of the FIRS properties means for us.

We also comment about the origin and history of the FIRST acronym, citing original sources.

Finally, we explain how we interpret isolated differently than the authors of the the FIRST acronym, our definition is more aligned with Beck’s interpretation of isolated.

[7] Isolated violations can also be avoided by using fixtures, but this may lead to slower tests that violate Fast.

With the advent of technologies such as Testcontainers, tests that were traditionally classified as integration tests can now adhere to the FIRS properties. Using Testcontainers helps maintain isolation, ensuring the tests run independently and are repeatable regardless of the developer’s local setup.

[8] Effectful code performs effects, but what are effects? Let’s try to informally explain it.

Any code that uses any input that isn’t in its argument list (or is injected through its constructor in the case of objects), or does anything that isn’t part of its return value is considered effectful, and those hidden inputs and outputs are effects.

Most people call the hidden inputs and outputs side-effects, but some people use the term side-effect only for the hidden outputs, and the term side-causes for the hidden inputs (like Kris Jenkins in his post What Is Functional Programming?) to highlight their different nature.

According to that distinction, a side-effect is something a program does to its environment, and a side-cause is something a program requires from its environment.

Effectful code is much harder to test and understand than pure code.

[9] According to Alistair Cockburn, port interfaces should be expressed in the language of the application:

“Every interaction between the app and the outside world happens at a port interface, using the interface language the app itself defines” (in page 12 of the preview edition of his book Hexagonal Architecture Explained).

We observe that the interfaces that arise from looking for FIRS violations or detecting effects tend to have more risk of presenting a too low level of abstraction, falling in the mimic adapter antipattern. The object peer stereotypes help in alleviating that risk.

[10] Since dependency-breaking techniques carry some risk as they are applied without tests, we try to reduce this risk by introducing very thin layers that isolate the minimum possible amount of code to get isolation. Therefore, they will most likely be at a lower level of abstraction than what our application requires and will not align with the terms of our application. They are mimic adapters which, in this context, is not an antipattern, it’s exactly what we need to reduce risks and introduce tests.

However, if, once we have tests in place, we don’t refactor these low level interfaces, we may face excessive coupling problems in our tests (see our post An example of wrong port design detection and refinement).

[11] In the chapter Deriving Strategy Pattern of his book Flexible, Reliable Software Using Patterns and Agile Development, Henrik Bærbak Christensen describes and analyses in depth four options, which he calls proposals, to accommodate an object-oriented design to this kind of behavioral variations: source tree copy proposal, parametric proposal, polymorphic proposal and compositional proposal.

The compositional proposal has many interesting benefits and a few liabilities. Read the whole analysis there.

[12] This has the interesting property of changing behavior by adding new production code instead of modifying existing one. This property is characterized as Change by Addition (in Henrik Bærbak Christensen book), Open-Closed Principle, or Protected Variations.

I prefer the last description.

[13] “When two actions are bundled together into one module just because they happen to occur at the same time.” See the entry about Coupling in Wikipedia.

[14] Another example of blaming a tool or technique rather than our design or how we use the tool or technique.

[15] By interpreting difficulties in testing as feedback signaling that our design might need improvement.

Have a look at this interesting series of posts about listening to the tests by Steve Freeman. It’s a raw version of the content that you’ll find in chapter 20, Listening to the tests, of their book.

In fact, according to Nat Pryce mocks were designed as a feedback tool for designing OO code following the ‘Tell, Don’t Ask’ principle. You can read more about this in this conversation in the Growing Object-Oriented Software Google Group.

The feedback usually comes in the form of “pain” 😅.

[16] See Command Query Separation.

[17] Since notifications are commands, we may use either mocks, fakes or spies.

Volver a posts