Data clumps, primitive obsession and hidden tuples

Connascence, Refactoring, Code Smells, Object-Oriented Design


During the writing of one of our posts about connascence some of us were discussing whether we could consider a data clump a form of Connascence of Meaning (CoM) or not. In the end, we agreed that data clumps are indeed a form of CoM and that introducing a class for the missing abstraction reduces their connascence to Connascence of Type (CoT).

I had wondered in the past why we use a similar refactoring to eliminate both primitive obsession and data clump smells. Thinking about them from the point of view of connascence has helped me a lot to understand why. It also gave me an alternative path to get to the same conclusion, in which a data clump gets basically reduced to an implicit form of primitive obsession. The reasoning is as follows:

The concept of primitive obsession might be extended to consider the collections that a given language offers as primitives. In such cases, encapsulating the collection reifies a new concept that might attract code that didn't have where to "live" and thus was scattered all over. From the point of view of connascence, primitive obsession is a form of CoM that we transform into CoT by introducing a new type and then we might find Connascence of Algorithm (CoA) that we'd remove by moving the offending code inside the new type. The composing elements of a data clump only make sense when they go together. This means that they're conceptually (but implicitly) grouped. In this sense a data clump could be seen as a "hidden or implicit tuple".

Having this "hidden or implicit tuple" in mind is now easier to see how closely related the data clump and primitive obsession smells are. In this sense, when we refactor a data clump what we do is removing the data clump by encapsulating a collection which is its "hidden or implicit tuple", inside a new class. Again, from the point of view of connascence, this encapsulation reduces CoM to CoT and might make evident some CoA that will make us move some behavior into the new class that becomes a value object.

This "hidden or implicit tuple" concept helped me to make more explicit the mental process that was leading me to end up doing very similar refactorings to remove both code smells. I think that CoM unifies both cases much more easily than relating the two smells. The fact that the collection (the grouping of the elements of a data clump) is implicit also makes it more difficult to recognize a data clump as CoM in the first place. That's why I think that a data clump is a more implicit example of CoM than primitive obsession, and, thus, we might consider its CoM to be stronger than the one in primitive obsession.

Originally published in Manuel Rivero's blog.