mobile menu icon

On code smells catalogues and taxonomies

Published by Manuel Rivero on 05/11/2022

Learning, Refactoring, Code Smells, Object-Oriented Design


This post is an English translation of the original post: De taxonomías y catálogos de code smells.

Index.

  1. Introduction: code smells and refactoring.
  2. Martin Fowler: code smells and catalogues.
  3. Taxonomies.
    1. Wake’s taxonomy 2003.
    2. Mäntylä et al taxonomy 2003.
    3. Mäntylä et al taxonomy 2006.
    4. Jerzyk et al taxonomy 2022.
      1. Jerzyk’s extended code smells catalogue.
      2. Online catalogue.
  4. Conclusions.
  5. Acknowledgements.
  6. References.

Introduction: code smells and refactoring.

Refactoring is a practice that allows us to sustainably evolve code. To refactor problematic code, we first need to be able to recognize it.

Code smells are descriptions of signals or symptoms that warn us of possible design problems. It’s crucial to detect these problems and remove them as soon as possible.

Refactoring is cheaper and produces better results when done regularly. The longer a problematic piece of code lives, the more it takes to refactor it, and the biggest friction it will add to evolve new features directly contributing to accruing technical debt. Sustaining this situation over a long period of time may have a very negative economic impact. In the worst case scenario, the code might get too complicated to keep on maintaining it.

Cost of introducing a feature over time
Cost of introducing a feature over time (image from The Three Values of Software from J. B. Rainsberger).

Another, sometimes not so visible, consequence of bad software quality is its effects on developers. Less refactoring, can lead to a less maintainable code that can lead to longer times to develop new features, which, in turn, means more time pressure to deliver, which can lead us to test less, which will allow us to refactor less… This vicious circle might have a very demoralising effect.

Less refactoring vicious circle
Less refactoring vicious circle.

Understanding and being able to identify code smells gives us the opportunity of detecting design problems when they are still small and located in specific areas of our code, and, as such, are still easy to fix. This might have a very positive economic and emotional effect.

The problem is that code smells are not usually very well understood. This problem is understandable because code smells descriptions are sometimes abstract, diffuse and open to interpretation. Some code smells seem obvious, others not so much, and some might mask other smells.

Besides, we should remember that smells are only symptoms of possible design problems, not guarantees of the existence of problems. To further complicate matters, in addition to possible false positives, there are degrees to the problem that each smell represents, trade-offs between different smells and contraindications in their refactorings (sometimes “the cure is worse than the disease”).

Therefore, recognizing design problems through code smells is a subtle skill that requires experience and judgement. Acquiring this skill, which sometimes may feel like a kind of spider-sense, can take some time.

In our experience teaching and coaching teams, not being able to identify code smells is one of the greatest barriers to refactoring. Many developers do not detect design problems while they are still small and localised. What we often observe is that they don’t sense the design problems until the problems are quite large and/or have compounded with other problems, spreading through their codebase.

Code smells catalogues.

In 1999 Fowler and Beck published a code smells catalogue in a chapter of the book, Refactoring: Improving the Design of Existing Code. This catalogue contains the descriptions of 22 code smells.

In 2018 Fowler published a second edition of his book. In this new edition there are a number of changes with respect to the first, mainly in the code smells catalogue of smells and the refactorings catalogue (detailed in Changes for the 2nd Edition of Refactoring).

If we focus on the smells, which is the goal of this post, the changes are as follows:

There are 24 code smells in total. Even though they usually have catchy and memorable names, it’s difficult to remember such a long list.

How could we better understand code smells? How could we remember them more easily?

In this post, and some later ones, we will talk about strategies to better understand and remember code smells.

Organisational strategies.

Organising and manipulating information, seeing it from different points of view, can help us understand and remember it better. Taxonomies are a type of organisational strategy that allows us to group study material according to its meaning. This helps to create significant groupings of information or “chunks” that will facilitate learning.

As we mentioned, the 2018 Fowler catalogue is a flat list that does not provide any type of classification. Although, reading the descriptions of the code smells, and the motivation sections of the different refactorings, we can glimpse that some smells are more related to each other than to other smells, these relationships are not expressed explicitly and remain blurred and dispersed in different parts of the book.

The use of taxonomies to classify similar code smells can be beneficial to better understand and remember code smells, and recognize the relationships that exist between them.

Taxonomies.

There have been different attempts to classify code smells according to different criteria. The most popular classification is that of Mäntylä et al 2006 but it is not the first one. Below we will show some taxonomies that we consider quite interesting.

Wake 2003.

Wake in his book, Refactoring Workbook, describes 9 new code smells that did not appear in Fowler’s original catalogue: Dead Code, Null Check, Special Case, Magic Number, Combinatorial Explosion, Complicated Boolean Expression, and three related to bad names: Type Embedded in Name, Uncommunicative Names and Inconsistent Names.

Wake explicitly classifies code smells by first dividing them into two broad categories, Smells within Classes and Smells Between Classes, depending, respectively, on whether the code smell can be observed from within a class or whether a broader context needs to be considered (various classes). Each of these categories is divided into subcategories that group code smells based on where they can be detected. This classification criterion, later called “occurrence” by Jerzyk, answers the question: “where does this code smell appear?”.

Following this “occurrence” criterion Wake finds 10 subcategories.

Within the category of Smells within Classes we find the following subcategories:

The subcategories within the Smells between Classes category are:

Wake’s taxonomy 2003
Wake’s taxonomy (new code smells appear in green).

Wake presents each smell in a standard format with the following sections: Smell (the name and aliases), Symptoms (clues that may help detect it), Causes (notes on how it may have generated), What To Do (possible refactorings), Benefits (how the code will improve by removing it) and Contraindications (false positives and trade-offs). In some cases, he adds notes relating the code smell to design principles that might have helped to avoid it.

The book also contains many practical exercises and very useful tables (symptoms vs. code smells, smells vs. refactorings, reverse refactorings, etc.) and exercises that relate code smells to other concepts such as design principles or design patterns.

We highly recommend reading this book to delve into the discipline of refactoring and better understand when and why to apply the refactorings that appear in Fowler’s catalogue.

Although we also refer to to the rest of the taxonomies that we talk about in this post, the content about code smells in our course Code Smells & Refactoring is mainly based on Wake’s taxonomy seasoned with a bit of our own experience.

Mäntylä et al 2003.

In this taxonomy, code smells are grouped according to the effect they have on code (the type of problem, what they make difficult or the practices or principles that they break). This classification criteria is called “obstruction” by Jerzyk 2022.

The original classification from 2003 (A Taxonomy and an Initial Empirical Study of Bad Smells in Code) is comprised of 7 code smells categories: Bloaters, Object -Orientation Abusers, Change Preventers, Dispensables, Encapsulators, Couplers and Others.

Mäntylä et al taxonomy 2003
Mäntylä et al taxonomy 2003.

This is how they define each of the categories:

In their paper, Mäntylä et al discuss the reasons why each code smell is in a given category and not another, although they admit that some of them could be classified in more than one category.

Mäntylä et al 2006.

In 2006 Mäntylä et al published another paper (Subjective evaluation of software evolvability using code smells: An empirical study) in which they revised their original classification from 2003.

In this new version, they eliminated the Encapsulators (moving the Message Chains and Middle Man smells to the Couplers category) and Others (Comments and Incomplete Library Class disappear from the taxonomy) categories, and moved the Parallel Inheritance Hierarchies code smell from the Object-Orientation Abusers category to the Change Preventers category.

This version of their taxonomy is the one that has become most popular on the internet (it can be found on many websites, courses and posts), probably due to the greater accessibility (readability) of the summary of the paper that appears on the web: A Taxonomy for “Bad Code Smells”.

Mäntylä et al taxonomy 2006
Mäntylä et al taxonomy 2006.

What is interesting is not so much the discussion of which category each smell should fall into, but rather to start thinking that a given smell can have different types of effects on the code and the relationships between these effects. In fact, subsequent classifications from the point of view of the effect of a smell on the code (“obstruction”) consider it more useful not to lose the information that a code smell can produce several effects. This leads to no longer considering categories as exclusive, which makes it possible that the same smell can fall into several categories.

Jerzyk et al 2022.

In 2022 Marcel Jerzyk published his master’s thesis, Code Smells: A Comprehensive Online Catalog and Taxonomy and a paper with the same title. His research on code smells had three goals:

  1. Providing a public catalogue that could be useful as a unified knowledge base for both researchers and developers.

  2. Identifying all the possible concepts that are being characterised as code smells and determining possible controversies about them.

  3. Assigning appropriate properties to code smells in order to characterise them.

To achieve these objectives, they carried out a review of the existing literature on code smells up to that moment, with special emphasis on code smell taxonomies.

In the thesis 56 code smells are identified and described, of which 16 are new original proposals (remember that Wake described 31 code smells and Fowler 24 in his last review).

Descriptions and discussions of each of these 56 code smells can be found in Jerzyk’s master’s thesis.

Analysing the classification criteria of the taxonomies proposed up to that moment, Jerzyk finds three significant criteria to categorise code smells:

  1. Obstruction: this is the criterion used by Mäntylä et al to classify smells in their taxonomy, it’s also the most popular criterion. It informs us about the type of problem that a code smell causes in the code (what they make difficult or the practices or principles that they break). In the thesis they update the Mäntylä taxonomy, adding three new groups: Data Dealers, Functional Abusers and Lexical Abusers. Below we present a mental map that shows the classification of the 56 code smells according to only this criterion.
Jerzyk taxonomy using only obstruction criterion
Jerzyk taxonomy according to only the obstruction criterion.
  1. Expanse: inspired by Wake’s taxonomy, this criterion talks about whether the code smell can be observed in a reduced context (within a class) or if a broader context (between several classes) needs to be considered. Possible categories are Within Class and Between Classes.

  2. Occurrence: Also inspired by Wake’s taxonomy, this criterion is related to the location where (or the method by which) a code smell can be detected. Possible categories are Names, Conditional Logic, Message Calls, Unnecessary Complexity, Responsibility, Interfaces, Data, Duplication and Measured Smells.

The following table shows the 56 code smells in Jerzyk’s thesis classified according to the three criteria discussed above:

Code Smell Obstruction Expanse Occurrence
Long Method Bloaters Within Measured Smells
Large Class Bloaters Within Measured Smells
Long Parameter List Bloaters Within Measured Smells
Primitive Obsession Bloaters Between Data
Data Clumps Bloaters Between Data
Null Check Bloaters Between Conditional Logic
Oddball Solution Bloaters Between Duplication
Required Setup/Teardown Bloaters Between Responsibility
Combinatorial Explosion Bloaters Within Responsibility
Parallel Inheritance Hierarchies Change Preventers Between Responsibility
Divergent Change Change Preventers Between Responsibility
Shotgun Surgery Change Preventers Between Responsibility
Flag Argument Change Preventers Within Conditional Logic
Callback Hell Change Preventers Within Conditional Logic
Dubious Abstraction Change Preventers Within Responsibility
Special Case Change Preventers Within Conditional Logic
Feature Envy Couplers Between Responsibility
Type Embedded In Name Couplers Within Names
Indecent Exposure Couplers Within Data
Fate over Action Couplers Between Responsibility
Afraid to Fail Couplers Within Responsibility
Binary Operator in Name Couplers Within Names
Tramp Data Data Dealers Between Data
Hidden Dependencies Data Dealers Between Data
Global Data Data Dealers Between Data
Message Chain Data Dealers Between Message Calls
Middle Man Data Dealers Between Message Calls
Insider Trading Data Dealers Between Responsibility
Lazy Element Dispensables Between Unnecessary Complexity
Speculative Generality Dispensables Within Unnecessary Complexity
Dead Code Dispensables Within Unnecessary Complexity
Duplicate Code Dispensables Within Duplication
“What” Comments Dispensables Within Unnecessary Complexity
Mutable Data Functional Abusers Between Data
Imperative Loops Functional Abusers Within Unnecessary Complexity
Side Effects Functional Abusers Within Responsibility
Uncommunicative Name Lexical Abusers Within Names
Magic Number Lexical Abusers Within Names
Inconsistent Names Lexical Abusers Within Names
Boolean Blindness Lexical Abusers Within Names
Fallacious Comment Lexical Abusers Within Names
Fallacious Method Name Lexical Abusers Within Names
Complicated Boolean Expressions Obfuscators Within Conditional Logic
Obscured Intent Obfuscators Between Unnecessary Complexity
Vertical Separation Obfuscators Within Measured Smells
Complicated Regex Expression Obfuscators Within Names
Inconsistent Style Obfuscators Between Unnecessary Complexity
Status Variable Obfuscators Within Unnecessary Complexity
Clever Code Obfuscators Within Unnecessary Complexity
Temporary Fields O-O Abusers Within Data
Conditional Complexity O-O Abusers Within Conditional Logic
Refused Bequest O-O Abusers Between Interfaces
Alternative Classes with Different Interfaces O-O Abusers Between Duplication
Inappropriate Static O-O Abusers Between Interfaces
Base Class Depends on Subclass O-O Abusers Between Interfaces
Incomplete Library Class Other Between Interfaces


Some of the names in the table are different from those that usually appear in the literature. The changes in naming were due to the introduction of more updated names, as is the case for, for instance, Lazy Element or Insider Trading which were previously called Lazy Class and Inappropriate Intimacy, respectively.

Several smells are new:

One super useful and practical thing for developers in Jerzyk’s work is an online code smells catalogue that, when published, included the 56 code smells in the thesis. This catalogue is an accessible and searchable website. On the date of publication of this post, the catalogue contains 56 code smells.

In the catalogue code smells can be filtered by different classification criteria.

Example of filtering code smells in Jerzyk’s online catalogue
Example of filtering code smells in Jerzyk’s online catalogue.

For example, the screenshot above shows the result of filtering code smells that are OO Abusers and affect Interfaces which includes the Refused Bequest, Base Class depends on Subclass, and Inappropriate Static code smells.

Code smell in Jerzyk’s online catalogue
Code smell in Jerzyk’s online catalogue.

For each code smell, the catalogue presents the following sections: Smell (discussion of the smell), Causation (possible causes of the code smell), Problems (problems that the smell can cause or design principles violates), Example (examples of minimal code that illustrate the possible symptoms of a code smell and show a possible solution), Refactoring (possible refactorings) and Sources (articles or books that have described this code smell). It also includes a box containing information about possible aliases of the code smell, the category to which it belongs according to the criteria of obstruction, occurrence and expanse, the related smells and its relationship to them, and the historical origin of the code smell.

Conclusions.

Since the code smell concept was coined in 1999, many new smells have appeared. Catalogues as flat lists of code smells catalogue are hard to remember, and don’t help to highlight the relationships that exist between different code smells.

We have presented several code smells taxonomies that can help us see code smells from different points of view and relate them to each other according to different criteria: the problems they cause in the code, where they can be detected or the context that needs to be taken into account to detect them.

These meaningful groupings of code smells will help us understand and remember them better than flat catalogue lists.

Finally we’d like to highlight the recent work by Marcel Jerzyk which has not only proposed new smells and created a new multi-criteria taxonomy, but also has made available to us an online code smells catalogue in the form of an open-source repository and accessible and searchable website, which we believe can be very useful. useful and practical for both researchers and developers. I encourage you to take a look.

Acknowledgements.

I would like to thank my colleagues Fran Reyes, Antonio de La Torre, Miguel Viera and Alfredo Casado for reading the final drafts of this post and giving me feedback.

I’d also like to thank nikita for her photo.

References.

Books.

Papers.


Photo by nikita in Pexels.

Volver a posts