The Clean Code Fundamental Series - Episode V

Disclaimer All credits for this series go to Robert C Martin, who gave me permission to write the content.

Coding Standard

We are firm believers in coding standards. Each team should have one and each member of the team should follow it. However we don’t believe the coding standards should be written in a separate document. The coding standard should be clearly visible inside the code itself. The code should be the coding standard.

Comments

Often coding standards defined in a separate document state that we should comment every variable, function, class and every block of code. This is far from desirable. When there are that many comments in our code we tend to ignore them, even the ones we shouldn’t.

Comments should be rare

When programmers are forced to write comments, they tend to write them because they have to, not because they need to. When comments are common, they become like the boy who cried wolf.

Comments should be rare. They should be coloured bright red so that they stand out and every programmers who reads them should be grateful that those comments are there.

In “The elements of programming style”, by Kernighan and Plauger, 1978, the authors already state that the only comments useful for a computer program are in the code itself. When the program is in error, artistic flowcharts or lengthy comments on what the code does are of no use. Additionally, there is a tendency of discrepancy between the code and the comments.

Comments are failures

It should be the intent of every programmer to write their code so well that it doesn’t need comments. If we adopt that goal, then every comment we write should be seen as a failure.

Languages like assembly are not expressive, so comments in those languages are absolutely essential. The same could be considered for Pascal, Fortran and sometimes C as these are syntactically challenges languages. This might also happen in resource constrained environments, like in the ’90s, where being expressive might be challenging. The resource constraints dominate the structure of the code.

However modern languages like Java, C# and Ruby are remarkably expressive. Our processors are so fast and memory so cheap that any minor inefficiencies derived from making our code more expressive can almost always be ignored. Nowadays there’s no excuse for not making your code as expressive as possible.

Does it mean we never write comments? No, we sometimes do. But we shouldn’t congratulate ourselves for doing it. We should see every comment we write as a failure in expressing our intent.

Comments are lies

Comments tend to be clutter and lies. It’s very hard for comments to remain truthful for any length of time. For example if we change a line of code somewhere in a function to fix a bug, how do we know that the comment we just wrote hasn’t invalidated another comment somewhere else in the code, perhaps at the class level? Comments eventually rot and they become lies that do more harm than good.

Good comments

Not all comments are bad. Sometimes comments can be very useful. Sometimes they are required. Let’s look at some examples:

Legal comments. When required, they need to be there. So put them at the top of the file.
Informative comments. For example a comment that explains a horrible regular expression.
Clarification and explanations of intent. When we fail in expressing our code intent well, we should write a comment explaining what the code intent is and then do appropriate penance.
Warning of consequences. For example if there’s a test function that when running will take significant amount of time it’s useful to warn fellow team members of the consequences of running that function.
Public API documentation. Nothing could be quite as useful as a well documented public API.

Bad comments

Mumbling. Don’t talk to yourself in the comments of your code.
Redundant explanations. Please when you write a comment make sure it adds something new. Comments added to getters and setters, variable names that simply state what the declarations are are meaningless, they add clutter and they make the code more difficult to understand. Don’t say what the algorithm does if this is perfectly clear from the code.
Mandated redundancy. For example having Javadocs on a method that simply state what’s clearly obvious from the method signature. If you have tools like Checkstyle that mandate these comments, turn them off. When we see a comment that is either wrong or misleading we should either fix it or delete it.
Journal comments. These are comments that list the dates of every change to the code. Today we have source code management repositories, like Git. Let’s use them and their logs if we want the change history. Some IDEs also offer local history.
Noise comments. Like a “Default constructor” comment. I know the language, thank you. I know what a default constructor looks like.
Position markers and big banner comments. Big banner comments are a great way to make sure that the words inside it are never read. That’s because big banner comments drive the attention to things that we would not rather pay attention to. For example a big banner comment that says: “Here are the default constructors. “.
Closing brace comments. There was a time in the ’80s when this was useful. However today’s IDEs offer colour matching for brace comments and make sure we close our braces and parenthesis appropriately.
Attributions. For example /* Added by Rick */. Source Code Management systems can offer this information.
HTML in comments. For example to make some text bold or with a certain presentation style. This makes comments unreadable.
Non local information. For example a comment which states which port a certain server will run on. Comments about parts of the system that are far away from the code will rot quickly. If we have to write a comment we must make sure it sits right next to the code it describes.
Commented out code. This is the worse of all comments. When you see commented out code, you must delete it. If the reason for having it there was to keep memory of what the code once was, there is the Source Code Management system for that.

Explanatory structures

Rather than writing comments to try and explain what your code does, make your code read like well written prose. Choose the correct parts of speech for your names and compose readable sentences in your code.

File Size

How big should a source file be? Project size and file size aren’t related to each other. Big projects don’t imply big files. By using TDD and Clean Code it’s possible to have big projects with small file sizes. The ideal sizes are somewhat in these ranges:

The average file size should be 50-60 lines of code
Some files could be around 200 lines of code
The largest files should be around 500 lines of code and they should be outliers.

Vertical Formatting

How long should a line of code be? There is a simple rule for this: we should never have to scroll to see the entire line. A useful set of conclusions about this:

We like lines that are about 30-40 characters wide
We really don’t like lines that are longer than 80 characters
A reasonable rule seems to suggest that lines should be less than 100-120 characters.

Indentation

The code that comes out of a team should look like the team wrote it. Whatever indentation styles have been agreed by the team, every team member should follow them. Whether it’s tabs or spaces, how many, where to write braces, etc. There are no hard and fast rules. The team needs to sit down and decide together an indentation style and members of the team need to commit to follow it.

Please avoid reformatting code to your personal style every time you check the code out and then reformat the code to the team’s style when you’re checking back in. Also please avoid reformatting to run automatically on each check in.

It’s OK to use an IDE to reformat the code to the team’s agreed style. In that case it’s best if the IDE has a configuration file that every member of the team should use. However if you do this, make sure to reformat the code in small snippets. Don’t reformat the entire file because that creates chaos with the Source Code Management system and makes merges a nightmare.

Classes

What is a Class? We write a Class by declaring private variables and then manipulate them with public functions. So from the outside in, a Class appears to have no variables at all. Since an Object is an instance of a class, then it’s also true that from the outside in an object appears to have no variables, or in other words, from the outside in, an Object appears to have no observable state. One might object to this by stating that a Class might have getters and setters for its private variables. This is bad design. In the end, why would you define the state of an Object as private variables if you were to expose them through getters and setters?

Think about what we discussed in the previous episode, tell don’t ask, and how this is useful in functional programming. In the end if you don’t expose variables in an Object, while it’s very easy to tell the object what to do, it doesn’t a lot of sense to ask it for anything.

Think it this way: since methods manipulate variables in the class, the more variables a method manipulates, the more cohesive that method is. The more cohesive methods a Class has, the more cohesive the Class is. Getters and Setters are not cohesive because they only manipulate one variable. Therefore, the more getters and setters a Class has, the less cohesive it is.

Does that mean that a Class should never have getters and setters? No, as dogmatic rules rarely apply in software engineering. However we should try to minimise them because we want to try to maximise cohesion. In those instances when we decide to have a getter, we want not to expose the variable as is, rather to abstract the information that is returned.

Imagine the following example. Let’s say we have a Class called Car which has a private variable called gallonsOfGas and that we wanted to expose that information to the outside world:

Now let’s assume that we had derivatives of that class, say DieselCar and ElectricCar:

That is just plain wrong. The base class exposes an awful lot of implementation detail that gets propagated down the inheritance structure and it’s not really correct to express what the derivatives do. This is where the abstraction comes into play. For example, if we were to rename the base method as getPercentFuel() that it could be implemented in any derivatives.

The less implementation we expose, the more opportunity we have to make polymorphic classes. Polymorphism allows us to protect clients of our classes from changes in the implementation of such classes.

Data Structures

A data structure is kind of the opposite of a class. It has a bunch of data variables, but it has no methods.

It might go a bit too far to say that Data Structures have no methods. They can have methods, but these typically are getters, setters and navigation aids. The methods of a Data Structure manipulate individual variables. The methods of a data structure expose implementation: they don’t hide it and they don’t abstract it. You can’t tell a Data Structure to do anything. All you can do is ask a question. The software that manipulates Data Structures is the antithesis of tell, don’t ask.

Consider the following code as an example:

public class Employee {
  public enum Type {HOURLY, SALARIED};
  
  public Type type;
  public String firstName;
  public String lastName;
  public Calendar dob;
  public String ssn;
}

class Utilities {
  void print(Employee e) {
    switch (e.type) {
      case HOURLY:
        printHourlyEmployee(e);
        break;
      case SALARIED: 
        printSalariedEmployee(e);
        break;
    }
  }
}

So Data Structures and switch statements are related the same way classes and polymorphism are related. When we see a switch statement we can pretty much sure there is a Data Structure lurking behind it. In the previous episode, we said we don’t like switch statements: while it’s true they are not object-oriented, when it comes to Data Structures they offer some kind of protection but it’s a different kind of protection. When it comes to classes and objects, polymorphism protect clients of those classes from new objects. However if we add a method to a base class, all clients of that class and all the children of that class must be recompiled and redeployed, breaking independent deployability.

Data Structures are immune to the addition of new functions. When we add a new function all we have to do is add a new switch. Nothing else needs to change or recompiled.

Notice how these two patterns are opposite to each other:

Classes protect us from new types but expose us to new methods
Data Structures protect us from new methods but expose us to new types

That begs an interesting question: is there a way to get protection from both, i.e. new methods and new types?

This is called the Expression Problem and there are good solutions to it. We will be exploring some of these solutions in a future episode on Design Patterns.

For now, the key to independent deployability is to know which form to use and when:

We use classes and objects when it’s types are the most likely to be added
We use data structures and switch statements when it’s methods that are more likely to be added

Boundaries

n the last episode we talked about how to separate main from. the rest of the application.

We also talked about how all source code dependencies should cross that boundary going from main towards the application. This makes main a plugin to the rest of the application. This is a specific case of a much more general rule. The separation of main and application boundaries is just one example: other examples are the boundaries that separate views from models and the boundary that separates the database from domain objects.

For each boundary, one side is concrete, the other is abstract. Whenever we have boundaries like these we want all source code dependencies pointing away from the concrete partition, towards the abstract partition. For example, consider the boundaries between the database and the domain objects:

The database is concrete, the domain objects are abstract (an abstract representation of the data in the database). We want all source code dependencies to point from the database to the domain objects. In other words, the database depends on the domain objects; the domain objects do not depend on the database.

For years software engineering experts have told us that we should write applications that do not depend on the database. This is very good advice.

The last thing we want to see is a bunch of SQL statements scattered through our application code. We especially don’t want to see SQL code in our views!

The database interface layer will be depending on the database. What’s not always clear though is that the application should not depend on DB interface layer. The application is abstract and the interface layer is concrete, therefore the DB interface layer should depend on the application layer. This might seem absurd at first but let’s remember that object-oriented programming allows us to invert the source code dependency without inverting the flow of control. This means that the application can still call the DB interface layer without knowing that the layer actually exists. If the application doesn’t depend on the interface layer it means it can’t have knowledge of the database at all. It doesn’t know about the table names, the column names or any other database specific information.

Consider this: a database table is a data structure. It has data and no methods. The database tables are so concrete they don’t have a chance to be polymorphic.

The Impedance Mismatch

This is an old topic of the impedance mismatch between databases and object-oriented programming. The impedance is in that a database row is not an object. It’s the opposite of an object: it’s a data structure and we can’t force a data structure to be an object. So what about object-relational mappers, like Hibernate? While they’re useful and serve a purpose they are not truly object-relational mappers. What these tools really are are relational table to data structure mappers. Hibernate for example is a good tool for taking a data structure in the database and moving it to a data structure in memory. What we like on the application side of the boundary are domain objects. The methods of these domain objects will be business rules. It runs out that when one has a bunch of business rules in domain objects, not surprisingly we end up with classes that don’t look like the database tables or the database schema.

On the application side of the boundary we can separate ourselves from the enterprise database schemas by designing the objects we actually want to use. These will be true objects with exposed methods and hidden data. This will make the application much more natural and easy to understand. Rather than manipulating table rows, we’ll be manipulating business objects.

On the Application side we’d like to see a series of business objects and an interface that allows the DB interface plugin to be plugged in. On the DB Interface layer, in addition to the interface implementation, we’d like to see a series of data structures representing the data retrieved from the database.

We would like those business objects to use those interfaces to access the data they need. On the other side of the boundary, we’d like to see a number of classes implementing those interfaces and implement data access methods by interrogating data structures that have been fetched from the database. So the DB interface layer depends on the application using a bunch of inheritance relationships.

The same can be applied to Views and Application. Views are concrete and the Application is abstract. This means that the View partition source code dependency must point towards the application.

The views should know about the application. The application should not know anything about the views. This rule about boundaries is fundamental to good object-oriented design and it’s fundamental to good software design. Source code dependencies should point towards the abstractions and away from concrete implementations. This approach is called the dependency inversion principle, which we will delve into quite a bit in future episodes.

The Clean Code Fundamental Series – Episode V – Form

Table of Contents