The Clean Code Fundamental Series - Episode IV

Disclaimer All credits for this series go to Robert C Martin, who gave me permission to write the content.

Arguments

How many arguments should a function have? In general, fewer is better. Function arguments are hard to read and hard to understand. So rather than using function arguments as a convenience, we should apply a significant amount of restraint and discipline when deciding which arguments should our function accept. We should treat each function argument as a liability, not as an asset. Zero argument is best, one is OK, two is kinda of OK, three should be an exceptional case. With three or more arguments it’s difficult to start understanding the order. Besides if three or more arguments are so cohesive, why aren’t they part of an object?

No Boolean arguments Ever

Generally when one passes a boolean into a function, they’ve just declared loudly to the world that they have written a function that does two things: it does one thing for the true case and it does one thing for the false case. This violates one of the key principles about functions: functions should do one thing, should do it well and should do it only. Instead we should have written two functions: one for the true case and one for the false case.

Understanding the meaning of a boolean function, unless the method and argument names make this perfectly clear, can lead to confusion and error.

Consider this code:

void doWakeResponse(context, request, true) {}

What does true mean?

Passing in two booleans is even worse: passing in two booleans means the function does four things.

Output arguments

No output arguments please! People just don’t expect data to come out of a function argument. When readers look at a function argument, they expect data to come in the function through the argument, not to come out of that argument. if you need data to come out of a function, pass it via the return value of the function, not through a function argument. Consider this function:

// Does 'returnText' go in? Out? Both?
private String toSimpleText(Parse table, StringBuffer returnText) {
  if (table.parts == null) {
    simpleTextOfLeave(table, returnText);
    simpleTextOfMore(table, returnText);
    return returnText.toString();
  }
  simpleTextOfParts(table, returnText);
  simpleTextOfMore(table, returnText);
  return returnText.toString();
}

What’s the role of returnText? Will it be modified in the function? Is it read-only? It’s difficult to say and therefore to understand.

The Null Defense

Passing a null into a function or writing a function that expect null to be passed in is almost as bad as passing a boolean into a function.

Null might in fact be worse than using a boolean. With booleans at least is clear that there are two values and therefore two behaviours. With null, while this is the case, it’s not clear. With null, there is a behaviour for the non-null value and a behaviour for the null value. What we should do is write two functions: one for the non-null case and one without that argument at all.

Besides this leads to defensive programming, which is considered a smell. It clutters the code with ifs and non-null and error checks. It is a symptom of a team allowing incorrect values to be passed in functions and that is not conducive of a high level of trust in our colleagues.

Defensive programming leads to offensive code. It means you don’t trust your team of your team’s unit tests.

Of course in public APIs we need to program defensively because who knows what people will be passing into our functions.

However for functions in the system and written by my team, a better policy is that a good defence comes from a good offence and a good offence is a good suite of tests.

The Stepdown Rule

When an author writes a newspaper or magazine article they follow a simple rule: important stuff goes at the top, details go to the bottom. There are two benefits to this: the first is that an editor who needs to fit the article without missing any of the essence of it, can simply take the stuff at the top (public API anyone?); the second is that readers who want to read the article can start at the top and go towards the bottom until they get bored and then stop.

So a good practice is to write all public methods at the top and private methods at the bottom. The more we go down on the listing the more details should become visible. So private functions should he higher than other private functions they invoke and so on. We don’t want any backwards references. We want all functions to be pointing down the listing.

As readers go down on the list they move from higher levels of abstraction to lower levels of abstractions.

Switches and Cases

As a general rule, we don’t like switch statements. They are not Object-Oriented. One of the big benefits of Object-Oriented is the ability to manage dependencies.

Imagine there are two modules: Module A and Module B and A invokes a function on B.

There are clearly two types of dependencies:

At runtime A must have access to B
At compile time A must also have some source code dependency on B

It’s this Source Code Dependency that we are interested in. What if one wants to deploy Module A and Module B separately? If Module A has a source code dependency on Module B, then the two cannot be deployed separately. In fact the two can’t even be compiled separately. Any changes to Module B will force a recompile and redeployment of Module A.

Object-Oriented allows us to do some useful. We can invert the source code dependency while keeping the runtime dependency, by introducing a polymorphic interface.

The Source Code Dependency of Module B now opposes the flow of control thus allowing us to deploy Module A and B separately. Module A uses the Interface, while Module B derives from the Interface. Module B now can plug in into Module A. In fact there now can be many Module Bs that can plug in into Module A and Module A’s source code has no knowledge of them at all. Moreover, changes to Module B do not require Module A to be recompiled and redeployed. That’s just one of the things that’s good about Object-Oriented: independent deployability.

A switch statement is the antithesis of independent deployability.

Each case in the switch statement is likely to have a dependency on an external module. This is known as the fan-out-problem. In a switch statement the source code dependencies point in the same direction as the flow of control and, as explained earlier, this leads to tightly coupled modules that do not allow for independent deployability. The switch statement depends on all downstream modules. If any of the modules changes, there is an impact on the switch statement and anything that depends on it, like the App that customers use.

When we see a switch statement we have two options:

Invert the dependencies with polymorphism
Move the switch statement to a place where it can’t do any harm.

It’s not that difficult to replace a switch statement with polymorphism. The trick is to take the argument of the switch, which is some kind of type code, and replace it with an abstract Base class, that has a method that corresponds to the operation being performed by the switch. Then each of the cases of the switch becomes a derived class that implements the method in the abstract class.

Notice that the Flow of control / runtime dependency remains the same but the Source Code Dependency now opposes it, with the modules depending on the Abstract Class, because the sub-types depend on the Abstract class. The only thing to figure out now is how and where to create those instances. Typically we’ll do that in some kind of factory and that brings up the interesting topic of main.

In every application we write, we should be able to draw a line between the low-level details and the core functionality of the application.

In the Application partition we normally have most of the code of the application. In the main partition we have low-level stuff, like factories, configuration data and the main program. The main partition should be kept small and sub-division should be limited.

The dependencies between these two partitions should point in one direction and one direction only: They should cross the line pointing towards the application. The main partition should depend on the application. This is a technique called dependency injection. The trick to dependency injection? Clearly define and maintain the partitioning. If using a framework, e.g. Spring, go ahead and use it, but only inject a few entry points from main into the rest of the application. Then let main do the rest of the work with factories and strategies.

Switch statements that live in the main partition usually do no harm, so long as all the cases point towards the main partition.

If your application is composed of a set of independently deployable plugins, as it should be, then all these plugins should know about the central core of the application, and the central core shouldn’t know anything about the plugins. All source code dependencies should point inwards, from the plugins towards the core of the application.

Independent deployability doesn’t mean that we want to deploy each module separately. In fact in most cases we’ll ship many modules into a single deployable. However independent deployability means also independent developabilityand that’s a goal we should all strive for.

Long chains of if/else statements present the same problems as the switch statement does. We can fix these issues by applying polymorphism the same way we did with the switch statement. This also helps with keeping functions small as discussed in one of the previous episodes.

Paradigms

Over the past 50 years we have seen three main revolutions in the software industry which have stuck:

Functional Programming paradigm
Structured Programming paradigm
Object-Oriented Programming paradigm

Functional Programming

This was the first of the paradigm that was invented. It traces back to 1957 with the advent of Lisp. Ironically it was the last of the paradigms to become popular.

So what is Functional Programming?

The Functional Programming paradigm tells us to write programs without any assignment statements. Instead of setting a bunch of values into variables, you pass those values as arguments into functions. Instead of looping through a set of variables, you recurse through a set of function arguments. The upside to this is that a function in functional programming is a true mathematical function. The value of a function only depends on its input arguments and not on any other state in the system. Every time one calls that function with the same inputs they get exactly the same output. There are no side-effects.

Side effects

When a function changes the value of a variable that outlives the scope of the function (e.g. an instance variable, like the elements variable below) then that function has a side effect. This characteristic makes programs difficult to understand and it’s a persistent source of errors.

public void push(int element) {
  if (size == capacity)
    throw new Overflow();
  elements[size++] = element;
}

Normally these kind of functions come in pairs: open/close, new/delete. They need to be called in order: this is known as temporal coupling. E.g, open must be called before close and new must be called before delete.

Can temporal coupling be eliminated? Often yes. You can move from this code:

public void open(File f, FileCommand c) {
  f.open();
  c.process(f);
  f.close();
}

public void open(myFile, new FileCommand() {
    public void process(File f) {
      //process file here
    }
  }
}

This is a technique called passing a temporal block. You hide the second function into the first, pass the command into the first function. Our goal is not to eliminate side effects: writing to a database, printing to a queue, etc. are all desirable side effects. Out goal is to impose discipline upon where and how the side effects happen.

Command Query Separation (CQS)

One very successful discipline for managing side effects is to create a strong separation between commands and queries. In this discipline, a command changes the state of the system and returns nothing, while a query retrieves state of the system without changing its state. An obvious example of this are getters and setters. Getters are queries, setters are commands.

CQS formalises this: it says that functions that change state should not return values. Functions that return values should not change state.

While this might seem trivial at first, applying CQS has some distinct advantages. Consider this code:

int f(); //Query

void g(); //Command

Functions that return a value have no side-effects; functions that have a void return type (they don’t return anything) have side-effects. In contrast, consider this code:

User u = authoriser.login(username, password); //Query and Command mixed together

This function clearly changes the state of the system as now there’s a new logged in user. It also returns that user to us. What are we supposed to do with it? Manage it? Are we now smack in the middle of a big temporal coupling? Shouldn’t the authoriser keep track of the user and manage it? We should be able to do this:

User u = authoriser.getUser(username); // Query

Why did login return a User in the mixed version? Maybe the function author meant to return a null if the user could not be logged in. It’s better instead to throw an exception.

Tell, don't ask

An extreme form of CQS tells us to avoid queries altogether. Consider this code:

if (user.isLoggedIn())
  user.execute(command);
else
  annunciator.promptLogin();

How many times have we seen code like this? Wouldn’t it be better to have code like this?

try {
  user.execute(command);
}
catch (User.NotLoggedIn e) {
  annunciator.promptLogin();
}

This would be even better if we let the user deal with the problem entirely:

user.execute(command, annunciator); // Tell, don't ask

In the end, it’s the User object which knows if the user is logged in or not. That state belongs to the User class. Why would we take the state out and make decisions here? We should let the User deal with the problem.

Tell don’t ask it’s a rule that simply says that we should tell objects what to do and not to ask them what their state is. We don’t ever want to ask an object for their state and then make decisions on that object’s behalf. The object knows its own state and can make its own decisions, thank you.

We have all seen code like this:

o.getX()
    .getY()
      .getZ()
        .doSomething(); // This is knows as "train wreck"

The above code is a clear violation of Tell don’t ask because it asks and it asks and it asks before asking the object to do something. This code has too much knowledge of the navigation structure of the system: it knows that o has an X, X has a Y, Y has a Z and Z can do something. One of the advantages of Tell don’t ask is that we can significantly reduce the number of queries, which can get out of hand pretty quickly. Wouldn’t it be better if the code was something like:

o.doSomething();

Train wrecks, which are long chains of queries, violate something called the “The Law of Demeter”. This law tells us that it’s a bad idea for single functions to know the entire navigation structure of the system.

We want our functions to have a very limited amount of knowledge. What we want is to tell our neighbouring objects what to do and depend on them to propagate that message outwards to the appropriate destinations.

The Law of Demeter

You may call methods of objects that are:

Passed as arguments
Created locally
Instance variables
Globals

You may NOT call methods on objects that are:

Returned from a previous method call

Biological systems are an extreme example of tell don’t ask. Cells don’t ask each other questions: they tell each other what to do. This means that all of us are an example of tell don’t ask.

Structured Programming

Structured programming is the youngest of the programming paradigms. Functional Programming dates back in 1957 with the introduction of Lisp. Then Object Oriented Programming came next, between 1962 and 1968. Structured Programming only came to its fruition when Dijkstra published his seminal paper, “Go to Considered Harmful” in 1967.

Structured Programming tells us that all programs should be composed out of three basic operations: Sequence, Selection and Iteration.

Sequence is simply the arrangement of two blocks in time. The exit of the first block feeds the entry of the second. Of course there might any number of blocks in a Sequence.

Selection is just a boolean expression (predicate) that splits the flow of control into two paths, leading to a block for the True case and one block for the False case. Then the two paths rejoin and there is a single exit.

Iteration is simply the continuous repetition of a block until some boolean predicate is satisfied.

Dijkstra demonstrated that if one can write their programs following this structure, it’s possible to reason about them sequentially. This is because each of the structures correctness does not depend on the others. He also demonstrated that if one doesn’t violate the recursive block structure with un-constrained go-tos, one could construct proofs of correctness. Our goal is not usually to construct such proofs. However let’s think about this: if it’s possible to construct proofs that our code is correct, then it’s possible to understand our code.

A key feature of the structures above is that each structure has a single entry point and a single exit point. As we build algorithms following these structures they also have a single entry and exit point. As we build modules out of algorithms, modules too will have a single entry and exit point. As we compose systems out of modules, systems will have a single entry and exit point.

Early Returns

Does the single entry / single exit rule mean one can’t have multiple returns from a function? No, not at all. Consider this code:

private boolean nameIsInvalid(String name) {
  if (name.equals("")
    return true;
  if (!WikiWordWidget.isSingleWord(name) 
    return true;
  return false;
}

Early and guard returns simply take you to the end of the function where the exit is. However there is a small problem in early returns from loops. Consider the following code:

while (enumeration.hasMoreElements() && (i < length) {
  NameClassPair ncPair = (NameClassPair) enumeration.nextElement();
  String name = ncPair.getName();
  if (!name.endsWith("jar"))
    continue;
  if (!name.equals(jarNames[i]) {
    return true;
  }
  i++;
}

Mid-loop returns add an unexpressed and indirect exit condition. While this doesn’t violate structure, it makes the loop a lot more complicated.

continue inside a loop is not problem at all. It’s equivalent to a no-op block and then the loop resumes. break on the other hand is a bit more problematic. A break creates an indirect and unexpressed exit condition. Again they don’t violate structure but they do make the loop constraints a lot more complicated. Why does this matter? Anything we do to make our code harder to sequentially reason about matters. Our first responsibility is to our readers. It’s more important to make the code understandable than to make it work. Additionally, if we keep our functions very small as described in Episode III, it’s veery hard to have mid-loop breaks and returns.

Error Handling

Michael Feathers once wrote that error handling is important but if it obscures logic, it’s wrong.

It’s always best to write the error handling code before one writes the rest of the code.

Exceptions are for callers

What exceptions should we throw? We don’t like exceptions canned from the Java library. We like exceptions that are scoped to the class throwing them and we want them to be named with as much specific information as possible. This means that typically they will be scoped and defined within the class throwing them.

Use unchecked exceptions

Exceptions should derive from RuntimeException in Java or from an unchecked type if in other languages. The experiment with checked exceptions is over and it failed. It failed in C++ and it failed in Java. The problem with checked exceptions is that it creates a reverse dependency: a dependency up the inheritance hierarchy instead of down. If one has a method in a class deriving from a base class throwing a checked exception, the base class method needs to change signature and. so do all users of that class which now must be recompiled and redeployed. This violates independent deployability and it breaks the open close principle, which will be discussed in a future episode.

What message should we put in these exceptions? We like exceptions with no message at all. If the exception is rightly scoped and meaningfully named, there shouldn’t be any need for messages.

A simple rule about try

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

If we must use try within a function these are the rules we like to observe:

try should be the first line in the function
The body of the try should be a single function call
The catch and finally blocks should be the last statements in the function

In the end, if we follow the rule of having small functions, the above makes sense.

The Clean Code Fundamental Series – Episode IV – Function Structure

Table of Contents