“Clean Code” Book Club: Chapters 5–7

Posted on Tue 11 June 2024

Continuing our book club on Robert Martin’s “Clean Code”, over the past month we’ve started to rotate who leads the discussion on each chapter. When I’m not the discussion lead, my notes have been a bit less detailed. Accordingly, posts from now on may be shorter and less comprehensive; completeness continues to explicitly be a non-goal.

In this post, I’ll cover chapters 5 (“Formatting”), 6 (“Objects and Data Structures”) and 7 (“Error Handling”).

Chapter 5: Formatting

Pick a vaguely reasonable style, use it, be pragmatic about it. That’s it.

Robert Martin discusses his personal style preferences and thinking behind them in some detail in this chapter. Overall, it’s a reasonable style; but we didn’t discuss details since that would just risk devolving into bike-shedding.

Instead, we used most of the hour-long session to talk about our experiences with automatted tools. Code formatters like black, clang-format or others may be helpful; on the other hand, automated tools are notoriously bad at being pragmatic and may be more frustrating than helpful in some cases.

Chapter 6: Objects and Data Structures

Objects hide their data behind abstractions and expose functions that operate on that data. Data structures expose their data and have no meaningful functions. (p. 95)

Once you get used to the fact that Martin’s definition of “data structures” is different from what we typically think of as data structures, this distinction makes sense.¹ Martin then explains nicely how data structures lead to procedural code; and how this makes it easy to add new functions that operate on these data structures, without having to change existing data structures. On the other hand, Martin explains, objects enable object-oriented code, which makes it easy to add new objects without having to change existing functions.

This complementarity is quite elegant; however, we discussed in the book club that it’s not always helpful in practice. Research software is often quite exploratory, so when writing some initial code, it may be uncertain how it will be extended in the future; or it may be necessary to add both objects/data structures and functions.

The Law of Demeter

This section led to a lot of discussion and we were somewhat sceptical.

For one, method chaining is not bad per se; while it may be considered sloppy style in Java, it is a very established pattern e.g. in jQuery (which is explicitly designed to enable this).

But more importantly, we were not convinced by the law itself. According to the Law of Demeter, Martin says, the following code …

Options opts = ctxt.getOptions();
File scratchDir = opts.getScratchDir();
final String outputDir = scratchDir.getAbsolutePath();
String outFile = outputDir + "/" + className.replace('.', '/') + ".class";
FileOutputStream fout = new FileOutputStream(outFile);
BufferedOutputStream bos = new BufferedOutputStream(fout);

… is bad form, because it knows too much about the objects returned by getOptions() and getScratchDir(). But passing those returned objects as an argument into a helper function, which then knows the exact same things about these objects, would supposedly be fine. I’m sure there are more complex examples with a layered architecture, where this would make sense. But in this specific example that would just add unnecessary layers of indirection to obey a constraint that feels arbitrary.

In Martin’s defense, he doesn’t propose adding helper functions. Instead, he suggests creating another function on the ctxt object. Naïvely, one might come up with a ctxt.getAbsolutePathOfScratchDirectoryOption(); function as a direct replacement for ctxt.getOptions().getScratchDir().getAbsolutePath();. However, exposing all chained functions like that would lead to a combinatorial explosion.

Instead, Martin considers how that scratch path is actually used by the calling code, and recommends a function that specifically fulfills this use case:

String classFileName = className.replace('.', '/') + ".class";
BufferedOutputStream bos = ctxt.createScratchFileStream(classFileName);

The downside here, which Martin ignores, is that as different use cases appear, this could equally lead to an explosion of methods on the ctxt object. (And what’s more, it requires the ctxt object to be aware of any higher level code calling it and to change in response to changes in higher level code. Not a great software design.)

Chapter 7: Error Handling

Error handling is important, but if it obscures logic, it’s wrong. (p. 103)

Leaving aside some Java-specific bits that weren’t applicable to other programming languages that our group commonly uses, this was an interesting chapter that sparked some good and nuanced discussion in our book club.

Use Exceptions Rather Than Return Codes

In principle, we agreed that exceptions—where available—are preferable to return codes.

The code example for this section quite nicely separates error handling from application logic. However, it assumes that all we do in response to errors is to log them. What if we want to handle the errors in other ways, e.g. by restoring a previous state or prompting the user for input? That would by necessity mix error handling and application logic.

Write Your `Try-Catch-Finally` Statement First

We agreed it’s typically a good idea to think about the overall structure first (What am I going to do? How can it fail? How do I handle failure?), before writing the implementation details. However, the example goes far beyond that, giving step-by-step instructions on how to do test-driven development (TDD). That is rather beside the point here and doesn’t make much sense without explaining the motivations for using TDD, which will only be covered in a later chapter.

Define Exception Classes in Terms of a Caller’s Needs

Often, the caller doesn’t care and doesn’t want to know about the details of how an operation failed; it just matters whether it did, in fact, fail. Adding a wrapper class to hide away those details, as recommended in this section, works well enough in those cases. On the other hand, it would usually be preferable if the library implements a class hierarchy of exceptions. That way, callers can choose whether to look out for, e.g., a generic ConnectionError or whether to handle each ConnectionError subclass (ConnectionAbortedError, ConnectionRefusedError, etc.) individually; all without having to write their own wrapper.

Define the Normal Flow

Hm … on the one hand, this concrete example looks cleaner when using a default value. But on the other hand, picking a particular default value is tying the implementation of the ExpenseReportDAO class very closely to the higher-level business logic using it, which could lead to severe issues. For example, what if the business logic requires a different default value in some other circumstances? (E.g., when claiming expenses for conference attendences, the default is a per diem; but when claiming expenses for hosting a visitor, there is no per diem.) Or what if the per diem varies by location and needs to be entered by the admin staff manually?

If the ExpenseReportDAO class raises a MissingExpensesError and lets the higher-level business logic handle it differently depending on context, that’s all straightforward to implement. But if it returns a fixed default value automatically, handling other cases becomes much more difficult. (Perhaps even impossible, if the calling code can’t distinguish between, say, a per diem of £30 and actual expenses that happen to sum to £30.)

Some details of this chapter were very Java-specific. For example, in Python it’s common to access member variables directly, rather than defining getters and setters; there is also no enforced distinction between private and public variables. In practice, though, naming conventions (e.g. using a leading underscore for “private” variables) or documentation usually fulfill that function well enough, so the main points of this chapter apply in Python, too. ↩