“Clean Code” Book Club: Chapter 2, Meaningful Names
Posted on Thu 18 April 2024
After last week’s introduction, in the second session of this book club we dive into the main content of the book. And, if you believe the old joke, we start with one of the hardest chapters:
There are two hard problems in computer science: Naming things, cache invalidation and off-by-one errors.
(As usual, this post collects most of my notes and some points from our group discussion; completeness is explicitly a non-goal.)
Chapter 2: Meaningful Names
Like about half of the chapters in this book, this one was not written by Robert Martin, but by one of his collaborators; in this case, Tim Ottinger.
In general, my impression of this chapter was fairly uneven. It discusses many important aspects of naming, formulates clear guidelines—some of which I was already aware of, others I’ll need to keep in mind—and is very conscious of how context-dependent naming is. Superficially, some guidelines may seem contradictory; but to me, this is not a flaw in the writing, but rather an acknowledgement that naming choices have tradeoffs—being a good developer simply means being better at making reasonable tradeoffs. Unfortunately, several examples seemed unhelpful or even actively misleading to me.1
Use Intention-Revealing Names
If a name requires a comment, then the name does not reveal its intent. (p. 18)
For example, naming a variable d
does not make the intent clear; whereas naming it maxAgeInDays
does.
This is such an obvious, basic guideline; and yet I’ve been guilty of breaking it. (Though if we’re honest, so has every programmer, right?)
Now, if this applies to naming function arguments as well, does that mean we shouldn’t (need to) describe arguments in the function’s docstring?
We discussed this a bit in the book club; and we think that would be an overly broad interpretation of this guideline.
For example, a function may take an argument maxAgeInDays
but have a non-obvious behaviour for special values like 0
. Such larger context belongs in a comment; the basic intent of the argument is still clear from its name alone.
Here’s a longer example:2
# Bad: Names don’t reveal meaning of the variables.
# It isn’t clear what the code is doing at a high level.
def getThem():
list1 = []
for x in theList:
if x[0] == 4:
list1.append(x)
return list1
# Better: High-level intent is clear from reading code.
def getFlaggedCells():
flaggedCells = []
for cell in gameBoard:
if cell.status = CellStatus.FLAGGED
flaggedCells.append(cell)
return flaggedCells
Avoid Disinformation
The first example in this section recommends not to use hp
as a variable name for a hypotenuse, since it could also be used (alongside aix
and sco
3) as the name of a Unix variant.
I don’t think this is a great example.
For one, I struggle to imagine a context where there is serious doubt whether a variable refers to a hypotenuse or to HP-UX.
But also, hp
is a fairly bad variable name for other reasons anyway—it’s fairly cryptic, hard to pronounce, … if you do need a name for a hypotenuse, at least make it hyp
!
The following example (don’t call a variable accountsList
if it’s not actually a list
) makes a good point, though.
And even if it is a list, encoding that in the name is often unnecessary. I like the convention of simply using a plural noun (accounts
) to name such a variable—whether it’s a list
, set
, dict
or similar class—since it enables beautifully readable code like for account in accounts:
.
It is very helpful if names for very similar things sort together alphabetically (p. 20)
This is very practical API design advice!
For example, in NumPy, np.as<TAB>
offers the autocomplete options asarray
, ascontiguousarray
, asfortranarray
, asmatrix
, asscalar
, etc., so even if I know only one or two of these functions, it’s obvious where to look for related ones.
In contrast, if they were named np.arrayfrom
, np.fortranarrayfrom
etc., I’d have to look through the whole documentation to find the one that’s most appropriate.
Make Meaningful Distinctions
Two variables named time
and time2
may be distinguishable to the compiler/interpreter, but they are not meaningfully distinct to a human reader.
Use e.g. startTime
and endTime
instead—or rawTime
and correctedTime
, or whatever makes sense in context.
Similarly, the chapter notes that “noise words” are not a meaningful distinction. For example, if two objects are named accountData
and account
, it’s unclear to a reader what the difference is.
Or, in an earlier Java code sample in this chapter, there were both an ArrayList
and a List
. If you’re not already very familiar with the API, can you tell what the difference might be? I had to look it up.
Use Pronouncable Names
If you can’t pronounce it, you can’t discuss it […] This matters, because programming is a social activity. (p. 22)
Well said!
Use Searchable Names
Grepping for a NAMED_CONSTANT
is easy, grepping for a digit is basically hopeless.
A very good point, which I hadn’t given much thought to before.
This section also has the following rule of thumb, which is more widely applicable:
The length of a name should correspond to the size of its scope (p. 22)
That makes sense: If the scope is small, then I can see all the necessary context at a glance and the name doesn’t need to duplicate that information. Whereas if the scope is too large to be glanceable, I need the name of the variable to provide me with context.
Avoid Encodings
This section discusses Hungarian Notation or the common convention of prefixing e.g. member variables with m_
.
This, as the chapter says, is largely unnecessary in even halfway modern IDEs. Experienced developers learn to ignore it after a bit, while new developers joining a codebase have an extra hurdle to reading the code.
Especially in an academic context, where the software is often not the main product but just a tool to enable a research project, new developers (e.g. PhD students or postdocs) might not have the time or motivation to understand such cryptic encodings.
Maintainers therefore need to expend significant effort to enforce such conventions, or the code will become inconsistent and thus misleading over time.
Often, such encodings also make names hard to pronounce—see above!
Another kind of encoding—not mentioned in the book, but highly relevant to research software—is mathematical notation. Just because it’s legible when properly typeset, doesn’t mean it’s legible in code.
For example, sub- or superscript indices are extremely useful in typeset equations, but often don’t translate well into code.
Or I recently encountered an example, where a physics paper used γ
and Γ
for distinct but related variables. In the typeset equations, this was a clear distinction; but in code, having variables named gamma
and Gamma
would be terribly confusing.
Avoid Mental Mapping
This chapter left me a bit puzzled. It’s not that I disagree with anything in here; it’s just that the examples don’t make it clear how this is any different from the section “Use Intention-Revealing Names” at the beginning of the chapter … 🤷
Class and Method Names
Classes and objects should be nouns or noun phrases; methods and functions should be verbs or verb phrases.
Good advice. I mostly do this intuitively, but it’s good to be aware of this.
Don’t Be Cute
If colleagues can’t understand your code without googling pop-culture references or asking you to explain some inside joke, that’s a problem.
I know it’s tempting to use that really clever joke … but don’t. If it’s otherwise safe for work, feel free to post it in the #random channel on your team’s Slack, where anyone who doesn’t get the reference can safely ignore it; if it’s risqué, post it on your AfterDark Mastodon account.
Pick One Word per Concept … and One Concept per Word (a.k.a. Don’t Pun)
These two sections are closely related to the “Make Meaningful Distinctions” guideline earlier.
For example, fetch
, retrieve
and get
may be different, but they’re not meaningfully distinct to a reader, so pick one and use it for all equivalent methods across different classes.
But on the flip side: Don’t use the same word for conceptually different things. This can sometimes be subtle. For example, appending an object to a list is kinda like adding it, right? Well … that’s just English being imprecise. Consider these examples:
>>> [1,2] + [3,4]
[1, 2, 3, 4]
>>> [1,2] + 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
Instead of list.add(element)
, the Python standard library rightfully calls this function list.append(element)
.4
Use Solution/Problem Domain Names
Remember that the people who read your code will be programmers. (p. 27)
That’s not necessarily true for research software—a lot of our collaborators are experts in their specific field of research, but have no formal programming training. Thus, while the book recommends programming terms first, and domain terms as a fallback, for us the better approach is often going to be the other way around.
Add Meaningful (But Not Gratuitous) Context
The last two sections tell us that we want to be right in The Goldilocks Zone of Context—just enough to inform, not so much that it overwhelms.
The basic idea here makes a lot of sense.
But unfortunately, I found the example code listings here worse then useless. To me, the original version of the code is better than the “improved” version in almost every way:
The “improved” version of the code has 1.5 times as many lines of code.
Instead of 1 function, it has 1 class containing 5 functions; all for a simple three-way if-elseif-else
statement.
Instead of executing linearly from top to bottom, execution now jumps up and down between 3 of the 5 functions.
Supposedly, that example code demonstrates that having the three variables number
, verb
and pluralModifier
defined inside a GuessStatisticsMethods
class instead of a printGuessStatistics
function adds more meaningful context?
I didn’t see much of a difference; and even if there is one, the example is terribly ill-suited to illustrating that difference, since it includes major unrelated code changes (completely restructuring the code flow) and even changes the functionality (instead of printing a message, the “improved” code returns a string).
Oh, and the class and function names? For a chapter that’s all about meaningful naming, I found them surprisingly obtuse. Here’s a Python version of the initial code example:
def printGuessStatistics(candidate, count):
if count == 0:
print(f"There are no {candidate}s.")
else if count == 1:
print(f"There is 1 {candidate}.")
else:
print(f"There are {count} {candidate}s.")
(Note: The one significant detail that got lost in this translation is that candidate
was a char
in the Java version.)
The name printGuessStatistics
doesn’t make much sense here without additional context which this code sample doesn’t offer.
But let’s assume that these are results from a survey asking people to guess the right answer from some options A, B, C and D.
That would explain the Guess
part, but Statistics
is still a bit of a misnomer—printGuessCounts
would be more accurate.
Also, why are three functions in the “improved” version (which basically correspond to the print
function calls in my Python version) called thereAreNoLetters
, thereIsOneLetter
and thereAreManyLetters
?
This is violating the guideline of “One Word Per Concept”—Letters
refers to the same thing as candidate
—and the guideline that method names should be verb phrases.
With all these complaints about that example out of the way, what would be a better example for adding meaningful context?
Off the top of my head: Maybe we could have gone back to the getFlaggedCells
example at the start of this chapter and discussed how having a CellStatus
enum adds useful context, compared to having a bunch of independent constants like FLAGGED
, EMPTY
and NONE
.
But it’s certainly worth thinking about—what are other, better examples?
- Of course, some of that will be due to my perspective: The code examples use Java (which I have almost no experience writing) and often draw from a business context. In contrast, my primary language is Python and I work mainly on scientific software; so some cultural differences are unavoidable. ↩
- I will write examples in Python in this post. Partly so I don’t just copy the author’s code; and partly, because it’s a nice exercise to translate examples into a different language and verify whether the guidelines still apply, independent of syntax details. ↩
- Yes, that SCO. “The Most Hated Company in Tech”, which became bankrupt just as this book came out. ↩
- Ironically, one of the first Java code examples in this chapter uses
list1.add(x);
. I think it’s a missed opportunity that the author explicitly brings up theadd
/append
example, but doesn’t acknowledge this. ↩