« Back to article list

Self Documenting Code

Table of Contents

Self Documenting Code

I've been toying with the idea of Literate Programming recently.

While I feel it has some merit to it, it seems in practice the majority of the prose often restates trivial implementation details that anyone with moderate understanding of the underlying language should already know.

See this call to malloc? I'm getting some memory, because the program needs memory

This would be fine and all, if it weren't for the cumbersome workflow introduced by adding yet another layer of abstraction, or if the target audience (reader) was simply those in need of learning the underlying language (so, LP may be a strong fit for my blogging for instance).

Another thing to consider is that the majority of code nowadays pushes for "Self Documenting Code", and going lighter on code comments. I think this is a good direction, however I think what SDC is can vary person to person. I would like to offer some pointers on what makes good SDC vs bad SDC.

The superVerboseOverTheTopMethodName

This is likely a high point of contention - some people really enjoy a multi-word name for a function that basically describes the implementation in the method name (believe it or not, I've worked with some code with 40+ character names - I feel this is too lengthy).

Sample of a hard to read naming style (lets pick on PHP for a bit, as they've been pretty heavy into the Java-like naming conventions lately with Symfony etc.).

<?php

// Assumes UserRepository is defined elsewhere.
class UserService {
    protected $userRepository;

    public function __construct (UserRepository $userRepository) {
        $this->userRepository = $userRepository;
    }

    public function getAllUsers () {
        $users = $this->userRepository->getAllUsers();

        return $users;
    }

    public function getUserByUserId (int $userId) {
        $users = $this->userRepository->getUserByUserId($userId);

        return $users;
    }
}

// Now, calling into this would tend to look similar to this
// in a controller method or something (or dependency injected via your
// framework of choice).
class UserController {
    public function getAllUsers () {
        $userRepository = new UserRepository();
        $userService = new UserService($userRepository);

        return $userService->getAllUsers();
    }

    public function getUserByUserId (int $userId) {
        $userRepository = new UserRepository();
        $userService = new UserService($userRepository);

        return $userService->getUserByUserId($userId);
    }
}

This isn't bad, in fact, I intentionally chose what looks like reasonable run of the mill enterprise code.

Yet, do you notice any fluff or value-less words here (words that, if when removed, do not alter what is being shown or communicated in anyway)?

Context is important, and if we really start thinking about the UserRepository, should it ever be returning any data that is not a user? (hint_: probably not, it could go in a FooRepository if so).

Does saying:

    $userRepository->getUserByUserId($userId);

offer any more clarity to the reader than simply saying:

    $repo->getById($id);

In the scope of usage (UserService) the one and only repository is the user repository. The surrounding scope (UserService) has set the context for the code within ("user"), so any further mentions of "user" underneath this scope are fluff words (they do not add further refinement or clarity to what the reader is looking at).

What if we try again, while removing that fluff word (and shortening repository to repo)?

<?php

// Assumes UserRepository is defined elsewhere.
class UserService {
    protected $repo;

    public function __construct (UserRepository $repo) {
        $this->repo = $repo;
    }

    public function getAll () {
        $users = $this->repo->getAll();

        return $users;
    }

    public function getById (int $id) {
        $users = $this->repo->getById($id);

        return $users;
    }
}

// Now, calling into this would tend to look similar to this
// in a controller method or something (or dependency injected via your
// framework of choice).
class UserController {
    public function getAll () {
        $repo = new UserRepository();
        $service = new UserService($repo);

        return $service->getAll();
    }

    public function getById (int $id) {
        $repo = new UserRepository();
        $service = new UserService($repo);

        return $service->getById($id);
    }
}

I think this second attempt just became more readable by quite a bit. Still, we can look into some common patterns used for naming methods/functions:

Prefix Meaning Sample
get Retrieve something and give it back getAll()
set Modify something that exists setName('Frank')
make Create something (aka constructor) FooFactory::make()
do Run some side effects doInit()
persist Save something persistUser($user)
     

Anyways, you get the idea.

You can also (hopefully) see that many of our Repository calls are all describing the same default action ("get").

Calling a function and retrieving some value (I/O) tends to be the default operation of almost all function calls in all languages.

This lends itself well to omitting the "get" prefix entirely.

What would our class look like if we re-wrote it without the "get" prefix peppered all over the place?

<?php

// Assumes UserRepository is defined elsewhere.
class UserService {
    protected $repo;

    public function __construct (UserRepository $repo) {
        $this->repo = $repo;
    }

    public function all () {
        $users = $this->repo->all();

        return $users;
    }

    public function byId (int $id) {
        $users = $this->repo->byId($id);

        return $users;
    }
}

// Now, calling into this would tend to look similar to this
// in a controller method or something (or dependency injected via your
// framework of choice).
class UserController {
    public function all () {
        $repo = new UserRepository();
        $service = new UserService($repo);

        return $service->all();
    }

    public function byId (int $id) {
        $repo = new UserRepository();
        $service = new UserService($repo);

        return $service->byId($id);
    }
}

If you read through the last sample, is it still "self documenting"? It surely is not being opaque about what is going on in either method as it trickles down from a controller, to a service, to a repository.

Context must be overly applied to everything all the time

This one really hurts, especially in a typed language (but in all really).

Lets say you're writing some Typescript and need a function to split a single "name" of a restaurant owner into a first and last name.

Often I'll see code that overly sets context in data that does not actually require it to be contextual, and the added "clarity" does not self document anything, in fact, it hinders it from readability.

This is even worse when what should be primitive intermediate variables end up getting set to overly contextual things:

interface SplitRestaurantOwnerNames {
  firstName: string
  lastName: string
}

function splitRestaurantOwnerName (ownerName: string): SplitRestaurantOwnerNames {
  const splitRestaurantOwnerNames: string[] = ownerName.split(' ')

  const firstName = splitRestaurantOwnerNames[0]
  const lastName = splitRestaurantOwnerNames[1]

  return { firstName, lastName }
}

Yet, in the scope of the small function, little is gained by identifying "what" kind of name is being split. This also overlaps with the overly verbose naming problem, in that we're clumping in "what" we're doing, with various names of types and is often unnecessary.

How might this look cleaned up and simplified?

interface RestaurantOwnerNames {
  first: string
  last: string
}

function splitRestaurantOwnerName (name: string): RestaurantOwnerNames {
  const ss: string[] = name.split(' ')

  const first = ss[0]
  const last = ss[1]

  return { first, last }
}

Was any "readability" or "self documenting" lost in this refactor?

A caller would still see a descriptive function name, and seeing small placeholder names vs the overly verbose offers little disconnect in understanding (IFF (if and only if) methods / functions are also kept small and concise).

Self Documenting does not mean NO comments

Using links to more verbose documentation is a great strategy (whether you link to a company Jira, or Confluence, a Stack Overflow post or anything that elaborates).

Likewise, documenting some of the "why" in questionable implementation decisions can save you and your team a lot of repeated answering of those same questions (and allow future readers to know as well).

Lets say you have a function in Javascript that needs to call up 3 endpoints at once.

async function getRssFeeds () {
  const feed1 = await client.get('/some-feed.rss')
  const feed2 = await client.get('/some-feed2.rss')
  const feed3 = await client.get('/some-feed3.rss')

  return [...feed1, ...feed2, ...feed3]
}

In this state, you'll almost certainly have someone question or try to improve this at some point in time by making these calls asynchronous.

However, you can save yourself and future readers a lot of hassle with a small comment (and potentially a link), to avoid an attempted refactor of this into a Promise.all call:

// Must run sequentially due to remote throttle on RSS calls to avoid http 429 block
// { see: http://example.com }
async function getRssFeeds () {
  const feed1 = await client.get('/some-feed.rss')
  const feed2 = await client.get('/some-feed2.rss')
  const feed3 = await client.get('/some-feed3.rss')

  return [...feed1, ...feed2, ...feed3]
}

Now when your helpful programmers come across this, they'll immediately resist the urge to "optimize" your already handled code decisions.

Fin

fin.