Error propagation patterns

In backend operations managing multiple orthogonal systems, composing valuable error messages from inconsistent API interfaces is a huge benefit for technical API consumers. This poses an even bigger benefit to your team when you’re performing support to users, and ideally you’d have this data bubble up to a useful layer like Sentry.

For simple CRUD apps, I doubt much beyond REST standards matters, but for a technical user to act on a complex error in a sane, stress-free, and safe manner, the ability to discriminate between error sources and failure points in the network is a huge benefit. In well-defined software stacks, I’d expect that this is a solved problem. In paths less travelled, we’ll have to roll it ourself.

As an aside, your structure for error handling and propagation will vary from domain to domain. If you like, you could consider a spectrum with safety-critical domains at one end, like medicine or nuclear power, and safety-negligible domains at the other, like online streaming and video games. In the former, you want all errors to be handled safely. In the latter, you want to fail out as soon as you’re aware of an issue, and triage it at your preference. I’m not going to feign expertise in any of the above mentioned systems, so I won’t talk about how best to throw errors for a given use case.

What we’ll be covering is an intermediary pattern I’ve found useful over time for managing multiple systems.

Maybe you’re handling IoT devices and want human-readable messages for the handful of errors that are actually relevant to your firm’s value proposition. Maybe you’d like a way to aggregate multiple errors of the same type in high-frequency applications to keep from having a data overload. Maybe you’re integrating static typing into your codebase and want to have a uniform way of managing typed and untyped errors. Maybe you want to take the various error structures your systems have been raising for the past two months and finally integrate them into a standard API response, like JSON:API.

In each of these situations, what we’d like is a way of composing errors such that we can split, group, chain, and annotate messages, and finally output them to the end user.

Before moving on to a concrete use case, let’s see some code. The general case for our errors looks like this (the exact language doesn’t matter):

class ComposableError extends Error {
  constructor(message, meta = {}) {
    // So we can use 'x instanceof Error'
    super(message);
    // JSON:API error spec below
    this.code = this.constructor.name;
    this.detail = message;
    this.meta = meta;
  }
  static fromParent(parentError, code, message, meta) {
    return new ComposableError(
      code,
      message,
      Object.assign({ cause: parentError }, meta)
    );
  }
  getCause() {
    return this.meta.cause || undefined;
  }
  // ... setter omitted for brevity
}

As you can see we’ve just scaffolded our given API spec on top of the language’s basic error implementation, with one additional detail: we’ve added room for a ‘cause’ of the error, so we can chain errors together.

It’s worth noting that this intermediary class will have varying levels of logic; for example, NodeJS hooks all errors to have stack traces and Node-specific error codes, in the stack and code properties, respectively. If we want these to be API spec compliant for JSON:API, for example, we’d have extra code copying them to our meta property so we don’t have to move between two error models all the time.

That aside, this lets us write code like this:

const unwrapErrorChain = (err) => {
  let chain = [err];
  let nextError = err.getCause();
  while (nextError) {
    chain.push(nextError);
    nextError = nextError.getCause();
  }
  return chain;
};

Which in turn lets us compose API errors like this:

function performDifficultOperation(params) {
  try {
    performNetworkRequest(params);
  } catch (err) {
    if (err instanceof TimeoutError) {
      throw NetworkError.fromParent(
        err,
        'Could not retrieve data'
      );
    } else {
      throw ComposableError.fromParent(
        err,
        'Unknown error occurred in network request'
      );
    }
  }
}

And then we can send them:

try {
  performDifficultOperation(args);
} catch (error) {
  response.setStatus(500);  
  response.setHeader('Content-Type', 'application/json');
  if (error instanceof ComposableError) {
    const plainErrorChain = unwrapErrorChain(e)
      .map(err => ({
        code: err.code,
        detail: err.detail,
      }));
    return response.send(JSON.stringify(plainErrorChain));
  }
  // ... edge case for other error types goes here
}

As for a use case: I’m currently working on a project to monitor local networks for available services using Apple’s Bonjour protocol. Bonjour abstracts multicast DNS and DNS service discovery so we can write applications and service layers that don’t have to worry deeply about the network layer, at the expense of security in our immediate network. A good example of this is AirPrint, Apple’s tech for allowing your device to print over a wireless-connected printer on your network without fiddling with configuration. For more info, I highly recommend the zero-conf website, and this talk by its creator Stuart Cheshire.

To monitor network changes, we’ll be working with an open-source implementation of the protocol, Avahi. For the sake of brevity, just know that there are some useful command line utilities installable on Linux distros allowing you to quickly observe services broadcasting over Bonjour on your local network, and they can throw errors. We can set this aside as an asynchronous, low-level system, expecting specific outputs for a specific input. The utility to observe your local network is avahi-browse, and to resolve a known endpoint we use avahi-resolve.

So when we’ve fleshed out a number of error types:

ComposableError
TimeoutError
InvalidIPv4Error
HostResolutionError
AvahiError

We end up getting responses like this:

{
  "data": {},
  "errors": [
    {
      "code": "NetworkError",
      "detail": "Could not find bookmarked printer on the local network"
    },
    {
      "code": "AvahiError",
      "detail": "Search for 'CANON20123819' failed"
    },
    {
      "code": "HostResolutionError",
      "detail": "Could not resolve host 'CANON20123819.local'"
    },
    {
      "code": "TimeoutError",
      "detail": "avahi-resolve command timed out after 15 seconds"
    }
  ]
}

And now we have data we can use to improve the UX on the frontend, for example by asking them to check if their printer is connected, or on the backend, by aggregating these cases and integrating them into any number of services, like email reporting or bug trackers.

If you’ve got more experience on error propagation, or spot an error in this post, don’t hesitate to shoot through an email, any and all feedback is highly appreciated.