Design Patterns: Error Handling

Aleksander Demko,

January, 2015

Background: Exceptions

There are generally three approaches to signaling error conditions from functions to callers. They are:

Ignoring errors: The function, upon detecting errors does nothing, and if required, returns "dummy" (but still valid) data.
Error codes: Each function that can return errors returns a code for success and another code for failure. Often, you'll need many failure codes as there are many ways to fail.
Exceptions: A function, at any level, can throw an exception object and the system will "unwind" the calls until a function that has been designed to handles it is found.

Exceptions have proven, among the three, to be the best and most general mechanism for handling errors. It allows code to detect and signal errors anywhere, and for the caller programmer to collapse and merge error handling code where it makes sense.

All modern programming languages support them and use them extensively. They are the most productive and clean technique of reliably handling all error paths. Error codes, despite their proponents claims, are rarely used properly:

Rarely do they check every function call for errors. This would greatly clutter the code with if-error-return checks after almost every function call. Instead, they don't bother checking all the error paths, leading to errors that are uncaught, which would do nothing (at best) or corrupt the data (at worst).

The most common argument against exceptions tend to be for performance or legacy reasons. For example:

There is some overhead cost to them, in terms of code size and sometimes performance. In specialized programs such as operating system kernels or tiny embedded programs, controlling this might be desirable. Sometimes the programmer doesn't have a choice if they are using C (aka "portable assembly language") for such projects, as it lacks support for exceptions.
However, the cost argument is rarely ever fully explored. An exception-enabled code base has to be compared to a code base that has if-error-return checks in every other line of code. This means that error checks are active at every function call, decreasing performance. An exception-enabled code base only pays this cost when an error is actually raised.
They might have a large, legacy C++ code base that isn't exception safe. This usually happens when the initial programmers aren't strong C++ developers (or worse, are C developers) and didn't structure their code using basic RAII patterns. Refactoring the code base would be a large task, so such a company often instead adopts a "no exceptions" policy going forward. Google seems to fall into this category.

Error Handling Patterns

Despite their strengths, one must understand that exceptions should only be used for exceptional or unexpected errors in a function call. In many cases, it may be useful to add other techniques. When when designing a function, you must factor in the common use cases of the caller.

For example, let's say we are making an ImageArray class, and are implementing the function ImageArray.GetAtIndex(index), which returns the image at the given index. The array might only have count images, so obviously index must be less than count. When implementing such a function, I typically start at the most restrictive (but cleanest and most per formant) implementation first, and then "open it up" as new requirements or usage patterns surface. These are the patterns, with the most preferred listed first:

No error checking. Caller must make sure the index is valid. In debug builds, the function could verify index is valid with an assert, and in release modes (under C++) this check disappears and the program would crash.
Exceptions. An exception is thrown if index is invalid. Depending on the caller, this code results in:
- The caller catching this exception explicitly and reporting it. To simplify code, one catch block can group multiple exceptions.
- If the caller doesn't bother to catch the immediate exception, then their program will crash with an exception error. This "fail-fast" approach means that a nice error was reported rather than continuing and perhaps corrupting data.
Exceptions with validation functions. Sometimes an exceptional circumstance is not really exceptional. For these cases, you should add an additional validation function for the caller. For our example we could add a Has(index) method. This means:
- The caller will only call the function after validing the index. This means that exceptions will never been thrown and no catch blocks need to be written.
Return true on success or false on error. This combines the validation and extraction functions into one call. If the validation function is called often, or does some (perhaps expensive) common work that the main function does, it's often useful to combine these functions into one function that returns false on errors.
Sentinel values on errors. It might make more sense to return -1, 0, "", null or other sentinel value on errors. This is generally less clean than simply returning false, but may be useful in certain cases.
Return a "default" value on errors. This allow the caller to specify the sentinel values (in the previous case) that are returned on errors. This is useful if the caller doesn't care about the specific error, and just wants a value in those cases.
Return error codes. Finally, a function can return full error codes. This could be as simple as zero for success and a non-zero number for errors. Non-zero values would then index into a enum table. More complex examples include the HRESULT system in Win32. As this is the most tedious technique, it is the least preferable. However, it can be be handy when designing boundary API to other systems or languages as exceptions may not be available.

Nesting Errors

Non-trivial functions may call other functions during their execution. The main function should report these sub-function errors to their callers.

Only exceptions can properly and easily propagate such errors.
Return true and false functions can be bubbled up, if care is taken. However, since all errors are just "false,", any specific error information is quickly lost. You know that a function failed, but you don't know why.
Combing error codes. You can build a whole error code system (like HRESULT) to bubble up the individual errors. This system can get unweildly fast as tables of error codes must be maintained.

Multiple Patterns

Don't be afraid to use two or more patterns for the same function if the needs arise. These can be done by having multiple functions or by using switches on the same function. For example:

You can have two string-to-int converting functions:
- bool StringToIntChecked(string, int &outval) // returns false if the conversion failed
- int StringToIntDefault(string, int defaultint) // returns defaultint if the conversion failed
Sometimes you want a fail-fast exception or sometimes you want to ignore the error code. Lets say you have a function that loads plug ins:
- bool LoadPlugIns(bool throwExceptions = false) // returns true on success, or throws exceptions (if the caller asked for them) or false on errors
- The implementation of such a function is still exception based, but if throwExceptions is false, then it will catch all internal exceptions in return false rather than let than propagate them.