Idiomatic panics in Go

Unpopular opinion: Idiomatic Go panics. Not instead of proper error handling, but there remain times when panicking is the best option for correct code. Unfortunately, the use of panic is frowned upon so strongly that even correct uses are likely to get criticized in review. This post hopefully explains when panics are acceptable and idiomatic in Go.

To start, an appeal to authority by showing how the Go authors support the use of panics.

  1. The builtin functions panic and recover indicate that the language authors believed that there are valid cases for using panics.

  2. The runtime panics in certain cases, such as index out of bounds, or nil pointer. More interesting, the builtin function close, which could easily return an error, also panics.

  3. The standard library uses panics in certain places, such as sync.WaitGroup.

So, if we acknowledge that panics are valid in certain cases, a questions emerges: What are those cases?

Abandonment

With the caveat that this is a simplification, programmers often divide options for handling errors into 2 approaches: error returns and exceptions. However, there is a third approach, called abandonment. This is where the program aborts all activity and dies. Again this is a simplification. Erlang, for example, uses isolated green processes as the scope for abandonment. Midori, on the other hand, approximately called os.Exit on programming bugs.

It wouldn't be idiomatic Go to use abandonment as a general error handling strategy. It probably wouldn't be appropriate in any language. But abandonment remains an option when the caller cannot respond to an error condition.

Abandonment could be implemented using os.Exit or runtime.Goexit. However, I believe that panic is the idiomatic method in Go when abandoning because of an error.

When should code return an error, when should code abandon?

To begin, if you are unsure, return an error. The caller can always respond to an error by itself abandoning.

There are a few heuristics that can be used to determine whether to return an error or panic, but they tend to be different ways of looking at the same thing.

Return an error if the failure is due to a missing invariant or requirement that lives outside the program, for example file not found or network error. Return an error if the caller or the program's user will be responsible for correcting the problem. Return an error when the failure might be recoverable. Return an error when you can foresee injecting errors during testing.

Panic if the failure is due to a missing invariant within the program, for example index out of bounds. Panic if the failure is due to a bug, and the developer will be responsible for correcting the problem. Panic when testing would require injecting bugs. Panic when the caller has no reasonable response to the failure.

Some might wonder why not just always return an error. However, error handling code is still code, which carries a certain cost to write and maintain. Error paths, since they are infrequently executed, also tend to have latent bugs. Making your callers pay that price when they don't have a reasonable policy to react to an error is inefficient.

Example from the Standard Library

A good example of panics comes from the standard library in sync.WaitGroup. The method Add performs has three (!) potential calls to panic. Note that the error conditions depend entirely on state within the program. Invalid states arrive when the caller uses the type sync.WaitGroup incorrectly, indicating a bug in the caller.

func (wg *WaitGroup) Add(delta int) {
	statep, semap := wg.state()
	if race.Enabled {
		_ = *statep // trigger nil deref early
		if delta < 0 {
			// Synchronize decrements with Wait.
			race.ReleaseMerge(unsafe.Pointer(wg))
		}
		race.Disable()
		defer race.Enable()
	}
	state := atomic.AddUint64(statep, uint64(delta)<<32)
	v := int32(state >> 32)
	w := uint32(state)
	if race.Enabled && delta > 0 && v == int32(delta) {
		// The first increment must be synchronized with Wait.
		// Need to model this as a read, because there can be
		// several concurrent wg.counter transitions from 0.
		race.Read(unsafe.Pointer(semap))
	}
	if v < 0 {
		panic("sync: negative WaitGroup counter")
	}
	if w != 0 && delta > 0 && v == int32(delta) {
		panic("sync: WaitGroup misuse: Add called concurrently with Wait")
	}
	if v > 0 || w == 0 {
		return
	}
	// This goroutine has set counter to 0 when waiters > 0.
	// Now there can't be concurrent mutations of state:
	// - Adds must not happen concurrently with Wait,
	// - Wait does not increment waiters if it sees counter == 0.
	// Still do a cheap sanity check to detect WaitGroup misuse.
	if *statep != state {
		panic("sync: WaitGroup misuse: Add called concurrently with Wait")
	}
	// Reset waiters count to 0.
	*statep = 0
	for ; w != 0; w-- {
		runtime_Semrelease(semap, false, 0)
	}
}

Summary

In certain cases, panicking is the correct response when an error is detected, when that error is due to a bug. When the caller can't reasonably react to the failure, the caller can't benefit from that error information. This does not mean that the error can be completely ignored, as bugs should be reported. In these cases, a panic is simpler and correct.

Join the discussion...