The goal of this post is to look at the design of
sync.WaitGroup. This type is obviously subject to Go’s compatibility guarantee, so it won’t be changing. Nevertheless, the curious might be interested in a comparison with an alternate design, and those ideas might be useful elsewhere. This comparison is by looking at
sync.WaitGroup as one type of executor.
An executor implements a policy to set where, when, and how to run a function or closure. For Go, where all the examples examined have used goroutines under the hood, the function or closure is always run on locally, which determines the where and how. Still, there is flexibility around when. Additionally, they can wrap behaviour to provide facilities beyond simply being an executor (for example,
sync.WaitGroup is probably familiar to many Go users. Still, a quick primer for the uninitiated follows. The code below is a slightly modified example straight from the documentation.
WaitGroup is straightforward. There are three steps required. First, increment the counter for every goroutine (line 7). This must be done outside of the goroutine itself. Second, the goroutine must decrement the counter as it terminates (line 11). Third, call
Wait to pause the parent goroutine and wait for the child goroutines to terminate (line 17). Although perfectly serviceable, the interface does have a few areas where it can be misused. The author might admit to having put the call to
Add inside the goroutine a few times, which is definitely a bug.
With this approach, there are only two operations. First, launch goroutines using closures (line 8). Second, wait for the child goroutines to terminate (line 16). This simpler interface leaves less options for users, which should make it easier to use. Additionally, some types of executors cannot be implemented using the original model. For example, if the goal is to limit concurrency, then the callbacks must be put into a queue, and this cannot be done with the first approach.
Although there are some marginal usability improvements for the second interface,
sync.WaitGroup has a few small performance advantages.
Instead of calling
Add for each goroutine, it is possible to increment the counter by more than one (see example). This might have a small performance boost, but the time for calls to
Add are quite small in comparison to starting each goroutine.
The compiler uses some internal magic to reduce memory allocations when starting a goroutine using a closure. Benchmarking the first example does not show any allocations for the closure2, the second example does. For functions, there wasn’t a measurable difference in the performance of the two approaches (t=0.41us)3. For closures, the second approach was measurably slowed, going from t=0.41us to t=0.45us on the test hardware.
The interface provided by the standard library for
sync.WaitGroup is useable, but perhaps a little more difficult to use than necessary. It also does not generalize to other types of executors. However, it may provide opportunities for optimization by limiting calls to
Add, and does reduce allocations when the callback is a closure.
Although the name is the same, this is two separate packages: github.com/juju/utils/parallel and gitlab.com/stone.code/parallel. ↩︎
The compiler must obviously be allocating for each goroutine. However, those allocations don’t appear to be recorded. ↩︎