Design Comments on sync.WaitGroup

Wed May 15, 2019
~800 Words
Tags: programming, Go

The goal of this post is to look at the design of sync.WaitGroup. This type is obviously subject to Go’s compatibility guarantee, so it won’t be changing. Nevertheless, the curious might be interested in a comparison with an alternate design, and those ideas might be useful elsewhere. This comparison is by looking at sync.WaitGroup as one type of executor.

An executor implements a policy to set where, when, and how to run a function or closure. For Go, where all the examples examined have used goroutines under the hood, the function or closure is always run on locally, which determines the where and how. Still, there is flexibility around when. Additionally, they can wrap behaviour to provide facilities beyond simply being an executor (for example, errgroup.Group).

The type sync.WaitGroup is probably familiar to many Go users. Still, a quick primer for the uninitiated follows. The code below is a slightly modified example straight from the documentation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Work to be completed
var urls = []string{ /* contents omitted */ }

wg := sync.WaitGroup{}
for _, url := range urls {
    // Increment the WaitGroup counter.
    wg.Add(1)
    // Launch a goroutine to fetch the URL.
    go func(url string) {
        // Decrement the counter when the goroutine completes.
        defer wg.Done()
        // Fetch the URL.
        http.Get(url)
    }(url)
}
// Wait for all HTTP fetches to complete.
wg.Wait()

Using WaitGroup is straightforward. There are three steps required. First, increment the counter for every goroutine (line 7). This must be done outside of the goroutine itself. Second, the goroutine must decrement the counter as it terminates (line 11). Third, call Wait to pause the parent goroutine and wait for the child goroutines to terminate (line 17). Although perfectly serviceable, the interface does have a few areas where it can be misused. The author might admit to having put the call to Add inside the goroutine a few times, which is definitely a bug.

An alternate interface is used by other types of concurrency packages, such as errgroup.Group, parallel.Run, and parallel.Runner1. Rewriting the previous example, we have:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Work to be completed
var urls = []string{ /* contents omitted */ }

wg := errgroup.Group{}
for _, url := range urls {
    // Launch a goroutine to fetch the URL.
    url := url // https://golang.org/doc/faq#closures_and_goroutines
    wg.Go(func() error {
        // Error handling omitted for comparison with previous example.
        // Refer to example on godoc for a more complete example.
        http.Get(url)
        return nil
    })
}
// Wait for all HTTP fetches to complete.
wg.Wait()

With this approach, there are only two operations. First, launch goroutines using closures (line 8). Second, wait for the child goroutines to terminate (line 16). This simpler interface leaves less options for users, which should make it easier to use. Additionally, some types of executors cannot be implemented using the original model. For example, if the goal is to limit concurrency, then the callbacks must be put into a queue, and this cannot be done with the first approach.

Advantages for sync.WaitGroup

Although there are some marginal usability improvements for the second interface, sync.WaitGroup has a few small performance advantages.

  1. Instead of calling Add for each goroutine, it is possible to increment the counter by more than one (see example). This might have a small performance boost, but the time for calls to Add are quite small in comparison to starting each goroutine.

  2. The compiler uses some internal magic to reduce memory allocations when starting a goroutine using a closure. Benchmarking the first example does not show any allocations for the closure2, the second example does. For functions, there wasn’t a measurable difference in the performance of the two approaches (t=0.41us)3. For closures, the second approach was measurably slowed, going from t=0.41us to t=0.45us on the test hardware.

Summary

The interface provided by the standard library for sync.WaitGroup is useable, but perhaps a little more difficult to use than necessary. It also does not generalize to other types of executors. However, it may provide opportunities for optimization by limiting calls to Add, and does reduce allocations when the callback is a closure.


  1. Although the name is the same, this is two separate packages: github.com/juju/utils/parallel and gitlab.com/stone.code/parallel. [return]
  2. The compiler must obviously be allocating for each goroutine. However, those allocations don’t appear to be recorded. [return]
  3. These numbers come from benchmarking during CI for ‘gitlab.com/stone.code/parallel’. There is limited data, and benchmarking on shared hardware is noisy. [return]