Managing Goroutines

Quite some time ago a friend of mine asked me for help with a little Go project he was working on. He wanted to parallelise some aspects of it but wasn’t sure how to keep all the goroutines under control and handle errors being generated by them.

His program basically consisted of a single goroutine generating work that should then be processed by a worker-pool. These workers then could also create their own sub-workers if necessary. But how should he deal with situations where a worker produced an error?

Since my answer was a bit elaborate I thought it would be worth cleaning up and reposting it here. So this is what has worked for me quite well in the past. Note that this is targeted at worker routines so you mileage may vary depending on you particular problem and context. I’ll also use only tools that come out of the box with Go. Yes, there are things like go-tomb and errgroups but I’ll leave them out for now.

Management levels

At the core of how I prefer to manage goroutines is a simple principle:

When a routine starts another routine, then it is responsible for it.

If nothing else, this means that a routine should always wait for all the others it has created. In Go this is mostly done using sync.WaitGroup. Basically, when you launch a routine, you increment a counter. When the goroutine eventually returns, the last thing it should do is to decrement that counter again. Once you’ve launched everything you just wait for that counter to reach zero again before you move on.

Here a little example:

package main

import (
	"fmt"
	"sync"
)

func main() {
	wg := sync.WaitGroup{}

	// Set the waitgroup to the number of things you want to wait on.
	wg.Add(2)

	for i := 0; i < 2; i++ {
		go func(i int) {
			defer fmt.Printf("%d is done\n", i)
			defer wg.Done()
			fmt.Printf("%d started\n", i)
		}(i)
	}

	// wg.Wait will block until the waitgroup has reached 0 again.
	wg.Wait()
	fmt.Println("We are done!")
}

Abort with Contexts

So, now we know how to wait until all sub-goroutines have finished before moving on. But can we somehow cancel a goroutine from the outside? Let’s say, we have an infinite loop that should run all the time (e.g. for a Webserver waiting for incoming requests) inside a goroutine:

func worker(wg *sync.WaitGroup) {
	defer wg.Done()

	for {
		time.Sleep(time.Second * 1)
		fmt.Println("Working...")
	}
}

We have no way to stop this loop except for killing the whole process with a call to os.Exit(...) or similar.

This is where contexts come in. Contexts have a property where they can be marked as “Done”. With this in mind, when you have a long-running goroutine, have it check every once in a while (e.g. every iteration of some loop) if that context you go is marked like that:

func worker(ctx context.Context, ...) {
	for {
		select {
			case <- ctx.Done():
				return
		}
	}
}

Cancelling a context requires that you somehow get access to a so-called cancel function. You get that by creating a context (or wrapping an existing one) with the context.WithCancel(ctx) function. This will return first of all a new context but also a cancel function:

ctx, cancel := context.WithCancel(ctx)
go worker(ctx)

// Something happens and you want the worker to stop:
cancel()

To learn more about contexts outside of the official documentation, I can highly recommend Francesc Campoy’s justforfunc episode about them!

Dealing with results

While with synchronous functions you can simply take the return value of a function as its result, with goroutines that’s a bit more complicated. The return value of a function that is called a goroutine is most of the time simply not what you’re looking for.

Instead, you have more or less two options for somehow communicating any kind of output from inside of the goroutine to the outside world:

  • A global data structure with some Mutex/Lock system around it to prevent multiple goroutines from modifying it at the same time.
  • A channel

Especially for scenarios where the number of inputs and outputs is not known in advance (e.g. with some kind of data processing pipeline), channels are a good first start. Just create a channel in the goroutine that will launch your worker and pass it into the worker.

results := make(chan SomeResult, 1)
go worker(ctx, ..., results)

Errors are just another result

But what about errors? Well, errors are just another kind of output! Personally, I’m always somehow fighting with myself when it comes to if I want to have a single output channel (with some struct that can also indicate if the result is erroneous or valid) or multiple channels to give errors their own little pipe to end up in.

Just keep in mind: errors are just results. There is nothing magic about them in Go, so don’t become too inventive here 🙂

Producers close channels

One thing to keep in mind, though, when working with channels is that the goroutine that writes into them, should also close them. Why? Among other things also because you can then loop over a channel as a reader and that loop will stop if that channel is closed:

for item := range myChannel {
	// do something with the item
}

// myChannel has been closed.

You close a channel when you know that you no longer want to write anything into it. That’s something only the writer can reliably know!

To summarise

OK, this has been a lot but my preferred way of dealing goroutines boils down to these few rules:

  • The goroutine that launches another goroutine, is responsible for it and must have a way to stop it!
  • Use WaitGroups to know when a goroutine has exited.
  • Use Contexts to communicate into a goroutine that it should exit.
  • Errors are just another kind of result and use thread-safe data structures or channels for communicating results.

I’m, obviously, not perfectly following these rules all the time, but I try to and every time I haven’t it has bitten me 🤷‍♂️