Garbage Collection In Go

Being a self taught developer without a formal education in CS, I have always wondered where does the program that I’m running stores my variables. I got to know the answer to that when I started learning Mastering Go by Mihalis Tsoukalos. In this blog, I’ll try to cover garbage collection in Golang from a beginner’s perspective.

What is Garbage Collection?

In the literal sense, it means to clear off the memory that is no longer in use. But wait. What is the memory that we are talking about? Where exactly is this memory located at? TLDR - it is heap that we are concerned here. If you don’t want to sit through a lecture about the memory space of a process, skip ahead to the garbage collection algorithm.

Memory space of a process

memory space

To execute a program, the OS spins up a process and the process will have a memory space allocated to it. This is not the actual physical memory (RAM) rather it is the virtual memory that maps to the physical memory. More on that later, in a separate blog. Now this has four segments. Let’s look at them and see which segment is relevant to our topic.

» Code area: This is where your program in its machine language instruction set form is stored.
» Data area: This is where the your program’s global variables are stored.
» Stack: The local variables of a function are stored here. When the program returns i.e. it is done with a function - it removes all the stuff associated with it from the stack. The point to note here is that, you or the OS doesn’t have to deal with clearing this space.
» Heap: Essentially this has the virtual memory addresses of the physical memory. The variables that need dynamic memory allocation - slices (the underlying array), or anything that doesn’t fit in the stack are stored in the heap. So even though a slice is initialized in a local function, it will be stored in the heap and only a pointer to this heap location will be stored in the stack.

The need for Garbage Collection

Now if the local function returns and the slice is popped from the stack, the underlying array of the slice is still present in the heap. Now imagine this happens for hundreds of slices on the stack. You’ll run out of memory very soon. Unless you clear out the ones in the heap that are not used by the process. Now this is garbage collection.

Garbage collection algorithm

Golang’s garbage collector uses a type of mark and sweep algorithm - Tricolor algorithm. This basically means the algorithm is going to mark all the objects that are in use and sweep the objects that are not in use.

How does the algorithm know which ones are in use. The tricolor algorithm categorizes all the objects in the heap into three colors.

» White: all the objects start as white.
» Grey: the objects that we know is being used by the process by it has children objects that are not still explored yet. Basically this is the intermediary state.
» Black: these are objects that has been completely explored by the algorithm and doesn’t have any children that needs to be explored.

The algorithm starts with the roots - any object we know for sure that the process has access to - few of them are -

the global variables
the variables on the stack that is pointing to heap location
anything on the CPU registers that is known to have pointers to the heap object

All the objects that are referenced by the roots are marked grey. The algorithm starts the traversal from these objects and marks each object as either grey or black as it traverses. The ones that are still marked as white at the end of mark phase - meaning the process does not have access to them anymore - will be garbage collected and the heap location is given back to the OS for allocation.

For example, let’s say one of the nodes of the linked list needs to be deleted.

LinkedList Garbage Coll.png

Here node 4 is deleted and needs to be garbage collected. The traversal would start from node 1.

The nodes 1,2 and 4 will not be marked as black as until the node 5 is explored since those nodes will have children to be explored - hence will be left as grey. But eventually all the nodes except the deleted node will be marked black (represented in the image as orange). The node 4 will be marked for garbage collection and will be freed from the heap.

This is the basic algorithm that is followed but the Go team keeps updating the garbage collector to add optimization.

For example, before Go 1.5 the garbage collector was non-concurrent - meaning the process execution will be stopped for garbage collection to take place, hence introducing latency. But why should the world stop for garbage collection. Well there are two reasons for it.

If the GC marks an object as black and then the process removes all the references to it, this object will stay in the heap till the next cycle. Thus increasing the memory footprint.
If the GC is about to free an object from the heap and the process makes a reference to it - it leads to a dangling pointer situation - there is a pointer but the address pointed to is not allocated anymore.

Both the GC and process will step on each other.

But since the version 1.5, the GC is concurrent. What changed? For starters, it is not that the process is not stopped for GC now. But it stops for a short moment only when :

the GC starts - to establish the root objects.
the mark phase is completed - to make sure any object in white isn’t referenced.

Thus reducing the latency by a huge margin. In addition to this, there is write barrier. This is a brief function triggered when the concurrent process makes a reference from an object that is marked black to an object marked white. This write barrier will mark the referenced white object to grey. Hence protecting it from garbage collection.

We have just scratched the surface here. But it is a vast and very interesting topic on itself. You can learn about garbage collection straight from the horse’s mouth here - A Guide to the Go Garbage Collector.

But I hope this piece will ease you into learning more about garbage collection especially if you are self - taught developer like me.

Cheers. 💛