On arena allocators

Modern software finds it tempting to liberally use dynamic memory allocation. Most objects die young, so applications often see large amounts of churn. This is mostly harmless for programming languages with copying, generational garbage collectors, as there allocation is essentially free (pointer bump and well-predicted conditional branch), and the running time of the nursery GC pass is proportional to the number of live objects, not the number of allocations.

Regardless of how much you can avoid heap allocation, large amounts of it tend to be inevitable for upkeep of maintainability of the project, the sanity of developers and consistent ownership guarantees regarding provenance of pointers.

Recent years have seen a resurgence of interest in arena allocators, very conceptually similar to copying garbage collectors. Recall that a copying garbage collector works like such: we allocate two memory regions called the from-space and to-space of fixed sizes (bigger is better for throughput, but worse for latency, typically). Allocation works like so:

void * alloc(size_t size) {
  if (free_ptr + size > from_space_end) {
    collect();
  }
  void * ptr = free_ptr;
  free_ptr += size;
  return ptr;
}

When the collect() function is called, we copy all live objects from the from-space to the to-space, and then swap the roles of the two spaces. This means that all allocations are very fast (just a pointer bump), and deallocations are also fast (just a pointer reset). Downsides are the need to forward the objects to the new space and update pointers (there are clever algorithms for this), and the need to have enough memory for both spaces.

An arena allocator is a simplified version of this idea, where we only have one space (the arena), and we don't do any copying or collecting. Instead we adapt a similar idea to languages with ownership: the unique owner of the arena is responsible for allocating and deallocating it, all objects present in the arena have the same lifetime as the arena itself, the arena typically doesn't grow (if implemented similarly to the copying garbage collector version).

The main advantage of these types of is that they can be much faster than traditional heap allocators, especially for workloads with a lot of short-lived objects. They also have better cache locality, since all objects are allocated contiguously in memory. Finally, they're a whole lot simpler to implement than a general-purpose heap allocator, and can be easily tuned for specific workloads.

#include <stddef.h>
#include <stdint.h>
#include <stdalign.h>
#include <sys/mman.h>
static int align_up(size_t n, size_t alignment, size_t * out) {
  size_t mask = alignment - 1;
  if ((alignment & mask) != 0) return -1;
  if (n > SIZE_MAX - mask) return -1;
  *out = (n + mask) & ~mask;
  return 0;
}
typedef struct {
  unsigned char * arena;
  size_t size, used;
} Arena;
int arena_create(Arena * a, size_t size) {
  a->arena = NULL;
  a->size = a->used = 0;
  if (size == 0) return 0;
  void * ptr = mmap(
    NULL,
    size,
    PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_PRIVATE,
    -1,
    0
  );
  if (ptr == MAP_FAILED) return -1;
  a->arena = ptr;
  a->size = size;
  return 0;
}
void * arena_alloc(Arena * a, size_t size) {
  size_t aligned_size;
  if (size == 0) size = 1;
  if (align_up(size, alignof(max_align_t), &aligned_size) != 0)
    return NULL;
  if (aligned_size > a->size - a->used)
    return NULL;
  void * ptr = a->arena + a->used;
  a->used += aligned_size;
  return ptr;
}
void arena_reset(Arena * a) {
  a->used = 0;
}
void arena_destroy(Arena * a) {
  if (a->arena != NULL)
    munmap(a->arena, a->size);
  a->arena = NULL;
  a->size = a->used = 0;
}

So this is typically how far arenas go in terms of functionality in most cases. However, there is one very useful variation to this design that allows for growth.

Growable arenas? ⎇

The issue with a naive implementation of a growable arena is that all pointers to objects within the arena have to be updated when the arena grows, since upon reallocation we are not guaranteed that a contiguous block of virtual memory of new size at the same location was available. There is, however, an asymptotically and generally good way out of this problem.

We borrow a standard trick from the stretchy vector construction. The arena is allocated in blocks of exponential size (starting at some reasonable default), and we maintain a list of pointers to these blocks. When we need to allocate an object, we check if the current block has enough space for it; if not, we allocate a new block and add it to the list. This way, we can grow the arena without having to update any pointers, since each block is allocated independently.

An implementation of this idea might look a bit like so:

#define BLOCK_BASE ((size_t)4096)
#define BLOCK_MAX 32
typedef struct {
  unsigned char * blocks[BLOCK_MAX];
  size_t sizes[BLOCK_MAX], used[BLOCK_MAX];
  size_t block_count, next_block_size;
} GrowableArena;
void growable_arena_create(GrowableArena * a) {
  a->block_count = 0;
  a->next_block_size = BLOCK_BASE;
}
static int growable_arena_add_block(GrowableArena * a, size_t min_size) {
  if (a->block_count == BLOCK_MAX)
    return -1;
  size_t block_size = a->next_block_size;
  if (block_size < min_size)
    block_size = min_size;
  if (align_up(block_size, alignof(max_align_t), &block_size) != 0)
    return -1;
  void *ptr = mmap(
    NULL,
    block_size,
    PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_PRIVATE,
    -1,
    0
  );
  if (ptr == MAP_FAILED)
    return -1;
  size_t i = a->block_count;
  a->blocks[i] = ptr;
  a->sizes[i] = block_size;
  a->used[i] = 0;
  a->block_count++;
  if (a->next_block_size <= SIZE_MAX / 2)
    a->next_block_size *= 2;
  else
    a->next_block_size = SIZE_MAX;
  return 0;
}
void * growable_arena_alloc(GrowableArena * a, size_t size) {
  size_t aligned_size;
  if (size == 0) size = 1;
  if (align_up(size, alignof(max_align_t), &aligned_size) != 0)
    return NULL;
  if (a->block_count == 0)
    if (growable_arena_add_block(a, aligned_size) != 0)
      return NULL;
  size_t i = a->block_count - 1;
  if (aligned_size > a->sizes[i] - a->used[i]) {
    if (growable_arena_add_block(a, aligned_size) != 0)
      return NULL;
    i = a->block_count - 1;
  }
  void * ptr = a->blocks[i] + a->used[i];
  a->used[i] += aligned_size;
  return ptr;
}
void growable_arena_destroy(GrowableArena * a) {
  for (size_t i = 0; i < a->block_count; i++) {
    munmap(a->blocks[i], a->sizes[i]);
    a->blocks[i] = NULL;
    a->sizes[i] = a->used[i] = 0;
  }
  a->block_count = 0;
  a->next_block_size = BLOCK_BASE;
}

A more parsimonious implementation might try to fit smaller objects in preceding blocks. However, note that this would not only require more complicated management, but also is not super useful in practice: you typically want to allocate large objects with your general purpose allocator, and use the arena for small objects that are allocated in large numbers (e.g. AST nodes in a compiler). In this case, the exponential growth strategy is usually sufficient to keep fragmentation low.