Stacking up the heap

In the last post, I promised to see how far we can go using only ‘the stack’ instead of ‘the heap’. I lied, slightly, as instead of using ‘the stack’, we’ll be creating our own stack allocator, separate from the automatic call-stack.

Custom stack allocators may be familiar to people who’ve written embedded software for constrained systems.  This kind of allocator works by pre-acquiring a large block of memory (which you typically acquire from ‘the heap’!), and then treating it as a simple stack-of-bytes data structure. If the user requires some bytes, you increment the top of your stack by the requested amount — this is about as simple and as fast as it gets!

I’ve also seen these referred to as ‘mark and release‘ allocators, because at arbitrary points the user can record the value of the ‘top’ of the stack (i.e. make a mark of how big the stack is), allocate some more data from it (which increases the ‘top’ value), and then release just this data by resetting the stack’s ‘top’ value back to the marked value. Alternatively, to release all data in the stack, you can set the ‘top’ value back to zero.

The obvious downsides to this allocator are:

  1. Coarse lifetime management: only whole, contiguous blocks of data can be released, and only from the top of the stack.
  2. It’s great for plain-old-data, but not great for true C++ objects, which require their destructors to be called when released.
  3. If the stack fills up, you’re in trouble!
  4. Dangling pointer bugs are hard to catch (if you ‘release’ an object while some other part of the program still has a pointer to that object, you’re gonna have a bad time).

A simple stack

Here’s what a simple implementation of the above ideas might look like. Given a range of memory, this Stack class will allow you to allocate small portions from it, and manage their lifetimes using the mark-and-release method. The ScopedMalloc class is a little helper that will use malloc to grab some heap memory for our stack to use, and automatically call free when it goes out of scope.

The actual implementations in the eighth engine are slightly more complex, located at <eight/core/alloc/stack.h> and <eight/core/alloc/malloc.h>. NonCopyable is implemented at <eight/core/noncopyable.h>.

class StackAlloc : NonCopyable
  StackAlloc( u8* begin, u8* end )
    : begin(begin)
    , end(end)
    , top(begin)
  void* Alloc( uint size )
    u8* allocation = top;
    top += size;
    assert( top <= end );
    return allocation;
  void        Clear()      { top = begin; }
  const void* Mark() const { return top; }
  void Unwind(const void* mark)
    assert( mark >= begin && mark <= top );
    top = (u8*)mark;
  u8* begin;
  u8* end;
  u8* top;
class ScopedMalloc : NonCopyable
   ScopedMalloc( uint size )
	   : buffer( malloc(size) ) {}
  ~ScopedMalloc() { free( buffer ); }
  operator u8*()  { return (u8*)buffer; }
  void* buffer;

struct TestObject { int x, y, z; };
uint size = sizeof(TestObject)*100;
ScopedMalloc memory( size );
StackAlloc stack( memory, memory+size );

TestObject* obj1 = (TestObject*)
  stack.Alloc( sizeof(TestObject) );
obj1->x = 42;

const void* mark = stack.Mark();
TestObject* obj2 = (TestObject*)
  stack.Alloc( sizeof(TestObject) );

obj2->x = 1;//ERROR, this object has been released
obj1->x = 1;//OK, this object is still alive

By itself, this simple stack is only useful in cases where you really need super-fast allocations, you’re using plain-old-data (not C++ objects), you know your upper memory-usage limit, and you have a lot of faith in your discipline to avoid the above-demonstrated dangling-pointer type bugs when using the mark-and-release method.

As you can see above, I’ve simply not handled the out-of-memory condition, beyond putting an assertion there to document my assumption that it shouldn’t happen! In many software jobs, this kind of reckless behaviour might get you a stern talking to, but in games and embedded software, you often have hard upper-limits on memory usage anyway, where allocation failure is a real possibility! If you use too much memory, there’s often nothing you can do about it but crash and burn. The solution in these situations… is to simply not end up in them – to ensure that you never do get to the point where allocations fail. Games are not general purpose systems; we can usually, ahead of time, place artificial limits on every piece of content, such as the number of monsters or bullets that can be active at a time. We can calculate and stick to an upper bound. </side rant>

Dispite the simplicity (and dangers) of this method of memory management, I have seen it used as the backbone of more than one large console game in the past – often with just a few different stacks. e.g. one for the main character and menus, and then three more for storing three geographical ‘zones’ of the game world, which are streamed in/out as the character moves around (2 active ones, and one being streamed in).

Scoping it out

The inspiration for the next layer comes from a DICE presentation by Andreas Fredriksson, which I’d recommend you read for another point of view on this topic.

If we compare the above Stack class with the built-in automatic ‘call-stack’ (the place where local / ‘automatic lifetime’ variables are allocated from), there’s one big, important difference:

  • ‘The stack’ gives us scopes — blocks of code enclosed in braces.
  • Every time we encounter a closing brace (}), all the local variables that were created inside that block of code are destructed.
  • Our Stack class can kind of approximate this with careful use of mark/release, but doesn’t call destructors…

To reconcile our Stack allocator with idiomatic C++ objects (that have constructors/destructors), and to remove the requirement to manually use the mark-and-release method, what we can do is introduce a Scope class. This class will manage lifetimes (i.e. it will mark and release the stack for us), and it will execute destructors of objects that have been released.

The eighth engine implementation is in <eight/core/alloc/scope.h>, but we’ll also need <eight/core/alloc/new.h> for some macros that we’ll be using instead of the new keyword.

To jump straight into an example, let’s first look at how native / built-in scopes work. When we create a local variable, it’s lifetime is from the point of creation to the end of the scope that it’s created inside (i.e. up until the parent scope’s “}).

struct Object : NonCopyable
  Object(const char* name) : name(name)
            { printf("constructed %s\n", name); }
  ~Object() { printf("destructed %s\n", name); }
  const char* name;
  Object o1( "o1" );
    Object o2( "o2" );
  Object o3( "o3" );

This produces the output:

constructed o1
constructed o2
destructed o2
constructed o3
destructed o3
destructed o1

The equivalent example using new/auto_ptr (N.B. auto_ptr is deprecated in favor of unique_ptr) would use new to allocate the objects from the heap, but then ties them to the block lifetimes via the auto_ptr local variables:

  std::auto_ptr<Object> o1( new Object("o1") );
    std::auto_ptr<Object> o2( new Object("o2") );
  std::auto_ptr<Object> o3( new Object("o3") );

The equivalent example using our Scope class first creates a stack just like before, using a large block of memory acquired from the heap, but then it creates a Scope local variable to handle allocation of the objects. For the object in the inner scope (o2), we create a second Scope object that is a child of the first:

//create a stack allocator in a chunk of heap memory, as earlier
  uint size = eiMiB(1);
  ScopedMalloc memory( size );
  StackAlloc stack( memory, size );
//Wrap up the stack in a scope object to manage object lifetimes
  Scope a( stack, "main" ); // by convention, the default name for a scope variable is 'a'

  Object* o1 = eiNew(a, Object)( "o1" ); //alloc in 'a' scope
    Scope inner( a, "inner" ); // child scope (mark stack)
    Object* o2 = eiNew(inner, Object)( "o2" ); //alloc in 'inner' scope
  }//unwind stack to the mark, destruct objects created in 'inner'
  Object* o3 = eiNew(a, Object)( "o3" ); //alloc in 'a' scope

These last two examples produce the same results as the first, but demonstrate alternate methods of allocation: the typical C++ RAII pattern, and Eight’s Stack/Scope pattern.

In the C++ RAII (auto_ptr) example above, each allocation is kept track of by one smart-pointer object, which itself is a local variable. When this local variable goes out of scope, the allocation that it’s tracking is properly released (which includes calling the destructor).

In the Scope example above, one Scope object can keep track of multiple allocations. When a Scope object (which is a local variable) goes out of scope, all allocations that were made using it are properly released (which includes calling destructors).
i.e. both approaches are much the same, but the former uses a one-to-one relationship between allocations and their RAII containers, while the latter uses a many-to-one relationship.

A new new

The eiNew macro/keyword warrants some explanation. Inside the eighth engine codebase, usage of new/malloc are discouraged in favour of these engine-specific replacements. Also, global state and singletons are discouraged, which includes global memory allocators like new and malloc!

The allocation macro/keywords come in 5 flavours:

  • eiNew(          allocator, type )( ... );
  • eiAlloc(        allocator, type );
  • eiNewArray(     allocator, type, count );
  • eiAllocArray(   allocator, type, count );
  • eiNewInterface( allocator, type )( ... );

All of these varieties take an allocator object (such as a or inner in the above examples) as the first argument, and a typename as the second argument. The two *Array varieties take a count as the third argument, which is how many instances to allocate (equivalent to new type[count] or malloc(sizeof(type)*count)).

The *Alloc* varieties only allocate memory; they do not call constructors, and destructors will not be called. These are for use in places where you might traditionally use malloc.

The *New* varieties do call constructors and also register the allocations with the Scope object so that their destructors will automatically be called when the Scope is destructed. These are for use with C++ classes and places where you’d traditionally use neweiNew and eiNewInterface should be followed by the constructor arguments, like would normally follow after new type.... Unfortunately, eiNewArray does not support constructor arguments, so only works with types that are default-constructible.

For example, instead of a typical C++ allocation like this:
      Object* o1 = new Object( "o1" );
A typical eighth engine allocation looks like this:
      Object* o1 = eiNew(a, Object)( "o1" );

Up next

Ok, so that’s scope and stack allocation in a nutshell, but what good is it? The next post will compare it to both traditional heap and stack allocations and explore some of it’s uses. We’ll also see more examples of how it’s used practically inside the eighth engine.