Stack persistence vs. API simplicity

I posted a rather poorly-worded question on today's episode about a tradeoff between persistent data and a simple API. I hope this post makes better sense. Not sure.

I'm programming something in C++ (and yes it uses a nerfed subset of OO design). One of my objectives (inspired by handmade hero) is to keep all code on the stack. I plan for my code to release as a library with a boiled-down, simple API. I don't want the user to do any complex initializations, or hand any empty containers or empty pointers into functions being called. I don't want the user to do any persistence support programming at all, other than to request 1 or 2 initializations per functional facility. The rest of the user experience should be to access high-level functions that give access and control over highly transformed data held and maintained by the library.

The usage scenario is that the tech user can write his or her own driver atop the library, meaning the user controls the main loop of program execution and thus the ownership and persistence of a few top-level objects. I want him/her to focus on their own code, not in maintaining mine.

That's not a problem, until the libraries are called to create complex data structures, usually with nested objects. Some of these data structures need to persist for the lifetime of the user's driver without any maintenance, or even without the user's knowledge.

The problem with doing things entirely on the stack is that although the driver code may possess pointers to top-level objects in a library, the library is responsible for creating and storing fairly complicated multi-level substructures internally. If creation and storage of these items is done on some lower level of the stack, there are only three things I can do to persist them.

1. Require the user to preallocate empty spaces and containers for objects, taking creation ownership of them in the driver. This neither practical or desirable, since it defeats the principle of a simple API, and is in effect pushing responsibility for container creation on the user, making him/her responsible for memory management. Secondly, you can't really know how many of something you will need in dynamic objects, which may contain X number of sub-objects determined by running code at a lower level in a certain context.

2. Using the static keyword at object creation time in the library. This is something like heap programming. You can for example, create a struct instance inside a function and allow it and its pointer to be attached to a container created at the same level using the same technique. This will persist after the initializations have been done on the object. The only drawback is that it lasts for the lifetime of the program.

3. Use the heap. I am trying not to go there.

Persistence of larger internal data structures is something I do not want the user to directly access except through high-level functions and indirection. I generally want to store things at the same level I'm creating and allocating them, and it seems there is no way to do this other than 2 of the 3 ways listed above.

Am I totally not grasping the concepts of top-down creation and allocation here? Am I wrong in thinking the API will suffer if I demand creation be done always at a higher level?

Sorry this was so long...
I prefer API to delegate memory management to me. Because then I can control where objects will live. On stack, in global static memory, on heap (allocated with one malloc call or many, depending on situation). If API starts to control where it allocates its memory it looses flexibility for optimizing memory usage.

In case you need to pass some memory to API, you could support both variants - either user manages memory, or you allocate it somewhere (heap?). Simplified example:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
bool LibInit(LibContext* ctx, LibMemoryBlob* blob)
{
  if (!blob) {
    blob = malloc(sizeof(*blob));
    ctx->allocated = true;
  }
  ctx->blob = blob;
  ...
  return true;
}


This allows me to use stack memory allocation, or delegate memory management to library:
1
2
3
4
5
LibMemoryBlob blob;
LibInit(&ctx, &blob); // make library to use blob on my stack
...

LibInit(&ctx, NULL); // make library to manage memory itself


And please don't ever use static variables in API library. That doesn't play well with DLL reloading (as seen in HH). Or with multi-threading. Or with unit-testing.
In addition to what mmozeiko wrote:

I prefer to point the API to allocation/deallocation functions, so my application gets called when memory is needed (I am not sure if you meant it this way mmozeiko).

You are also risking to exhaust the clients stack space if you push too many huge structures on the stack. And then maybe someone uses recursion.
This is a very complex subject with a lot of things to pay attention to, but two things I would mention:

1) Libraries should aggregate allocations as much as possible. So, generally speaking, if the library needs 1 megabyte of persistent memory to make some structural thing that functions as a unit, it should not force the app to deal with 50 smaller chunks that add up to 1 megabyte.

2) Libraries should allow the user to pass the memory in. This can be coded fairly cleanly if you plan for it ahead of time by doing things like this:

1
2
size_t GetSizeForThingee(parameters);
thingee *CreateThingee(void *Memory, parameters);


and internally these can be coded with an aggregate memory allocator, which looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
prepped_alloc PrepThingeeForAlloc(parameters)
{
    prepped_alloc Prep = BeginAllocPrep();
    ... bunch of macro calls here that describe all the allocations
    ... and their internal pointers to itself
    EndAllocPrep(Prep);
}

size_t GetSizeForThingee(parameters)
{
    prepped_alloc Prep = PrepThingeeForAlloc(parameters);
    size_t Result = GetSize(Prep);
    return(Result);
}

thingee *CreateThingee(void *Memory, parameters)
{
    prepped_alloc Prep = PrepThingeeForAlloc(parameters);
    thingee *Result = (thingee *)PlaceAlloc(Prep, Memory);
    return(Result);
}


Everything can go through this path, and then you just make the Prep-based memory allocator once and it does everything.

- Casey
I appreciate all your responses. Perhaps option #1 is something I should reconsider. Offering API flexibility to the user is a point I missed. And I hadn't gone deeply enough into it to even consider aggregation issues.

What a good API should be is clearly something to consider, should it be designed deliberately at some point or naturally emerge from writing usage code first? We all have some idea of what a good API "should be", but there is something BDUF about planning it all out beforehand. As I'm writing my own code, it's turning out that I tend to favor returning packaged data structures for the user, and minimizing the number of function calls offered. If you hand them access to aggregate data and maybe an iterator or two, perhaps that's 80% of it right there.

Also, I wanted to add another option to the three I listed, because it's a new thing I don't fully grok yet:

4. Move semantics. Apparently, a move-equipped structure (aka: struct equipped with a move constructor and move assignment operator) can be created in a temporary/automatic context and returned like a persistent object to the caller. No special facilities (like new or static) are needed at the point of creation. It's transparent to both usage and implementation code, and it's efficient. But it's hard to understand.
If we are talking about C then there are no move constructors. You simply pass object by address:
1
2
3
4
5
6
7
8
void LibFunction(Object* obj)
{
  obj->x = 10;
  ...
}

Object obj
LibFunction(&obj);


But if you are creating C++ API then of course it's different story. I would suggest to not create C++ API. Because C++ doesn't have standard ABI. Every compiler/OS does it in it's own way.
I'm pretty much talking about C++ in either OO-style or C-style. ABI shouldn't matter too much as long as the code can compile on different platforms.

The example you gave over-simplifies the problem. I don't want to require the user to supply ANY DATA the library should rightfully be responsible for creating and maintaining internally. It's simply not the API user's domain to provide empty data structures for the library to use internally.

Correct me if I'm wrong, but it seems the reason handmade_hero code can achieve doing everything on the stack is because there is a limited amount of data the program itself has to persist between calls. Data is persisted through the driver (the main loop); or in large chunks by a custom memory management scheme, like with Arenas; or managed in the background by the Win32 API. And to a minor extent, globals and statics. There just isn't any other way to do it, and I'm pretty sure Casey isn't using move semantics.

I might go along with Casey's suggestion of a custom memory allocator, if it were optional. There may be situations where the API user wants to hand over a large chunk of memory to the library, to maintain control over maximum memory usage. Normally, though, the user simply defines how much maximum memory for the library's facilities to use and doesn't physically allocate that memory themselves.

What this all implies is that some kind of allocation scheme has to be present in a library, whether it's freestore, static or a customized manager. Data handed back to the user from function calls should be in persistent enough that once they are received and assigned by the user, they can be released by the library. Mechanisms handed to the user, like cursor or iterators into large internal structures should have some kind of dispose() method to allow the library to deallocate the memory assigned for them.

It really makes me wonder why the people who came up with these languages didn't seize upon the idea of cross-stack frame persistence. It would solve a lot of problems if the current frame context could access leftover data from previous frame contexts of all function calls it made directly.

For example if the current calling context made calls to A() B() and C(), it could access the leftover local data from those functions by the name of the function: like $A.somelocaldata, or $B.somestruct. So, if function C() constructed a complex object X in its local context, on return it would become available to the caller's context as $C.X - obviating the need for persistence strategies like a freestore. If you wanted to "bubble-up" data from functions you called, there would be some keyword like "preserve C" to allow the parent stack frame to continue to access the data structure created in C().

When you returned from the current frame context, data to A, B, and C would be reclaimed, unless it was marked as preserved.

To release chains of preserved resources, the current frame context would just not use the preserve keyword, and return.

I don't know why nobody ever thought of doing this.

Edited by noxy_key on Reason: Forgot a sentence
Yes, I said this on Handmade Hero explicitly: most allocation schemes can be completely handled by n overlapping stacks with differing pop times. Why no languages understand this, I don't know. Surely I am not the first person to point it out?

- Casey
You're right about statics. Even in local functions that attempt to initialize new static structs, they will continue (silently) to return the address of the first struct created.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <cassert>

using namespace std;

struct ATOKEN { const char * data; };

struct StaticExample
   {
   ATOKEN * tokens[10];

// 0 = run the freestore version
// 1 = run the static version
#if 0
   void create_and_store (const char * str, int idx, ATOKEN ** tok_ary)
      {
      static ATOKEN token;
      token.data = str;
      tok_ary[idx] = &token;
      }
#else
   void create_and_store (const char * str, int idx, ATOKEN ** tok_ary)
      {
      ATOKEN * token = new (nothrow) ATOKEN;
      assert(token);
      // add to a list or something of addresses in custom memory manager
      token->data = str;
      tok_ary[idx] = token;
      }
#endif
   };

int main (void)
   {
   const char * bigstr = "1234567890";

   StaticExample S;

   for (int i = 0; i < 10; ++i)
      S.create_and_store (&bigstr[i], i, S.tokens);

   for (int i = 0; i < 10; ++i)
      cerr << "Address: " << S.tokens[i] << " Value: " << S.tokens[i]->data << endl;

   for (int i = 0; i < 10; ++i)
      if (!S.tokens[i]) delete S.tokens[i];
   }



And I think I understand
I must have missed that episode :)

It would be a very cool to have such a feature. That would put the C++ standards committee to work. Have you ever heard of any language that does this?
cmuratori
Surely I am not the first person to point it out?

Indeed you are not.


Allan Bowhill
It would be a very cool to have such a feature.

They sort-of thought about this, in the sense that the language supports custom new/delete and the STL supports custom allocators.

Unfortunately, the custom allocator API in the STL is beyond useless. In fact, the main difference between STL and EASTL is that EASTL cleaned up the allocator interface.


Allan Bowhill
Have you ever heard of any language that does this?

Plenty!

The technical term for this is region-based memory management. It's a natural fit for nondeterministic logic languages like Prolog, because backtracking and stack popping are more or less the same thing. It's also been implemented as part of the language in Standard ML, Cyclone, and ParaSail.

But, of course, lots of programs which need real-time performance implement them in their libraries. Apache, for example, uses what it refers to as memory pools.
The stl allocator interface has been cleaned up a lot lately.

I was particularly pleased that the scoped_allocator_adapter worked as advertised for the casey-esk asset system i experimented with.
Building a memory manager is desirable, especially if a requirement is to allocate memory limits defined by the user. But stack preservation built into the language is a different thing, and like malloc/new, deallocation is not automatic with a memory manager.
For example, a stack preservation scheme would trail automatic cleanup of local variables behind one stack frame than it currently does right now.

Currently, in C++ A calls B and when B returns, A cannot access B's locals.

In a trailing cleanup scheme, only one level of previous frames won't get cleaned-up until the current context ends. So after a call to B() returns, A can access B's locals. When A returns, then do B's locals get destroyed.

======================================
Frame A
B();
S = $B.S; // S is accessible via special syntax
======================================
Frame B
S = (some struct created locally)
======================================

Further if a trailing scheme had a "preserve <function name>" keyword, you
could bubble-up values at no expense. When A calls B calls C, then returns
to B, all locals in C are still accessible to B's context. When B issues
preserve and returns to A, A has access to all B and C locals. When A returns,
the preserve chain is automatically destroyed. In this case, the chain is only
1 long, but it could be N frames long.


======================================
Frame A
B();
S = $B.C.S; // S is accessible via preserve
======================================
Frame B
C();
preserve C;
======================================
Frame C
S = (create struct in this frame)
======================================
Isn't the nature of stack allocation such that if a frame has been popped out, that frame is essentially "lost" ? So a frame preservation scheme would not work, unless I missed something.

Here's a recap of the the situation :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
B()
{
  C();
  // struct `Sb` allocated on stack
}

C()
{
  // struct `Sc` allocated on stack
}

A()
{
  B();
  // structs `$B.Sb` and `$C.Sc` being referenced, and work done upon them
}

And here's a visualization of the stack, the way I understand it (stack grows towards the bottom):

1) Inside C() scope, after A() -> B() -> C() chain of calls.
1
2
3
4
5
6
| ....... |  A frame
+---------+
| ..$Sb.. |  B frame
+---------+
| ..$Sc.. |  C frame
+---------+  <-- STACK TOP --

2) Inside A() scope, after returns from C() and B().
1
2
3
4
5
6
| ....... |  A frame
+---------+  <-- STACK TOP --
| ..$Sb.. |  B frame
+---------+
| ..$Sc.. |  C frame
+---------+

At this point, $B.Sb and $C.Sc can be referenced, and the data they hold could be worked upon without problems.
But now suppose that a new D() call has been inserted in function A() :
1
2
3
4
5
6
A()
{
  B();
  D();
  // structs `$B.Sb` and `$C.Sc` are being referenced
}

3) Now, the frame of D() will overwrite the frame of B(), thus rendering the contents of $B.Sb invalid (and potentially $C.Sc too).
1
2
3
4
5
6
| ....... |  A frame
+---------+  
| ....... |  D frame
+---------+  <-- STACK TOP --
| ..$Sc.. |  C frame
+---------+

Technically, $B.Sb could still be available, provided that $B.Sb is placed high enough in the stack, so that the frame of D() could not reach and overwrite it.
But there's no guarantee of that, and it's not at all clear how to make that work in the general case.