What is a not terrible way to handle strings in C?

Long time viewer, first time poster here.

Somehow it's been about 300 episodes and all this time Casey has managed to eschew heavy usage of c-style strings. In the few places where he uses them (asset builder, debug system and platform layer), he simply makes due with declaring a char Buffer[N] on the stack and going along his way.

I'm working on a large-ish project at the moment and using the coding philosophy and techniques from HH has gotten me tremendously far. However, I've finally hit a wall and I'm curious about other people's opinions on the matter. The game I am working on will have a lot of strings. It's not a text game, but it is text-heavy: plenty of offline stuff e.g. descriptions, and plenty of online stuff e.g. a log of events that occur in the game. So far, I have been recklessly slinging char Buffer[N] left and right, using filth like strcpy, strlen, and worst of all sprintf, continuously telling myself that I will soon figure something out and get this mess under control. I don't think I've had an overflow bug yet, but at this rate it's only a matter of time.

As of now, I can only see these options:
1) Keep doing what I'm doing, except upgrade all the string.h calls with their max-byte counterparts.
Pros - Least friction.
Cons - Need to keep track of size when appending to a buffer. Of course, the fact I haven't had catastrophic memory corruption bugs means I'm already doing this implicitly to some extent.
- Still uses string.h. Not the end of the world, but still lame.
2) Make a struct string that holds the length and capacity and allocates a fixed-size array from the Transient State's memory.
Pros - Size is already saved for automatic use when appending to a buffer.
Cons - Need to pass the Transient State to every function that makes a string (many).
- Need to wrap every new string and its usage with BeginTemporaryMemory and EndTemporaryMemory calls.
3) Same as 2), but make a separate arena for "scratch-space" that gets its size set to 0 at the beginning of each frame.
Pros - Don't have to call Begin and EndTemporaryMemory everywhere.
Cons - Still need to pass this scratch arena to all the functions that make strings.
4) Give up and use malloc(). Or even worse, give up and use a string library... Just kidding.

I am leaning toward the third option but I'm curious about other people's thoughts.

So, how do you deal with strings when you have to use them frequently?

Edited by hotspur on Reason: dat [code] tag
"Strings" is not so much the issue most of the time, but rather "things which change size vs. things which don't." Strings happen to be the most common case of this sort of data, of course.

As far as how to handle dynamically-sizing data well, typically you just want to make an assessment about how the data is being used and what you know about it. If you conclude that it is relatively random, then a general-purpose allocator is probably the best you're going to be able to do.

However, many times you don't need to go that route, even for strings. For example, if your game only uses strings of known sizes (eg., the _users_ aren't entering the strings, they are known ahead of time), then often you don't really have a problem. You can still load assets in chunks, and just point to the strings in them, and there isn't really much extra work to do here. It's usually only user-created strings that pose a problem.

- Casey
Thanks for the reply Casey!

I'm happy to have fixed sizes on all of the runtime-generated strings (it's what I've been doing so far). There are some user-input strings but they can all be assigned arbitrary max lengths. My biggest pain point is that my code is littered with char Buffer[N] where N is varying all the time, so N is either a magic number or it needs a variable name that needs to be passed to any safe string function.

I have another idea... but I've never seen it before:

1
2
3
4
5
6
#define FixedStringLengthMax 1024
struct fixed_string
{
    int32 Length;
    char Buffer[FixedStringLengthMax];
};


Assuming this is large enough for my runtime string needs, it appears quite convenient.

Now I don't have to pass the capacity to create or append to these since its always the same... awesome!

Except now I'm afraid to blow up the stack. Each string is now 1KB, which seems fine if used judiciously... but I have no idea to what extent stack sizes vary given hardware, OS, etc.
1. I would always suggest replacing string.h functions with the max-byte counterparts. Unless you can exhaustively prove for every non-max-byte use that there is no overflow.

2. and 3. As a little optimization you can assign the per-frame scratch space to a global as the first thing in your game-logic entry point.

1
2
3
4
5
void Entry(memory_desc* Mem, ...)
{
    GlobalScratchMemory = InitArena(mem->transient);
    //...
}


This will make it possible to avoid having to pass the scratch arena around everywhere. You will still need to wrap the lifetimes.
For my own personal code I ended up hashing the strings before compiling. This sort of technique is just one solution, and I chose it mostly because I thought it would be interesting to try it out and see what happens. The hashes can be checked for collisions before compiling occurs.

For displaying strings I've stored them all in a debug table so tools can lookup and display them.

It's really nice to be able to run a switch statement on what is conceptually a string:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
sid A = SID( "hello" );
switch ( A )
{
    case SID( "hi" ):
        break;

    case SID( "hello" ):
        break; // control goes here

    case SID( "greetings" ):
        break;
}

Edited by Randy Gaul on
hotspur
Thanks for the reply Casey!

I'm happy to have fixed sizes on all of the runtime-generated strings (it's what I've been doing so far). There are some user-input strings but they can all be assigned arbitrary max lengths. My biggest pain point is that my code is littered with char Buffer[N] where N is varying all the time, so N is either a magic number or it needs a variable name that needs to be passed to any safe string function.

If you actually have an array buffer (rather than a pointer):
1
2
#define array_count(a) (sizeof(a)/sizeof((a)[0]))
string_t s = MakeString(buffer, array_count(buffer));

Then you can resize your buffer as you like, and the size handling will just work. However, if you switch to a pointer at any point, this breaks (rather horribly, because it will still compile and just give you the wrong size).

C++ templates can actually solve the problem as well:
1
2
3
4
template<size_t N>
size_t ArrayCount(char buffer[N]) {
   return N;
}

This has all the unfortunate effects of templates in general, but if you're using a small number of translation units like HH, the compile overhead isn't actually that bad.

I have another idea... but I've never seen it before:

1
2
3
4
5
6
#define FixedStringLengthMax 1024
struct fixed_string
{
    int32 Length;
    char Buffer[FixedStringLengthMax];
};


Assuming this is large enough for my runtime string needs, it appears quite convenient.

Now I don't have to pass the capacity to create or append to these since its always the same... awesome!

Except now I'm afraid to blow up the stack. Each string is now 1KB, which seems fine if used judiciously... but I have no idea to what extent stack sizes vary given hardware, OS, etc.

Let's assume you have just a general string type, size, length, and pointer, like this:
1
2
3
4
5
struct string_t {
    u64 size;
    u64 length;
    char * buffer;
};


The process of wrapping a stack buffer in this struct is straightforward, but long-winded:
1
2
3
4
5
char foo_buffer[512];
string_t foo;
foo.size = array_count(foo_buffer);
foo.length = 0;
foo.buffer = foo_buffer;


Well, you can just wrap that in a macro:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#define STACK_STRING(name, count) STACK_STRING_BUFFER(name)[count];\
                                  string_t name;\
                                  name.size = count;\
                                  name.length = 0;\
                                  name.buffer = STACK_STRING_BUFFER(name);

//...
{
   STACK_STRING(foo, 512);
   // 'foo' is now a string with a 512 character buffer.
}

The 'STACK_STRING_BUFFER()' macro just needs to generate a unique name for the buffer.

You want to do this, or something similar, because it makes it a lot easier to do things like strcpy and sprintf properly. (You wrap those calls inside functions dealing with string_t, and then you only ever have to deal with the details in one place.) However, you do need to take care about never calling free() in your string functions in this case (so, no reallocations), but this is an easy enough thing to handle.
Thanks for all the great responses. This was definitely the right place to ask.

For stack strings I think I'm gonna go with a macro and a custom string type (from btaylor's reply).

As for offline strings, Randy's post is very interesting (I've been thinking of something along those lines). I am hesitant to make the leap into preprocessor land but the benefits might outweigh the cost.

I will update the thread with any progress or ideas. Thanks for the help everyone.
Antirez (author of a pure C database called redis) also wrote about the topic, should be a good read for you: https://github.com/antirez/sds

Edited by hugo schmitt on
Here is what I do for very dynamic strings in C.

https://github.com/gingerBill/gb/blob/master/gb_string.h

https://github.com/gingerBill/gb/blob/master/gb.h#L1367 (Uses custom allocator system)