Moving away from the Standard Template Library

I used to write all my own datastructures, and used template based linked lists a LOT. I pretty much knew that I should have been using arrays / growing arrays, but I never really made the switch.

I also used template based binary trees which stored key-values for a lot of sorting stuff (inorder traversal). I never did implement balancing, and eventually I ran into a programming problem where I really needed better performance out of a typical "map" structure.

I did some tests with std::map and it totally blew my implementation away. At that point I sort of felt that enough was enough, and went through my codebase and switched everything to either std::vector or std::map, and everything performs much better as a result. I have roughly 1 million lines of code to maintain, so I reasoned that I could spend my time on better stuff than trying to maintain my own datastructures.

Other than those two fundamental structures I pretty much do everything from scratch. I do however still use C++ new / the heap / general purpose allocation.

What I'm wondering about is what would Casey's take be on getting away from STL, seeing as I only use a tiny (but performance critical) aspect of it. I pretty much know how to build a std::vector replacement (indeed I have a couple special purpose ones), but what about a replacement for std::map that performs as well as possible?

PS I really had a hard time getting used to the STL style iterators, so that would be nice to be free of again... DS
If you are embracing C++ then I see no reason to not use STL.

And if you don't care about items being sorted then try to replace it with std::unordered_map. You might significantly better performance.

Not sure why you don't like iterators, but replacing std:: containers with your own is not a big deal. I think every CS student has implemented std::map alternative at least once :) And there exists bunch of good open-source implementations which you can adjust to your own needs. Like khash/ktree from https://github.com/attractivechaos/klib or look up some skiplist implementations (here's one in C# - http://igoro.com/archive/skip-lists-are-fascinating/ ).
Also, if you have the option for a newish compiler, the ranged base for loops can get rid of many uses of raw iterators, and look much cleaner to boot.

And I second the unordered_map comment, should be used in preference to map whenever possible.

Cheers
While vector and unordered_map may be among the more performant containers in the STL, they have been around for a while and designed for general use. It's not difficult to beat them in some cases, and with a simple array template I was able to beat std::vector by about 7x for one use case.

If you're dedicated to performance, I recommend messing about with creating your own variations of these templates, as they're really not that complicated to implement and it's not hard to beat the STL. If you have time and effort to spare, of course.
with a simple array template I was able to beat std::vector by about 7x for one use case.

Could you elaborate on this use case? A factor of 7 seems so high that I suspect your usage of std::vector might have been suboptimal, like inadvertently calling the element type's default constructor a bunch of times. Then again, it could be argued that the vector's interface is pretty bad because it's all too easy to misuse.
That's a much bigger difference than I have personally seen in practice. You have to know visual studio's magic incantations to turn off all the STL safety checks/debugging that it turns on, could that be it?

Are you comparing a custom allocator vs. the STL default allocator?

Can you share some code for us to examine/play with?
Hoho, I didn't know about unordered_map, maybe I should read the STL documentation... :)

Most often I do indeed just need a fast lookup in a sparse structure as opposed to actual sorted keys, so I will definitely look into using unordered_map. Thanks for the tip.

In regards to "embracing C++"; I feel like I have had similar experiences to Casey, in that I used to be a crazy OOP person, but now the more time passes I feel like I get more and more back to C. For this reason I am interested in non-STL alternatives to things like std::vector and std::map.

I'm also seeing cases where custom memory management would be a big win, but I still need fast lookups i.e. std::map in those cases.

Edited by Johannes Norneby on
One more tip on STL containers - preallocate memory for containers before using them (with reserve method). If you are inserting a lot of items into container that wastes time on memory reallocation and potentially makes heap more fragmented.

But yeah, if you need just a hash table then there exists bunch of alternatives to STL. Unless you want to implement hashtable on your own which should be pretty trivial (as seen on HH, just use better hash function :) Here are some I recommend looking at:
1) https://github.com/attractivechaos/klib
2) https://troydhanson.github.io/uthash/
3) https://code.google.com/p/sparsehash/ (memory efficient)
Nimbal
with a simple array template I was able to beat std::vector by about 7x for one use case.

Could you elaborate on this use case? A factor of 7 seems so high that I suspect your usage of std::vector might have been suboptimal, like inadvertently calling the element type's default constructor a bunch of times. Then again, it could be argued that the vector's interface is pretty bad because it's all too easy to misuse.

rathersleepy
That's a much bigger difference than I have personally seen in practice. You have to know visual studio's magic incantations to turn off all the STL safety checks/debugging that it turns on, could that be it?

Are you comparing a custom allocator vs. the STL default allocator?

Can you share some code for us to examine/play with?


I realize that was quite a bold claim, and almost certainly is a strange circumstance. You can read more about it in this thread, where I show the usage code for my tests. I don't think I'm quite ready to release my container template yet, but suffice to say its a thin wrapper over a growing array and some utility functions.

So 7x might be a bit of an exaggeration in most cases, but I stand by my original point. The STL is hardly the best in class, just convenient.
johno
In regards to "embracing C++"; I feel like I have had similar experiences to Casey, in that I used to be a crazy OOP person, but now the more time passes I feel like I get more and more back to C. For this reason I am interested in non-STL alternatives to things like std::vector and std::map.

I think it's worth pointing out that the STL is the least-OO part of the C++ standard library. If you grep your implementation for the word "virtual", you'll find that the only place where it's used is to implement exceptions. None of the data structures or algorithms use it anywhere.

Stepanov is on record as saying that OO is terrible, and part of the point of the STL was to come up with a system of generic data structures and algorithms which had no OO anywhere.
I feel that this thinking is exactly why c++ has such a bad rep (in certain circles)

c++ should never be conflated which oop. It has always been a multi-paradigm language.

Yes, it does provide oop tools, and they do have a use-case. But they should not be all that is considered, and in fact are rarely to correct choice.

The modern thinking for c++ use is to use objects with minimal interfaces for encapsulation, and prefer free functions for operations.
Ok, so STL isn't very OOP, all well and good. I am definitely moving away from OOP.

To clarify; I was trying to understand what I could do to move away from using STL containers purely based on Casey's apparent sentiment that STL is just plain bad. I certainly understand that Casey doesn't like templates.

Implementing classic datastructures by hand is where I started out, so it certainly isn't beyond my ability to build something custom, but I'm trying to understand if there is any real reason to not use STL. As mentioned, my tests indicate that the implementations that MSVC STL offers are at least as performant as my own (and most of the time better), and not having to maintain something myself is a big win imho.

I will freely admit that I'm much more of a game designer / programmer than a purely technical programmer type, so that is my primary concern, but I do still write lots of my own code / use very few third party things. So I'm basically just checking to see if I'm missing something that would be a big productivity and / or performance gain.

I've already learned about unordered_map from this thread, so it's all good so far... :)

It seems to me that related to all of this is custom memory management, and there I'm very much a beginner. Currently watching day 157...
johno
Ok, so STL isn't very OOP, all well and good. I am definitely moving away from OOP.

To clarify; I was trying to understand what I could do to move away from using STL containers purely based on Casey's apparent sentiment that STL is just plain bad. I certainly understand that Casey doesn't like templates.

Most STL structures give the calling code very little control over the memory allocation patterns. Random small allocations on the system heap are bad because:
a) They're bad for cache performance.
b) They fragment the heap.

EA's reimplementation of the STL fixes at least some of this (though not all of it is publicly available):
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
Implementing classic datastructures by hand is where I started out, so it certainly isn't beyond my ability to build something custom, but I'm trying to understand if there is any real reason to not use STL. As mentioned, my tests indicate that the implementations that MSVC STL offers are at least as performant as my own (and most of the time better), and not having to maintain something myself is a big win imho.
Well, yes, std::map will beat out an unbalanced binary tree, because balancing is a critical optimization for trees. (A good question to ask yourself here is: why use a tree for key-value lookup? There are lots of ways to build maps that are both simpler and have fewer performance cliffs.)
I will freely admit that I'm much more of a game designer / programmer than a purely technical programmer type, so that is my primary concern, but I do still write lots of my own code / use very few third party things. So I'm basically just checking to see if I'm missing something that would be a big productivity and / or performance gain.
This talk by Jonathan Blow is worth watching:
https://www.youtube.com/watch?v=JjDsP5n2kSM

90+% of the time, you want an array or a resizable array (std::vector). When that's *not* the case, you will usually have enough information on hand to build a better solution for the problem you *actually have* than the STL authors did.