Casey Muratori against the Standard Library

cella

#26154

March 27, 2022

I do apologize for my lack of knowledge in advance. Recently I started programming, and I was casually browsing Casey Muratori twitter, in one post he expressed his dislike for the standard library. The post: https://twitter.com/cmuratori/status/1402555611900911619

So in his opinion programmers shouldn't use iostream, queue etc... Can you explain me in words that a beginner can understand, why he is so against it? Also, how would you print out text or use arrays and integers without a library?

Mārtiņš Možeiko

#26155

March 27, 2022

Thing about relying on libraries is that you are adding extra dependency to your code. If it works and reduces amount of work you need to do - great. But often opposite happens, many times you don't know how exactly things are implemented in library and whatever you can know is available in poor documentation. Sometimes library has a bug on some strange edge condition that you cannot fix. Sometimes it is not flexible enough and does not offer exactly functionality you need. And another important point - sometimes implementation changes due to compiler version upgrade and then your assumptions/code may stop working.

Historically it happened that "standard" library in C/C++ was in not the best quality. It offers poor abstraction over native OS functionality, and the more modern OS'es become, the more ancient standard library becomes and does not offer way to interact with OS in performant way.

In early C++ days STL library was very different across compilers/OS'es. Relying on it means that sometimes your code works a bit differently - either in performance, or in memory usage. It can be come very hard to write reliable code this way.

Due to all these issues many people do not use standard library in C/C++ much. For simple code obviously it does not matter, there you can do whatever you want. But if you want to write reliable code that works in different OS'es, has reasonably same performance/memory characteristics, doesn't randomly break because you upgraded your compiler or C runtime library - then you write your own code to do the things you want.

To print out text without standard library (or do anything else like allocate memory, write files, send things over network) you use OS native API's. On every OS there is API that this OS provides to user application to perform various things - like print to console, allocate memory, start a thread, open network connection, etc... Those are not "standard" libraries in sense that they come with C/C++ - they are different on each platform. Although nowadays many things on popular OS'es are very similar, there are very few minor differences, so abstracting them in your code does not take much time. It usually is very minor part of your platform code.

cella

#26156

March 27, 2022

So if I want to support many Operating Systems, I have to write an implementation for each of them. Got it.

Replying to mmozeiko (#26155)

Miles

#26157

March 27, 2022

There are good reasons to avoid certain portions of the standard library. Most of the C++ standard library is simply low-quality and requires you to use features that you might not want to use, such as exceptions and RAII. Some of the C standard library, mainly file I/O, has platform-specific nuances and limitations you should be aware of (e.g. MAX_PATH and unicode limitations on Windows).

As for why Casey refuses to use any of the standard library, I'm sure he has his personal reasons. He's a pretty reactive person in general, who tends to take all-or-nothing stances on things, and primarily ships code on Windows, which has pretty much always had the worst standard library support of any major operating system. He has his reasons, but it doesn't mean you have to do as he does.

cella

#26158

March 27, 2022

I've always felt stupid when reading about the standard library. I thought the implementations were created by professionals with years of experience, so their way of doing was in my eyes the only one. I'm happy that there's someone like Casey that plants the seed of doubt in our heads. There are people who follow blindly celebrities, because they want to feel part of something bigger, but that's not my way of doing, that's why I asked the OP question in the first place. Why Casey Muratori is so against the standard library? Because it does a bad job at implementing functionality for a specific OS... Thank you guys for clarifying to me the subject.

Replying to notnullnotvoid (#26157)

Dawoodoz

#26159

March 27, 2022

std::string does not have the same character size and behavior on each platform, which will be messy if someone makes bitwise operations on characters or suddenly need to support Chinese characters in a parsing algorithm. Working on an 8-bit string format encoding UTF-8 is almost guaranteed to introduce bugs from cutting characters in half and would add lots of bloat to all algorithms. Better to have a custom string type that always store the non-encoded 32-bit value representations of Unicode, just like in the .NET framework. One element is one character and then just save as UTF-8 by default when saving to files for compression. There are 32-bit versions of std::string, but these have the same design flaws as std::vector leading to poor performance.

std::iterator (deprecated in C++17) is very unfriendly to beginners, hard to debug, preventing SIMD vectorization, and not providing any benefit over indices. Iterators were supposed to abstract away the difference between arrays and linked lists, but linked lists are only used when spending a long time in each element or only working on one element at a time. For high level code, better to use indices so that you can see how the algorithm executes while debugging. For high performance code, better to stick with arrays and bound checked pointer abstractions.

std::vector has a poor allocation strategy (lots of tiny initial allocations smaller than a cache line), does not align memory properly, forces the use of iterators for certain operations (leading to dangerous code for no reason), uses unsigned indices (cannot loop backwards while x <= 0u, due to unsigned underflow). For high level code, you can write a wrapper around std::vector, but never use std::vector in high-performance iterations. The worst part about std::vector is how people use it as a substitute for std::array, just because std::array cannot use a variable length at construction, leading to pointer crashes if the std::vector reallocates with old pointers to elements.

The Chrono API is okay to use, because it's just a clean hardware abstraction allowing you to target operating systems that have not been created yet. Just need someone in the future to recompile the code and publish a remake. Better than having contemporary hacks that are guaranteed to fail as soon as the current OS fad dies out 50 years from now.

std math is harder to decide on, because writing std::max(a, b) does not feel like using a core language feature, making the std namespace implicit would be even worse due to constant changes, making inline wrappers over trigonometry in a new namespace would still expose std math in the header. The C math library does not support overloaded functions between float and double, which makes the expressions look strange. It's just a terrible mess of legacy and not knowing what to use.

It would be nice if lambdas did not force use of the std library, because it totally blurs the line between core language and optional features for anyone wanting to use modern C++ without std. Don't want to write a class just to re-implement the Lambda type. Should be an intrinsic language type just like function pointers, but that would break backwards compatibility with older projects.

std::move and std::swap are also core functions that somehow ended up in the std namespace. Implementing them yourself would be too dangerous to even try, due to all the heap hacks used in compilers to make everything work.

Edited by Dawoodoz on March 27, 2022, 5:40pm

Mārtiņš Možeiko

#26160

March 27, 2022

std::string does not have the same character size and behaviour on each platform, which will be messy if someone makes bitwise operations on characters or suddenly need to support Chinese characters in a parsing algorithm.

Umm what? std::string stores char type on every platform. And unless you're working on 40 year old computer (where std::string probably does not exist anyway) then char will be exactly 1 byte large. non-1-byte char's are pretty obsolete and not present anymore. So std::string will have same size and behavior on any platform you nowadays use.

Better to have a custom string type that always store the non-encoded 32-bit value representations of Unicode, just like in the .NET framework.

This is not true. .NET stores string with their char type which stores UTF-16 encoded codepoint. Which arguably is worst of all UTF encodings.

std::iterator (deprecated in C++17) is very unfriendly to beginners, hard to debug, preventing SIMD vectorization

Not really true. In many cases it will autovectorize just fine: https://godbolt.org/z/j6c87K6d8

Compilers are not as dumb as they were - if iterator is random access & sequential, it will optimize same way as regular array. It is legal for you to specialize on it too - by casting to raw pointer and doing whatever optimizations you want on such iterator. With modern C++ contexpr's that becomes really easy to write too (before constexpr you needed to use template specialization which was more annoying to write).

Edited by Mārtiņš Možeiko on March 27, 2022, 5:47pm

Replying to Dawoodoz (#26159)

Dawoodoz

#26162

March 27, 2022

Unless you are reading the manual for a specific compiler, char is a signed or unsigned integer of at least 8 bits, unlike char8_t used by std::u8string, which is explicitly 8 bits large.

I agree than UTF-16 was a mistake, but my point was that .NET uses one element per character, which is clearly better for internal processing. Using UTF-8 internally is like trying to edit an image stored internally using JPG compression and repeating the encoding and decoding with fourier transforms in each algorithm.

Getting 5% performance increase from auto vectorization is not comparable to manual vectorisation with a 20 to 200 time speedup on aligned raw memory. Knowing which instruction is actually used unlocks a whole new set of bitwise algebra, using unzip to get both integer division and modulo for many elements in the same cycle, et cetera.

Replying to mmozeiko (#26160)

Mārtiņš Možeiko

#26163

March 27, 2022

I agree than UTF-16 was a mistake, but my point was that .NET uses one element per character, which is clearly better for internal processing.

This is not true. UTF-16 in .NET does NOT store one element per character. There are plenty of unicode "chars" that takes TWO .net char elements to store. That's the point of UTF-16 - to encode unicode codepoints in one or two 16-bit values. If you're indexing .NET string with operator [] (or any other methods with explicit indices) and accessing each element as a "whole" then your code has bugs - it won't process unicode properly, you're risking cutting encoding of some char in half, which means your text will be corrupted.

Getting 5% performance increase from auto vectorization is not comparable to manual vectorisation with a 20 to 200 time speedup on aligned raw memory. Knowing which instruction is actually used unlocks a whole new set of bitwise algebra, using unzip to get both integer division and modulo for many elements in the same cycle, et cetera.

My point was not to promote autovectorization. You're looking at wrong thing. My point was to show that you can use iterator properties to access elements in SIMD way just fine. Even with manual intrinsic code.

Edited by Mārtiņš Možeiko on March 27, 2022, 10:27pm

Replying to Dawoodoz (#26162)

Miles

#26166

March 28, 2022

Even if you're treating each 32-bit codepoint as a single character, it's still partly wrong (but will work for slightly more languages, namely those using CJK ideographs, as long as you use separate fonts for each language) because unicode is a dumpster fire that uses multiple - in some cases several - codepoints to encode a single character, even though they have plenty of codepoint space to not do that.

Mārtiņš Možeiko

#26173

March 28, 2022

Right, text shaping is another issue. I'm just saying that if you break two UTF-16 chars that belong together apart, you will produce invalid text encoding. You won't be able to use such text, not even talking about layout/rendering for it. Basically same thing as with UTF-8 - if you would access random str[n] byte element individually.

Replying to notnullnotvoid (#26166)