Has anybody tried using strong typedefs?

Connor Stack

#6211

March 25, 2016

I recently learned about strong typedefs, where the compiler enforces that two typedefs for the same underlying type can't be interchanged.

C/C++ do not support this natively, but there are plenty of options out there to work around that, e.g. :

http://stackoverflow.com/questions/28916627/strong-typedefs

http://stackoverflow.com/question...in-c-type-strictness-for-typedefs

Has anybody used strong typedefs? What method do you think is best? (defining your own wrapper structs and operaters? Using boost? Using a linter?)

Nicolas Léveillé

#6237

March 27, 2016

Instead, I would stick the value into a struct. Each struct being its own type it would be different, and it requires no particular other trick.

If the values are not meant to be interchangeable, it's likely you're going to need functions anyway, which you can specialise on the struct.

(Which is the trick they use, why not be explicit about it is just what I'm saying)

Edited by Nicolas Léveillé on March 27, 2016, 3:06am

Casey Muratori

#6238

March 27, 2016

I usually don't do it unless it's a very opaque situation, but when I do want it, I use struct wrappers - eg.,

struct thingee
{
    int Value;
};

I usually do this if I'm worried that I may accidentally pass something that is not a "thingee" by mistake and not know, causing a subtle bug that I don't notice. If I don't think that will happen, I tend not to do it because I'd rather not have to constantly type extra stuff just to convert from the native int to the struct type and back when I'm doing IO or other things.

- Casey

Edited by Casey Muratori on March 27, 2016, 5:28am

Nicolas Léveillé

#6249

March 27, 2016

Something that came back to my mind after I made my reply:

One should be careful when collecting "technical tools" (for example this idea of strong typedefs) in one's head. It is very easy when reading articles written by programmers for programmers to read about so many that you only have the most recent ones in mind.

It may lead to a mindset of trying to fit a tool to the problem rather than finding the right tool for a particular problem.

My personal trick to deal with this is to skip details on all the incidental, language specific tools and try to generalise them in broad sets of tools for programming. This helps me at least keep a small set of tools in my head that I can then apply to problems that arise.

I.e. in this case the general tool here is type checking. When is it useful? What kind of mistakes it is able to prevent? Are there other alternatives to deal with the same issue?

Of course one needs to know details about one's language. Here specifically in C/C++ that the only types checked are the primitive ones, struct/class/enum/union and pointers thereof.

Example of interesting cases where I'm tempted to use stronger/more specific types is resource identifiers, if there's a risk to mix them up.

Typedefs are merely aliases so they are the wrong concept to use when talking about proper types. In certain new dialects of C++ (C++11 and beyond) people even use the `using A = B;` statements instead of `typedef` which makes it quite clear it is not a type definition.

Edited by Nicolas Léveillé on March 27, 2016, 11:29pm

Andrew Bromage

#6267

March 30, 2016

uucidl
in this case the general tool here is type checking. When is it useful? What kind of mistakes it is able to prevent? Are there other alternatives to deal with the same issue?

A long time ago, I wrote a program that was, basically, diff. Don't ask why; the point is, I did.

If you're familiar with diff (or, indeed, many programs which have to do with editing text files), one interesting thing about it is that there are two types of interesting "index": Line numbers, and positions between lines. The reason why is that line edits actually happen in the space between lines, not at lines. If you insert a line in a text file, that insertion takes place between lines.

The typical numbering scheme (used by RCS) is that the first line in the file is numbered 1, the second line is numbered 2, and so on. But for positions, 0 is the position before line 1, 1 is the position between line 1 and line 2, and so on.

As you can imagine, it's extremely easy to get these mixed up, but also it introduces a bunch of adjustments by 1 which are easy to miss or misinterpret.

Now this program wasn't in C++, but I basically did Casey's trick: wrapping the value in the equivalent of a struct. Something like this:

// I didn't write code laid out like this.
struct line { explicit line(int value) : v(value) { } int v; };
struct pos { explicit pos(int value) : v(value) { } int v; };

line before(pos p) { return line(p.v); }
line after(pos p) { return line(p.v + 1); }
pos before(line l) { return pos(l.v - 1); }
pos after(line l) { return pos(l.v); }

That's not very much code, but it saved me hours of debugging.

Back when I worked in visual effects, I found that the same thing was true of points, vectors, and normals. Keeping the three concepts distinct at the type level meant that the compiler caught a lot of usage bugs.

All too often, the type system of the programming language is designed for the benefit of the compiler; you declare something as an integer so the compiler knows what register to store it in. I think this has it backwards. The type system should be designed primarily for the benefit of the programmer. The closest I've seen is Hindley-Milner type systems, which really do seem to be designed with the programmer in mind. Unfortunately, H-M languages tend not to let you "feel the bits" that you're working with like a lower-level language does. I like to think that there's a sweet spot still to be discovered.

As a final comment, I'd like to rant for a moment about Hungarian notation.

Fixing this kind of type error was the original thinking behind Hungarian notation. Charles Simonyi used to work on Excel at Microsoft, and he noticed that one common class of type error was programmers doing things like mixing an integer which was logically a "row" in a spreadsheet with one that was logically a "column". His idea was to prepend the variable name with the semantic type. But the way that Microsoft (and Windows programmers in general) seem to use it is to prepend the variable name with the physical type.

The compiler already knows that "LPCTSTR lpszPathName" is a pointer to a C string, and if you try to misuse it that way (e.g. by passing it to something that expects a pointer to some other type), the compiler will give you a warning or error. What the compiler doesn't know is that it should be handled as a file path (e.g. it has a maximum length, that it has a structure with an optional drive letter and path components separated by backslashes on Windows, etc) and shouldn't be passed to a function that wants a user name.

OK, so that's an artificial example; any sober programmer is unlikely to pass a variable called "FilePath" to a function called "CheckUserName()". But similarly, "FilePath" is unlikely to be anything other than a string, so the "lpsz" prefix requires extra typing and uses valuable screen real-estate for no gain.

Maybe this made some sort of sense in the 16-bit era where there was good reason to visually distinguish near and far pointers. It's the 21st century now.

(As an aside, I also note that IDEs or text editors which support auto-completion make the situation worse, since they almost always auto-complete the postfix of an identifier, not the prefix. Even in the land of auto-complete, you need to invoke the type of what you want before you can think of typing the name of what you want. How crazy is that?)

So if you're determined to use Hungarian notation (which should still be used with a very light touch, if at all), doesn't it make more sense to prefix with the semantic type rather than the physical type? So if you decide, say, that "fp" means "file path", you could use variable names like "fpSave" and "fpBackup" rather than "lpszSavePath" and "lpszBackupPath".

End rant.

ratchetfreak

#6269

March 30, 2016

that's the difference between "apps hungarian" (prefix with the use) and "systems hungarian" (prefix with the type). In a weak typedefed language (or really a type-aliasing language) you want to use apps hungarian.

What really bugs me is that systems hungarian is far to prevalent. Especially with member fields (often prefixed with "m"). Using a decent language any decent IDE will be able to tell and give you different highlighting on the identifier.

I get why it is so prevalent though, there are preexisting rules new projects can blindly adopt. For apps hungarian you need to think up new rules for each project.

Ginger Bill

#6270

March 30, 2016

I've use Go a lot now and one of its features is strong typedefs. They are useful at times and from it, enums are a natural occurrence.

type Guid uint64

a := uint64(744)
b := Guid(1337) 
var c Guid = a // Compile error - cast is needed
var d Guid = Guid(a)

type Radians float32
type Degrees float32
// Cannot implicitly convert from Radians to Degrees, and vice versa

// "Enumerations"
type EntityType uint32
const (
    EntityTypeNone EntityType = iota
    EntityTypePlayer
    EntityTypeAnimal
    // Etc.
)

In C/C++, strong typedefs are not useful at all due to the type system. I've experimented with it a lot before and it just does not work. enum class in C++11 is nice but it only works for a few cases and then you have two inconsistent types of enums.

The explicit keyword for constructors seems useful but if it's a wrapper around a base type, you will want operator overloads and probably a template to make it easier to construct in the future.

using in C++11 is very nice and actually fixes a lot of gripes I had with the C typedef syntax but it's only a syntax change (using is better for templated types are it can do a lot of things without typename everywhere).

In other languages apart for C and C++, strong typedefs are useful but due to the type system of C/C++, it's not worth the trouble.

Andrew Bromage

#6278

March 31, 2016

ratchetfreak
Especially with member fields (often prefixed with "m").

I always figured that the main reasons for putting this in a coding standard is to ease porting old-school global-variable-heavy C code to C++, and because in constructors and setters, you need to give parameters and members different names:

something::something(int pValue1, float pValue2)
    : mValue1(pValue1), mValue2(pValue2)
{
}

If you insist on using setters, and don't mandate a standard, average-quality programmers will give you the code that you probably deserve.

Mārtiņš Možeiko

#6281

March 31, 2016

You don't need to give different names to members and parameters in initialization list:

something::something(int pValue1, float pValue2)
    : pValue1(pValue1), pValue2(pValue2)
{
}

This will work fine. Because the name before parenthesis can be only member - you cannot initialize parameter. And the name inside parenthesis is take from most recent scope - that means parameters (they are "closer" than member variables). So compiler knows what you mean by using same name in initialization list.

Edited by Mārtiņš Možeiko on March 31, 2016, 11:37am

Andrew Bromage

#6302

April 5, 2016

Oh, interesting! Is that a change in the standard? Because I seem to recall that not working at some point in the past, possibly last century.

Casey Muratori

#6303

April 5, 2016

It definitely didn't work when I used to program C++ (~1995-1999). But that may have had nothing to do with the standard - often times compilers were not even close to "standard compliant" back then.

- Casey