SPOILER: found the bug for day 186

Hi,

I believe I have found the bug in the debug collater.

RecordDebugEvent should be a define, because now it is a function and as such has only one value for the TranslationUnit for every event it records. TRANSLATION_UNIT_INDEX does not do anything. (BTW be careful of spaces following \ as I think that was the reason for the compiler errors that made casey go to the function version in the first place)

Also when fixing this you will need more records per frame.

Happy coding,
Mox


BTW, the way I found it was by looking at the debug records and noticing that for 1 thread the TranslationUnit for start and stop was different. Which is a bit lucky, because without inlining this would never have happened. But this led me to look into the right place.

Edited by Mox on Reason: debug method + macro \ warning
Can confirm your statement, just had no time to post it yet.

Bug only exists in handmade_optimized.cpp::DrawRectangleQuickly at the moment because there are the only RecordDebugEvent Calls in that Translation Unit.

Mildly interesting (or why it is happening):
First of all it wouldn't be a problem if all calls would have the same TranslationUnitIndex.
And here comes Optimization into play.
At DrawRectangleQuickly only the constructor is inlined but not the RecordDebugEvent call. (So it calls the RecordDebugEvent function at the beginning)

Furthermore when you look at the assembly you can see that the destructor is also inlined and this time the whole RecordDebugEvent is also inlined.
So the destructor does the right thing and sets the TranslationUnitIndex correctly.

When you look at the created object file for handmade_optimized you will see that the RecordDebugEvent is also correct. (the TranslationUnitIndex is set)

But when you link the Object File something weird happens and suddenly the RecordDebugEvent call in DrawRectangleQuickly jumps to the RecordDebugEvent function for the TranslationUnitIndex 0.

This creates the problem that the TranslationUnitIndex doesn't match anymore (remember the destructor RecordDebugEvent call was inlined and so holds the correct Index)


Currently working fix for me:
1
2
3
4
5
RecordDebugEvent(int RecordIndex, debug_event_type EventType, u8 TranslationUnit = TRANSLATION_UNIT_INDEX)
{
  ...
  Event->TranslationUnit = TranslationUnit;
}


Best regards,
Andreas

Edited by Andreas on
Yeah, when I looked further into the problem I indeed found the same reason. I guess inlining can be a bit dangerous here. Without a way to always inline and thus setting the translation unit correctly there could also be a potential problem with the lookup in the debugrecord for the name as well. I guess you want to let both begin and end be in the same unit, so either both should be inlined, or neither.


Another fix:
Another fix for the problem is to declare RecordDebugEvent static/internal. This makes sure that there is a copy in each translation unit. What a difference a single word can make :ohmy:

Edited by Mox on
Yes, the inline and msvc together cased the problem.
That totally make sense.

If the compiler inlined the function all the time it would not be a problem it will be like a macro.
And if it didn't inlined it at all it also won't be a problem I think.

It is funny how it all come together to cause this nasty bug.
msvc decision, inline function, which has different definition in different units
I hope that msvc has a good reason to inline the RecordDebugEvent call in the constructor and not in the destructor because otherwise, I think from now on we should do this

#define inline __forceinline
I don't know if it is actually that bad. The constructor and destructor should always be in the same translation unit themselves, right? So if you make sure that the functions they call also are (by for example my second method) you should be fine.

You, the coder, should know that you are dealing with translation units here. We actually have special code here to handle those, so making sure it is filled correctly is not that odd. The inlining will only be problematic if you don't take the extra step to make sure the functions dealing with those translation units actually use the right values.

The inlining actually brought some extra attention to our own mistake. As without it the DebugRecords with the function name could still be in the incorrect place. The DebugRecords would be in the optimized unit while the DebugEvent would be in the main game unit. This would have given us problems down the line.

Though I am curious why there are no warnings about multiple definitions of RecordDebugEvent in the translation units when they are linked together.
The functions that are marked as inline are different from the regular ones
the body of the inline function need to be defined in the header so anyone will call it can see it and the compiler can inline it.
In the case where the compiler decide not to inline the function, my guess is it will treat the function as static.
In both cases the linker won't even see it.

The inlining made the mistake shows up immediately but also made the debugging way more difficult to the degree where Casey start questioning the multi-threaded part.
On the other hand it would be way more easier if the compiler didn't do what it did the question will be: "Why are the other translation units arrays empty?? well the index is wrong!!" simple Right?
Actually the translation unit index is what let you catch the bug.

My opinion about inline keyword is that it should be the other way around.
It should be: "inline" instead of __forceinline and "inline_hint" instead of the current inline.
Johnathan Blow said once on his stream: "The language should be designed to allow good programmers to do what they want to do not to prevent bad programmers from making stupid mistakes".
The person who typed inline in front of his function he meant it to be inlined and if for some reason Mr. compiler thinks it should not be then he should throw a warning, because I don't think people throw inline in front of every function and wait for the compiler to make the right decision for them.

I don't know what you mean by:
The constructor and destructor should always be in the same translation unit themselves, right? So if you make sure that the functions they call also are (by for example my second method) you should be fine.

Edited by Ameen Sayegh on
I see now that inline does make the compiler treat the function special. Not making it inline does produce a linker error for multiple definitions. I guess my C time is so long ago that I don't know the details of inlining any more. Again a good point for languages that don't need separate declaration and definition :|

aameen951
The functions that are marked as inline are different from the regular ones
the body of the inline function need to be defined in the header so anyone will call it can see it and the compiler can inline it.
In the case where the compiler decide not to inline the function, my guess is it will treat the function as static.
In both cases the linker won't even see it.


But if it really made it static the problem should not exist. Because then the translation units would never be mixed. Here the problem occurs because not inlining the function only produced one version of the code. If the linker really didn't see the function you would get one version for every translation unit, each with the correct constant for the TranslationUnit variable. So in my opinion it is not the inlining that is the problem, but the placement of the function if it is not inlined.


My comment about the constructor and destructor is about this same thing. Where does the code end up with respect to the translation units. If inline would not change the location of the code with respect to the translation unit then this still only works if the calling code is also in the same translation unit. So the constructor and destructor need to be in the same unit. But I can't think of a reason why the compiler would split them up in different units. But then again, compilers are strange beasts...


Gcc manual might be a good indicator for how it handles inline and static: https://gcc.gnu.org/onlinedocs/gcc-4.0.1/gcc/Inline.html
When an inline function is not static, then the compiler must assume that there may be calls from other source files; since a global symbol can be defined only once in any program, the function must not be defined in the other source files, so the calls therein cannot be integrated. Therefore, a non-static inline function is always compiled on its own in the usual fashion.
I think msvc does something similar. It generates one global symbol and therefore messes up our expectation of one version of RecordDebugEvent per translation unit.

Edited by Mox on Reason: gcc docs
Looks like g++ is doing the same. (Wrote a simple app to reproduce)
No warning is generated

According to the C++ Specification Draft:
If a function with external linkage is declared inline in one translation unit, it shall be declared inline in all translation units in which it appears; no diagnostic is required.
An inline function with external linkage shall have the same address in all
translation units.

Furthermore there is 3.2 ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf ):

There can be more than one definition of a class type (clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (clause 14), non-static function template (14.5.5), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.4) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements.
Given such an entity named D defined in more than one
translation unit, then
— each definition of D shall consist of the same sequence of tokens;
This is violated in the Handmade Hero Source
...

If the definitions of D do not satisfy these requirements, then the behavior is undefined.
So I guess it's kinda right, but it feels so wrong ...

Edited by Andreas on
Mox
But if it really made it static the problem should not exist. Because then the translation units would never be mixed. Here the problem occurs because not inlining the function only produced one version of the code. If the linker really didn't see the function you would get one version for every translation unit, each with the correct constant for the TranslationUnit variable. So in my opinion it is not the inlining that is the problem, but the placement of the function if it is not inlined.


You are right I missed that point. if it was static there would be no problem.
I feel like making it static is the sane thing to do since the body must be defined in every translation unit but they don't.
They just assume that the definition is the same like @AK is pointing out.



If the definitions of D do not satisfy these requirements, then the behavior is undefined.
I don't know what they mean by "undefined"?
Does it mean that the compiler doesn't need to check for this? I don't know.

Edited by Ameen Sayegh on
aameen951

I don't know what they mean by "undefined"?
Does it mean that the compiler doesn't need to check for this? I don't know.


undefined behavior

behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.
— end note ]
Sounds like the compiler can do whatever it wants

Edited by Andreas on