I believe I have found the bug in the debug collater.
RecordDebugEvent should be a define, because now it is a function and as such has only one value for the TranslationUnit for every event it records. TRANSLATION_UNIT_INDEX does not do anything. (BTW be careful of spaces following \ as I think that was the reason for the compiler errors that made casey go to the function version in the first place)
Also when fixing this you will need more records per frame.
BTW, the way I found it was by looking at the debug records and noticing that for 1 thread the TranslationUnit for start and stop was different. Which is a bit lucky, because without inlining this would never have happened. But this led me to look into the right place.