1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | extern u32 GlobalDebugCounter; #define GET_COUNTER__(Counter, FileNameInit, LineNumberInit, BlockNameInit, ...) \ static u32 Init_##Counter = 0; \ static u32 Counter = 0;\ if (Init_##Counter != 2) {\ if (AtomicCompareExchangeUInt32(&Init_##Counter, 1, 0) == 0) {\ Counter = AtomicAddU32(&GlobalDebugCounter, 1);\ \ CompletePreviousWritesBeforeFutureWrites;\ Init_##Counter = 2;\ Assert(Counter < MAX_DEBUG_RECORD_COUNT); \ debug_record *Record = GlobalDebugTable->Records[TRANSLATION_UNIT_INDEX] + Counter; \ Record->FileName = FileNameInit; \ Record->LineNumber = LineNumberInit; \ Record->BlockName = BlockNameInit; \ } else {\ volatile u32* test = &Init_##Counter;\ while (*test != 2);\ }\ } #define GET_COUNTER_(Name, FileNameInit, LineNumberInit, BlockNameInit, ...) GET_COUNTER__(Name, FileNameInit, LineNumberInit, BlockNameInit) #define GET_COUNTER(Name, FileNameInit, LineNumberInit, BlockNameInit, ...) GET_COUNTER_(Name, FileNameInit, LineNumberInit, BlockNameInit) #define FRAME_MARKER() \ { \ GET_COUNTER(Counter, __FILE__, __LINE__, "Frame Marker"); \ RecordDebugEvent(Counter, DebugEvent_FrameMarker); \ } #define TIMED_BLOCK__(BlockName, Number, ...) \ GET_COUNTER(Counter_##Number, __FILE__, __LINE__, BlockName, ## __VA_ARGS__); \ timed_block TimedBlock_##Number(Counter_##Number) #define TIMED_BLOCK_(BlockName, Number, ...) TIMED_BLOCK__(BlockName, Number, ## __VA_ARGS__) #define TIMED_BLOCK(BlockName, ...) TIMED_BLOCK_(#BlockName, __COUNTER__, ## __VA_ARGS__) #define TIMED_FUNCTION(...) TIMED_BLOCK_(__FUNCTION__, __COUNTER__, ## __VA_ARGS__) #define BEGIN_BLOCK_(Counter) RecordDebugEvent(Counter, DebugEvent_BeginBlock); #define END_BLOCK_(Counter) RecordDebugEvent(Counter, DebugEvent_EndBlock); #define BEGIN_BLOCK(Name) \ GET_COUNTER(Counter_##Name, __FILE__, __LINE__, #Name); \ BEGIN_BLOCK_(Counter_##Name); #define END_BLOCK(Name) \ END_BLOCK_(Counter_##Name); struct timed_block { int Counter; timed_block(int CounterInit) { // TODO(casey): Record the hit count value here? Counter = CounterInit; BEGIN_BLOCK_(Counter); } ~timed_block() { END_BLOCK_(Counter); } }; |
1 2 3 4 5 6 7 | inline u32 AtomicAddU32(u32 volatile *Value, u32 Addend) { // NOTE(casey): Returns the original value _prior_ to adding u32 Result = _InterlockedExchangeAdd((long volatile *)Value, Addend); return(Result); } |
1 | u32 GlobalDebugCounter = 0; |
1 | GlobalDebugTable->RecordCount[TRANSLATION_UNIT_INDEX] = GlobalDebugCounter; |
1 2 3 4 5 | #define GET_COUNTER(... arguments, FileName, LineNumber) \ { \ static int CounterNumber = Hash(FileName, LineNumber); \ ... // use CounterNumber here \ } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | static u32 test = (u32)(__FUNCTION__ + 1); 000007FE967656D7 mov eax,dword ptr [test+4h (07FED6C28AB0h)] 000007FE967656DD test al,1 000007FE967656DF jne DrawRectangleQuickly+119h (07FE967656F9h) 000007FE967656E1 lea rcx,[string "DrawRectangleQuickly"+1h (07FE96791061h)] 000007FE967656E8 or eax,1 000007FE967656EB mov dword ptr [test (07FED6C28AACh)],ecx 000007FE967656F1 mov dword ptr [test+4h (07FED6C28AB0h)],eax 000007FE967656F7 jmp DrawRectangleQuickly+11Fh (07FE967656FFh) 000007FE967656F9 mov ecx,dword ptr [test (07FED6C28AACh)] static u32 test2 = (u32)(__FUNCTION__ + 2); 000007FE967656FF test al,2 000007FE96765701 jne DrawRectangleQuickly+13Bh (07FE9676571Bh) 000007FE96765703 or eax,2 000007FE96765706 mov dword ptr [test+4h (07FED6C28AB0h)],eax 000007FE9676570C lea rax,[string "DrawRectangleQuickly"+2h (07FE96791062h)] 000007FE96765713 mov dword ptr [test2 (07FED6C28AB4h)],eax 000007FE96765719 jmp DrawRectangleQuickly+141h (07FE96765721h) 000007FE9676571B mov eax,dword ptr [test2 (07FED6C28AB4h)] |
1 2 3 4 5 6 | static int OurIndex; if(OurIndex == 0) { // Do expensive initialization with hash table here } // Do the stuff here |
Mox
After that my solution will have 'perfect' indexing, while the hash will at the least have empty buckets to look through at collation time, or at the worst some extra chains to get to the right item with each lookup, even at event recording.
It get's even worse, the assembly that it produces is not even thread safe: a second thread getting the same static value might see the flag that it has been initialised before the actual value is stored in some conditions (two statics back to back) if my assembly reading skills are not deserting me. Bad compiler bug :ohmy:This is kind of tricky. C++ standard before C++11 didn't specified thread-safety for static initialization. So that code is perfectly valid code for compiler with version less than C++11.
1 2 3 4 5 6 | int g(); int f() { static int a = g(); return a; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ?f@@YAHXZ PROC ; f, COMDAT $LN7: sub rsp, 40 ; 00000028H mov ecx, DWORD PTR _tls_index mov rax, QWORD PTR gs:88 mov edx, OFFSET FLAT:_Init_thread_epoch mov rax, QWORD PTR [rax+rcx*8] mov ecx, DWORD PTR [rdx+rax] cmp DWORD PTR ?$TSS0@?1??f@@YAHXZ@4HA, ecx jle SHORT $LN4@f lea rcx, OFFSET FLAT:?$TSS0@?1??f@@YAHXZ@4HA call _Init_thread_header cmp DWORD PTR ?$TSS0@?1??f@@YAHXZ@4HA, -1 jne SHORT $LN4@f call ?g@@YAHXZ ; g lea rcx, OFFSET FLAT:?$TSS0@?1??f@@YAHXZ@4HA mov DWORD PTR ?a@?1??f@@YAHXZ@4HA, eax call _Init_thread_footer $LN4@f: mov eax, DWORD PTR ?a@?1??f@@YAHXZ@4HA add rsp, 40 ; 00000028H ret 0 ?f@@YAHXZ ENDP ; f |
1 | #define TIMED_FUNCTION(...) TIMED_BLOCK_(__FUNCTION__, __COUNTER__, TRANSLATION_UNIT_INDEX, ## __VA_ARGS__) |
cmuratori
Once you introduce a static, it doesn't matter how expensive the lookup is. That's the key thing.
mmozeiko
This is kind of tricky. C++ standard before C++11 didn't specified thread-safety for static initialization. So that code is perfectly valid code for compiler with version less than C++11.
skeeto
I believe making the inline function static would also solve the problem. It would prevent the linker from merging the different versions of this function in each translation unit into a single function with the wrong values.
mmozeiko
Another options is to use it only in place where macro gets expanded. Same way how we use __FILE__ and __LINE__ macros. Imho this is much better option.
Mox
Wouldn't this be valid for my "real" counter code too?
But I guess the compiler is not recognizing this as a real constant value.Gcc is recognizing that :)
1 2 3 4 5 6 7 8 9 10 | __Z1fv: movl __ZZ1fvE4test, %eax ret ... __ZZ1fvE12__FUNCTION__: .ascii "f\0" __ZZ1fvE4test: .long __ZZ1fvE12__FUNCTION__+1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | static u16 GetDebugRecordIndex(char *FileName, int LineNumber, char *BlockName) { int DebugRecordIndex = ReallyGoodHashFunction(FileName, LineNumber); // Records now is simply debug_record Records[MAX_DEBUG_RECORD_COUNT] // no need for MAX_DEBUG_TRANSLATION_UNITS debug_record *Record = GlobalDebugTable->Records[DebugRecordIndex]; debug_record *InitialRecord = Record; // no need to compare FileName and LinenNumber values, because for // each (FileName, LineNumber) pair function will be called only once, // so simply find first unused place in table while (Record->FileName) { DebugRecordIndex = (DebugRecordIndex + 1) % MAX_DEBUG_RECORD_COUNT; Record = GlobalDebugTable->Records[DebugRecordIndex]; // if we examined every entry in hash table that means hash table has less entries // than debug records we are putting in source code! Increase MAX_DEBUG_RECORD_COUNT! Assert(Record != InitialRecord); } Record->FileName = FileName; Record->LineNumber = LineNumber; Record->BlockName = BlockName; return DebugRecordIndex; } inline debug_event *RecordDebugEventCommon(u16 RecordIndex, debug_event_type EventType) { u64 ArrayIndex_EventIndex = AtomicAddU64(&GlobalDebugTable->EventArrayIndex_EventIndex, 1); u32 EventIndex = ArrayIndex_EventIndex & 0xFFFFFFFF; Assert(EventIndex < MAX_DEBUG_EVENT_COUNT); debug_event *Event = GlobalDebugTable->Events[ArrayIndex_EventIndex >> 32] + EventIndex; Event->Clock = __rdtsc(); Event->DebugRecordIndex = RecordIndex; Event->Type = (u8)EventType; return Event; } inline void RecordDebugEvent(u16 RecordIndex, debug_event_type EventType) { debug_event* Event = debguRecordDebugEventCommon(RecordIndex, EventType); Event->TC.CoreIndex = 0; Event->TC.ThreadID = (u16)GetThreadID(); } #define FRAME_MARKER(SecondsElapsedInit) \ { \ static u16 RecordIndex = GetDebugRecordIndex(__FILE__, __LINE__, "Frame Marker"); \ debug_event *Event = RecordDebugEventCommon(RecordIndex, DebugEvent_FrameMarker); \ Event->SecondsElapsed = SecondsElapsedInit; \ } #if HANDMADE_PROFILE #define TIMED_BLOCK__(BlockName, Number, ...) \ static u16 Record_##Number = GetDebugRecordIndex(__FILE__, __LINE__, BlockName); \ timed_block TimedBlock_##Number(Record_##Number, ## __VA_ARGS__) #define TIMED_BLOCK_(BlockName, Number, ...) TIMED_BLOCK__(BlockName, Number, ## __VA_ARGS__) #define TIMED_BLOCK(BlockName, ...) TIMED_BLOCK_(#BlockName, __LINE__, ## __VA_ARGS__) #define TIMED_FUNCTION(...) TIMED_BLOCK_(__FUNCTION__, __LINE__, ## __VA_ARGS__) #define BEGIN_BLOCK_(RecordIndex) RecordDebugEvent(RecordIndex, DebugEvent_BeginBlock); #define END_BLOCK_(RecordIndex) RecordDebugEvent(RecordIndex, DebugEvent_EndBlock); #define BEGIN_BLOCK(Name) \ static u16 Record_##Name = GetDebugRecordIndex(__FILE__, __LINE__); \ BEGIN_BLOCK_(Record_##Name); #define END_BLOCK(Name) \ END_BLOCK_(Record_##Name); struct timed_block { u16 RecordIndex; timed_block(u16 RecordInit, u32 HitCountInit = 1) { // TODO(casey): Record the hit count value here? RecordIndex = RecordInit; BEGIN_BLOCK_(Record); } ~timed_block() { END_BLOCK_(Record); } }; |
1 2 3 4 5 6 | // initialized with 0 by default, that means nobody is running this function static volatile u32 Lock; // try to get a "lock", repeat if didn't succeeded while (AtomicCompareExchangeUInt32(&Lock, 1, 0) != 0) { } |
1 2 | // release a "lock" *Lock = 0; |