Simon Anciaux
1185 posts
Edited by Simon Anciaux on Reason: Added question mark icon
Hi,

I was watching day 27 and doing the drawRectangle function and at some point I though it would be great to add an assert to check if minX < maxX (before realising the for loop will check for that) and made a call to drawRectangle that would trigger the assert.

I recompiled with the game still running and Visual studio display the "break window" but didn't load the PDB file and I could not debug. I have to click the "Browse and find handmade_xxxx.pdb..." link for it to load (I don't have to browse, it load as soon as I click the link).

[attachment=9]pdb_problem.jpg[/attachment]

I though it might be that the assert triggers directly and the pdb file isn't ready right after the dll creation, but I set the drawRectangle call on a button press and the same thing happened. If I restart the application (no dll hotloading) the pdb is correctly loaded right away. Does that happen only for me ? Am I missing some setting ? I use Visual Studio 2012 pro.

Another unrelated small question : my drawRectangle code looks like this

 1 2 3 4 5 6 7 uint32* pixel = ( uint32* ) buffer->memory; for ( int y = roundedMinY; y < roundedMaxY; y++ ) { for ( int x = roundedMinX; x < roundedMaxX; x++ ) { *( pixel + y * buffer->width + x ) = color; } } 

Is there a downside to computing the pixel offset from x and y every loop cycle instead of incrementing a pixel pointer ?

Thanks !
Casey Muratori
801 posts / 1 project
Casey Muratori is a programmer at Molly Rocket on the game 1935 and is the host of the educational programming series Handmade Hero.
I'm not sure why Visual Studio is having trouble... I suppose it could be a difference between 2013 (which is what we use on the stream) and 2012, but it could also be something more subtle. Hard to say, since Visual Studio is such an... interesting piece of software.

As for your DrawRectangle question, the answer is that it depends on whether or not the compiler is smart enough to promote the multiplication and addition outside of the loop or not. Most compilers these days, with optimization turns on, probably are, but you'd have to check to be sure.

Also, note that the code as you wrote it will break if we change the pitch to be different from the width, which we may well do, so I would not recommend writing the code to operate with y-times-width like this. It should always be y-times-pitch in bytes, just in case the pitch and the width are different due to alignment or padding.

- Casey
Simon Anciaux
1185 posts
The problem with the pdb was what I thought, I just didn't tested it right. At the moment the dll is loaded, the pdb file is not yet ready. A simple fix is to "Sleep()" for a while before loading the new dll (I tried several durations and 155ms seems the minimum needed on my machine).
 1 2 3 4 5 6 7 FILETIME dllWriteTime = win32GetLastWriteTime( sourceGameCodeDllFullPath ); if ( CompareFileTime( &dllWriteTime, &game.dllLastWriteTime ) != 0 ) { Sleep( 155 ); win32UnloadGameCode( &game ); game = win32LoadGameCode( sourceGameCodeDllFullPath, tempGameCodeDllFullPath ); } 

Maybe I could check if the pdb file exists before loading the dll. Is there a way to know the pdb file name associated with the dll without loading the dll (and just to know: is there a way to know the pdb file name when the dll has been loaded)?
Casey Muratori
801 posts / 1 project
Casey Muratori is a programmer at Molly Rocket on the game 1935 and is the host of the educational programming series Handmade Hero.
VERY INTERESTING!

There is a fairly easy thing we can do to fix this. I'll do it after tomorrow's Q&A.

Thanks,
- Casey
Andrew Bromage
183 posts / 1 project
Research engineer, resident maths nerd (Erdős number 3).
Is there a downside to computing the pixel offset from x and y every loop cycle instead of incrementing a pixel pointer ?

Casey is right, that on a modern compiler it probably won't matter. This is an optimisation known as "strength reduction", and all compilers do it because programmers often access arrays in loops.

Do bear in mind that if it works, it's only because the compiler can work out that buffer->width won't change inside the loop. It can almost certainly work that out in this case.

However, the compiler is smart enough to realise that someone else might know about "buffer". So if you alter the loop in various simple ways (e.g. if you introduce a function call), the compiler may not be able to make that assumption any more.

You can guard against this by moving the width into a local variable:

 1 2 3 4 5 6 7 8 uint32* pixel = ( uint32* ) buffer->memory; uint32 width = buffer->width; for ( int y = roundedMinY; y < roundedMaxY; y++ ) { for ( int x = roundedMinX; x < roundedMaxX; x++ ) { *( pixel + y * width + x ) = color; } } 

The compiler knows that nobody else can see the local variable, so it's guaranteed not to be modified in the loop. In general, you can future-proof performance critical loops if you can arrange things so that they only do work on local variables.

Having said all that, I encourage you to find out for yourself what happens on your compiler. You could inspect the generated code of both versions, or you could measure the time taken by both versions using rdtsc.
Simon Anciaux
1185 posts
Thanks for the answer. Since I had some time, I checked the assembly and did some test with rdtsc.

"My code" assembly was a little shorter than Casey's but took more time to execute: 68k cycles against 62k cycles average.

Those were with the compiler optimisations off. I then tried with -O2 instead of -Od.

The rdtsc calls were optimised out. "Casey's code" assembly is a lot shorter (1/3 of my code). It's weird how the optimised assembly doesn't look at all like the original code: lots of things are duplicated, there are more loops and branches (I mean the C++ lines visual studio shows in the assembly as I didn't "translated" all the assembly).

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 internal void drawRectangle( GameOffscreenBuffer* buffer, real32 minX, real32 minY, real32 maxX, real32 maxY, real32 r, real32 g, real32 b ) { 000007FEF2061000 48 89 5C 24 08 mov qword ptr [buffer],rbx 000007FEF2061005 48 89 7C 24 10 mov qword ptr [minX],rdi int32 roundedMinX = roundReal32ToInt32( minX ); 000007FEF206100A F3 0F 10 25 46 A3 05 00 movss xmm4,dword ptr [__real@3f000000 (07FEF20BB358h)] int32 roundedMaxY = roundReal32ToInt32( maxY ); 000007FEF2061012 F3 0F 10 44 24 28 movss xmm0,dword ptr [maxY] if ( roundedMinX < 0 ) { 000007FEF2061018 33 C0 xor eax,eax 000007FEF206101A 48 8B D9 mov rbx,rcx int32 roundedMaxY = roundReal32ToInt32( maxY ); 000007FEF206101D F3 0F 58 C4 addss xmm0,xmm4 int32 roundedMinX = roundReal32ToInt32( minX ); 000007FEF2061021 F3 0F 58 CC addss xmm1,xmm4 int32 roundedMinY = roundReal32ToInt32( minY ); 000007FEF2061025 F3 0F 58 D4 addss xmm2,xmm4 int32 roundedMaxY = roundReal32ToInt32( maxY ); 000007FEF2061029 F3 0F 2C C8 cvttss2si ecx,xmm0 roundedMaxY = buffer->height; } uint32 color = ( roundReal32ToUInt32( r * 255.0f ) << 16 ) | ( roundReal32ToUInt32( g * 255.0f ) << 8 ) | ( roundReal32ToUInt32( b * 255.0f ) << 0 ); 000007FEF206102D F3 0F 10 44 24 30 movss xmm0,dword ptr [r] int32 roundedMinX = roundReal32ToInt32( minX ); 000007FEF2061033 F3 44 0F 2C C9 cvttss2si r9d,xmm1 if ( roundedMinX < 0 ) { 000007FEF2061038 45 85 C9 test r9d,r9d 000007FEF206103B 44 0F 48 C8 cmovs r9d,eax roundedMaxY = buffer->height; } uint32 color = ( roundReal32ToUInt32( r * 255.0f ) << 16 ) | ( roundReal32ToUInt32( g * 255.0f ) << 8 ) | ( roundReal32ToUInt32( b * 255.0f ) << 0 ); 000007FEF206103F F3 0F 10 0D 29 A3 05 00 movss xmm1,dword ptr [__real@437f0000 (07FEF20BB370h)] int32 roundedMinY = roundReal32ToInt32( minY ); 000007FEF2061047 F3 0F 2C FA cvttss2si edi,xmm2 int32 roundedMaxX = roundReal32ToInt32( maxX ); 000007FEF206104B F3 0F 58 DC addss xmm3,xmm4 roundedMaxY = buffer->height; } uint32 color = ( roundReal32ToUInt32( r * 255.0f ) << 16 ) | ( roundReal32ToUInt32( g * 255.0f ) << 8 ) | ( roundReal32ToUInt32( b * 255.0f ) << 0 ); 000007FEF206104F F3 0F 59 C1 mulss xmm0,xmm1 000007FEF2061053 85 FF test edi,edi 000007FEF2061055 0F 48 F8 cmovs edi,eax 000007FEF2061058 F3 0F 58 C4 addss xmm0,xmm4 int32 roundedMaxX = roundReal32ToInt32( maxX ); 000007FEF206105C F3 44 0F 2C D3 cvttss2si r10d,xmm3 roundedMinX = 0; } if ( roundedMinY < 0 ) { roundedMinY = 0; } if ( roundedMaxX > buffer->width ) { 000007FEF2061061 44 3B 53 08 cmp r10d,dword ptr [rbx+8] roundedMinX = 0; } if ( roundedMinY < 0 ) { roundedMinY = 0; } if ( roundedMaxX > buffer->width ) { 000007FEF2061065 44 0F 4F 53 08 cmovg r10d,dword ptr [rbx+8] roundedMaxX = buffer->width; } if ( roundedMaxY > buffer->height ) { 000007FEF206106A 3B 4B 0C cmp ecx,dword ptr [rbx+0Ch] roundedMaxY = buffer->height; } uint32 color = ( roundReal32ToUInt32( r * 255.0f ) << 16 ) | ( roundReal32ToUInt32( g * 255.0f ) << 8 ) | ( roundReal32ToUInt32( b * 255.0f ) << 0 ); 000007FEF206106D F3 4C 0F 2C D8 cvttss2si r11,xmm0 roundedMaxX = buffer->width; } if ( roundedMaxY > buffer->height ) { 000007FEF2061072 0F 4F 4B 0C cmovg ecx,dword ptr [rbx+0Ch] roundedMaxY = buffer->height; } uint32 color = ( roundReal32ToUInt32( r * 255.0f ) << 16 ) | ( roundReal32ToUInt32( g * 255.0f ) << 8 ) | ( roundReal32ToUInt32( b * 255.0f ) << 0 ); 000007FEF2061076 41 C1 E3 08 shl r11d,8 000007FEF206107A F3 0F 10 44 24 38 movss xmm0,dword ptr [g] 000007FEF2061080 F3 0F 59 C1 mulss xmm0,xmm1 000007FEF2061084 F3 0F 58 C4 addss xmm0,xmm4 000007FEF2061088 F3 48 0F 2C C0 cvttss2si rax,xmm0 000007FEF206108D F3 0F 10 44 24 40 movss xmm0,dword ptr [b] 000007FEF2061093 F3 0F 59 C1 mulss xmm0,xmm1 000007FEF2061097 44 0B D8 or r11d,eax 000007FEF206109A F3 0F 58 C4 addss xmm0,xmm4 000007FEF206109E 41 C1 E3 08 shl r11d,8 000007FEF20610A2 F3 48 0F 2C C0 cvttss2si rax,xmm0 000007FEF20610A7 44 0B D8 or r11d,eax uint64 start = __rdtsc( ); uint8* leftSideStart = ( uint8* ) buffer->memory + roundedMinY * buffer->pitch + roundedMinX * buffer->bytesPerPixel; 000007FEF20610AA 8B 43 14 mov eax,dword ptr [rbx+14h] 000007FEF20610AD 41 0F AF C1 imul eax,r9d 000007FEF20610B1 48 63 D0 movsxd rdx,eax 000007FEF20610B4 8B 43 10 mov eax,dword ptr [rbx+10h] 000007FEF20610B7 0F AF C7 imul eax,edi 000007FEF20610BA 48 98 cdqe 000007FEF20610BC 48 03 D0 add rdx,rax 000007FEF20610BF 48 03 13 add rdx,qword ptr [rbx] for ( int y = roundedMinY; y < roundedMaxY; y++ ) { 000007FEF20610C2 3B F9 cmp edi,ecx 000007FEF20610C4 7D 29 jge drawRectangle+0EFh (07FEF20610EFh) roundedMaxY = buffer->height; } uint32 color = ( roundReal32ToUInt32( r * 255.0f ) << 16 ) | ( roundReal32ToUInt32( g * 255.0f ) << 8 ) | ( roundReal32ToUInt32( b * 255.0f ) << 0 ); 000007FEF20610C6 2B CF sub ecx,edi 000007FEF20610C8 44 8B C1 mov r8d,ecx 000007FEF20610CB 0F 1F 44 00 00 nop dword ptr [rax+rax] for ( int x = roundedMinX; x < roundedMaxX; x++ ) { 000007FEF20610D0 45 3B CA cmp r9d,r10d for ( int x = roundedMinX; x < roundedMaxX; x++ ) { 000007FEF20610D3 7D 0E jge drawRectangle+0E3h (07FEF20610E3h) uint32* pixel = ( uint32* ) leftSideStart; 000007FEF20610D5 41 8B CA mov ecx,r10d 000007FEF20610D8 41 8B C3 mov eax,r11d 000007FEF20610DB 48 8B FA mov rdi,rdx 000007FEF20610DE 41 2B C9 sub ecx,r9d 000007FEF20610E1 F3 AB rep stos dword ptr [rdi] ( *pixel ) = color; pixel++; } leftSideStart += buffer->pitch; 000007FEF20610E3 48 63 43 10 movsxd rax,dword ptr [rbx+10h] 000007FEF20610E7 48 03 D0 add rdx,rax 000007FEF20610EA 49 FF C8 dec r8 000007FEF20610ED 75 E1 jne drawRectangle+0D0h (07FEF20610D0h) } uint64 end = __rdtsc( ); uint64 diff = end - start; } 000007FEF20610EF 48 8B 5C 24 08 mov rbx,qword ptr [buffer] 000007FEF20610F4 48 8B 7C 24 10 mov rdi,qword ptr [minX] 000007FEF20610F9 C3 ret 
Mārtiņš Možeiko
2357 posts / 2 projects
Don't benchmark code without optimizations. Those numbers can be completely different than code with optimizations.

Compiler is free to optimize away any local variable that is never used (read) So it can easy remove your diff variable. And after that end and start. There is cheap trick to prevent compiler to do that - add volatile keyword for diff variable:
 1 volatile uint64_t diff = end - start; 

This will make compiler think that writing to diff variable has side-effects, and it can not remove it.
Simon Anciaux
1185 posts
I had never used the volatile keyword.
Using only "volatile uint64 diff..." I couldn't inspect the variable in visual studio ("No code generated") so I used it in snprintf and OutputDebugString. My code was 34k cycles average and Casey's code was 33k cycles average (~3% difference).

I'll stop there as there is no point in measuring performances at this point, just did it for the "fun". Thanks for the advices !
Mārtiņš Možeiko
2357 posts / 2 projects
Edited by Mārtiņš Možeiko on
When compilers optimize code often they don't track very accurately where local variables are stored on stack. To workaround that you can do one of two following things:

1) make variable global. In that case you probably want to separate initialization from definition as global vars are initialized only first time.
 1 2 static volatile uint64_t diff; diff = end - start; 

2) Use new feature introduced in VS2013 Update 4 where compiler can generate better debugging database (pdb file) for optimized code. It will larger, but it will track local variable locations much better. To do that add "/Zo" compiler option (you should replace /Zi or /Z7 with /Zo, or at least put /Zo after /Zi or /Z7). For more information about this read here: http://randomascii.wordpress.com/...ed-codenew-in-visual-studio-2012/ This feature is available also in VS2012 as undocumented "/d2Zi+" compiler option.
Benjamin Kloster
48 posts
Edited by Benjamin Kloster on
I may have found another solution for the PDB loading here. To restate our problem: The PDB is created after the DLL. If the game detects a change in the DLL and reloads it, the PDB is still not completely generated, so VS fails to load it. The obvious fix would be to not look at the modification time of the DLL, but of the PDB. Problem is, the PDB has a random name that we can't predict. This random name was necessary because when we load a new DLL, VS locks the accompanying PDB file and will not release it, even after we unloaded the DLL. This prevents us from using the same trick for the PDB that we used for the DLL, namely to copy the files generated by the compiler to a temporary location and load it from there.

You see, it would be so much simpler if VS would simply allow us to overwrite the PDB file when it's no longer needed. And we can convince it to do just that. The trick is to enable "Native Edit and Continue" in Visual Studio, found in Tools -> Options -> Debugging -> Edit and Continue. You can disable all other checkboxes in that dialog box if you like.

This causes Visual Studio to unlock the PDB file when the DLL is unloaded. This by itself obviously still doesn't allow us to overwrite the PDB while the DLL is still loaded, so we have to also copy the PDB to a temporary location, just like the DLL. But of course, the PDB's path is hardcoded into the DLL... unless we add the linker option /PDBALTPATH:filename when compiling the DLL. The PDB will still be generated in the same place, but on loading, the debugger will look for the PDBALTPATH value first.

So, to reiterate:

1. Enable native edit and continue in Visual Studio (Tools -> Options -> Debugging -> Edit and continue)
4. Watch the modification date of handmade.pdb instead of handmade.dll. This assumes that the PDB is _always_ generated after the DLL. If you want it bulletproof, maybe wait for both modification dates to change.

I still have a couple of problem with that solution. First, the PDB modification time seems to change multiple times during compilation, causing multiple attempts to reload it. Those attempts fail at copying the DLL and / or PDB files to their temporary location.

The second problem is that it relies on the "Native Edit and Continue" option being set. It's easy to forget for new developers coming aboard and I don't see a good way of detecting this to notify the user that this is why hot-reloading is not working.
Simon Anciaux
1185 posts
Thanks.
I'll stay with the "sleep" solution for now. I hope that Casey will find some time to explain his solution soon.
Casey Muratori
801 posts / 1 project
Casey Muratori is a programmer at Molly Rocket on the game 1935 and is the host of the educational programming series Handmade Hero.
I'll explain it here in the meantime :)

Just check for the existence of a lock file. In the DLL reload code, check for the existence of "pdb.lock" and if it is there, DON'T load the DLL. Then in the build.bat, before running the DLL compile, do something like "echo FOO > pdb.lock" and then in between the DLL compile and the EXE compile, do "del pdb.lock".

- Casey
Simon Anciaux
1185 posts
It seems to work. Thanks

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 char pdbLockFullPath[ WIN32_STATE_FILE_NAME_COUNT ]; win32BuildExePathFileName( &state, "pdb.lock", pdbLockFullPath ); ... FILETIME dllWriteTime = win32GetLastWriteTime( sourceGameCodeDllFullPath ); if ( CompareFileTime( &dllWriteTime, &game.dllLastWriteTime ) != 0 ) { DWORD attributes = GetFileAttributes( pdbLockFullPath ); if ( attributes == INVALID_FILE_ATTRIBUTES ) { win32UnloadGameCode( &game ); game = win32LoadGameCode( sourceGameCodeDllFullPath, tempGameCodeDllFullPath ); } } 

 1 2 3 4 5 6 ... echo lock > pdb.lock cl %commonCompilerFlags% ..\code\handmade.cpp -LD -link -INCREMENTAL:NO -PDB:handmade_%random%.pdb -EXPORT:gameGetSoundSamples -EXPORT:gameUpdateAndRender del pdb.lock cl %commonCompilerFlags% ..\code\win32_handmade.cpp /link %commonLinkerFlags% ... 
Ville Penttinen
8 posts
 1 2 3 4 5 6 7 Visual Studio DLL Project Properties -> Build Events -> Pre-Link Event Command Line: echo lock > $(OutputPath)pdb.lock Post-Build Event Command Line: del$(OutputPath)pdb.lock