Dll PDB doesn't "autoload" when hotloading the dll

Hi,

I was watching day 27 and doing the drawRectangle function and at some point I though it would be great to add an assert to check if minX < maxX (before realising the for loop will check for that) and made a call to drawRectangle that would trigger the assert.

I recompiled with the game still running and Visual studio display the "break window" but didn't load the PDB file and I could not debug. I have to click the "Browse and find handmade_xxxx.pdb..." link for it to load (I don't have to browse, it load as soon as I click the link).

[attachment=9]pdb_problem.jpg[/attachment]

I though it might be that the assert triggers directly and the pdb file isn't ready right after the dll creation, but I set the drawRectangle call on a button press and the same thing happened. If I restart the application (no dll hotloading) the pdb is correctly loaded right away. Does that happen only for me ? Am I missing some setting ? I use Visual Studio 2012 pro.

Another unrelated small question : my drawRectangle code looks like this

1
2
3
4
5
6
7
uint32* pixel = ( uint32* ) buffer->memory;

for ( int y = roundedMinY; y < roundedMaxY; y++ ) {
	for ( int x = roundedMinX; x < roundedMaxX; x++ ) {
		*( pixel + y * buffer->width + x ) = color;
	}
}

Is there a downside to computing the pixel offset from x and y every loop cycle instead of incrementing a pixel pointer ?

Thanks !

Edited by Simon Anciaux on Reason: Added question mark icon
I'm not sure why Visual Studio is having trouble... I suppose it could be a difference between 2013 (which is what we use on the stream) and 2012, but it could also be something more subtle. Hard to say, since Visual Studio is such an... interesting piece of software.

As for your DrawRectangle question, the answer is that it depends on whether or not the compiler is smart enough to promote the multiplication and addition outside of the loop or not. Most compilers these days, with optimization turns on, probably are, but you'd have to check to be sure.

Also, note that the code as you wrote it will break if we change the pitch to be different from the width, which we may well do, so I would not recommend writing the code to operate with y-times-width like this. It should always be y-times-pitch in bytes, just in case the pitch and the width are different due to alignment or padding.

- Casey
The problem with the pdb was what I thought, I just didn't tested it right. At the moment the dll is loaded, the pdb file is not yet ready. A simple fix is to "Sleep()" for a while before loading the new dll (I tried several durations and 155ms seems the minimum needed on my machine).
1
2
3
4
5
6
7
FILETIME dllWriteTime = win32GetLastWriteTime( sourceGameCodeDllFullPath );

if ( CompareFileTime( &dllWriteTime, &game.dllLastWriteTime ) != 0 ) {
	Sleep( 155 );
	win32UnloadGameCode( &game );
	game = win32LoadGameCode( sourceGameCodeDllFullPath, tempGameCodeDllFullPath );
}

Maybe I could check if the pdb file exists before loading the dll. Is there a way to know the pdb file name associated with the dll without loading the dll (and just to know: is there a way to know the pdb file name when the dll has been loaded)?
VERY INTERESTING!

There is a fairly easy thing we can do to fix this. I'll do it after tomorrow's Q&A.

Thanks,
- Casey
Is there a downside to computing the pixel offset from x and y every loop cycle instead of incrementing a pixel pointer ?

Casey is right, that on a modern compiler it probably won't matter. This is an optimisation known as "strength reduction", and all compilers do it because programmers often access arrays in loops.

Do bear in mind that if it works, it's only because the compiler can work out that buffer->width won't change inside the loop. It can almost certainly work that out in this case.

However, the compiler is smart enough to realise that someone else might know about "buffer". So if you alter the loop in various simple ways (e.g. if you introduce a function call), the compiler may not be able to make that assumption any more.

You can guard against this by moving the width into a local variable:

1
2
3
4
5
6
7
8
uint32* pixel = ( uint32* ) buffer->memory;
uint32 width = buffer->width;

for ( int y = roundedMinY; y < roundedMaxY; y++ ) {
	for ( int x = roundedMinX; x < roundedMaxX; x++ ) {
		*( pixel + y * width + x ) = color;
	}
}


The compiler knows that nobody else can see the local variable, so it's guaranteed not to be modified in the loop. In general, you can future-proof performance critical loops if you can arrange things so that they only do work on local variables.

Having said all that, I encourage you to find out for yourself what happens on your compiler. You could inspect the generated code of both versions, or you could measure the time taken by both versions using rdtsc.
Thanks for the answer. Since I had some time, I checked the assembly and did some test with rdtsc.

"My code" assembly was a little shorter than Casey's but took more time to execute: 68k cycles against 62k cycles average.

Those were with the compiler optimisations off. I then tried with -O2 instead of -Od.

The rdtsc calls were optimised out. "Casey's code" assembly is a lot shorter (1/3 of my code). It's weird how the optimised assembly doesn't look at all like the original code: lots of things are duplicated, there are more loops and branches (I mean the C++ lines visual studio shows in the assembly as I didn't "translated" all the assembly).

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
internal void drawRectangle( GameOffscreenBuffer* buffer,
								real32 minX, real32 minY, real32 maxX, real32 maxY,
								real32 r, real32 g, real32 b ) {
000007FEF2061000 48 89 5C 24 08       mov         qword ptr [buffer],rbx  
000007FEF2061005 48 89 7C 24 10       mov         qword ptr [minX],rdi  

	int32 roundedMinX = roundReal32ToInt32( minX );
000007FEF206100A F3 0F 10 25 46 A3 05 00 movss       xmm4,dword ptr [__real@3f000000 (07FEF20BB358h)]  
	int32 roundedMaxY = roundReal32ToInt32( maxY );
000007FEF2061012 F3 0F 10 44 24 28    movss       xmm0,dword ptr [maxY]  

	if ( roundedMinX < 0 ) {
000007FEF2061018 33 C0                xor         eax,eax  
000007FEF206101A 48 8B D9             mov         rbx,rcx  
	int32 roundedMaxY = roundReal32ToInt32( maxY );
000007FEF206101D F3 0F 58 C4          addss       xmm0,xmm4  

	int32 roundedMinX = roundReal32ToInt32( minX );
000007FEF2061021 F3 0F 58 CC          addss       xmm1,xmm4  
	int32 roundedMinY = roundReal32ToInt32( minY );
000007FEF2061025 F3 0F 58 D4          addss       xmm2,xmm4  
	int32 roundedMaxY = roundReal32ToInt32( maxY );
000007FEF2061029 F3 0F 2C C8          cvttss2si   ecx,xmm0  
		roundedMaxY = buffer->height;
	}
	
	uint32 color =
		( roundReal32ToUInt32( r * 255.0f ) << 16 )
		| ( roundReal32ToUInt32( g * 255.0f ) << 8 )
		| ( roundReal32ToUInt32( b * 255.0f ) << 0 );
000007FEF206102D F3 0F 10 44 24 30    movss       xmm0,dword ptr [r]  

	int32 roundedMinX = roundReal32ToInt32( minX );
000007FEF2061033 F3 44 0F 2C C9       cvttss2si   r9d,xmm1  

	if ( roundedMinX < 0 ) {
000007FEF2061038 45 85 C9             test        r9d,r9d  
000007FEF206103B 44 0F 48 C8          cmovs       r9d,eax  
		roundedMaxY = buffer->height;
	}
	
	uint32 color =
		( roundReal32ToUInt32( r * 255.0f ) << 16 )
		| ( roundReal32ToUInt32( g * 255.0f ) << 8 )
		| ( roundReal32ToUInt32( b * 255.0f ) << 0 );
000007FEF206103F F3 0F 10 0D 29 A3 05 00 movss       xmm1,dword ptr [__real@437f0000 (07FEF20BB370h)]  
	int32 roundedMinY = roundReal32ToInt32( minY );
000007FEF2061047 F3 0F 2C FA          cvttss2si   edi,xmm2  
	int32 roundedMaxX = roundReal32ToInt32( maxX );
000007FEF206104B F3 0F 58 DC          addss       xmm3,xmm4  
		roundedMaxY = buffer->height;
	}
	
	uint32 color =
		( roundReal32ToUInt32( r * 255.0f ) << 16 )
		| ( roundReal32ToUInt32( g * 255.0f ) << 8 )
		| ( roundReal32ToUInt32( b * 255.0f ) << 0 );
000007FEF206104F F3 0F 59 C1          mulss       xmm0,xmm1  
000007FEF2061053 85 FF                test        edi,edi  
000007FEF2061055 0F 48 F8             cmovs       edi,eax  
000007FEF2061058 F3 0F 58 C4          addss       xmm0,xmm4  
	int32 roundedMaxX = roundReal32ToInt32( maxX );
000007FEF206105C F3 44 0F 2C D3       cvttss2si   r10d,xmm3  
		roundedMinX = 0;
	}
	
	if ( roundedMinY < 0 ) {
		roundedMinY = 0;
	}
	
	if ( roundedMaxX > buffer->width ) {
000007FEF2061061 44 3B 53 08          cmp         r10d,dword ptr [rbx+8]  
		roundedMinX = 0;
	}
	
	if ( roundedMinY < 0 ) {
		roundedMinY = 0;
	}
	
	if ( roundedMaxX > buffer->width ) {
000007FEF2061065 44 0F 4F 53 08       cmovg       r10d,dword ptr [rbx+8]  
		roundedMaxX = buffer->width;
	}
	
	if ( roundedMaxY > buffer->height ) {
000007FEF206106A 3B 4B 0C             cmp         ecx,dword ptr [rbx+0Ch]  
		roundedMaxY = buffer->height;
	}
	
	uint32 color =
		( roundReal32ToUInt32( r * 255.0f ) << 16 )
		| ( roundReal32ToUInt32( g * 255.0f ) << 8 )
		| ( roundReal32ToUInt32( b * 255.0f ) << 0 );
000007FEF206106D F3 4C 0F 2C D8       cvttss2si   r11,xmm0  
		roundedMaxX = buffer->width;
	}
	
	if ( roundedMaxY > buffer->height ) {
000007FEF2061072 0F 4F 4B 0C          cmovg       ecx,dword ptr [rbx+0Ch]  
		roundedMaxY = buffer->height;
	}
	
	uint32 color =
		( roundReal32ToUInt32( r * 255.0f ) << 16 )
		| ( roundReal32ToUInt32( g * 255.0f ) << 8 )
		| ( roundReal32ToUInt32( b * 255.0f ) << 0 );
000007FEF2061076 41 C1 E3 08          shl         r11d,8  
000007FEF206107A F3 0F 10 44 24 38    movss       xmm0,dword ptr [g]  
000007FEF2061080 F3 0F 59 C1          mulss       xmm0,xmm1  
000007FEF2061084 F3 0F 58 C4          addss       xmm0,xmm4  
000007FEF2061088 F3 48 0F 2C C0       cvttss2si   rax,xmm0  
000007FEF206108D F3 0F 10 44 24 40    movss       xmm0,dword ptr [b]  
000007FEF2061093 F3 0F 59 C1          mulss       xmm0,xmm1  
000007FEF2061097 44 0B D8             or          r11d,eax  
000007FEF206109A F3 0F 58 C4          addss       xmm0,xmm4  
000007FEF206109E 41 C1 E3 08          shl         r11d,8  
000007FEF20610A2 F3 48 0F 2C C0       cvttss2si   rax,xmm0  
000007FEF20610A7 44 0B D8             or          r11d,eax  

	uint64 start = __rdtsc( );
		
    uint8* leftSideStart = ( uint8* ) buffer->memory + roundedMinY * buffer->pitch + roundedMinX * buffer->bytesPerPixel;
000007FEF20610AA 8B 43 14             mov         eax,dword ptr [rbx+14h]  
000007FEF20610AD 41 0F AF C1          imul        eax,r9d  
000007FEF20610B1 48 63 D0             movsxd      rdx,eax  
000007FEF20610B4 8B 43 10             mov         eax,dword ptr [rbx+10h]  
000007FEF20610B7 0F AF C7             imul        eax,edi  
000007FEF20610BA 48 98                cdqe  
000007FEF20610BC 48 03 D0             add         rdx,rax  
000007FEF20610BF 48 03 13             add         rdx,qword ptr [rbx]  
		
	for ( int y = roundedMinY; y < roundedMaxY; y++ ) {
000007FEF20610C2 3B F9                cmp         edi,ecx  
000007FEF20610C4 7D 29                jge         drawRectangle+0EFh (07FEF20610EFh)  
		roundedMaxY = buffer->height;
	}
	
	uint32 color =
		( roundReal32ToUInt32( r * 255.0f ) << 16 )
		| ( roundReal32ToUInt32( g * 255.0f ) << 8 )
		| ( roundReal32ToUInt32( b * 255.0f ) << 0 );
000007FEF20610C6 2B CF                sub         ecx,edi  
000007FEF20610C8 44 8B C1             mov         r8d,ecx  
000007FEF20610CB 0F 1F 44 00 00       nop         dword ptr [rax+rax]  

		for ( int x = roundedMinX; x < roundedMaxX; x++ ) {
000007FEF20610D0 45 3B CA             cmp         r9d,r10d  

		for ( int x = roundedMinX; x < roundedMaxX; x++ ) {
000007FEF20610D3 7D 0E                jge         drawRectangle+0E3h (07FEF20610E3h)  
	
        uint32* pixel = ( uint32* ) leftSideStart;
000007FEF20610D5 41 8B CA             mov         ecx,r10d  
000007FEF20610D8 41 8B C3             mov         eax,r11d  
000007FEF20610DB 48 8B FA             mov         rdi,rdx  
000007FEF20610DE 41 2B C9             sub         ecx,r9d  
000007FEF20610E1 F3 AB                rep stos    dword ptr [rdi]  
			
            ( *pixel ) = color;
            pixel++;
		}

        leftSideStart += buffer->pitch;
000007FEF20610E3 48 63 43 10          movsxd      rax,dword ptr [rbx+10h]  
000007FEF20610E7 48 03 D0             add         rdx,rax  
000007FEF20610EA 49 FF C8             dec         r8  
000007FEF20610ED 75 E1                jne         drawRectangle+0D0h (07FEF20610D0h)  
	}
	
	uint64 end = __rdtsc( );
	
	uint64 diff = end - start;
}
000007FEF20610EF 48 8B 5C 24 08       mov         rbx,qword ptr [buffer]  
000007FEF20610F4 48 8B 7C 24 10       mov         rdi,qword ptr [minX]  
000007FEF20610F9 C3                   ret  
Don't benchmark code without optimizations. Those numbers can be completely different than code with optimizations.

Compiler is free to optimize away any local variable that is never used (read) So it can easy remove your diff variable. And after that end and start. There is cheap trick to prevent compiler to do that - add volatile keyword for diff variable:
1
volatile uint64_t diff = end - start;

This will make compiler think that writing to diff variable has side-effects, and it can not remove it.
I had never used the volatile keyword.
Using only "volatile uint64 diff..." I couldn't inspect the variable in visual studio ("No code generated") so I used it in snprintf and OutputDebugString. My code was 34k cycles average and Casey's code was 33k cycles average (~3% difference).

I'll stop there as there is no point in measuring performances at this point, just did it for the "fun". Thanks for the advices !
When compilers optimize code often they don't track very accurately where local variables are stored on stack. To workaround that you can do one of two following things:

1) make variable global. In that case you probably want to separate initialization from definition as global vars are initialized only first time.
1
2
static volatile uint64_t diff;
diff = end - start;


2) Use new feature introduced in VS2013 Update 4 where compiler can generate better debugging database (pdb file) for optimized code. It will larger, but it will track local variable locations much better. To do that add "/Zo" compiler option (you should replace /Zi or /Z7 with /Zo, or at least put /Zo after /Zi or /Z7). For more information about this read here: http://randomascii.wordpress.com/...ed-codenew-in-visual-studio-2012/ This feature is available also in VS2012 as undocumented "/d2Zi+" compiler option.

Edited by Mārtiņš Možeiko on
I may have found another solution for the PDB loading here. To restate our problem: The PDB is created after the DLL. If the game detects a change in the DLL and reloads it, the PDB is still not completely generated, so VS fails to load it. The obvious fix would be to not look at the modification time of the DLL, but of the PDB. Problem is, the PDB has a random name that we can't predict. This random name was necessary because when we load a new DLL, VS locks the accompanying PDB file and will not release it, even after we unloaded the DLL. This prevents us from using the same trick for the PDB that we used for the DLL, namely to copy the files generated by the compiler to a temporary location and load it from there.

You see, it would be so much simpler if VS would simply allow us to overwrite the PDB file when it's no longer needed. And we can convince it to do just that. The trick is to enable "Native Edit and Continue" in Visual Studio, found in Tools -> Options -> Debugging -> Edit and Continue. You can disable all other checkboxes in that dialog box if you like.

This causes Visual Studio to unlock the PDB file when the DLL is unloaded. This by itself obviously still doesn't allow us to overwrite the PDB while the DLL is still loaded, so we have to also copy the PDB to a temporary location, just like the DLL. But of course, the PDB's path is hardcoded into the DLL... unless we add the linker option /PDBALTPATH:filename when compiling the DLL. The PDB will still be generated in the same place, but on loading, the debugger will look for the PDBALTPATH value first.

So, to reiterate:

  1. Enable native edit and continue in Visual Studio (Tools -> Options -> Debugging -> Edit and continue)
  2. Add /PDBALTPATH:handmade_temp.pdb to the linker flags of the DLL
  3. When reloading the DLL, also copy handmade.pdb to handmade_temp.pdb after unloading the old DLL, but before loading the new one
  4. Watch the modification date of handmade.pdb instead of handmade.dll. This assumes that the PDB is _always_ generated after the DLL. If you want it bulletproof, maybe wait for both modification dates to change.

I still have a couple of problem with that solution. First, the PDB modification time seems to change multiple times during compilation, causing multiple attempts to reload it. Those attempts fail at copying the DLL and / or PDB files to their temporary location.

The second problem is that it relies on the "Native Edit and Continue" option being set. It's easy to forget for new developers coming aboard and I don't see a good way of detecting this to notify the user that this is why hot-reloading is not working.

Edited by Benjamin Kloster on
Thanks.
I'll stay with the "sleep" solution for now. I hope that Casey will find some time to explain his solution soon.
I'll explain it here in the meantime :)

Just check for the existence of a lock file. In the DLL reload code, check for the existence of "pdb.lock" and if it is there, DON'T load the DLL. Then in the build.bat, before running the DLL compile, do something like "echo FOO > pdb.lock" and then in between the DLL compile and the EXE compile, do "del pdb.lock".

- Casey
It seems to work. Thanks

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
char pdbLockFullPath[ WIN32_STATE_FILE_NAME_COUNT ];
win32BuildExePathFileName( &state, "pdb.lock", pdbLockFullPath );
...

FILETIME dllWriteTime = win32GetLastWriteTime( sourceGameCodeDllFullPath );

if ( CompareFileTime( &dllWriteTime, &game.dllLastWriteTime ) != 0 ) {

	DWORD attributes = GetFileAttributes( pdbLockFullPath );
						
	if ( attributes == INVALID_FILE_ATTRIBUTES ) {
		win32UnloadGameCode( &game );
		game = win32LoadGameCode( sourceGameCodeDllFullPath, tempGameCodeDllFullPath );
	}
}


1
2
3
4
5
6
...
echo lock > pdb.lock
cl %commonCompilerFlags% ..\code\handmade.cpp -LD -link -INCREMENTAL:NO -PDB:handmade_%random%.pdb -EXPORT:gameGetSoundSamples -EXPORT:gameUpdateAndRender
del pdb.lock
cl %commonCompilerFlags% ..\code\win32_handmade.cpp /link %commonLinkerFlags%
...
This method seems to work when compiling from Visual Studio as well.

What I did was:
1
2
3
4
5
6
7
Visual Studio DLL Project Properties -> Build Events ->

Pre-Link Event
Command Line: echo lock > $(OutputPath)pdb.lock

Post-Build Event
Command Line: del $(OutputPath)pdb.lock


I'm not sure about Pre-Link vs Pre-Build Events but I would assume Pre-Link is enough because the PDB & DLL creation shouldn't happen before linking.
Thanks for the tip Casey, and for the example, MrMixer. The game would hang for a couple seconds when I rebuilt, and this fixed it.