Handmade Hero»Episode Guide
Optimizing the Specular to Diffuse Transform
?
?

Keyboard Navigation

Global Keys

[, < / ], > Jump to previous / next episode
W, K, P / S, J, N Jump to previous / next marker
t / T Toggle theatre / SUPERtheatre mode
V Revert filter to original state Y Select link (requires manual Ctrl-c)

Menu toggling

q Quotes r References f Filter y Link c Credits

In-Menu Movement

a
w
s
d
h j k l


Quotes and References Menus

Enter Jump to timecode

Quotes, References and Credits Menus

o Open URL (in new tab)

Filter Menu

x, Space Toggle category and focus next
X, ShiftSpace Toggle category and focus previous
v Invert topics / media as per focus

Filter and Link Menus

z Toggle filter / linking mode

Credits Menu

Enter Open URL (in new tab)
0:03Demo the current state and performance of our lighting
🏃
0:03Demo the current state and performance of our lighting
🏃
0:03Demo the current state and performance of our lighting
🏃
1:37Reacquaint ourselves with the lighting's blend-over-time parameter in EndLightingComputation()
📖
1:37Reacquaint ourselves with the lighting's blend-over-time parameter in EndLightingComputation()
📖
1:37Reacquaint ourselves with the lighting's blend-over-time parameter in EndLightingComputation()
📖
3:02Demo the fast-response lighting blend
🏃
3:02Demo the fast-response lighting blend
🏃
3:02Demo the fast-response lighting blend
🏃
3:11Decrease tUpdateBlend from 10/60 to 1/60
3:11Decrease tUpdateBlend from 10/60 to 1/60
3:11Decrease tUpdateBlend from 10/60 to 1/60
3:14Check out the slower-response, but noiseless lighting blend
🏃
3:14Check out the slower-response, but noiseless lighting blend
🏃
3:14Check out the slower-response, but noiseless lighting blend
🏃
3:36Increase tUpdateBlend from 1/60 to 5/60
3:36Increase tUpdateBlend from 1/60 to 5/60
3:36Increase tUpdateBlend from 1/60 to 5/60
3:50Check out the usable-response, but flickery lighting blend
🏃
3:50Check out the usable-response, but flickery lighting blend
🏃
3:50Check out the usable-response, but flickery lighting blend
🏃
4:21Decrease tUpdateBlend from 5/60 to 2/60
4:21Decrease tUpdateBlend from 5/60 to 2/60
4:21Decrease tUpdateBlend from 5/60 to 2/60
4:22Check out the slower-response, but less flickery lighting blend
🏃
4:22Check out the slower-response, but less flickery lighting blend
🏃
4:22Check out the slower-response, but less flickery lighting blend
🏃
4:39Increase tUpdateBlend from 2/60 to 8/60
4:39Increase tUpdateBlend from 2/60 to 8/60
4:39Increase tUpdateBlend from 2/60 to 8/60
4:45Check out the faster-response, but noisy lighting blend
🏃
4:45Check out the faster-response, but noisy lighting blend
🏃
4:45Check out the faster-response, but noisy lighting blend
🏃
5:22Notice light buildup in the dungeon
🏃
5:22Notice light buildup in the dungeon
🏃
5:22Notice light buildup in the dungeon
🏃
5:56Check that light buildup in the dungeon, possibly due to the voxel switch
🏃
5:56Check that light buildup in the dungeon, possibly due to the voxel switch
🏃
5:56Check that light buildup in the dungeon, possibly due to the voxel switch
🏃
6:58Determine to gauge the performance of our specular–diffuse transform
🗩
6:58Determine to gauge the performance of our specular–diffuse transform
🗩
6:58Determine to gauge the performance of our specular–diffuse transform
🗩
7:39Consider shrinking the lighting lookup voxel in Z
🏃
7:39Consider shrinking the lighting lookup voxel in Z
🏃
7:39Consider shrinking the lighting lookup voxel in Z
🏃
9:13Comment out LIGHT_LOOKUP_VOXEL_DIM, and respecify ComputeLightPropagationWork() and EndLightingComputation() to operate in X-slices
9:13Comment out LIGHT_LOOKUP_VOXEL_DIM, and respecify ComputeLightPropagationWork() and EndLightingComputation() to operate in X-slices
9:13Comment out LIGHT_LOOKUP_VOXEL_DIM, and respecify ComputeLightPropagationWork() and EndLightingComputation() to operate in X-slices
14:52Define MAX_LIGHT_LOOKUP_VOXEL_DIM for InitLighting() to use
14:52Define MAX_LIGHT_LOOKUP_VOXEL_DIM for InitLighting() to use
14:52Define MAX_LIGHT_LOOKUP_VOXEL_DIM for InitLighting() to use
17:16Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in ComputeLightPropagationWork()
17:16Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in ComputeLightPropagationWork()
17:16Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in ComputeLightPropagationWork()
20:35Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in CompileZBiasProgram()
20:35Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in CompileZBiasProgram()
20:35Replace mentions of LIGHT_LOOKUP_VOXEL_DIM in CompileZBiasProgram()
27:34Reintroduce LIGHT_LOOKUP_VOXEL_DIM for Win32InitOpenGL() to use
27:34Reintroduce LIGHT_LOOKUP_VOXEL_DIM for Win32InitOpenGL() to use
27:34Reintroduce LIGHT_LOOKUP_VOXEL_DIM for Win32InitOpenGL() to use
27:56Get the same thing we saw before
🏃
27:56Get the same thing we saw before
🏃
27:56Get the same thing we saw before
🏃
28:10Split out LIGHT_LOOKUP_VOXEL_DIM to all three dimensions for Win32InitOpenGL() to use
28:10Split out LIGHT_LOOKUP_VOXEL_DIM to all three dimensions for Win32InitOpenGL() to use
28:10Split out LIGHT_LOOKUP_VOXEL_DIM to all three dimensions for Win32InitOpenGL() to use
28:28Check out our cubic lighting lookup voxel
🏃
28:28Check out our cubic lighting lookup voxel
🏃
28:28Check out our cubic lighting lookup voxel
🏃
28:41Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16
28:41Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16
28:41Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16
28:49Check out our squatter, faster lighting lookup voxel
🏃
28:49Check out our squatter, faster lighting lookup voxel
🏃
28:49Check out our squatter, faster lighting lookup voxel
🏃
29:34Increase the LIGHT_LOOKUP_VOXEL_DIM_Z from 16 to 32
29:34Increase the LIGHT_LOOKUP_VOXEL_DIM_Z from 16 to 32
29:34Increase the LIGHT_LOOKUP_VOXEL_DIM_Z from 16 to 32
29:43125ms per frame, with a 32×32×32 voxel
🏃
29:43125ms per frame, with a 32×32×32 voxel
🏃
29:43125ms per frame, with a 32×32×32 voxel
🏃
29:55Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16
29:55Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16
29:55Decrease the LIGHT_LOOKUP_VOXEL_DIM_Z from 32 to 16
30:0475ms per frame, with a 32×32×16 voxel
🏃
30:0475ms per frame, with a 32×32×16 voxel
🏃
30:0475ms per frame, with a 32×32×16 voxel
🏃
31:0224% frame time spent in ComputeLightPropagationWork()
🏃
31:0224% frame time spent in ComputeLightPropagationWork()
🏃
31:0224% frame time spent in ComputeLightPropagationWork()
🏃
31:17Disable the specular–diffuse transform in ComputeLightPropagationWork()
31:17Disable the specular–diffuse transform in ComputeLightPropagationWork()
31:17Disable the specular–diffuse transform in ComputeLightPropagationWork()
31:4565ms per frame with 3% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform
🏃
31:4565ms per frame with 3% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform
🏃
31:4565ms per frame with 3% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform
🏃
32:18Prepare to optimise the specular–diffuse transform
📖
32:18Prepare to optimise the specular–diffuse transform
📖
32:18Prepare to optimise the specular–diffuse transform
📖
33:36Re-enable the specular–diffuse transform in ComputeLightPropagationWork()
33:36Re-enable the specular–diffuse transform in ComputeLightPropagationWork()
33:36Re-enable the specular–diffuse transform in ComputeLightPropagationWork()
33:5225% frame time spent in ComputeLightPropagationWork()
🏃
33:5225% frame time spent in ComputeLightPropagationWork()
🏃
33:5225% frame time spent in ComputeLightPropagationWork()
🏃
33:59Disable the specular–diffuse transform in ComputeLightPropagationWork()
33:59Disable the specular–diffuse transform in ComputeLightPropagationWork()
33:59Disable the specular–diffuse transform in ComputeLightPropagationWork()
34:03Hit assertion in DEBUGGetArenaByLookupBlock()
🏃
34:03Hit assertion in DEBUGGetArenaByLookupBlock()
🏃
34:03Hit assertion in DEBUGGetArenaByLookupBlock()
🏃
34:333% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform
🏃
34:333% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform
🏃
34:333% frame time spent in ComputeLightPropagationWork(), without the specular–diffuse transform
🏃
35:14Enable ComputeLightPropagationWork() to count up the zero weights
35:14Enable ComputeLightPropagationWork() to count up the zero weights
35:14Enable ComputeLightPropagationWork() to count up the zero weights
38:01Step in to ComputeLightPropagationWork() to find a ZeroWCount of 196
🏃
38:01Step in to ComputeLightPropagationWork() to find a ZeroWCount of 196
🏃
38:01Step in to ComputeLightPropagationWork() to find a ZeroWCount of 196
🏃
39:22Consider our potential for optimising ComputeLightPropagationWork()
📖
39:22Consider our potential for optimising ComputeLightPropagationWork()
📖
39:22Consider our potential for optimising ComputeLightPropagationWork()
📖
40:09Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
40:09Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
40:09Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
41:54Define LIGHT_ATLAS_ASSERT()
41:54Define LIGHT_ATLAS_ASSERT()
41:54Define LIGHT_ATLAS_ASSERT()
43:19Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
43:19Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
43:19Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
43:33Disable multithreading of the lighting, wondering if RemedyBG supports step-single-thread
43:33Disable multithreading of the lighting, wondering if RemedyBG supports step-single-thread
43:33Disable multithreading of the lighting, wondering if RemedyBG supports step-single-thread
44:43Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
44:43Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
44:43Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
49:35Optimise ComputeLightPropagationWork() to load and shuffle a row at once, introducing LoadF32_4X() and Broadcast4x()
49:35Optimise ComputeLightPropagationWork() to load and shuffle a row at once, introducing LoadF32_4X() and Broadcast4x()
49:35Optimise ComputeLightPropagationWork() to load and shuffle a row at once, introducing LoadF32_4X() and Broadcast4x()
1:15:35Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
1:15:35Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
1:15:35Inspect the assembly of the specular–diffuse transform in ComputeLightPropagationWork()
🏃
1:16:28Re-enable multithreading of the lighting
1:16:28Re-enable multithreading of the lighting
1:16:28Re-enable multithreading of the lighting
1:16:4611% frame time spent in ComputeLightPropagationWork(), but with chromatic aberration
🏃
1:16:4611% frame time spent in ComputeLightPropagationWork(), but with chromatic aberration
🏃
1:16:4611% frame time spent in ComputeLightPropagationWork(), but with chromatic aberration
🏃
1:17:32Double-check the specular–diffuse transform
📖
1:17:32Double-check the specular–diffuse transform
📖
1:17:32Double-check the specular–diffuse transform
📖
1:20:21Fix ComputeLightPropagationWork() to load the specular texels in strides of 4, rather than 12
1:20:21Fix ComputeLightPropagationWork() to load the specular texels in strides of 4, rather than 12
1:20:21Fix ComputeLightPropagationWork() to load the specular texels in strides of 4, rather than 12
1:20:52Admire our correct and faster lighting
🏃
1:20:52Admire our correct and faster lighting
🏃
1:20:52Admire our correct and faster lighting
🏃
1:21:51Consider our potential for optimising the specular–diffuse transform: Separable blur1,2,3
📖
1:21:51Consider our potential for optimising the specular–diffuse transform: Separable blur1,2,3
📖
1:21:51Consider our potential for optimising the specular–diffuse transform: Separable blur1,2,3
📖
1:31:00Check out our lighting
🏃
1:31:00Check out our lighting
🏃
1:31:00Check out our lighting
🏃
1:31:10Decrease the light transmission rate from 0.975 to 0.75 in BuildDiffuseLightMaps()
1:31:10Decrease the light transmission rate from 0.975 to 0.75 in BuildDiffuseLightMaps()
1:31:10Decrease the light transmission rate from 0.975 to 0.75 in BuildDiffuseLightMaps()
1:31:23More readily see our darker light map viewer
🏃
1:31:23More readily see our darker light map viewer
🏃
1:31:23More readily see our darker light map viewer
🏃
1:32:28Set up ComputeLightPropagationWork() to perform the specular–diffuse transform as a separable filter
1:32:28Set up ComputeLightPropagationWork() to perform the specular–diffuse transform as a separable filter
1:32:28Set up ComputeLightPropagationWork() to perform the specular–diffuse transform as a separable filter
1:48:04Check out our lighting
🏃
1:48:04Check out our lighting
🏃
1:48:04Check out our lighting
🏃
1:48:20Q&A
🗩
1:48:20Q&A
🗩
1:48:20Q&A
🗩
1:49:15lucid_frost Q: Are there any caching concerns? I'm not familiar with how much data is being pushed around here
🗪
1:49:15lucid_frost Q: Are there any caching concerns? I'm not familiar with how much data is being pushed around here
🗪
1:49:15lucid_frost Q: Are there any caching concerns? I'm not familiar with how much data is being pushed around here
🗪
1:52:16philliptrudeau Q: This scene has a little bit of variance in the lighting between frames. Is there a way to set up this solution so that the scene looks more "static", without taking a significant performance hit?
🗪
1:52:16philliptrudeau Q: This scene has a little bit of variance in the lighting between frames. Is there a way to set up this solution so that the scene looks more "static", without taking a significant performance hit?
🗪
1:52:16philliptrudeau Q: This scene has a little bit of variance in the lighting between frames. Is there a way to set up this solution so that the scene looks more "static", without taking a significant performance hit?
🗪
1:53:35somebody_took_my_name Q: The light seems to be repeating outside of the light box (before the rewrite). Is it still there and, if so, is it a modulus issue?
🗪
1:53:35somebody_took_my_name Q: The light seems to be repeating outside of the light box (before the rewrite). Is it still there and, if so, is it a modulus issue?
🗪
1:53:35somebody_took_my_name Q: The light seems to be repeating outside of the light box (before the rewrite). Is it still there and, if so, is it a modulus issue?
🗪
1:54:53mattiamanzati Q: You mentioned something about shaders API being better at this kind of job. I lost your point on that because of me being unfamiliar with the environment. Can you please explain that a little bit more?
🗪
1:54:53mattiamanzati Q: You mentioned something about shaders API being better at this kind of job. I lost your point on that because of me being unfamiliar with the environment. Can you please explain that a little bit more?
🗪
1:54:53mattiamanzati Q: You mentioned something about shaders API being better at this kind of job. I lost your point on that because of me being unfamiliar with the environment. Can you please explain that a little bit more?
🗪
1:59:38czapa10 Q: You often say that there should be some high level language feature which allows you to write SIMD code easier. Can you tell how this feature would exactly look like? Do you mean something like Jon Blow has in Jai (fast SOA, AOS switching)? Can't you do this feature yourself using metaprogramming?
🗪
1:59:38czapa10 Q: You often say that there should be some high level language feature which allows you to write SIMD code easier. Can you tell how this feature would exactly look like? Do you mean something like Jon Blow has in Jai (fast SOA, AOS switching)? Can't you do this feature yourself using metaprogramming?
🗪
1:59:38czapa10 Q: You often say that there should be some high level language feature which allows you to write SIMD code easier. Can you tell how this feature would exactly look like? Do you mean something like Jon Blow has in Jai (fast SOA, AOS switching)? Can't you do this feature yourself using metaprogramming?
🗪
2:01:05vtlmks Intel Intrinsics Guide4 is broken, it seems
🗪
2:01:05vtlmks Intel Intrinsics Guide4 is broken, it seems
🗪
2:01:05vtlmks Intel Intrinsics Guide4 is broken, it seems
🗪
2:01:45i_am_seabass He's got it cached
🗪
2:01:45i_am_seabass He's got it cached
🗪
2:01:45i_am_seabass He's got it cached
🗪
2:01:56czapa10 You can't specify specific architecture
🗪
2:01:56czapa10 You can't specify specific architecture
🗪
2:01:56czapa10 You can't specify specific architecture
🗪
2:02:25Plug uops5
📖
2:02:25Plug uops5
📖
2:02:25Plug uops5
📖
2:03:47Admire the lighting
🏃
2:03:47Admire the lighting
🏃
2:03:47Admire the lighting
🏃
2:04:22Close it on up
🗩
2:04:22Close it on up
🗩
2:04:22Close it on up
🗩