Handmade Hero»Episode Guide
Optimizing Ray vs. AABB Intersections
?
?

Keyboard Navigation

Global Keys

[, < / ], > Jump to previous / next episode
W, K, P / S, J, N Jump to previous / next marker
t / T Toggle theatre / SUPERtheatre mode
V Revert filter to original state Y Select link (requires manual Ctrl-c)

Menu toggling

q Quotes r References f Filter y Link c Credits

In-Menu Movement

a
w
s
d
h j k l


Quotes and References Menus

Enter Jump to timecode

Quotes, References and Credits Menus

o Open URL (in new tab)

Filter Menu

x, Space Toggle category and focus next
X, ShiftSpace Toggle category and focus previous
v Invert topics / media as per focus

Filter and Link Menus

z Toggle filter / linking mode

Credits Menu

Enter Open URL (in new tab)
0:02Recap and set the stage for the day refining the lighting
🗩
0:02Recap and set the stage for the day refining the lighting
🗩
0:02Recap and set the stage for the day refining the lighting
🗩
0:31Take Visual Studio's quick feedback survey
🗹
💢
0:31Take Visual Studio's quick feedback survey
🗹
💢
0:31Take Visual Studio's quick feedback survey
🗹
💢
8:15Run the game to see our current lighting situation
🏃
8:15Run the game to see our current lighting situation
🏃
8:15Run the game to see our current lighting situation
🏃
11:59Consider the performance of ComputeLightPropagation()
🏃
11:59Consider the performance of ComputeLightPropagation()
🏃
11:59Consider the performance of ComputeLightPropagation()
🏃
14:18Track TotalPartitionLeavesUsed in ComputeLightPropagation()
14:18Track TotalPartitionLeavesUsed in ComputeLightPropagation()
14:18Track TotalPartitionLeavesUsed in ComputeLightPropagation()
17:57Consult the TotalPartitionLeavesUsed performance figures
17:57Consult the TotalPartitionLeavesUsed performance figures
17:57Consult the TotalPartitionLeavesUsed performance figures
18:22Determine to change SplitBox() from doing k-d to quad tree partitioning
18:22Determine to change SplitBox() from doing k-d to quad tree partitioning
18:22Determine to change SplitBox() from doing k-d to quad tree partitioning
20:15k-d tree vs Quad tree
🖌
20:15k-d tree vs Quad tree
🖌
20:15k-d tree vs Quad tree
🖌
21:00Allow SplitBox() to have 8 leaves per child, increased from 4
21:00Allow SplitBox() to have 8 leaves per child, increased from 4
21:00Allow SplitBox() to have 8 leaves per child, increased from 4
21:32Run the game to see that that doesn't appreciably change our runtime
🏃
21:32Run the game to see that that doesn't appreciably change our runtime
🏃
21:32Run the game to see that that doesn't appreciably change our runtime
🏃
21:44Track PartitionsPerLeaf in ComputeLightPropagation()
21:44Track PartitionsPerLeaf in ComputeLightPropagation()
21:44Track PartitionsPerLeaf in ComputeLightPropagation()
22:37Run the game and compare the PartitionsPerLeaf with 8 and 4 leaves per child
🏃
22:37Run the game and compare the PartitionsPerLeaf with 8 and 4 leaves per child
🏃
22:37Run the game and compare the PartitionsPerLeaf with 8 and 4 leaves per child
🏃
23:09Toggle off ComputeLightPropagation()
23:09Toggle off ComputeLightPropagation()
23:09Toggle off ComputeLightPropagation()
23:41Run the game to determine that our shader is the bottleneck
🏃
23:41Run the game to determine that our shader is the bottleneck
🏃
23:41Run the game to determine that our shader is the bottleneck
🏃
26:00Toggle off wglSwapIntervalExt in Win32InitOpenGL()
26:00Toggle off wglSwapIntervalExt in Win32InitOpenGL()
26:00Toggle off wglSwapIntervalExt in Win32InitOpenGL()
26:25Run the game and watch the threads view with the determination to investigate the time delay
🏃
26:25Run the game and watch the threads view with the determination to investigate the time delay
🏃
26:25Run the game and watch the threads view with the determination to investigate the time delay
🏃
28:29There's a knock at the door
🗹
28:29There's a knock at the door
🗹
28:29There's a knock at the door
🗹
29:08Return and consider the threads view to be misleading
🏃
29:08Return and consider the threads view to be misleading
🏃
29:08Return and consider the threads view to be misleading
🏃
31:33Introduce HUD_TIMED_FUNCTION() called in LightingTest() to enable more direct textual profiling of the lighting rendering
31:33Introduce HUD_TIMED_FUNCTION() called in LightingTest() to enable more direct textual profiling of the lighting rendering
31:33Introduce HUD_TIMED_FUNCTION() called in LightingTest() to enable more direct textual profiling of the lighting rendering
36:53Run the game to see that it does nothing different
🏃
36:53Run the game to see that it does nothing different
🏃
36:53Run the game to see that it does nothing different
🏃
37:21Enable DEBUGEnd() and DEBUGInit() to handle HUD_TIMED_FUNCTION(), renaming AddTooltip() to AddLine() and DrawTooltips() to DrawLineBuffer()
37:21Enable DEBUGEnd() and DEBUGInit() to handle HUD_TIMED_FUNCTION(), renaming AddTooltip() to AddLine() and DrawTooltips() to DrawLineBuffer()
37:21Enable DEBUGEnd() and DEBUGInit() to handle HUD_TIMED_FUNCTION(), renaming AddTooltip() to AddLine() and DrawTooltips() to DrawLineBuffer()
57:53Crash in DEBUGDrawElement() and inspect what's going on
🏃
57:53Crash in DEBUGDrawElement() and inspect what's going on
🏃
57:53Crash in DEBUGDrawElement() and inspect what's going on
🏃
1:02:39Enable DrawTreeLink() to handle the case when a tree has no element and no children, and change HasChildren() to CanHaveChildren()
1:02:39Enable DrawTreeLink() to handle the case when a tree has no element and no children, and change HasChildren() to CanHaveChildren()
1:02:39Enable DrawTreeLink() to handle the case when a tree has no element and no children, and change HasChildren() to CanHaveChildren()
1:04:52Run the game to see that we do not crash, and see our new HUD element
🏃
1:04:52Run the game to see that we do not crash, and see our new HUD element
🏃
1:04:52Run the game to see that we do not crash, and see our new HUD element
🏃
1:06:24Make DEBUGEnd() display only the function name of our HUD_TIMED_FUNCTION(), expandable to contain textual profiling information for its child functions
1:06:24Make DEBUGEnd() display only the function name of our HUD_TIMED_FUNCTION(), expandable to contain textual profiling information for its child functions
1:06:24Make DEBUGEnd() display only the function name of our HUD_TIMED_FUNCTION(), expandable to contain textual profiling information for its child functions
1:17:27Step in to DEBUGEnd() and inspect the HUD element
🏃
1:17:27Step in to DEBUGEnd() and inspect the HUD element
🏃
1:17:27Step in to DEBUGEnd() and inspect the HUD element
🏃
1:22:41Enable DEBUGEnd() to gather profiling information for our HUD element
1:22:41Enable DEBUGEnd() to gather profiling information for our HUD element
1:22:41Enable DEBUGEnd() to gather profiling information for our HUD element
1:31:08Run the game and consult our LightingTest() HUD to see that our average value is busted
🏃
1:31:08Run the game and consult our LightingTest() HUD to see that our average value is busted
🏃
1:31:08Run the game and consult our LightingTest() HUD to see that our average value is busted
🏃
1:32:24Fix DEBUGEnd() to correctly compute the average cycle count
1:32:24Fix DEBUGEnd() to correctly compute the average cycle count
1:32:24Fix DEBUGEnd() to correctly compute the average cycle count
1:33:55Run the game and consult the performance of LightingTest() in the new HUD
🏃
1:33:55Run the game and consult the performance of LightingTest() in the new HUD
🏃
1:33:55Run the game and consult the performance of LightingTest() in the new HUD
🏃
1:35:37Tweak the leaves per child value in SplitBox(), comparing their performance in the HUD, and consider that our spatial partitioning is not effective enough
🏃
1:35:37Tweak the leaves per child value in SplitBox(), comparing their performance in the HUD, and consider that our spatial partitioning is not effective enough
🏃
1:35:37Tweak the leaves per child value in SplitBox(), comparing their performance in the HUD, and consider that our spatial partitioning is not effective enough
🏃
1:39:35Consider making the RayCast() more performant with AABB testing
📖
1:39:35Consider making the RayCast() more performant with AABB testing
📖
1:39:35Consider making the RayCast() more performant with AABB testing
📖
1:41:49Read 'Fast, Branchless Ray / Bounding Box Intersections'1 and 'Fast Ray / Axis-Aligned Bounding Box Overlap Tests using Ray Slopes'2
📖
1:41:49Read 'Fast, Branchless Ray / Bounding Box Intersections'1 and 'Fast Ray / Axis-Aligned Bounding Box Overlap Tests using Ray Slopes'2
📖
1:41:49Read 'Fast, Branchless Ray / Bounding Box Intersections'1 and 'Fast Ray / Axis-Aligned Bounding Box Overlap Tests using Ray Slopes'2
📖
1:45:36Ray AABB testing
🖌
1:45:36Ray AABB testing
🖌
1:45:36Ray AABB testing
🖌
1:47:18Continue to read these papers on ray AABB testing3,4
📖
1:47:18Continue to read these papers on ray AABB testing3,4
📖
1:47:18Continue to read these papers on ray AABB testing3,4
📖
1:56:23Change RayCast() to perform AABB testing5
1:56:23Change RayCast() to perform AABB testing5
1:56:23Change RayCast() to perform AABB testing5
1:59:15Computing X, Y and Z intersections in T
🖌
1:59:15Computing X, Y and Z intersections in T
🖌
1:59:15Computing X, Y and Z intersections in T
🖌
2:06:21Doing this without caring which direction the normal is facing
🖌
2:06:21Doing this without caring which direction the normal is facing
🖌
2:06:21Doing this without caring which direction the normal is facing
🖌
2:09:31Continue to enable RayCast() to perform fast AABB testing
2:09:31Continue to enable RayCast() to perform fast AABB testing
2:09:31Continue to enable RayCast() to perform fast AABB testing
2:13:05Determining the tMin and tMax intersection points
🖌
2:13:05Determining the tMin and tMax intersection points
🖌
2:13:05Determining the tMin and tMax intersection points
🖌
2:14:12Enable RayCast() to compute those tMin and tMax
2:14:12Enable RayCast() to compute those tMin and tMax
2:14:12Enable RayCast() to compute those tMin and tMax
2:15:50The Maximum-Minimum and Minimum-Maximum
🖌
2:15:50The Maximum-Minimum and Minimum-Maximum
🖌
2:15:50The Maximum-Minimum and Minimum-Maximum
🖌
2:16:29Set tMin and tMax and continue enabling RayCast() to perform fast AABB testing
2:16:29Set tMin and tMax and continue enabling RayCast() to perform fast AABB testing
2:16:29Set tMin and tMax and continue enabling RayCast() to perform fast AABB testing
2:27:31Consider how to determine which side of the box we hit
📖
2:27:31Consider how to determine which side of the box we hit
📖
2:27:31Consider how to determine which side of the box we hit
📖
2:30:24Enable RayCast() to select the correct BoxSurfaceIndex
📖
2:30:24Enable RayCast() to select the correct BoxSurfaceIndex
📖
2:30:24Enable RayCast() to select the correct BoxSurfaceIndex
📖
2:35:27Introduce f32_4x and v3_4x versions of Min() and Max(), and the v3_4x / operator
2:35:27Introduce f32_4x and v3_4x versions of Min() and Max(), and the v3_4x / operator
2:35:27Introduce f32_4x and v3_4x versions of Min() and Max(), and the v3_4x / operator
2:38:03Run the game to see that it looks pretty similar, and is faster
🏃
2:38:03Run the game to see that it looks pretty similar, and is faster
🏃
2:38:03Run the game to see that it looks pretty similar, and is faster
🏃
2:39:51Introduce v3_4x * operator and make RayCast() precompute the RayD
2:39:51Introduce v3_4x * operator and make RayCast() precompute the RayD
2:39:51Introduce v3_4x * operator and make RayCast() precompute the RayD
2:40:53Compare the performance of this with the previous way
🏃
2:40:53Compare the performance of this with the previous way
🏃
2:40:53Compare the performance of this with the previous way
🏃
2:41:29Consider breaking down our profiling of LightingTest()
📖
2:41:29Consider breaking down our profiling of LightingTest()
📖
2:41:29Consider breaking down our profiling of LightingTest()
📖
2:43:25Consult the threads view to see that the performance is more reasonable
2:43:25Consult the threads view to see that the performance is more reasonable
2:43:25Consult the threads view to see that the performance is more reasonable
2:43:52Try to stress the system
🏃
🖮
2:43:52Try to stress the system
🏃
🖮
2:43:52Try to stress the system
🏃
🖮
2:45:17Q&A
🗩
2:45:17Q&A
🗩
2:45:17Q&A
🗩
2:46:010b0000000000000 SurfaceIndexLookupTable[movemask(tmin == tBoxMin)], and just choose the first set bit in the return from the movemask for which index
🗪
2:46:010b0000000000000 SurfaceIndexLookupTable[movemask(tmin == tBoxMin)], and just choose the first set bit in the return from the movemask for which index
🗪
2:46:010b0000000000000 SurfaceIndexLookupTable[movemask(tmin == tBoxMin)], and just choose the first set bit in the return from the movemask for which index
🗪
2:47:15vaualbus Q: Would we get an improvement if we switch all the v3 to a v3_4x so we have not to load the value into the __m128 each frame?
🗪
2:47:15vaualbus Q: Would we get an improvement if we switch all the v3 to a v3_4x so we have not to load the value into the __m128 each frame?
🗪
2:47:15vaualbus Q: Would we get an improvement if we switch all the v3 to a v3_4x so we have not to load the value into the __m128 each frame?
🗪
2:47:400b0000000000000 It would only be 256 entries
🗪
2:47:400b0000000000000 It would only be 256 entries
🗪
2:47:400b0000000000000 It would only be 256 entries
🗪
2:48:05alexkelbo Q: Could you recap how we retain the state of the debug UI between frames?
🗪
2:48:05alexkelbo Q: Could you recap how we retain the state of the debug UI between frames?
🗪
2:48:05alexkelbo Q: Could you recap how we retain the state of the debug UI between frames?
🗪
2:48:31Consider the movemask instruction
🗩
2:48:31Consider the movemask instruction
🗩
2:48:31Consider the movemask instruction
🗩
2:49:460b0000000000000 You can collapse the 4 bytes into 2 bytes
🗪
2:49:460b0000000000000 You can collapse the 4 bytes into 2 bytes
🗪
2:49:460b0000000000000 You can collapse the 4 bytes into 2 bytes
🗪
2:52:19roam00010011 Q: tMin3 will always be the closest in distance to RayOrigin, since RayD that hits BoxMax will result in tBoxMax < tBoxMin, no?
🗪
2:52:19roam00010011 Q: tMin3 will always be the closest in distance to RayOrigin, since RayD that hits BoxMax will result in tBoxMax < tBoxMin, no?
🗪
2:52:19roam00010011 Q: tMin3 will always be the closest in distance to RayOrigin, since RayD that hits BoxMax will result in tBoxMax < tBoxMin, no?
🗪
2:53:00Determining the closest hit when rays can be cast in all directions
🖌
2:53:00Determining the closest hit when rays can be cast in all directions
🖌
2:53:00Determining the closest hit when rays can be cast in all directions
🖌
2:53:33Sketch out RayCast() computing the RayPosition from the ray direction
2:53:33Sketch out RayCast() computing the RayPosition from the ray direction
2:53:33Sketch out RayCast() computing the RayPosition from the ray direction
2:56:08Glimpse into the future optimising the spatial hierarchy stuff
🗩
2:56:08Glimpse into the future optimising the spatial hierarchy stuff
🗩
2:56:08Glimpse into the future optimising the spatial hierarchy stuff
🗩
2:57:00Consult the profiler in terms of multithreading
🏃
2:57:00Consult the profiler in terms of multithreading
🏃
2:57:00Consult the profiler in terms of multithreading
🏃
2:57:41Try quadrupling PointsPerWork in ComputeLightPropagation()
2:57:41Try quadrupling PointsPerWork in ComputeLightPropagation()
2:57:41Try quadrupling PointsPerWork in ComputeLightPropagation()
2:57:54Consult the threads view to see that there is more empty space now
🏃
2:57:54Consult the threads view to see that there is more empty space now
🏃
2:57:54Consult the threads view to see that there is more empty space now
🏃
2:59:40Optimising lane usage when threading
🖌
2:59:40Optimising lane usage when threading
🖌
2:59:40Optimising lane usage when threading
🖌
3:01:54Note that we are not measuring dead computation time, and just enjoy our lighting
🏃
3:01:54Note that we are not measuring dead computation time, and just enjoy our lighting
🏃
3:01:54Note that we are not measuring dead computation time, and just enjoy our lighting
🏃
3:03:05Close down, with one last look at 'Fast, Branchless Ray / Bounding Box Intersections'6
📖
3:03:05Close down, with one last look at 'Fast, Branchless Ray / Bounding Box Intersections'6
📖
3:03:05Close down, with one last look at 'Fast, Branchless Ray / Bounding Box Intersections'6
📖
3:04:08Glimpse into the future, either continuing with optimisation or investigating the lighting flicker
🗩
3:04:08Glimpse into the future, either continuing with optimisation or investigating the lighting flicker
🗩
3:04:08Glimpse into the future, either continuing with optimisation or investigating the lighting flicker
🗩