SIMD Raycasting
?
?

Keyboard Navigation

Global Keys

[, < / ], > Jump to previous / next episode
W, K, P / S, J, N Jump to previous / next marker
t / T Toggle theatre / SUPERtheatre mode
V Revert filter to original state Y Select link (requires manual Ctrl-c)

Menu toggling

q Quotes r References f Filter y Link c Credits

In-Menu Movement

a
w
s
d
h j k l


Quotes and References Menus

Enter Jump to timecode

Quotes, References and Credits Menus

o Open URL (in new tab)

Filter Menu

x, Space Toggle category and focus next
X, ShiftSpace Toggle category and focus previous
v Invert topics / media as per focus

Filter and Link Menus

z Toggle filter / linking mode

Credits Menu

Enter Open URL (in new tab)
0:02Recap and set the stage for the day with a few words on the pace of the project
🗩
0:02Recap and set the stage for the day with a few words on the pace of the project
🗩
0:02Recap and set the stage for the day with a few words on the pace of the project
🗩
4:09Run the game to see our current software-rendered lighting, with the determination to see how much performance we can get out of the CPU
🏃
4:09Run the game to see our current software-rendered lighting, with the determination to see how much performance we can get out of the CPU
🏃
4:09Run the game to see our current software-rendered lighting, with the determination to see how much performance we can get out of the CPU
🏃
6:47Prevent RayCast() from summing the TotalCastsInitiated
6:47Prevent RayCast() from summing the TotalCastsInitiated
6:47Prevent RayCast() from summing the TotalCastsInitiated
7:03Run the game to see that that does not appreciably affect our performance
🏃
7:03Run the game to see that that does not appreciably affect our performance
🏃
7:03Run the game to see that that does not appreciably affect our performance
🏃
8:05Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around
8:05Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around
8:05Augment lighting_work with some of the lighting_solution, to avoid bouncing cache lines around
16:08Run the game and inspect the profiler
🏃
16:08Run the game and inspect the profiler
🏃
16:08Run the game and inspect the profiler
🏃
16:52Make lighting_work and lighting_solution cache-aligned
16:52Make lighting_work and lighting_solution cache-aligned
16:52Make lighting_work and lighting_solution cache-aligned
19:53Cache-alignment and false sharing considerations when threading
🖌
19:53Cache-alignment and false sharing considerations when threading
🖌
19:53Cache-alignment and false sharing considerations when threading
🖌
24:15Introduce InitLighting() to align our lighting data
24:15Introduce InitLighting() to align our lighting data
24:15Introduce InitLighting() to align our lighting data
31:21Run the game and determine to double check that everything is aligned and figure out why the tests are slow
🏃
31:21Run the game and determine to double check that everything is aligned and figure out why the tests are slow
🏃
31:21Run the game and determine to double check that everything is aligned and figure out why the tests are slow
🏃
32:22Assert in ComputeLightPropagation() that the lighting_work is aligned
32:22Assert in ComputeLightPropagation() that the lighting_work is aligned
32:22Assert in ComputeLightPropagation() that the lighting_work is aligned
33:17Run the game and hit that assertion
🏃
33:17Run the game and hit that assertion
🏃
33:17Run the game and hit that assertion
🏃
34:01Pad the lighting_work to 64 bytes
34:01Pad the lighting_work to 64 bytes
34:01Pad the lighting_work to 64 bytes
34:40Run the game to see that we are running at full speed again
🏃
34:40Run the game to see that we are running at full speed again
🏃
34:40Run the game to see that we are running at full speed again
🏃
35:53Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()
35:53Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()
35:53Profile PartitionsPerCast and LeavesPerCast in ComputeLightPropagation()
37:54Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them
🏃
37:54Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them
🏃
37:54Run the game and consult the profiler to see that our PartitionsPerCast is high, and consider how to reduce them
🏃
41:50Add a TIMED_FUNCTION() in RayCast()
41:50Add a TIMED_FUNCTION() in RayCast()
41:50Add a TIMED_FUNCTION() in RayCast()
42:28Run the game and consult the profiler to see that our RayCast() is not too bad
🏃
42:28Run the game and consult the profiler to see that our RayCast() is not too bad
🏃
42:28Run the game and consult the profiler to see that our RayCast() is not too bad
🏃
45:24Remove the recursion in RayCastRecurse()
45:24Remove the recursion in RayCastRecurse()
45:24Remove the recursion in RayCastRecurse()
49:25Run the game to see that that greatly improved our performance
🏃
49:25Run the game to see that that greatly improved our performance
🏃
49:25Run the game to see that that greatly improved our performance
🏃
50:35Just make RayCast() perform our new code from RayCastRecurse()
50:35Just make RayCast() perform our new code from RayCastRecurse()
50:35Just make RayCast() perform our new code from RayCastRecurse()
52:06Run the game to see that that doesn't change our runtime
🏃
52:06Run the game to see that that doesn't change our runtime
🏃
52:06Run the game to see that that doesn't change our runtime
🏃
52:13Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it
52:13Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it
52:13Prevent RayCast() from taking Solution and add a TIMED_FUNCTION() in it
53:09Add a TIMED_FUNCTION() in RayCast()
53:09Add a TIMED_FUNCTION() in RayCast()
53:09Add a TIMED_FUNCTION() in RayCast()
53:25Run the game and see the performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next
🏃
53:25Run the game and see the performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next
🏃
53:25Run the game and see the performance of RayCast(), disable HANDMADE_SLOW and consider what to tackle next
🏃
55:45Consider performing RayCast() in SIMD
📖
55:45Consider performing RayCast() in SIMD
📖
55:45Consider performing RayCast() in SIMD
📖
1:00:21Reduce RayCount from 64 to 16 in ComputeLightPropagation()
1:00:21Reduce RayCount from 64 to 16 in ComputeLightPropagation()
1:00:21Reduce RayCount from 64 to 16 in ComputeLightPropagation()
1:00:36Run the game to see what kind of a speedup SIMD could provide
🏃
1:00:36Run the game to see what kind of a speedup SIMD could provide
🏃
1:00:36Run the game to see what kind of a speedup SIMD could provide
🏃
1:01:42Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()
1:01:42Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()
1:01:42Enable ComputeLightPropagation() to call SampleHemisphere() in SIMD, introducing V3_4x()
1:10:19Update RayCast() to work with our SIMD data, introducing GetComponent()
1:10:19Update RayCast() to work with our SIMD data, introducing GetComponent()
1:10:19Update RayCast() to work with our SIMD data, introducing GetComponent()
1:22:52Run the game, crash in RayCast() and investigate why
🏃
1:22:52Run the game, crash in RayCast() and investigate why
🏃
1:22:52Run the game, crash in RayCast() and investigate why
🏃
1:24:30Temporarily disable threading in ComputeLightPropagation()
1:24:30Temporarily disable threading in ComputeLightPropagation()
1:24:30Temporarily disable threading in ComputeLightPropagation()
1:26:14Step through RayCast() to try and see what's going wrong
🏃
1:26:14Step through RayCast() to try and see what's going wrong
🏃
1:26:14Step through RayCast() to try and see what's going wrong
🏃
1:32:12Assert in RayCast() that the Depth is < BoxStack
1:32:12Assert in RayCast() that the Depth is < BoxStack
1:32:12Assert in RayCast() that the Depth is < BoxStack
1:33:04Step back into RayCast(), and eventually hit that assertion
🏃
1:33:04Step back into RayCast(), and eventually hit that assertion
🏃
1:33:04Step back into RayCast(), and eventually hit that assertion
🏃
1:35:05Increase BoxStack from 32 to 64 in RayCast()
1:35:05Increase BoxStack from 32 to 64 in RayCast()
1:35:05Increase BoxStack from 32 to 64 in RayCast()
1:35:30Run the game and see nothing
🏃
1:35:30Run the game and see nothing
🏃
1:35:30Run the game and see nothing
🏃
1:35:56Prevent RayCast() from pushing on the boxes four times
1:35:56Prevent RayCast() from pushing on the boxes four times
1:35:56Prevent RayCast() from pushing on the boxes four times
1:37:42Run the game to see that our performance has decreased
🏃
1:37:42Run the game to see that our performance has decreased
🏃
1:37:42Run the game to see that our performance has decreased
🏃
1:38:18Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD
1:38:18Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD
1:38:18Enable RayCast() to call IsInRectangleCenterHalfDim() in SIMD
1:45:35Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, −, | and &1
1:45:35Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, −, | and &1
1:45:35Introduce Any3TrueInAtLeastOneLane(), All3TrueInAtLeastOneLane(), AbsoluteValue() and v3_4x versions of <=, −, | and &1
1:59:56Step in to RayCast() and inspect our SIMD data
🏃
1:59:56Step in to RayCast() and inspect our SIMD data
🏃
1:59:56Step in to RayCast() and inspect our SIMD data
🏃
2:01:59Prevent AbsoluteValue() from converting the Mask to a float
2:01:59Prevent AbsoluteValue() from converting the Mask to a float
2:01:59Prevent AbsoluteValue() from converting the Mask to a float
2:03:15Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()
🏃
2:03:15Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()
🏃
2:03:15Step back in to AbsoluteValue() and through our other new SIMD functions in RayCast()
🏃
2:06:13Run the game to see that our lighting is a little bit messed up
🏃
2:06:13Run the game to see that our lighting is a little bit messed up
🏃
2:06:13Run the game to see that our lighting is a little bit messed up
🏃
2:07:19Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD
2:07:19Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD
2:07:19Make RayCast() perform the old scalar code after our SIMD code, to verify the SIMD
2:11:28Run the game and fail to hit our verification assertion
🏃
2:11:28Run the game and fail to hit our verification assertion
🏃
2:11:28Run the game and fail to hit our verification assertion
🏃
2:13:10Prevent RayCast() from erroneously breaking out of the child box loop
2:13:10Prevent RayCast() from erroneously breaking out of the child box loop
2:13:10Prevent RayCast() from erroneously breaking out of the child box loop
2:13:48Run the game to see that we are correct
🏃
2:13:48Run the game to see that we are correct
🏃
2:13:48Run the game to see that we are correct
🏃
2:14:07Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of − for full 4-wide negation, < and *, a f32_4x struct and versions of −, * and / that use this type2
2:14:07Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of − for full 4-wide negation, < and *, a f32_4x struct and versions of −, * and / that use this type2
2:14:07Streamline and switch RayCast() almost entirely over to SIMD, introducing ZeroV34x(), versions of V3_4x() that load four scalars, and one wide set, into lanes, v3_4x versions of − for full 4-wide negation, < and *, a f32_4x struct and versions of −, * and / that use this type2
3:19:18Step in to RayCast() to inspect its values
🏃
3:19:18Step in to RayCast() to inspect its values
🏃
3:19:18Step in to RayCast() to inspect its values
🏃
3:21:33Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=
3:21:33Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=
3:21:33Make RayCast() subtract only one element of the FaceRelOrigin, introducing a f32_4x version of -=
3:28:15Step in to RayCast() and inspect the FaceRelOrigin
🏃
3:28:15Step in to RayCast() and inspect the FaceRelOrigin
🏃
3:28:15Step in to RayCast() and inspect the FaceRelOrigin
🏃
3:29:19Run the game to see that we are producing the correct lighting a little faster
🏃
3:29:19Run the game to see that we are producing the correct lighting a little faster
🏃
3:29:19Run the game to see that we are producing the correct lighting a little faster
🏃
3:29:50Enable RayCast() to perform the bounds checking in SIMD
3:29:50Enable RayCast() to perform the bounds checking in SIMD
3:29:50Enable RayCast() to perform the bounds checking in SIMD
3:41:11Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need
3:41:11Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need
3:41:11Organise handmade_simd.h and introduce f32_4x versions of every operator and function we need
4:02:30Run the game to see that it's totally fine
🏃
4:02:30Run the game to see that it's totally fine
🏃
4:02:30Run the game to see that it's totally fine
🏃
4:03:01Read through RayCast() with the determination to finish doing it all in SIMD
📖
4:03:01Read through RayCast() with the determination to finish doing it all in SIMD
📖
4:03:01Read through RayCast() with the determination to finish doing it all in SIMD
📖
4:04:22Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow
4:04:22Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow
4:04:22Start to enable RayCast() to update and load out the boxes in SIMD, before saving it for tomorrow
4:11:01Q&A
🗩
4:11:01Q&A
🗩
4:11:01Q&A
🗩
4:12:01Clarify the video capture card situation mentioned in the pre-stream
🗩
4:12:01Clarify the video capture card situation mentioned in the pre-stream
🗩
4:12:01Clarify the video capture card situation mentioned in the pre-stream
🗩
4:12:58wired_life Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?
🗪
4:12:58wired_life Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?
🗪
4:12:58wired_life Q: I think you call AnyTrue() from within AllTrue(). Can you check that's intended?
🗪
4:14:29"And" and "All" in a matrix
🖌
4:14:29"And" and "All" in a matrix
🖌
4:14:29"And" and "All" in a matrix
🖌
4:15:59Call it there
🗩
4:15:59Call it there
🗩
4:15:59Call it there
🗩