Converting Math Operations to SIMD | Handmade Hero Episode Guide

1:23Recap yesterday's work

2:46build.bat: Switch to -O2

4:22Think about doing the TestPixel TIMED_BLOCK over a wider range

5:20handmade_render_group.cpp: Move the timer around the for loops

5:50Debugger: See that there are two loops that are more or less the same

6:26handmade_platform.h: Number these DebugCycleCounters

6:49handmade_render_group.cpp: Rename TestPixel to ProcessPixel and remove TIMED_BLOCK around DrawRectangleSlowly

7:35Debugger: Look at the DEBUG CYCLE COUNTS

8:12handmade_render_group.cpp: Introduce END_TIMED_BLOCK_COUNTED

9:36Debugger: See that the ProcessPixel count is now more accurate [243cy/h]

10:34handmade_render_group.cpp: Write this in SIMD

16:35Run and see that it's still producing the correct result

16:47build.bat: Switch to -Od

17:27Debugger: Inspect TexelAr

21:28handmade_render_group.cpp: Continue transforming these Texel computations into SIMD

29:21Run and note that we're running just fine [575cy/h]

29:46handmade_render_group.cpp: Continue making these wide

37:14Compile and see if we made any mistakes [557cy/h]

37:31handmade_render_group.cpp: Do the rest of this wide, except for the Clamp

40:39Intel Intrinsics Guide: _mm_sqrt_ps¹

41:11handmade_render_group.cpp: Do _mm_sqrt_ps and continue converting to SIMD

43:39Run and note that we are blitting correctly [427cy/h]

43:54Debugger: Look at what Clamp01 does

47:17Intel Intrinsics Guide: _mm_min_ps and _mm_max_ps²

48:45handmade_render_group.cpp: Do the Clamps wide [179cy/h]

50:02Run and note that the game is already running faster

50:47Reflect on the straightforwardness of this work

51:54Consider what's left to convert to SIMD

52:46handmade_render_group.cpp: Do PixelP wide

54:16Run and note how fast it's running [124cy/h]

56:18Debugger: Investigate what the compiler is doing with those 50 cycles

1:02:54handmade_render_group.cpp: Finish doing the SIMD here

1:07:32Run and note that we're creeping forwards [121cy/h]

1:08:06Recap and glimpse into the future of doing the Loads and Repack in SIMD

1:11:08Q&A

🗩

1:11:08Q&A

🗩

1:11:08Q&A

🗩

1:11:32kknewkles How do you cover multiple CPU technologies intrinsic-wise? Preprocessor switches on dedicated intrinsics for each? Also, whom to read on ASM? I'm thinking Mike Abrash?

🗪

1:11:32kknewkles How do you cover multiple CPU technologies intrinsic-wise? Preprocessor switches on dedicated intrinsics for each? Also, whom to read on ASM? I'm thinking Mike Abrash?

🗪

1:11:32kknewkles How do you cover multiple CPU technologies intrinsic-wise? Preprocessor switches on dedicated intrinsics for each? Also, whom to read on ASM? I'm thinking Mike Abrash?

🗪

1:13:09houb_ We have come from 385 cycles to 123. Does something like the 80%-20% rule apply? Do you think we will get down to 50 cycles?

🗪

1:13:09houb_ We have come from 385 cycles to 123. Does something like the 80%-20% rule apply? Do you think we will get down to 50 cycles?

🗪

1:13:09houb_ We have come from 385 cycles to 123. Does something like the 80%-20% rule apply? Do you think we will get down to 50 cycles?

🗪

1:15:22maexono The way we use mmSquare, does it calculate the argument twice?

🗪

1:15:22maexono The way we use mmSquare, does it calculate the argument twice?

🗪

1:15:22maexono The way we use mmSquare, does it calculate the argument twice?

🗪

1:15:41Debugger: Determine if the compiler is doing common subexpression elimination for these multiplies

1:21:11Deep, concentrated investigation^α

1:25:54Look at how fast the game's running

1:26:19cvaucher Where do OpenCL and other GPGPU frameworks fit into optimization? It seems like if something is SIMD-able, it could just be done wider on a GPU. Are there workloads that are better suited to the CPU and SIMD?

🗪

1:26:19cvaucher Where do OpenCL and other GPGPU frameworks fit into optimization? It seems like if something is SIMD-able, it could just be done wider on a GPU. Are there workloads that are better suited to the CPU and SIMD?

🗪

1:26:19cvaucher Where do OpenCL and other GPGPU frameworks fit into optimization? It seems like if something is SIMD-able, it could just be done wider on a GPU. Are there workloads that are better suited to the CPU and SIMD?

🗪

1:29:06garlandobloom We have optimizations still on?

🗪

1:29:06garlandobloom We have optimizations still on?

🗪

1:29:06garlandobloom We have optimizations still on?

🗪

1:29:19gasto5 Why are there optimizing options in the compiler if one will end up typing SIMD functions?

🗪

1:29:19gasto5 Why are there optimizing options in the compiler if one will end up typing SIMD functions?

🗪

1:29:19gasto5 Why are there optimizing options in the compiler if one will end up typing SIMD functions?

🗪

1:31:01quylthulg Do you know of the _mm_setr_ps intrinsic (and _pd etc) - note the r in setr? It loads the values in reverse order, i.e. in the order that is more intuitive

🗪

1:31:01quylthulg Do you know of the _mm_setr_ps intrinsic (and _pd etc) - note the r in setr? It loads the values in reverse order, i.e. in the order that is more intuitive

🗪

1:31:01quylthulg Do you know of the _mm_setr_ps intrinsic (and _pd etc) - note the r in setr? It loads the values in reverse order, i.e. in the order that is more intuitive

🗪

1:31:38garlandobloom When do you think we will thread the renderer?

🗪

1:31:38garlandobloom When do you think we will thread the renderer?

🗪

1:31:38garlandobloom When do you think we will thread the renderer?

🗪

1:31:57goodoldmalk Possibly misguided question, is there a way to overload operators to use SIMD instructions instead?

🗪

1:31:57goodoldmalk Possibly misguided question, is there a way to overload operators to use SIMD instructions instead?

🗪

1:31:57goodoldmalk Possibly misguided question, is there a way to overload operators to use SIMD instructions instead?

🗪

1:32:45digitaldomovoi Is padding and alignment still something you have to concern yourself with? I remember doing SIMD in the mid 2000s, and SIMD was essentially worthless (much of the time) if your data wasn't aligned

🗪

1:32:45digitaldomovoi Is padding and alignment still something you have to concern yourself with? I remember doing SIMD in the mid 2000s, and SIMD was essentially worthless (much of the time) if your data wasn't aligned

🗪

1:32:45digitaldomovoi Is padding and alignment still something you have to concern yourself with? I remember doing SIMD in the mid 2000s, and SIMD was essentially worthless (much of the time) if your data wasn't aligned

🗪

1:33:43digitaldomovoi Addendum: By "concern yourself", I mean, is it something the compiler now handles more autonomously when you "engage" SIMD

🗪

1:33:43digitaldomovoi Addendum: By "concern yourself", I mean, is it something the compiler now handles more autonomously when you "engage" SIMD

🗪

1:33:43digitaldomovoi Addendum: By "concern yourself", I mean, is it something the compiler now handles more autonomously when you "engage" SIMD

🗪

1:34:15kil4h Will you generate asm for NEON (if you port to arm of course)? GCC seems to be pretty bad at generating correct code with intrinsics (from my experience on Android)

🗪

1:34:15kil4h Will you generate asm for NEON (if you port to arm of course)? GCC seems to be pretty bad at generating correct code with intrinsics (from my experience on Android)

🗪

1:34:15kil4h Will you generate asm for NEON (if you port to arm of course)? GCC seems to be pretty bad at generating correct code with intrinsics (from my experience on Android)

🗪

1:35:03culver_fly How would you know if doing something will speed up the code? Especially when it's a fairly large change to the codebase and when time is limited, I find myself reluctant to perform such optimizations in fear of introducing bugs

🗪

1:35:03culver_fly How would you know if doing something will speed up the code? Especially when it's a fairly large change to the codebase and when time is limited, I find myself reluctant to perform such optimizations in fear of introducing bugs

🗪

1:35:03culver_fly How would you know if doing something will speed up the code? Especially when it's a fairly large change to the codebase and when time is limited, I find myself reluctant to perform such optimizations in fear of introducing bugs

🗪

1:36:46miblo What do you think you'll next want to convert to SIMD, in case I want to practise over the weekend?

🗪

1:36:46miblo What do you think you'll next want to convert to SIMD, in case I want to practise over the weekend?

🗪

1:36:46miblo What do you think you'll next want to convert to SIMD, in case I want to practise over the weekend?

🗪

1:38:52flaturated Can you compile it -Od and show how SIMD has helped there?

🗪

1:38:52flaturated Can you compile it -Od and show how SIMD has helped there?

🗪

1:38:52flaturated Can you compile it -Od and show how SIMD has helped there?

🗪

1:39:32kknewkles Would it be a good exercise (albeit a large one) to study a simple CPU and write some soft for it? Arduino or something ancient? I wanted to learn coding for GBA for a while

🗪

1:39:32kknewkles Would it be a good exercise (albeit a large one) to study a simple CPU and write some soft for it? Arduino or something ancient? I wanted to learn coding for GBA for a while

🗪

1:39:32kknewkles Would it be a good exercise (albeit a large one) to study a simple CPU and write some soft for it? Arduino or something ancient? I wanted to learn coding for GBA for a while

🗪

1:41:04kknewkles Let's rephrase: what CPU would you advise to study that would be simple enough yet representative enough of the general stuff you should know about when working with CPUs?^β

🗪

1:41:04kknewkles Let's rephrase: what CPU would you advise to study that would be simple enough yet representative enough of the general stuff you should know about when working with CPUs?^β

🗪

1:41:04kknewkles Let's rephrase: what CPU would you advise to study that would be simple enough yet representative enough of the general stuff you should know about when working with CPUs?^β

🗪

1:42:52theitchyninja How long have you been working on this and when do you think you will finish?

🗪

1:42:52theitchyninja How long have you been working on this and when do you think you will finish?

🗪

1:42:52theitchyninja How long have you been working on this and when do you think you will finish?

🗪

1:43:29gasto5 re you going to optimize gameplay code as well?

🗪

1:43:29gasto5 re you going to optimize gameplay code as well?

🗪

1:43:29gasto5 re you going to optimize gameplay code as well?

🗪

1:43:45houb_ Have you heard of the JayStation2 Project from Jaymin Kessler, working with the Raspberry Pi 2 B+?

🗪

1:43:45houb_ Have you heard of the JayStation2 Project from Jaymin Kessler, working with the Raspberry Pi 2 B+?

🗪

1:43:45houb_ Have you heard of the JayStation2 Project from Jaymin Kessler, working with the Raspberry Pi 2 B+?

🗪

1:44:03Close things down with a recap of the week's optimisation work

🗩

1:44:03Close things down with a recap of the week's optimisation work

🗩

1:44:03Close things down with a recap of the week's optimisation work

🗩

1:48:03Shout out to the mods

🗩

1:48:03Shout out to the mods

🗩

1:48:03Shout out to the mods

🗩

Handmade Hero

Keyboard Navigation

Global Keys

Menu toggling

In-Menu Movement

Quotes and References Menus

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu