SSE Mixer Pre and Post Loops | Handmade Hero Episode Guide

00:02:11Plan for today: SIMDizing the mixer

00:03:41Aligning the temporary buffer

00:05:00Making sure the temporary sound buffers are big enough to fit all samples

00:05:29Explanation of Align16

00:06:23Alignment macro for any power of two: AlignPow2

00:11:17Clamping samples to the signed 16-bit integer range

00:18:09(intermission) Two's complement

00:34:44Back to SIMD

00:36:48Rounding the samples

00:37:37Downconverting from 32-bit to 16-bit integers. No clamping necessary!

00:39:54Looking for intrinsics that interleave 16-bit values

00:44:18Interleaving the samples before packing them

00:47:27Making sure we don't write out of bounds

00:49:00Debugging output using structured input

00:52:50Padding the buffer in the platform layer to make sure we always have space for overwrites

00:54:20Casey remembers that the horizontal mouse position was linked to music panning

00:54:52Getting rid of unnecessary clamping operations

00:55:45Using aligned loads and stores

00:57:24Plan for next episode

01:01:30More 2s complement. Full example

01:11:30Q&A

🗩

01:11:30Q&A

🗩

01:11:30Q&A

🗩

01:11:37cubercaleb Why isn't 2's complement used for floating-point numbers if it makes signed arithmetic easy?

🗪

01:11:37cubercaleb Why isn't 2's complement used for floating-point numbers if it makes signed arithmetic easy?

🗪

01:11:37cubercaleb Why isn't 2's complement used for floating-point numbers if it makes signed arithmetic easy?

🗪

01:16:35poohshoes Are you not going to profile it too see how much faster it gets?

🗪

01:16:35poohshoes Are you not going to profile it too see how much faster it gets?

🗪

01:16:35poohshoes Are you not going to profile it too see how much faster it gets?

🗪

01:16:55dr_s80 When you implemented streaming in chunks of audio; I believe the code actually loads the entire file (with a platform layer VirtualAlloc) for each chunk. Is this just an artifact of the debug nature of that code?

🗪

01:16:55dr_s80 When you implemented streaming in chunks of audio; I believe the code actually loads the entire file (with a platform layer VirtualAlloc) for each chunk. Is this just an artifact of the debug nature of that code?

🗪

01:16:55dr_s80 When you implemented streaming in chunks of audio; I believe the code actually loads the entire file (with a platform layer VirtualAlloc) for each chunk. Is this just an artifact of the debug nature of that code?

🗪

01:17:33ishytarus Does the audio make the framerate in debug mode?

🗪

01:17:33ishytarus Does the audio make the framerate in debug mode?

🗪

01:17:33ishytarus Does the audio make the framerate in debug mode?

🗪

01:26:09cubercaleb If 1111 (-1) is supposed to be less than 0000 (0) then how do number comparisons work on the CPU level?

🗪

01:26:09cubercaleb If 1111 (-1) is supposed to be less than 0000 (0) then how do number comparisons work on the CPU level?

🗪

01:26:09cubercaleb If 1111 (-1) is supposed to be less than 0000 (0) then how do number comparisons work on the CPU level?

🗪

01:32:39marumoto Do you have any tips for speeding up compile time when using multiple translation units?

🗪

01:32:39marumoto Do you have any tips for speeding up compile time when using multiple translation units?

🗪

01:32:39marumoto Do you have any tips for speeding up compile time when using multiple translation units?

🗪

01:32:55sssmcgrath It's movsx for signed

🗪

01:32:55sssmcgrath It's movsx for signed

🗪

01:32:55sssmcgrath It's movsx for signed

🗪

Handmade Hero

Keyboard Navigation

Global Keys

Menu toggling

In-Menu Movement

Quotes and References Menus

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu