Preparing a Function for Optimization | Handmade Hero Episode Guide

1:31Open things up and recap

2:48DrawRectangleSlowly: Increase efficiency

3:33Create DrawRectangleHopefullyQuickly

4:34DrawRectangleHopefullyQuickly: Skip the preamble

5:42Remove all unnecessary code

6:44Look at what's happening

8:01Make the edge testing code more explicit

9:49Blackboard: See what's happening with these inner products

12:04DrawRectangleHopefullyQuickly: Test U and V instead

13:12Run the game

13:33Make these U and V computations more efficient

14:40Run the game and ensure that everything still blits fine

15:16Continue pruning

18:02Flatten the routine

19:55Blow out v4 Blended into scalar form

21:18Take a close look at the routine and precompute InvTexelA

23:35Blow out v4 Dest and Texel into scalar form

25:30Flatten BilinearSample and SRGBBilinearBlend

28:02Assess our situation

28:55Unpack and optimise the Lerps

33:57Run the game and annotate the code

35:33Flatten SRGB255ToLinear1

36:38Flatten Unpack4x8

38:59That's everything flattened

39:22Note that the code is faster

40:58We have a nasty problem with the unpackings

44:01Blackboard: What is our "wide" strategy?

48:43Set the stage for SIMD

50:45Consider solidifying texture boundaries

51:53Leave it for today

53:09Q&A

🗩

53:09Q&A

🗩

53:09Q&A

🗩

53:28braincruser The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?

🗪

53:28braincruser The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?

🗪

53:28braincruser The way the code is written now you have a very long dependency chain (between instructions). Will you break down the code to remove it?

🗪

56:42stelar7 Why did you write float instead of real32 this stream?

🗪

56:42stelar7 Why did you write float instead of real32 this stream?

🗪

56:42stelar7 Why did you write float instead of real32 this stream?

🗪

57:14stelar7 Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?

🗪

57:14stelar7 Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?

🗪

57:14stelar7 Why use -O2 instead of -O3 or -Ofast (possibly with -fverbose-asm)?

🗪

58:06garryjohanson Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?

🗪

58:06garryjohanson Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?

🗪

58:06garryjohanson Do you ever use exclusive or operations to avoid pipeline stalls? If not, what do you use?

🗪

59:04g3rain1 Aren't those square roots pretty expensive?¹

🗪

59:04g3rain1 Aren't those square roots pretty expensive?¹

🗪

59:04g3rain1 Aren't those square roots pretty expensive?¹

🗪

1:03:31andsz_ Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)

🗪

1:03:31andsz_ Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)

🗪

1:03:31andsz_ Will you make multiple SIMD backends? (SSE?/AVX/FMA versions)

🗪

1:04:04davidthomas426 You could loft some of those variables out one more loop

🗪

1:04:04davidthomas426 You could loft some of those variables out one more loop

🗪

1:04:04davidthomas426 You could loft some of those variables out one more loop

🗪

1:04:58waterlimon How expensive is the float<>int conversion compared to the rest of the workload?²

🗪

1:04:58waterlimon How expensive is the float<>int conversion compared to the rest of the workload?²

🗪

1:04:58waterlimon How expensive is the float<>int conversion compared to the rest of the workload?²

🗪

1:05:40davidthomas426 Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?

🗪

1:05:40davidthomas426 Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?

🗪

1:05:40davidthomas426 Since xAxis and yAxis are usually perpendicular, should we special case for that? In the same vein, should we special-case for axis-aligned?

🗪

1:06:56waterlimon Does the compiler do any automatic SSE optimization (or have option for it?)

🗪

1:06:56waterlimon Does the compiler do any automatic SSE optimization (or have option for it?)

🗪

1:06:56waterlimon Does the compiler do any automatic SSE optimization (or have option for it?)

🗪

1:09:01stelar7 sqrt_ss vs sqrt_ps vs sqrt_pd?³

🗪

1:09:01stelar7 sqrt_ss vs sqrt_ps vs sqrt_pd?³

🗪

1:09:01stelar7 sqrt_ss vs sqrt_ps vs sqrt_pd?³

🗪

1:11:56waterlimon Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?

🗪

1:11:56waterlimon Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?

🗪

1:11:56waterlimon Would SSE allow doing sRGB using exponent 2.2 instead of approximating using one of 2, without a huge performance hit?

🗪

1:12:41pseudonym73 The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them

🗪

1:12:41pseudonym73 The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them

🗪

1:12:41pseudonym73 The main reason why you don't get automatic SIMD is precise exceptions. You probably need to tell the compiler that you don't need them

🗪

1:14:44waterlimon What happens if "/arch:AVX2" switch is enabled?

🗪

1:14:44waterlimon What happens if "/arch:AVX2" switch is enabled?

🗪

1:14:44waterlimon What happens if "/arch:AVX2" switch is enabled?

🗪

1:15:26Look at this AVX-512 stuff⁴

1:16:51braincruser FMA is fused multiply add

🗪

1:16:51braincruser FMA is fused multiply add

🗪

1:16:51braincruser FMA is fused multiply add

🗪

1:18:48andsz_ Yeah, looks like different caps bits

🗪

1:18:48andsz_ Yeah, looks like different caps bits

🗪

1:18:48andsz_ Yeah, looks like different caps bits

🗪

1:19:23Wrap things up

🗩

1:19:23Wrap things up

🗩

1:19:23Wrap things up

🗩

Handmade Hero

Keyboard Navigation

Global Keys

Menu toggling

In-Menu Movement

Quotes and References Menus

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu