Introduction to Function Approximation with Andrew Bromage | Handmade Hero Episode Guide

0:00Casey Muratori: Welcome to a special episode with Andrew Bromage

2:42Andrew Bromage: How floating point numbers are represented by a computer¹

4:36AB: Scientific notation, and how IEEE 754 represents numbers in binary

7:14CM: Get Andrew back

🗹

7:14CM: Get Andrew back

🗹

7:14CM: Get Andrew back

🗹

9:27AB: Return

10:36AB: Continuing IEEE 754's² representation of floating point numbers³

16:20AB: Subnormal numbers, the special-case numbers infinity, quiet NaN and signaling NaN, and the quality of being "algebraically closed"

24:10AB: Any questions?

24:35CM: Is it just a peculiarity of binary as a number system, that you can skip encoding the leading digit?

26:06desuused Q: Is there a representation for underflowing numbers?

🗪

26:06desuused Q: Is there a representation for underflowing numbers?

🗪

26:06desuused Q: Is there a representation for underflowing numbers?

🗪

27:28AB: Note the binary and decimal representations of floating point numbers in the IEEE 754 standard⁴

27:51AB: Constants definitions in handmade_numerics.h

📖

27:51AB: Constants definitions in handmade_numerics.h

📖

27:51AB: Constants definitions in handmade_numerics.h

📖

30:22AB: Constants definitions in C's float.h⁵ as compared with those in handmade_numerics.h, with a special mention of machine epsilon

📖

30:22AB: Constants definitions in C's float.h⁵ as compared with those in handmade_numerics.h, with a special mention of machine epsilon

📖

30:22AB: Constants definitions in C's float.h⁵ as compared with those in handmade_numerics.h, with a special mention of machine epsilon

📖

33:32AB: Describe the IEEEBinary32 union, ieee754_number_category enum and the special-case number functions

📖

33:32AB: Describe the IEEEBinary32 union, ieee754_number_category enum and the special-case number functions

📖

33:32AB: Describe the IEEEBinary32 union, ieee754_number_category enum and the special-case number functions

📖

36:48AB: Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()

📖

36:48AB: Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()

📖

36:48AB: Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()

📖

39:07AB: Describe ExtractExponent() as similar to the CRT's frexp()⁶ with an example of its use in a sqrt() function

📖

39:07AB: Describe ExtractExponent() as similar to the CRT's frexp()⁶ with an example of its use in a sqrt() function

📖

39:07AB: Describe ExtractExponent() as similar to the CRT's frexp()⁶ with an example of its use in a sqrt() function

📖

47:36CM: When multiplying a subnormal number by a power of two, does the floating point unit first shift the numbers into the normal range before incrementing the exponent?

50:38AB: Describe ScaleByExponent()

📖

50:38AB: Describe ScaleByExponent()

📖

50:38AB: Describe ScaleByExponent()

📖

53:58AB: Note the differing range of absolute values of the mantissa in text books (as used in handmade_numerics.h) and the CRT's frexp()⁷

📖

53:58AB: Note the differing range of absolute values of the mantissa in text books (as used in handmade_numerics.h) and the CRT's frexp()⁷

📖

53:58AB: Note the differing range of absolute values of the mantissa in text books (as used in handmade_numerics.h) and the CRT's frexp()⁷

📖

57:01AB: Quote James H. Wilkinson on the state of computer arithmetic in 1971

58:30AB: Describe SlowDivision(), with emphasis on the sheer amount of specification compliance it contains⁸

📖

58:30AB: Describe SlowDivision(), with emphasis on the sheer amount of specification compliance it contains⁸

📖

58:30AB: Describe SlowDivision(), with emphasis on the sheer amount of specification compliance it contains⁸

📖

1:06:29AB: Walk through an example of SlowDivision(), noting why it uses 11 bits of precision in its application of Horner's rule⁹ (IA-32's RCPSS instruction¹⁰)

📖

1:06:29AB: Walk through an example of SlowDivision(), noting why it uses 11 bits of precision in its application of Horner's rule⁹ (IA-32's RCPSS instruction¹⁰)

📖

1:06:29AB: Walk through an example of SlowDivision(), noting why it uses 11 bits of precision in its application of Horner's rule⁹ (IA-32's RCPSS instruction¹⁰)

📖

1:14:13AB: How SlowDivision() finishes up its computation to the highest precision it can

📖

1:14:13AB: How SlowDivision() finishes up its computation to the highest precision it can

📖

1:14:13AB: How SlowDivision() finishes up its computation to the highest precision it can

📖

1:16:54AB: Note the difference between SlowDivision() and how our FPU performs division

📖

1:16:54AB: Note the difference between SlowDivision() and how our FPU performs division

📖

1:16:54AB: Note the difference between SlowDivision() and how our FPU performs division

📖

1:19:13AB: Calculating those polynomial approximations from SlowDivision(), with range illustrations in Mathematica

📖

1:19:13AB: Calculating those polynomial approximations from SlowDivision(), with range illustrations in Mathematica

📖

1:19:13AB: Calculating those polynomial approximations from SlowDivision(), with range illustrations in Mathematica

📖

1:25:12AB: Relative vs absolute error

📖

1:25:12AB: Relative vs absolute error

📖

1:25:12AB: Relative vs absolute error

📖

1:26:43AB: Plot the error function in Mathematica, with a mention of Chebyshev's Equioscillation theorem¹¹ and Chebyshev nodes¹²

📖

1:26:43AB: Plot the error function in Mathematica, with a mention of Chebyshev's Equioscillation theorem¹¹ and Chebyshev nodes¹²

📖

1:26:43AB: Plot the error function in Mathematica, with a mention of Chebyshev's Equioscillation theorem¹¹ and Chebyshev nodes¹²

📖

1:31:35AB: Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica

📖

1:31:35AB: Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica

📖

1:31:35AB: Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica

📖

1:33:47AB: Using the Remez exchange algorithm¹³ to find approximations to Chebyshev's set of polynomials

📖

1:33:47AB: Using the Remez exchange algorithm¹³ to find approximations to Chebyshev's set of polynomials

📖

1:33:47AB: Using the Remez exchange algorithm¹³ to find approximations to Chebyshev's set of polynomials

📖

1:38:21CM: On the initial guesses in the Remez exchange algorithm

1:39:15AB: Plot the ninth order Chebyshev polynomial, for comparison with the eighth order, to explain extrema

📖

1:39:15AB: Plot the ninth order Chebyshev polynomial, for comparison with the eighth order, to explain extrema

📖

1:39:15AB: Plot the ninth order Chebyshev polynomial, for comparison with the eighth order, to explain extrema

📖

1:40:20CM: Struggle with the communication link

💢

🗩

1:40:20CM: Struggle with the communication link

💢

🗩

1:40:20CM: Struggle with the communication link

💢

🗩

1:41:49CM: As the Remez exchange algorithm proceeds, what values does it use as its new guesses?

1:42:24AB: Searching for the extremum, perhaps using golden-section search,¹⁴ as the Remez exchange algorithm proceeds

📖

1:42:24AB: Searching for the extremum, perhaps using golden-section search,¹⁴ as the Remez exchange algorithm proceeds

📖

1:42:24AB: Searching for the extremum, perhaps using golden-section search,¹⁴ as the Remez exchange algorithm proceeds

📖

1:46:50AB: Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π

📖

1:46:50AB: Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π

📖

1:46:50AB: Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π

📖

1:56:43gg_nate Can you set video / audio bit rate on hangouts? If he could lower the bit rate of the video the audio might work better

🗪

1:56:43gg_nate Can you set video / audio bit rate on hangouts? If he could lower the bit rate of the video the audio might work better

🗪

1:56:43gg_nate Can you set video / audio bit rate on hangouts? If he could lower the bit rate of the video the audio might work better

🗪

1:56:57AB: Describe SinCos_TableVersion()

📖

1:56:57AB: Describe SinCos_TableVersion()

📖

1:56:57AB: Describe SinCos_TableVersion()

📖

1:58:31AB: Deriving trigonometric identities

2:02:28AB: Calculating cosine around "a", branch-free

2:04:00AB: Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines

📖

2:04:00AB: Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines

📖

2:04:00AB: Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines

📖

2:05:54AB: Describe the XSinCosX table look up

📖

2:05:54AB: Describe the XSinCosX table look up

📖

2:05:54AB: Describe the XSinCosX table look up

📖

2:07:56AB: Counting has always started at zero¹⁵

📖

2:07:56AB: Counting has always started at zero¹⁵

📖

2:07:56AB: Counting has always started at zero¹⁵

📖

2:08:47AB: Continued description of the XSinCosX table look up

📖

2:08:47AB: Continued description of the XSinCosX table look up

📖

2:08:47AB: Continued description of the XSinCosX table look up

📖

2:10:38AB: Run calculate_sincos_tables and explain the result for .5

🏃

2:10:38AB: Run calculate_sincos_tables and explain the result for .5

🏃

2:10:38AB: Run calculate_sincos_tables and explain the result for .5

🏃

2:12:10AB: Describe how FindSinCosAround() searches adjacent floating point numbers

📖

2:12:10AB: Describe how FindSinCosAround() searches adjacent floating point numbers

📖

2:12:10AB: Describe how FindSinCosAround() searches adjacent floating point numbers

📖

2:15:55CM: Could you explain why part of the table version looks up into XSinCosX while the polynomial part is the same no matter where you are in the table?

2:17:10AB: Further explain the table lookups of cos(a) and sin(a) and the sin(e) and cos(e) polynomial approximations

📖

2:17:10AB: Further explain the table lookups of cos(a) and sin(a) and the sin(e) and cos(e) polynomial approximations

📖

2:17:10AB: Further explain the table lookups of cos(a) and sin(a) and the sin(e) and cos(e) polynomial approximations

📖

2:19:46CM: Since table lookups are hard in SIMD, what sort of stuff would you end up doing if you couldn't use a table?

2:20:09AB: Describe the experimental SinCos_QuadrantVersion()

📖

2:20:09AB: Describe the experimental SinCos_QuadrantVersion()

📖

2:20:09AB: Describe the experimental SinCos_QuadrantVersion()

📖

2:28:25x13pixels Q: Why do C0, C2, C4, C6, etc. have more precision than can fit in a float?

🗪

2:28:25x13pixels Q: Why do C0, C2, C4, C6, etc. have more precision than can fit in a float?

🗪

2:28:25x13pixels Q: Why do C0, C2, C4, C6, etc. have more precision than can fit in a float?

🗪

2:29:32AB: Describe ATan() and ATan2(), noting the current use of atan2 in Handmade Hero

📖

2:29:32AB: Describe ATan() and ATan2(), noting the current use of atan2 in Handmade Hero

📖

2:29:32AB: Describe ATan() and ATan2(), noting the current use of atan2 in Handmade Hero

📖

2:35:01CM: Determine to remove atan2 from Handmade Hero with thanks to Andrew

2:35:31CM: Q&A

2:36:240b0000000000000 If the values are denormal, they will run way slower

🗪

2:36:240b0000000000000 If the values are denormal, they will run way slower

🗪

2:36:240b0000000000000 If the values are denormal, they will run way slower

🗪

2:36:46filiadelski Q: What was the reason we couldn't do two's complement in the exponent?

🗪

2:36:46filiadelski Q: What was the reason we couldn't do two's complement in the exponent?

🗪

2:36:46filiadelski Q: What was the reason we couldn't do two's complement in the exponent?

🗪

2:37:320b0000000000000 sin and cos¹⁶

🗪

2:37:320b0000000000000 sin and cos¹⁶

🗪

2:37:320b0000000000000 sin and cos¹⁶

🗪

2:41:320b0000000000000 Unless you explicitly flush them to zero, they will run super slow

🗪

2:41:320b0000000000000 Unless you explicitly flush them to zero, they will run super slow

🗪

2:41:320b0000000000000 Unless you explicitly flush them to zero, they will run super slow

🗪

2:42:44spacealiens Q: Is there a version of this maths source code anywhere, or will it be included in the Handmade Hero project eventually?

🗪

2:42:44spacealiens Q: Is there a version of this maths source code anywhere, or will it be included in the Handmade Hero project eventually?

🗪

2:42:44spacealiens Q: Is there a version of this maths source code anywhere, or will it be included in the Handmade Hero project eventually?

🗪

2:43:17AB: Note the educational nature of this sine and cosine implementation, with a mention of Cody-Waite reduction

2:45:43vateferfout Q: Will you go through acos as well in a later stream?

🗪

2:45:43vateferfout Q: Will you go through acos as well in a later stream?

🗪

2:45:43vateferfout Q: Will you go through acos as well in a later stream?

🗪

2:46:100b0000000000000 What is the guy speaking's username?

🗪

2:46:100b0000000000000 What is the guy speaking's username?

🗪

2:46:100b0000000000000 What is the guy speaking's username?

🗪

2:46:51CM: SSE denormal flushing

2:48:130b0000000000000 I can show you some versions that are completely branchless without tables if you are interested

🗪

2:48:130b0000000000000 I can show you some versions that are completely branchless without tables if you are interested

🗪

2:48:130b0000000000000 I can show you some versions that are completely branchless without tables if you are interested

🗪

2:49:05sneakybob_wot Q: Have you done a speed comparison vs the C versions?

🗪

2:49:05sneakybob_wot Q: Have you done a speed comparison vs the C versions?

🗪

2:49:05sneakybob_wot Q: Have you done a speed comparison vs the C versions?

🗪

2:50:34AB: Re-emphasise the slowness of SlowDivision()

📖

2:50:34AB: Re-emphasise the slowness of SlowDivision()

📖

2:50:34AB: Re-emphasise the slowness of SlowDivision()

📖

2:52:55staythirsty90 Not industrial strength?! Why am I here??

🗪

2:52:55staythirsty90 Not industrial strength?! Why am I here??

🗪

2:52:55staythirsty90 Not industrial strength?! Why am I here??

🗪

2:53:15vateferfout Q: To be sure, it means the SIMD intrinsics are not standards-compliant?

🗪

2:53:15vateferfout Q: To be sure, it means the SIMD intrinsics are not standards-compliant?

🗪

2:53:15vateferfout Q: To be sure, it means the SIMD intrinsics are not standards-compliant?

🗪

2:56:22AB: Quote the Intel 64 and IA-32 Architectures Optimization Reference Manual:¹⁷ "Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"

📖

2:56:22AB: Quote the Intel 64 and IA-32 Architectures Optimization Reference Manual:¹⁷ "Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"

📖

2:56:22AB: Quote the Intel 64 and IA-32 Architectures Optimization Reference Manual:¹⁷ "Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"

📖

2:57:04CM: Thank you to Andrew for walking us through sine and cosine, with closing thoughts on numerical approximations

Handmade Hero

Keyboard Navigation

Global Keys

Menu toggling

In-Menu Movement

Quotes and References Menus

Quotes, References and Credits Menus

Filter Menu

Filter and Link Menus

Credits Menu