Introduction to Function Approximation with Andrew Bromage

?

?

W, K, P / S, J, N Jump to previous / next marker

t / T Toggle theatre / SUPERtheatre mode

V Revert filter to original state Y Select link (requires manual Ctrl-c)

X, ShiftSpace Toggle category and focus previous

v Invert topics / media as per focus

# Keyboard Navigation

## Global Keys

[, < / ], > Jump to previous / next episodeW, K, P / S, J, N Jump to previous / next marker

t / T Toggle theatre / SUPERtheatre mode

V Revert filter to original state Y Select link (requires manual Ctrl-c)

## Menu toggling

q Quotes r References f Filter y Link c Credits## In-Menu Movement

a

w

s

s

d

h
j
k
l

←

↑

↓

↓

→

## Quotes and References Menus

Enter Jump to timecode## Quotes, References and Credits Menus

o Open URL (in new tab)## Filter Menu

x, Space Toggle category and focus nextX, ShiftSpace Toggle category and focus previous

v Invert topics / media as per focus

## Filter and Link Menus

z Toggle filter / linking mode## Credits Menu

Enter Open URL (in new tab)⏫

Previous: 'Testing Better Entropy'

⏫

0:00 : Welcome to a special episode with Andrew Bromage

0:00 : Welcome to a special episode with Andrew Bromage

0:00 : Welcome to a special episode with Andrew Bromage

2:42 : How floating point numbers are represented by a computer

^{1}2:42 : How floating point numbers are represented by a computer

^{1}2:42 : How floating point numbers are represented by a computer

^{1}4:36 : Scientific notation, and how IEEE 754 represents numbers in binary

4:36 : Scientific notation, and how IEEE 754 represents numbers in binary

4:36 : Scientific notation, and how IEEE 754 represents numbers in binary

7:14 : Get Andrew back

🗹

7:14 : Get Andrew back

🗹

7:14 : Get Andrew back

🗹

9:27 : Return

9:27 : Return

9:27 : Return

10:36 : Continuing IEEE 754's

^{2}representation of floating point numbers^{3}10:36 : Continuing IEEE 754's

^{2}representation of floating point numbers^{3}10:36 : Continuing IEEE 754's

^{2}representation of floating point numbers^{3}16:20 : Subnormal numbers, the special-case numbers infinity, quiet NaN and signaling NaN, and the quality of being "algebraically closed"

16:20 : Subnormal numbers, the special-case numbers infinity, quiet NaN and signaling NaN, and the quality of being "algebraically closed"

24:10 : Any questions?

24:10 : Any questions?

24:10 : Any questions?

24:35 : Is it just a peculiarity of binary as a number system, that you can skip encoding the leading digit?

24:35 : Is it just a peculiarity of binary as a number system, that you can skip encoding the leading digit?

27:28 : Note the binary and decimal representations of floating point numbers in the IEEE 754 standard

^{4}27:28 : Note the binary and decimal representations of floating point numbers in the IEEE 754 standard

^{4}^{4}

27:51 : Constants definitions in handmade_numerics.h

📖

27:51 : Constants definitions in handmade_numerics.h

📖

27:51 : Constants definitions in handmade_numerics.h

📖

30:22 : Constants definitions in C's float.h

^{5}as compared with those in handmade_numerics.h, with a special mention of machine epsilon📖

30:22 : Constants definitions in C's float.h

^{5}as compared with those in handmade_numerics.h, with a special mention of machine epsilon📖

^{5}as compared with those in handmade_numerics.h, with a special mention of machine epsilon

📖

33:32 : Describe the IEEEBinary32 union, ieee754_number_category enum and the special-case number functions

📖

33:32 : Describe the IEEEBinary32 union, ieee754_number_category enum and the special-case number functions

📖

📖

36:48 : Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()

📖

36:48 : Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()

📖

36:48 : Describe Real32_Abs(), Real32_SetSign() and CategoryOfReal32()

📖

39:07 : Describe ExtractExponent() as similar to the CRT's frexp()

^{6}with an example of its use in a sqrt() function📖

39:07 : Describe ExtractExponent() as similar to the CRT's frexp()

^{6}with an example of its use in a sqrt() function📖

^{6}with an example of its use in a sqrt() function

📖

47:36 : When multiplying a subnormal number by a power of two, does the floating point unit first shift the numbers into the normal range before incrementing the exponent?

47:36 : When multiplying a subnormal number by a power of two, does the floating point unit first shift the numbers into the normal range before incrementing the exponent?

50:38 : Describe ScaleByExponent()

📖

50:38 : Describe ScaleByExponent()

📖

50:38 : Describe ScaleByExponent()

📖

53:58 : Note the differing range of absolute values of the mantissa in text books (as used in handmade_numerics.h) and the CRT's frexp()

^{7}📖

53:58 : Note the differing range of absolute values of the mantissa in text books (as used in handmade_numerics.h) and the CRT's frexp()

^{7}📖

^{7}

📖

57:01 : Quote James H. Wilkinson on the state of computer arithmetic in 1971

57:01 : Quote James H. Wilkinson on the state of computer arithmetic in 1971

57:01 : Quote James H. Wilkinson on the state of computer arithmetic in 1971

58:30 : Describe SlowDivision(), with emphasis on the sheer amount of specification compliance it contains

^{8}📖

58:30 : Describe SlowDivision(), with emphasis on the sheer amount of specification compliance it contains

^{8}📖

^{8}

📖

1:06:29 : Walk through an example of SlowDivision(), noting why it uses 11 bits of precision in its application of Horner's rule

^{9}(IA-32's RCPSS instruction^{10})📖

1:06:29 : Walk through an example of SlowDivision(), noting why it uses 11 bits of precision in its application of Horner's rule

^{9}(IA-32's RCPSS instruction^{10})📖

^{9}(IA-32's RCPSS instruction

^{10})

📖

1:14:13 : How SlowDivision() finishes up its computation to the highest precision it can

📖

1:14:13 : How SlowDivision() finishes up its computation to the highest precision it can

📖

1:14:13 : How SlowDivision() finishes up its computation to the highest precision it can

📖

1:16:54 : Note the difference between SlowDivision() and how our FPU performs division

📖

1:16:54 : Note the difference between SlowDivision() and how our FPU performs division

📖

1:16:54 : Note the difference between SlowDivision() and how our FPU performs division

📖

1:19:13 : Calculating those polynomial approximations from SlowDivision(), with range illustrations in Mathematica

📖

1:19:13 : Calculating those polynomial approximations from SlowDivision(), with range illustrations in Mathematica

📖

📖

1:25:12 : Relative vs absolute error

📖

1:25:12 : Relative vs absolute error

📖

1:25:12 : Relative vs absolute error

📖

1:26:43 : Plot the error function in Mathematica, with a mention of Chebyshev's Equioscillation theorem

^{11}and Chebyshev nodes^{12}📖

1:26:43 : Plot the error function in Mathematica, with a mention of Chebyshev's Equioscillation theorem

^{11}and Chebyshev nodes^{12}📖

^{11}and Chebyshev nodes

^{12}

📖

1:31:35 : Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica

📖

1:31:35 : Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica

📖

1:31:35 : Plot the eighth order Chebyshev polynomial in the range -1 to 1 in Mathematica

📖

1:33:47 : Using the Remez exchange algorithm

^{13}to find approximations to Chebyshev's set of polynomials📖

1:33:47 : Using the Remez exchange algorithm

^{13}to find approximations to Chebyshev's set of polynomials📖

^{13}to find approximations to Chebyshev's set of polynomials

📖

1:38:21 : On the initial guesses in the Remez exchange algorithm

1:38:21 : On the initial guesses in the Remez exchange algorithm

1:38:21 : On the initial guesses in the Remez exchange algorithm

1:39:15 : Plot the ninth order Chebyshev polynomial, for comparison with the eighth order, to explain extrema

📖

1:39:15 : Plot the ninth order Chebyshev polynomial, for comparison with the eighth order, to explain extrema

📖

📖

1:40:20 : Struggle with the communication link

💢

🗩

1:40:20 : Struggle with the communication link

💢

🗩

1:40:20 : Struggle with the communication link

💢

🗩

1:41:49 : As the Remez exchange algorithm proceeds, what values does it use as its new guesses?

1:41:49 : As the Remez exchange algorithm proceeds, what values does it use as its new guesses?

1:41:49 : As the Remez exchange algorithm proceeds, what values does it use as its new guesses?

1:42:24 : Searching for the extremum, perhaps using golden-section search,

^{14}as the Remez exchange algorithm proceeds📖

1:42:24 : Searching for the extremum, perhaps using golden-section search,

^{14}as the Remez exchange algorithm proceeds📖

^{14}as the Remez exchange algorithm proceeds

📖

1:46:50 : Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π

📖

1:46:50 : Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π

📖

1:46:50 : Plot sine in Mathematica, and introduce the computation of sine in the range 0 to 2π

📖

1:56:57 : Describe SinCos_TableVersion()

📖

1:56:57 : Describe SinCos_TableVersion()

📖

1:56:57 : Describe SinCos_TableVersion()

📖

1:58:31 : Deriving trigonometric identities

1:58:31 : Deriving trigonometric identities

1:58:31 : Deriving trigonometric identities

2:02:28 : Calculating cosine around "a", branch-free

2:02:28 : Calculating cosine around "a", branch-free

2:02:28 : Calculating cosine around "a", branch-free

2:04:00 : Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines

📖

2:04:00 : Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines

📖

2:04:00 : Point out the experimental SinCos_QuadrantVersion() for SIMD sines and cosines

📖

2:05:54 : Describe the XSinCosX table look up

📖

2:05:54 : Describe the XSinCosX table look up

📖

2:05:54 : Describe the XSinCosX table look up

📖

2:07:56 : Counting has always started at zero

^{15}📖

2:07:56 : Counting has always started at zero

^{15}📖

2:07:56 : Counting has always started at zero

^{15}📖

2:08:47 : Continued description of the XSinCosX table look up

📖

2:08:47 : Continued description of the XSinCosX table look up

📖

2:08:47 : Continued description of the XSinCosX table look up

📖

2:10:38 : Run calculate_sincos_tables and explain the result for .5

🏃

2:10:38 : Run calculate_sincos_tables and explain the result for .5

🏃

2:10:38 : Run calculate_sincos_tables and explain the result for .5

🏃

2:12:10 : Describe how FindSinCosAround() searches adjacent floating point numbers

📖

2:12:10 : Describe how FindSinCosAround() searches adjacent floating point numbers

📖

2:12:10 : Describe how FindSinCosAround() searches adjacent floating point numbers

📖

2:15:55 : Could you explain why part of the table version looks up into XSinCosX while the polynomial part is the same no matter where you are in the table?

2:15:55 : Could you explain why part of the table version looks up into XSinCosX while the polynomial part is the same no matter where you are in the table?

2:17:10 : Further explain the table lookups of cos(a) and sin(a) and the sin(e) and cos(e) polynomial approximations

📖

2:17:10 : Further explain the table lookups of cos(a) and sin(a) and the sin(e) and cos(e) polynomial approximations

📖

📖

2:19:46 : Since table lookups are hard in SIMD, what sort of stuff would you end up doing if you couldn't use a table?

2:19:46 : Since table lookups are hard in SIMD, what sort of stuff would you end up doing if you couldn't use a table?

2:20:09 : Describe the experimental SinCos_QuadrantVersion()

📖

2:20:09 : Describe the experimental SinCos_QuadrantVersion()

📖

2:20:09 : Describe the experimental SinCos_QuadrantVersion()

📖

2:29:32 : Describe ATan() and ATan2(), noting the current use of atan2 in Handmade Hero

📖

2:29:32 : Describe ATan() and ATan2(), noting the current use of atan2 in Handmade Hero

📖

2:29:32 : Describe ATan() and ATan2(), noting the current use of atan2 in Handmade Hero

📖

2:35:01 : Determine to remove atan2 from Handmade Hero with thanks to Andrew

2:35:01 : Determine to remove atan2 from Handmade Hero with thanks to Andrew

2:35:01 : Determine to remove atan2 from Handmade Hero with thanks to Andrew

2:35:31 : Q&A

2:35:31 : Q&A

2:35:31 : Q&A

2:43:17 : Note the educational nature of this sine and cosine implementation, with a mention of Cody-Waite reduction

2:43:17 : Note the educational nature of this sine and cosine implementation, with a mention of Cody-Waite reduction

2:46:51 : SSE denormal flushing

2:46:51 : SSE denormal flushing

2:46:51 : SSE denormal flushing

2:50:34 : Re-emphasise the slowness of SlowDivision()

📖

2:50:34 : Re-emphasise the slowness of SlowDivision()

📖

2:50:34 : Re-emphasise the slowness of SlowDivision()

📖

2:56:22 : Quote the Intel 64 and IA-32 Architectures Optimization Reference Manual:

^{17}"Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"📖

2:56:22 : Quote the Intel 64 and IA-32 Architectures Optimization Reference Manual:

^{17}"Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"📖

^{17}"Although x87 supports transcendental instructions, software library implementation of transcendental function can be faster in many cases"

📖

2:57:04 : Thank you to Andrew for walking us through sine and cosine, with closing thoughts on numerical approximations

2:57:04 : Thank you to Andrew for walking us through sine and cosine, with closing thoughts on numerical approximations

⏬

Next: 'Never, Ever Update Your Development Tools. Ever.'

⏬