QueryPerformanceCounter and RDTSC
?
?

Keyboard Navigation

Global Keys

[, < / ], > Jump to previous / next episode
W, K, P / S, J, N Jump to previous / next marker
t / T Toggle theatre / SUPERtheatre mode
z Toggle filter mode V Revert filter to original state

Menu toggling

q Quotes r References f Filter c Credits

In-Menu Movement

a
w
s
d
h j k l


Quotes and References Menus

Enter Jump to timecode

Quotes, References and Credits Menus

o Open URL (in new tab)

Filter Menu

x, Space Toggle category and focus next
X, ShiftSpace Toggle category and focus previous
v Invert topics / media as per focus

Credits Menu

Enter Open URL (in new tab)
1:19The Intel Architecture Reference Manual
1:19The Intel Architecture Reference Manual
1:19The Intel Architecture Reference Manual
3:49RDTSC- Read Time-Stamp Counter, measures clock cycles
3:49RDTSC- Read Time-Stamp Counter, measures clock cycles
3:49RDTSC- Read Time-Stamp Counter, measures clock cycles
7:40QueryPerformanceCounter(), measures wall clock time
7:40QueryPerformanceCounter(), measures wall clock time
7:40QueryPerformanceCounter(), measures wall clock time
15:30Timing our frames
15:30Timing our frames
15:30Timing our frames
17:02Union types (LARGE_INTEGER)
17:02Union types (LARGE_INTEGER)
17:02Union types (LARGE_INTEGER)
21:48Method of determining time elapsed between frames
21:48Method of determining time elapsed between frames
21:48Method of determining time elapsed between frames
23:19QueryPerformanceFrequency()
23:19QueryPerformanceFrequency()
23:19QueryPerformanceFrequency()
26:32Using dimensional analysis to convert between unit types
26:32Using dimensional analysis to convert between unit types
26:32Using dimensional analysis to convert between unit types
31:13Converting seconds/frame to ms/frame
31:13Converting seconds/frame to ms/frame
31:13Converting seconds/frame to ms/frame
32:36Printing out MSPerFrame
32:36Printing out MSPerFrame
32:36Printing out MSPerFrame
38:10Finding Frames Per Second using dimensional analysis (cause we can)
38:10Finding Frames Per Second using dimensional analysis (cause we can)
38:10Finding Frames Per Second using dimensional analysis (cause we can)
45:44The dangers of wsprintf()
45:44The dangers of wsprintf()
45:44The dangers of wsprintf()
50:10Using RDTSC to find cycles per frame
50:10Using RDTSC to find cycles per frame
50:10Using RDTSC to find cycles per frame
56:43Use wsprintf() to print our timings as floats
56:43Use wsprintf() to print our timings as floats
56:43Use wsprintf() to print our timings as floats
59:18A bit more about wsprintf()
59:18A bit more about wsprintf()
59:18A bit more about wsprintf()
1:02:05Final Thoughts
1:02:05Final Thoughts
1:02:05Final Thoughts
1:04:32Q&A
🗩
1:04:32Q&A
🗩
1:04:32Q&A
🗩
1:04:46RDTSC returns unsigned int
1:04:46RDTSC returns unsigned int
1:04:46RDTSC returns unsigned int
1:05:28Explanation of how C handles division based on type
1:05:28Explanation of how C handles division based on type
1:05:28Explanation of how C handles division based on type
1:07:34How compiler optimizations affect the execution time
1:07:34How compiler optimizations affect the execution time
1:07:34How compiler optimizations affect the execution time
1:09:58Do the divide in doubles.
1:09:58Do the divide in doubles.
1:09:58Do the divide in doubles.
1:13:26It would be nice to have a roadmap...Would you consider doing a 24 hour stream?
1:13:26It would be nice to have a roadmap...Would you consider doing a 24 hour stream?
1:13:26It would be nice to have a roadmap...Would you consider doing a 24 hour stream?
1:14:30Will we be able to use a Playstation 2 controller?
1:14:30Will we be able to use a Playstation 2 controller?
1:14:30Will we be able to use a Playstation 2 controller?
1:14:53Can we expect more Jeff and Casey shows in the future?
1:14:53Can we expect more Jeff and Casey shows in the future?
1:14:53Can we expect more Jeff and Casey shows in the future?
1:15:04Is it safe to use 64-bit variables and functions on a 32-bit PC?
1:15:04Is it safe to use 64-bit variables and functions on a 32-bit PC?
1:15:04Is it safe to use 64-bit variables and functions on a 32-bit PC?
1:16:17Even though we are doing low level programming, sometimes we have to pass things to Windows, which makes it difficult to follow exactly what the computer is doing. Will this be the same when we go to other platforms?
1:16:17Even though we are doing low level programming, sometimes we have to pass things to Windows, which makes it difficult to follow exactly what the computer is doing. Will this be the same when we go to other platforms?
1:16:17Even though we are doing low level programming, sometimes we have to pass things to Windows, which makes it difficult to follow exactly what the computer is doing. Will this be the same when we go to other platforms?
1:18:45Do you use a profiler or mostly hand-coded timing calls?
1:18:45Do you use a profiler or mostly hand-coded timing calls?
1:18:45Do you use a profiler or mostly hand-coded timing calls?
1:18:57Why do you use PascalCase for everything?
1:18:57Why do you use PascalCase for everything?
1:18:57Why do you use PascalCase for everything?
1:19:04Is it a violation to set one member of a union and then read from another?
1:19:04Is it a violation to set one member of a union and then read from another?
1:19:04Is it a violation to set one member of a union and then read from another?
1:19:15mvandevander Was wondering when we would get around to using RawInput to handle DualShock 4 natively
🗪
1:19:15mvandevander Was wondering when we would get around to using RawInput to handle DualShock 4 natively
🗪
1:19:15mvandevander Was wondering when we would get around to using RawInput to handle DualShock 4 natively
🗪
1:20:47tom_forsyth Modern CPUs RDTSC returns nominal clocks not real clocks.
🗪
1:20:47tom_forsyth Modern CPUs RDTSC returns nominal clocks not real clocks.
🗪
1:20:47tom_forsyth Modern CPUs RDTSC returns nominal clocks not real clocks.
🗪
1:22:14Why are you avoiding doubles in your code?
1:22:14Why are you avoiding doubles in your code?
1:22:14Why are you avoiding doubles in your code?
1:22:50Are you opposed to ever using high level languages in making games?
1:22:50Are you opposed to ever using high level languages in making games?
1:22:50Are you opposed to ever using high level languages in making games?
1:23:23If you wanted to lock the FPS at a particular number, would you just sleep or do something more complex?
1:23:23If you wanted to lock the FPS at a particular number, would you just sleep or do something more complex?
1:23:23If you wanted to lock the FPS at a particular number, would you just sleep or do something more complex?
1:24:55Isn't RDTSC affected by variable processor technologies in modern processors like SpeedStep?
1:24:55Isn't RDTSC affected by variable processor technologies in modern processors like SpeedStep?
1:24:55Isn't RDTSC affected by variable processor technologies in modern processors like SpeedStep?
1:26:54Have you done Ludum Dare?
1:26:54Have you done Ludum Dare?
1:26:54Have you done Ludum Dare?
1:27:10Can we do a bonus stream on ASM?
1:27:10Can we do a bonus stream on ASM?
1:27:10Can we do a bonus stream on ASM?
1:28:51If you read the latency tables for SSE2 float vs double, you'll see that double isn't that much slower than float...
1:28:51If you read the latency tables for SSE2 float vs double, you'll see that double isn't that much slower than float...
1:28:51If you read the latency tables for SSE2 float vs double, you'll see that double isn't that much slower than float...
1:37:18What are your thoughts on Swift?
1:37:18What are your thoughts on Swift?
1:37:18What are your thoughts on Swift?
1:37:23What do you think of Jonathan Blow's programming language?
1:37:23What do you think of Jonathan Blow's programming language?
1:37:23What do you think of Jonathan Blow's programming language?
1:37:50Do you have any discussions with Jonathan Blow on his compiler and what sort of features you would like to see in it?
1:37:50Do you have any discussions with Jonathan Blow on his compiler and what sort of features you would like to see in it?
1:37:50Do you have any discussions with Jonathan Blow on his compiler and what sort of features you would like to see in it?
1:38:59Do you ever write functions like printf() that take variable arguments?
1:38:59Do you ever write functions like printf() that take variable arguments?
1:38:59Do you ever write functions like printf() that take variable arguments?
1:39:07Would you consider using a templated type-safe version of printf(), even though you hate templates?
1:39:07Would you consider using a templated type-safe version of printf(), even though you hate templates?
1:39:07Would you consider using a templated type-safe version of printf(), even though you hate templates?
1:39:28MULPD is only half as fast if you do millions of operations...
1:39:28MULPD is only half as fast if you do millions of operations...
1:39:28MULPD is only half as fast if you do millions of operations...
1:40:03What low level language would you suggest to someone new to programming?
1:40:03What low level language would you suggest to someone new to programming?
1:40:03What low level language would you suggest to someone new to programming?
1:40:13Are we using the Win32 API but compiling for 64-bits? Do we need to compile for 32-bits for Windows XP support? It looks like we have int64 and real64 in the cpp
1:40:13Are we using the Win32 API but compiling for 64-bits? Do we need to compile for 32-bits for Windows XP support? It looks like we have int64 and real64 in the cpp
1:40:13Are we using the Win32 API but compiling for 64-bits? Do we need to compile for 32-bits for Windows XP support? It looks like we have int64 and real64 in the cpp
1:40:50In game development, do you follow enterprise design patterns, or do you have some different design patterns?
1:40:50In game development, do you follow enterprise design patterns, or do you have some different design patterns?
1:40:50In game development, do you follow enterprise design patterns, or do you have some different design patterns?
1:41:09What do you use for collections if you do not use templates?
1:41:09What do you use for collections if you do not use templates?
1:41:09What do you use for collections if you do not use templates?
1:41:18I didn't get the outome of the MULPS/MULPD compare. The latency packing was the same, how does that end up being double time?
1:41:18I didn't get the outome of the MULPS/MULPD compare. The latency packing was the same, how does that end up being double time?
1:41:18I didn't get the outome of the MULPS/MULPD compare. The latency packing was the same, how does that end up being double time?
1:44:04RE MULPS/MULPD: If you're writing SIMD code then you care, otherwise you don't.
1:44:04RE MULPS/MULPD: If you're writing SIMD code then you care, otherwise you don't.
1:44:04RE MULPS/MULPD: If you're writing SIMD code then you care, otherwise you don't.
1:45:58Comment about compiler auto-vectorization
1:45:58Comment about compiler auto-vectorization
1:45:58Comment about compiler auto-vectorization
1:47:43Tom Forsyth's rant on double precision
1:47:43Tom Forsyth's rant on double precision
1:47:43Tom Forsyth's rant on double precision
1:48:35Will we take MSVC all the way to shipping or will we use LLVM even on Windows?
1:48:35Will we take MSVC all the way to shipping or will we use LLVM even on Windows?
1:48:35Will we take MSVC all the way to shipping or will we use LLVM even on Windows?
1:50:03What's an intrinsic?
1:50:03What's an intrinsic?
1:50:03What's an intrinsic?
1:55:23Switch back to debug build from optimized
1:55:23Switch back to debug build from optimized
1:55:23Switch back to debug build from optimized
1:56:03Isn't it a pain to work in Windows, especially as a programmer?
1:56:03Isn't it a pain to work in Windows, especially as a programmer?
1:56:03Isn't it a pain to work in Windows, especially as a programmer?
1:57:05Didn't we want to start adding warnings?
1:57:05Didn't we want to start adding warnings?
1:57:05Didn't we want to start adding warnings?

QueryPerformanceCounter and RDTSC

Today we look at some techniques to get basic timing information from your running game. Timing, like everything, is more complicated than it first appears.

A Couple of ideas of time:

  • Wall clock time - time as it passes in the real world. Measured in seconds.
  • Processor time - how many cycles? this is related to wall clock time by processor frequency, but for a long time now frequency varies a lot and quickly.

Wall Clock Time

The Windows platform attempts to provide us with some tools for high precision timing, but as it is a complicated topic, there are some gotchas.

QueryPerformanceFrequency() returns a LARGE_INTEGER number of counts/sec. It's guaranteed to be stable, so you can get away with just calling it once at startup. QueryPerformanceCounter() returns a LARGE_INTEGER number of counts.

So, dividing counter/freq will give you a number of seconds since some unknown time in the past. More useful would be (counter - last_counter)/freq. This will allow us to get an elapsed time since some known point in the past. However, almost anything we want to time should be less than a second, and since this is an integer divide, anything between 1 and 0 seconds will return 0. Not super useful. So, we instead multiply the elapsed counts by 1000 to get our formula to get to elapsed milliseconds.

elapsedMs = (1000*(counter - last_counter)) / freq
        

To get instantaneous frames per second, we can just divide without changing to milliseconds:

fps = freq / (counter - last_counter)
        

Important ideas:

  • To time a frame, only query the timer once per frame, otherwise your timer will leave out time between last frame's end and this frame's start.

Processor Time

Every x86 family proccessor has a Timestamp Counter (TSC), which increments with every clock cycle since it was reset. RDTSC is a processor intruction that reads the TSC into general purpose registers.

For processors before Sandy Bridge but after dynamic clocking, RDTSC gave us actual clocks, but it was difficult to correlate to wall time because of the variable frequency. Since Sandy Bridge, they give us "nominal" clocks, which is to say the number of clocks elapsed at the chip's nominal frequency. These should correlate closely to wall clock time, but make tracking the "number of cycles" notion of processor time more difficult.

RDTSC is usually exposed in a compiler intrinsic. Check the docs for your compiler.

Resources:

Other topics

Casey had to cover a couple of new corners of C in order to work with the techniques above.

Union types

Union types are a C feature that let you superimpose a number of different layouts over the same chunk of memory. For example LARGE_INTEGER, the return type of the QueryPerf calls. I can treat it as an int64 by accessing its QuadPart, or as two int32s via HighPart and LowPart.

Compiler Intrinsics

An intrinsic is a compiler-specific extension that allows direct invocation of some processor instruction. They generally need to be extensions to the compiler so they can avoid all the expensive niceties compilers have to afford functions.