QueryPerformanceCounter and RDTSC
Today we look at some techniques to get basic timing information from your running game. Timing, like everything, is more complicated than it first appears.
A Couple of ideas of time:
- Wall clock time - time as it passes in the real world. Measured in seconds.
- Processor time - how many cycles? this is related to wall clock time by processor frequency, but for a long time now frequency varies a lot and quickly.
Wall Clock Time
The Windows platform attempts to provide us with some tools for high precision timing, but as it is a complicated topic, there are some gotchas.
QueryPerformanceFrequency() returns a
LARGE_INTEGER number of counts/sec. It's guaranteed to be stable, so you can get away with just calling it once at
a LARGE_INTEGER number of counts.
So, dividing counter/freq will give you a number of seconds since some unknown time in the past. More useful would be (counter - last_counter)/freq. This will allow us to get an elapsed time since some known point in the past. However, almost anything we want to time should be less than a second, and since this is an integer divide, anything between 1 and 0 seconds will return 0. Not super useful. So, we instead multiply the elapsed counts by 1000 to get our formula to get to elapsed milliseconds.
elapsedMs = (1000*(counter - last_counter)) / freq
To get instantaneous frames per second, we can just divide without changing to milliseconds:
fps = freq / (counter - last_counter)
- To time a frame, only query the timer once per frame, otherwise your timer will leave out time between last frame's end and this frame's start.
Every x86 family proccessor has a Timestamp Counter (TSC), which increments with every clock cycle since it was reset. RDTSC is a processor intruction that reads the TSC into general purpose registers.
For processors before Sandy Bridge but after dynamic clocking, RDTSC gave us actual clocks, but it was difficult to correlate to wall time because of the variable frequency. Since Sandy Bridge, they give us "nominal" clocks, which is to say the number of clocks elapsed at the chip's nominal frequency. These should correlate closely to wall clock time, but make tracking the "number of cycles" notion of processor time more difficult.
RDTSC is usually exposed in a compiler intrinsic. Check the docs for your compiler.
Casey had to cover a couple of new corners of C in order to work with the techniques above.
Union types are a C feature that let you superimpose a number of different layouts over the same chunk of memory. For example LARGE_INTEGER, the return type of the QueryPerf calls. I can treat it as an int64 by accessing its QuadPart, or as two int32s via HighPart and LowPart.
An intrinsic is a compiler-specific extension that allows direct invocation of some processor instruction. They generally need to be extensions to the compiler so they can avoid all the expensive niceties compilers have to afford functions.