__rdtsc, QueryPerformanceCounter, and Quadpart

I was curious about a couple things.

MSDN strongly discourages the use of the __rdtsc() intrinsic, I believe because it behaves unpredictably on different processors (and on older versions of Windows such as XP,) and therefore leads to code that is not highly portable.

From http://msdn.microsoft.com/en-us/l...ws/desktop/dn553408(v=vs.85).aspx

We strongly discourage using the RDTSC or RDTSCP processor instruction to directly query the TSC because you won't get reliable results on some versions of Windows, across live migrations of virtual machines, and on hardware systems without invariant or tightly synchronized TSCs. Instead, we encourage you to use QPC to leverage the abstraction, consistency, and portability that it offers.

You can use coreinfo.exe from Sysinternals to check whether your CPU uses an invariant TSC or not.

You've also got Raymond Chen telling us that QueryPerformanceCounter acts differently on different systems based on the hardware:

http://blogs.msdn.com/b/oldnewthing/archive/2005/09/02/459952.aspx

Secondly, I have a question about the LARGE_INTEGER union. With highly portable code in mind, do we need to put logic in our code that says something like "if on a 32-bit computer, use .HighPart and .LowPart, otherwise use .Quadpart"? Or will the .QuadPart member always work on both 32-bit and 64-bit machines?
__rdtsc is very important and you definitely want to use it _for profiling_. What MSDN is talking about is _shipping_ code that uses __rdtsc to an end user, which is _not_ something you want to do, because you don't know what processor they might be using.

QueryPerformanceCounter, on the other hand, is actually pretty reliable as a timer you can use on an end-user system, so it's always what you use for actual clocking in the shipping code.

So, just to summarize: __rdtsc for profiling on a single machine (comparing __rdtsc's on subsequent runs), QueryPerformanceCounter for getting the wall clock time in shipping code.

As for QuadPart, it works fine on 32-bit. The compilers nowadays are smart enough to emit 32-bit code that handles the 64-bit subtract.

- Casey
@ryanries: __rdtsc works reliably on SandyBridge and up (also on newer AMD cpus). See here: https://randomascii.wordpress.com.../rdtsc-in-the-age-of-sandybridge/ CPUID has special bit for this - if it is set that means rdtsc doesn't change frequency.

As long as your dev machine have this CPU using rdtsc for profiling is fine. For real shipping code as Casey says, you don't want to use it.
What's the point in using different function on dev machine if both perform the same function? Is QueryPerformanceCounter noticably slower, because it is a windows API call and might not use ___rdtsc?

Doesn't __rdtsc include times from other threads or other cores, as desribed in:

Game Timing and Multicore Processors:


Discontinuous values. Using RDTSC directly assumes that the thread is always running on the same processor. Multiprocessor and dual-core systems do not guarantee synchronization of their cycle counters between cores. This is exacerbated when combined with modern power management technologies that idle and restore various cores at different times, which results in the cores typically being out of synchronization. For an application, this generally results in glitches or in potential crashes as the thread jumps between the processors and gets timing values that result in large deltas, negative deltas, or halted timing.

Matra
cmuratori
__rdtsc is very important and you definitely want to use it _for profiling_. What MSDN is talking about is _shipping_ code that uses __rdtsc to an end user, which is _not_ something you want to do, because you don't know what processor they might be using.

- Casey


Do you know of any instances in actual games where the use of rdtsc has caused any problems? And on what machine was it? And in what usage case?

QPC uses RDTSC inside.
Thanks for the info, everyone.

Just for everyone else's amusement and information, here is another MSDN article about rdtsc and its use specifically in games:

http://msdn.microsoft.com/en-us/l...ws/desktop/ee417693(v=vs.85).aspx

Kladdehelvete wrote:
QPC uses RDTSC inside.

From what I understand, QPC will use RDTSC *if* Windows had already determined ahead of time that the current platform had an invariant TSC, otherwise, it will fall back to using some other timer with less resolution but still pretty reliable.

For instance, using coreinfo.exe from Sysinternals, I can clearly see that the processor on one of my larger machines has an invariant TSC, however, one of the virtual machines running on that same hardware does not. So, running your software in a virtual machine might produce some surprises if you use a lot of timing-sensitive code and calls to rdtsc.
ryanries
Thanks for the info, everyone.
From what I understand, QPC will use RDTSC *if* Windows had already determined ahead of time that the current platform had an invariant TSC, otherwise, it will fall back to using some other timer with less resolution but still pretty reliable.

For instance, using coreinfo.exe from Sysinternals, I can clearly see that the processor on one of my larger machines has an invariant TSC, however, one of the virtual machines running on that same hardware does not. So, running your software in a virtual machine might produce some surprises if you use a lot of timing-sensitive code and calls to rdtsc.


Thanks. Do you know of an API that can measure the time taken by other devices? Such as a HW call, to the Graphicscard or other hardware? I guess that RDTSC (or QPC) will not measure the time taken by external devices, but only the CPU? Does the GPU have such a timer as well?
Resurrection of the thread, I hope is ok, otherwise I create another one.

Am confused about the LARGE_INTERGER one gets with the QueryPerformanceFrequency, not sure if Casey answered it during the streams.


Wouldn't this value be related to the CPU speed? I get the value of 2942968, for a ~3GHz CPU, is there any relation really or this is a coincidence.

There is a brief mention in this thread
QueryPerformanceFrequency

QueryPerformanceFrequency returned 3579545. 3.579545 MHz is a frequency used by NTSC, so it would not be surprising to see it as QPC's frequency. A frequency of 3579545 means that on your computer, there are 3579545 QPC "ticks" every second


Does the WIN32 API contain another function to retrieve the CPU speed directly?


Edited by itzjac on
You cannot rely on meaning of exact value of it. Only meaning is that it goes to frequency value in one second (returned by QueryPerformanceFrequency).

If Windows only solution is ok then you can use WMI query or read it from registry or CallNtPowerInformation function with ProcessorInfo argument).

Note that it can be pretty tricky to get precise CPU speed information because you can have multiple different speed CPUs in system each with multiple cores running at different frequencies.