rdtsc inconsistencies?

Hi, I was wondering if anyone could help me understand this.

I'm using __rdtsc to profile my game code but the results seem to jump around a lot. I thought it was my code doing things I wasn't expecting but I just tried of some simple test code that just loops and does some simple arithmetic and I'm getting similar results.

It's quite consistent for chunks of frames.


But then it will change drastically for chunks of time.


Is this expected?
I also saw some inconsistencies with my CPU (even negative timespans B) ). What helped me was to call SetThreadAffinityMask (MSDN) on all the threads (even the main thread) so they wouldn't jump to another core (which has different TSC registers).

Edited by Marc Costa on
There are a few oddities in TSC that can depend on your CPU architecture and also whether you have multiple CPUs.

What CPU do you have (in particular, is it a Nehalem, Westmere, Sandy Bridge, or newer)? Do you happen to have two CPUs (this is fairly rare in desktops)?

The other less hardware-oriented issues - simply how often other processes are getting scheduled and what they're doing (everything from just not giving you time on the CPU to polluting CPU caches), and also whether your rdtsc is getting out-of-order executed by the CPU differently for some reason (for example, with a hyperthreaded CPU, when a particular slot is available).

Using rdtscp (or similar) to fence the tsc read is probably the first step.
My cpu is Haswell i7-5930k.
Thanks for the suggestions, I'll try some of these things and see what different results I get.
Ok so I used SetThreadAffinityMask to limit my thread to only one core and now the results of rdtsc are very stable so all my variability was due to the OS moving my thread around I guess. It's actually quite a bit faster when limited to one core as well which is interesting.

I'm guessing it would a bad idea to do this in shipping software cause maybe the core you choose is already in heavy use?

Should be useful for profiling when optimizing code though.

Thanks for your help!
Yes, your concerns for shipping code are valid.

You could first get which thread is running on and set affinity mask to keep thread on that same core. But I really don't know if that is a good idea. You should really prefer QPC over rdtsc for timing measurements in code that gets to user machine. Keep rdtsc only for dev profiling or similar use.
In my experience, thread affinity masking is pretty safe to ship in game code. YMMV, of course... but generally speaking you don't do this to regularize rdtsc timings, you do it when you have an idea about how you want to distribute work.

For example, if you know you have two heavy worker threads that do most of the work for the game, you might well affinity mask them so that they prefer separate cores, and won't end up swapping cores (and hence losing all the cache coherency) accidentally when they are interrupted by the OS.

These days, though, I don't know how necessary that actually is, as I would suspect Windows scheduler is better these days about preferring to keep heavy threads on the same cores (famous last words?)

- Casey