The rdtsc instruction is available on any x64 processor.
Assuming you want to program something, you should use QueryPerformanceCounter and QueryPerformanceFrequency for timing in games as there is no "straight forward" way of converting rdtsc values to seconds.
There's no need to inject anything. You simply use __rdtsc() intrinsic in your code. MSVC provides in <intrin.h> header.
But as mrmixer said - you should use QPC for getting real time measurements. RDTSC is useful only for low-level benchmarking. Otherwise you'll need some extra support code to map it to real time, which is just not worth the effort unless you're writing some kind of profiler.
That article you posted describes some pathological case, where game uses QPC too much. No game should have need to call QPC four hundred thousand times a second. RDTSC won't save you there - it will also have a limitation. I don't how much, but if you call it many million times per second, the game will also be slow. Instead be more intelligent with your code and call QPC just a few times per frame to measure time, and no need to to thousands of times per frame.