CPU usage StretchDIBits vs glTexImage2D

I've got a project that follows the first ~100 days of HMH fairly closely, and I'm experimenting with displaying our offscreen buffer via OpenGL, purely to get a more stable framerate.

It all works correctly, and having the swap interval at 1 gives me my stable vsync'd framerate.

However, the CPU usage seems vastly different. When using the original software-only approach and displaying via StretchDIBits(), cpu usage hovers around 3%-4% (we're talking literally a handful of filled rectangles drawn into the image).

When generating the buffer the same way, but displaying it via a glTexImage2D call every frame, the cpu usage sits at around 35%.

The act of displaying the buffer takes around the same amount of time (between 0.4 and 0.7ms) regardless of which method is used, so I can't account for why the CPU is apparently so much busier. There's still a good 10ms of spare time in each frame, it's just that the opengl way, it's inside SwapBuffers(), and in software it's inside calls to Sleep() (again, as per the early HMH code).

I get that it's transferring the entire image as a texture to the GPU once per frame, but why would that be so much more resource hungry than whatever StretchDIBits was doing previously?

I've made sure that I'm not doing anything else unnecessary in the opengl path, and I've isolated the call to glTexImage2D as the only thing that ramps up the cpu usage.

Any ideas about the discrepancy?