There are a few things to note here.
First, StretchDiBits is not a "normal" way to render anything anymore - it's an old, legacy path, so it's entirely possible that your system just isn't going to let anyone call it at a high enough throughput to get 60fps or whatnot. The reason we are using it at the moment is because I wanted to show the simplest possible way to get raw memory displayed on the screen, so people could see the connection. In the future, even when running the software rendering path, we will probably go through OpenGL to display to the screen, since that is the tested path nowadays (ie., we will simply submit a texture that is the game output and draw it to the screen as a giant quad).
Second, it does sound slightly odd that if all you were doing is calling StretchDiBits as fast as you could that you would only hit 37fps. Are you sure you're not calling Sleep() or some other functions that would indicate to the operating system that you didn't need the full amount of processor time to which you were entitled?
- Casey