Consistent FPS or ms/frame in Mac OS X.

Since the beginning of Handmade Hero, I've been modifying my platform code little by little to achieve the same features as the canonical Windows Casey version. But, one important thing I can't get is consistent fps.

The game update and render loop can average 40 000 cycles and once every few seconds take up to 3X the amount. I know the timing isn't perfectly deterministic, but Casey's version prints 0.033ms/frame very consistently.

I first started with a timer that fires every 1/30 or 1/60 second, but it fluctuates a lot. I thought, maybe the timer isn't accurate, but that wouldn't explain the varying cycles in the game code which only measures the time spend from the beginning to the end of the game update and render loop.

Now that we learned how to start a different thread, I tried running the game on a different thread and as in Windows, do an "infinite" while loop and adjust the sleep if the game runs too fast than the target ms/frame, but the results are not better, although different...

What's your experience with this? Is this harder to get right on OS X? This seems a trivial issue, so I can't see what I'd be doing wrong.

Note: I run the game at only 640-by-360, to make sure my 2008 laptop should be able to hit a consistent fps.

Edited by elle on
Does Windows use double buffering when calling BitBlt? I'm asking because if I try to run the game loop with anything other than a timer on OS X, the screen flickers as if it's blitting to the screen in the middle of "rendering". It's so annoying to not know exactly what the API does, it makes it hard to debug.

Is there a good reference to an implementation of a game loop without a timer on OS X?
Well, two things: first, Windows does defer drawing now, since Aero forces everything to go through a compositor AFAIK. However, even if it didn't, there would be no "flicker" for a blit to the frontbuffer these days. It'd just be tearing, since it wouldn't be synced to the vertical refresh.

Second, regarding OS X, I would assume that it also always goes through a compositor, so flickering sounds like a strange thing to be seeing. I would suspect that something else might be wrong as well?

- Casey
I think what I meant was tearing, and not flickering, sorry. The results I'm seeing are really odd. When I try to set up a regular game loop with a target seconds/frame of 0.033, there's a lot of tearing and missing frames (I print a message in the console if the dt > target). Yet, the amount of cycles in the game update and render loop is nearly always under 100 000. If I increase the target seconds/frame to something very high like 0.2. There's no tearing or missing frames. It's a mystery to me why the timings are so inconsistent and nonsensical.

There's got to be something crucial that I don't get.

What's also weird, is that the tearing happens even when the frame doesn't change. I know how to solve (or rather, minimize) this: by drawing in a separate buffer and right before returning from the gameUpdateAndRender function to the platform layer, copying the pixels from the secondary buffer to the buffer that's passed in the gameUpdateAndRender function as an argument. This buffer is then used to blit to the screen. But it doesn't solve the odd timings, and I don't think I should have to do this...

Edited by elle on
The question is how get pixels to screen? What API are you using for that?
During the gameUpdateAndRender function I modify the passed in buffer with raw pixel data. In the function drawRect, I create a CGImage from the buffer with raw pixel data, and call CGContextDrawImage to draw the image to the current graphics context.

I'm more concerned about finding out why the timings are so inconsistent, though. The only possible explanation I can think of is some sort of energy savings... Or I do something wrong, which I would prefer, because then I can hopefully find out what.
I am not a Mac OS X platform expert, but at least in Windows, there are different types of timers, and they have varying degrees of accuracy. For example, you could not reliably run a game loop off of WM_TIMER messages.

So, I can't really offer much in the way of advice because it's been years since I've done any from-scratch programming on Mac OS X, but I would say the first thing to verify is that you are actually using a system service which guarantees low-latency precision timing. Typically, windowing system timers aren't unless they're specifically labeled as such.

- Casey
Are you using CVDisplayLinkRef to handle your timer? This syncs with your screen's refresh rate.
The timer I'm using is NSTimer. But this is really a bad option, because if the timer can't fire due to the previous function that's still running, it won't fire immediately after returning from the function, but only at the next interval.

I've looked at CVDisplayLinkRef before, but that's only possible when using OpenGL afaik, and it doesn't help with the timing of the gameUpdate function, I think. It only makes sure the framebuffer flips at the vertical blanking interval, I THINK.

Anyway, thank you everyone for reading my ramblings and suggesting options to try. If I can't solve it, I guess I'll see a possible solution at the end of Handmade Hero. ;)
CVDisplayLink can use OpenGL. I just have it setup it draw the software rasterized image so the amount of actual OpenGL code is extremely small. It does not require OpenGL though, you can set it up a couple of ways though.

CVDisplayLink is a timer that is tied to your screen refresh rate (typically 60Hz), but since you use time deltas in the update code, it doesn't matter if it's a 120Hz screen or a variable refresh rate screen.
Ok. Anyway, I still want to be able to have the gameUpdateAndRender function not be necessarily tied to the screen refresh rate. Similar to how Casey does it in Windows. My main problem is that the timing I'm seeing in the console is not very precise or accurate, I don't know.

I'm seeing that every ~10 frames, the amount of cycles spend in gameUpdateAndRender can be up to 3x during 1 frame, the mystery of this is really what puzzles me the most. Is it possible that this is caused by reference counting? I don't think so, because I thought it should be deterministic as opposed to garbage collection, but I'm not a professional. Furthermore, 3x seems really high and it would make me sad if it's possible that the 3x cycles count every once in a while is caused by reference counting.

Edited by elle on
I don't know why you are seeing that, but referencing counting won't be the problem (I'm not even sure what ObjC types you'd really even have to be referenced counted). I'm seeing stable timings of 1/60s every frame.
Ok, thank you. I'll investigate further. If I find the cause, I'll post it here.

Small off topic question: Now that Swift has support for SIMD, I tried to use it, but __m128i and all the corresponding functions are not declared in my project... I thought it was something I did wrong, but then I found out that __m128i is not supported on ARM, so maybe that's why, and OS X has to wait for proper support. :(

If this is the case: is there a way to reinterpret an integer vector to a float vector without the __m128i type and its functions "manually"?
OSX has proper SSE intrinsic support.
What you meant probably is that iOS doesn't support SSE. iOS works on ARM which has NEON intrinsics.

What do you mean reinterpret and integer vector to float vector? Just a cast?
1
2
float fvec[4];
int* ivec = (int*)fvec;
I meant SIMD support in Swift specifically. I already implemented the functions in C before, but wanted to change it.

The problem is that __m128i is not defined in Swift for some reason (I think because it's not needed for ARM and that's what Apple only cares about.)

Can I just cast it instead of converting it? Or is that semantically different? I guess it wouldn't be enough, though, because there's no equivalent float4 function for every int4 function.