Busy-wait in LoadBitmap() -> thread starvation

I'm bringing my PS4 Handmade Hero port up to speed with the last few weeks' changes, and ran into a problem where none of the bitmaps were loading at runtime. I traced it to a consequence of the busy-wait added on Days 166-167 in LoadBitmap(). Quick recap: if two or more worker threads try to simultaneously load the same bitmap with Immediate=1, only one thread actually loads the bitmap; the others busy-wait until Asset->State is no longer AssetState_Queued.

What seems to be happening in my case is that the thread that's actually loading the asset from disk is continuously preempted by the other threads, all of which are just busy-waiting for the first thread to finish. It's kind of like priority inversion, except that all the threads in each queue have the same priority.

Anyway, this is most likely a problem with the OS's thread scheduler, which I'll be mentioning to our kernel team. For Handmade Hero purposes, though, I was able to work around the problem by inserting an intrinsic (our equivalent of __yield or _mm_pause) inside the busy-wait loop, which forces a context switch to another thread of equal or lower priority. It's still not a perfect solution -- two __yield()ing threads could potentially ping-pong back and forth while a third thread starves. But it's better than nothing.

Casey: would you consider adding something like a portable yield() equivalent to the official code? I know you never claimed the multi-threaded asset loading system was anything more than "probably theoretically working" and would need further testing down the line, but this would certainly help in the meantime. Besides, busy-waiting isn't a great idea in general, for a variety of compelling reasons.

Thanks!
- cort
Well, I'd rather think it through a bit more, because I'm not sure exactly what we want to do here. Remember, this is actually something that never happens in LoadBitmap() for the regular asset path, it only happens for the background composition path, which is loading far fewer bitmaps and presumably only hits this case very rarely once the game is actually spun up.

What would probably be the best thing would be for us to record the task that is actually working on loading the bitmap, and then have a platform yield call that says "I am blocked on this task", so that way we don't really need "yield", we can just properly slave the task to the master task that it actually needs to have completed...

That actually seems pretty easy to do, although there is a bit of fussiness in the whole "what if the task completes during the call that slaves to the task" and so on, but presumably there would be some way for us to get around that.

- Casey
Hey, thanks for the quick reply!

Remember, this is actually something that never happens in LoadBitmap() for the regular asset path, it only happens for the background composition path, which is loading far fewer bitmaps and presumably only hits this case very rarely once the game is actually spun up.
The background composition path was hitting this case 100% of the time for me; I couldn't run the game at all until I fixed it. But that's more a consequence of the way I chose to map worker threads to specific CPU cores, plus some potentially questionable behavior in the OS's thread scheduler. In general, you're right; it shouldn't be a major concern.

What would probably be the best thing would be for us to record the task that is actually working on loading the bitmap, and then have a platform yield call that says "I am blocked on this task", so that way we don't really need "yield", we can just properly slave the task to the master task that it actually needs to have completed...
Yeah, the yield intrinsic is just a quick band-aid; a real synchronization primitive would most likely be the right answer here. That's a bit of a rabbit hole, though, and not worth derailing the stream for (especially to solve a problem that doesn't even occur on the official target platform). I've avoided modifying any code outside the platform layer so far, but I'll make an exception here and just leave the yield() in my PS4 branch until we come back to bullet-proof the asset loader.

~(

Edited by Cort on