I think the "InterlockedCompareExchange" call in the "AtomicCompareExchangeUInt32" function has it's second and third arguments reversed.
msdn link
This causes the state to never get updated.
This obscures the bug Kim mentioned, where the task can fail to be acquired by "BeginTaskWithMemory" in "LoadBitmap", but the state is not returned to AssetState_Unloaded", so it is never retried.
Adding an else onto the "if(Task)" that returns the state to "AssetState_Unloaded" seems to fix it.
I have attached a diff against after day 135, it includes the fix plus all the crossplatform stuff for anyone who wants it.
Also, for the compiler memory barrier, there is "__sync_synchronize()" that I think works with gcc/clang, but I can't find out if it is just a compiler barrier or a cpu barrier as well.