Day 165 - MoveHeaderToFront Race Conditions Remain

The lock taken to move assets to the front of the LRU list protects only that particular node of the list. This is insufficient to prevent race conditions.

Given a list:
1
SENTINEL <-> A <-> B <-> C

To remove item B, I don't just modify B, I also modify A and C, to redirect their links. So a race exists between two threads that try to move consecutive items to the front at the same time.

Furthermore, a race exists between two threads that try to move *any* item, because moving to the front of the list modifies the sentinel node.

To properly protect the list from race conditions, ALL modifications to the list need to be serialized. The easiest solution would most likely be to lock the sentinel node itself.
I'm glad someone else caught that, I was really surprised that nobody in the Q&A seemed to have noticed that MoveHeaderToFront inherently is a function that needs to be mutex'd since it always operates on the sentinel and adjacent list entries as well as the header you're trying to move. So if two of those run concurrently you will end up with several broken scenarios, like two headers sitting in the same spot in the list, but the list only knowing about one of them, or knowing about one of them in the forward direction, but knowing about the other one in the backward direction. Then there's also the possibility of the list knowing about both of them in one direction but only one of them in the other.

An interesting thing about this bug though is that it doesn't really seem to permanently break things ever, since even if a header gets its links in the list overwritten by another header. The game still references to the header directly whenever it gets used and the header itself still links to a spot in the list. It might totally mess up the order of list entries whenever a deserted entry gets moved back to the front but it would never cause any crashes or obviously wrong behaviour, since it will generally still evict assets that haven't been used for a while it just won't evict some of them ever until they get used again and reinserted into the list.
Daverball
I was really surprised that nobody in the Q&A seemed to have noticed that MoveHeaderToFront inherently is a function that needs to be mutex'd since it always operates on the sentinel and adjacent list entries as well as the header you're trying to move

I think we were all just very sleepy :P

- Casey
Could we solve this problem by just using a singly linked list for the headers?

We could lock the node we are trying to remove and the next node after it. Then we would just have to memcpy the next node into the node being removed.

Adding the header back into the list would just be an interlockedcompareexchange then because you just need to swap one pointer value.

Edited by Vik on
Maybe the "buffering operations on the list" idea Casey had somewhere in the middle of the brainstorming wasn't that bad: That way operations on the ordered list would be sort of asynchronous, and only at some particular times one of the threads would lock the list and drain the buffer putting everything in order.

About the original bug, one possible simple solution could be to mark assets as used on the current frame with just a frame counter. That way you won't evict an asset you just requested for this frame. You can still evict assets that you are going to need later if they are requested after the eviction, but you'd just reload them when requested , which shouldn't be too bad if we agree that it should be a rare event.