When are data structures with bad locality worth it?

I'm working on a music typesetting program. I figure whatever strategy is appropriate for storing the text in a text editor is also appropriate for storing the objects on a staff, and conventional wisdom seems to be that text editors should use ropes. My concern about that is, wouldn't one expect a rope to have data locality approximately as bad as a linked list? I've seen the claim that on modern hardware, linked lists are almost always slower than arrays even when insertion and deletion at arbitrary positions is necessary because the cache misses entailed by linked lists are so killer.

My understanding is that with a rope, nodes that correspond to adjacent objects can start out adjacent in memory with a suitable allocation scheme, but as insertions and deletions are made, they will become increasingly scrambled. The performance of iteration will degrade as this happens, but not indefinitely, because if it reaches the point where every step in the iteration is a cache miss, there is no way for it to get worse. With an array, on the other hand, insertions and deletions will get worse forever as the array grows. So I guess the question is, how large does the array have to be before the insertions and deletions are worse than rope iteration?

How should one go about deciding? The best I've thought of so far is to start by implementing the program using ropes, create a document and do enough edits to generate a worst-case-scenario scrambled rope situation, then judge whether it's still "fast enough." If it is, ropes are a safe choice, so might as well use them. If it isn't, reimplement the program using arrays, create a document and, I guess...keep adding to the document until it gets slow, and decide whether users are likely to want to make documents that large? It's harder to decide what the right metric is for judging the array case, because unlike with ropes, there is no worst case scenario - it can always get worse.

Edited by AlexKindel on Reason: Initial post
Modern computers are very fast. You could simply store all text in one contiguous buffer and just move data around when you need to insert/delete items. As long as we are talking about item count in thousands or tens of thousands, everything will be fast.

But there is one very simple modifications for contiguous buffer that can reduce copying around - gap buffer. It is super simple to implement, it is "data oriented" memory-friendly, and works well. https://routley.io/posts/gap-buffer/
Are you pretty confident that the limit where ropes eventually surpass the performance of contiguous buffers is so far out that it isn't worth worrying about, then?
that depends, I certainly ran into that limit once when trying to de-minify a big javascript file in an editor. (this was nearly 10 years ago though)

Each time I inserted a newline near the start of the file it hung for quite some time. While it didn't hang when I inserted the newline near the end.

I doubt that alloc+memcpy was the only thing that was going on though. But it was very telling that insertions at the start were worse performing that insertions at the end.
Since I want to allow not only arbitrarily many objects on a staff, but also arbitrarily many staves, as far as I can tell there would be no clean way to avoid situations where a buffer runs out of room and has to be reallocated somewhere else, causing fragmentation, if I were to use a pure contiguous buffer for each staff. Instead, I'm considering storing each staff's objects in a linked list of (large) fixed-size buffers. Removing an element would entail shifting the elements of that buffer to fill the gap, but leaving any subsequent buffers untouched. Inserting an element would in turn entail shifting the elements of only that buffer to make room, and if there was no empty space at the end of the buffer to accommodate the last element, inserting a new buffer into the list for it. Non-local memory accesses from jumping from one buffer to the next wouldn't happen very often because the buffers would be large - probably ideally more than large enough to contain as many objects as are likely to fit onscreen at one time - but since they would be boundedly large, the size of the memory block that needs to be copied during an insertion or deletion would also be bounded, unlike the monolithic buffer case. Just how large the fixed-size buffers would be could be tuned to a good compromise.

I'm not sure exactly how to handle indexing. It will of course be necessary for each buffer to store the count of currently occupied slots. Summing these element counts would suffice to calculate the location of the nth element of the buffer chain, but having to visit the first several buffers to read their counts every time the data structure is indexed would partially defeat the purpose of trying for locality. However, since consecutive object manipulations tend to be within a screen width of each other rather than being randomly distributed, maybe allowing indexing relative to an arbitrary buffer rather than always relative to the first one in the list would work.

Edited by AlexKindel on
This reminds me of this small talk:



Basically word has a single insertion buffer that gets manipulated as you make edits. When you change edit points the data between the old and new edit points get shifted.
My understanding is that this Word structure is about how to associate objects with each other in a persistent way. That is something I also need to be able to do, but what I've described so far is only about where the objects go in memory in the first place. I wonder what Word's approach is to memory layout.

As it is, I believe the way I associate objects is similar to the system Word uses minus the clever part about making it relative to the most recent edit points: alongside the objects array, I have an object_indices array, whose elements stand for indices into the objects array. Each object, in turn, has a field, called its address, that specifies what the index of the object_indices element that points at it is. When an object is removed, its address is added to a free list. When an object is added, if there are any addresses on the free list, the most recent one is popped, and the object's index is added to object_indices at that address. Otherwise, the object's index is added to the end of object_indices. For both insertions and deletions, it's necessary to iterate through the objects starting at the edit point, chase each one's address field into the object_indices array, and increment or decrement the index there to be consistent with the object's new position. Given this system, though an object's index in the objects array can change over its lifetime, its address never changes, so anyone who wants to track an object does so by storing the latter.

Storing objects in a list of fixed-size buffers as I described before, the number of indices it would be necessary to increment per edit would again be bounded by the buffer size, but using the Word approach would in most cases be even better. What I'm wondering now, though, is if there is some adaptation of the Word approach that solves the data layout problem I was trying to solve before just as well as it solves the object association problem.

Edited by AlexKindel on