I made a comment about doing a merge sort from the ground up in Yesterday's Q&A and thought I would put the code here in a more readable format:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Entry* Source = First; Entry* Out for(u32 RangeSize = 1; RangeSize < Count; RangeSize *= 2) { for(u32 StartIndex = 0, MidIndex = RangeSize; MidIndex < Count; StartIndex += RangeSize*2, MidIndex += RangeSize*2) { entry* Start = Source + StartIndex; entry* Mid = Source + MidIndex; entry* End = Min(Mid+RangeSize, First+Count); entry* Dest = Out + StartIndex; //Note(ratchet): this is the exact same merge code as Casey's // hoisted out for laziness reasons Merge(Start, Mid, End, Dest); } // Note(ratchet): We may have skipped the last interval u32 RemainingEntries = Count % (RangeSize * 2); if(RemainingEntries < RangeSize) { for(u32 EntryIndex = Count - RemainingEntries ; EntryIndex < Count; ++Start) { Out[EntryIndex] = Source[EntryIndex]; } } entry* Tmp = Out; Out = Source; Source = Tmp; } //Note(ratchet): we could have gone through the outer loop an odd number of times // so we may need to copy from the temporary buffer back to the input buffer if(First != Source) { for(u32 EntryIndex = 0; EntryIndex < Count; ++Start) { First[EntryIndex] = Source[EntryIndex]; } } 

First and foremost is the removal of the (non-tail-optimizable) recursion in favor of the explicit loop. Mergesort is one of those algorithms that look very elegant when expressed in recursive format but has a more efficient form in the iterative format (kinda like fibonacci but with only the O(log n) memory cost + function call overhead).

Second is that ping-ponging the buffers is much easier in this form.

Third is that you can now easily add a pre-pass to sort chunks of size n with a fast insertion sort and start the outer loop at that size n instead of 1. Finding which START_RANGE_SIZE to use and which SimpleSort is for during profiling.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 for(u32 StartIndex = 0; StartIndex < Count; StartIndex += START_RANGE_SIZE) { Entry* Start = Source + StartIndex; Entry* End = Min(Start + RangeSize, First+Count); //Note(ratchet): this is the fast sort of your choice optimized for ranges of // small sizes SimpleSort(Start, End); } for(u32 RangeSize = START_RANGE_SIZE; RangeSize < Count; RangeSize *= 2) { //...