I made a comment about doing a merge sort from the ground up in Yesterday's Q&A and thought I would put the code here in a more readable format:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 | Entry* Source = First;
Entry* Out
for(u32 RangeSize = 1;
RangeSize < Count;
RangeSize *= 2)
{
for(u32 StartIndex = 0, MidIndex = RangeSize;
MidIndex < Count;
StartIndex += RangeSize*2, MidIndex += RangeSize*2)
{
entry* Start = Source + StartIndex;
entry* Mid = Source + MidIndex;
entry* End = Min(Mid+RangeSize, First+Count);
entry* Dest = Out + StartIndex;
//Note(ratchet): this is the exact same merge code as Casey's
// hoisted out for laziness reasons
Merge(Start, Mid, End, Dest);
}
// Note(ratchet): We may have skipped the last interval
u32 RemainingEntries = Count % (RangeSize * 2);
if(RemainingEntries < RangeSize)
{
for(u32 EntryIndex = Count - RemainingEntries ;
EntryIndex < Count;
++Start)
{
Out[EntryIndex] = Source[EntryIndex];
}
}
entry* Tmp = Out;
Out = Source;
Source = Tmp;
}
//Note(ratchet): we could have gone through the outer loop an odd number of times
// so we may need to copy from the temporary buffer back to the input buffer
if(First != Source)
{
for(u32 EntryIndex = 0;
EntryIndex < Count;
++Start)
{
First[EntryIndex] = Source[EntryIndex];
}
}
|
This has a few advantages:
First and foremost is the removal of the (non-tail-optimizable) recursion in favor of the explicit loop. Mergesort is one of those algorithms that look very elegant when expressed in recursive format but has a more efficient form in the iterative format (kinda like fibonacci but with only the O(log n) memory cost + function call overhead).
Second is that ping-ponging the buffers is much easier in this form.
Third is that you can now easily add a pre-pass to sort chunks of size n with a fast insertion sort and start the outer loop at that size n instead of 1. Finding which START_RANGE_SIZE to use and which SimpleSort is for during profiling.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | for(u32 StartIndex = 0;
StartIndex < Count;
StartIndex += START_RANGE_SIZE)
{
Entry* Start = Source + StartIndex;
Entry* End = Min(Start + RangeSize, First+Count);
//Note(ratchet): this is the fast sort of your choice optimized for ranges of
// small sizes
SimpleSort(Start, End);
}
for(u32 RangeSize = START_RANGE_SIZE;
RangeSize < Count;
RangeSize *= 2)
{
//...
|