One algorithm that is slightly neat is the Stable In-place Merge-Sort using binary search and rotations.
The core algorithm is the same, but then we do the merging in-place by yet another recursive divide-and-conquer algorithm.
Given the two halves H0 and H1, take the mid-point of H0 and perform binary search using that element as key in H1. Everything in H1 below the located index should therefore come before the H0 mid-point. Then perform a rotation of the upper half of H0 and the lower part of H1 which can be done in O(n) time (one way to do it which is not the best is to reverse the two blocks, and then reverse the whole).
Now we have two new halfs which are sorted relative to each other but not internally so what we do now is merge these two smaller halves by 2 recursive calls. Stop when at the base case.
The complexity of this algorithm in total is O(n*(log n)^2) since the merge has O(n*log n) complexity and due to the master theorem.
I once tested this in Java (of all languages) and it was quite a bit slower than using an auxiliary buffer. It can be mitigated if we use some extra memory, like 1% of the original array and switch to it once we come down deep enough in the recursion. Then this algorithm performed at something like 1.5x time compared to the one with extra memory but only using 1% extra space. Disclaimer: The test was very crude and it might be possible to do much better to reduce the negative performance characteristics of this more complicated algorithm.