In this case inlining is not the problem in a sense that it "replicates" function body. Compiler optimizer still should optimize replicated code.
Look at this example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | inline int inc(int x)
{
return x + 1;
}
...
int a = ...;
a = inc(a);
a = inc(a);
a = inc(a);
a = inc(a);
a = inc(a);
a = inc(a);
|
when inlining inc function you don't expect compiler just to replicate function body and stop:
| int a = ...;
a = a + 1;
a = a + 1;
a = a + 1;
a = a + 1;
a = a + 1;
a = a + 1;
|
You expect compiler to optimize inlined code:
So inlining doesn't increase code size in this example. It actually reduces. Because adding constant 6 to variable integer is much shorter machine code than calling function 6 times.
And that is what all the modern compilers I am aware of (msvc, clang, gcc) will do in this example.
As for why it happened in HH - its because MSVC is not very good at optimizing. Simple as that. I compiled yesterdays code with clang and look what it produced:
Approximately same stuff what MSVC produces after Casey manually inlined functions. So clang with function calls (yesterdays code) generates more or less same code (regarding performance) as MSVC after manual inlining (todays code).
Why clang is so much better and MSVC isn't? I don't know. I know that a lot of people are working on clang optimizer so it can optimize these kind of situations as much as possible. For more information on this topic see "Zero-Cost Abstractions and Future Directions for Modern Optimizing Compilers" presentation from Chandler Carruth:
* slides -
http://llvm.org/devmtg/2012-11/Carruth-OptimizingAbstractions.pdf
* video -
http://llvm.org/devmtg/2012-11/vi...arruth-OptimizingAbstractions.mp4
In general case of course you don't want compiler to inline large functions too much. Because of limited size of CPU instruction cache. And compilers know that.