Compiler Optimization: Partial Inlining
Inlining is one of the most impactful optimizations a compiler can apply. Replacing a function call with the function body eliminates call overhead, enables constant propagation, and exposes the inlined code to further optimization in context. But inlining has a cost: if a function is large and is called from many sites, inlining it everywhere causes code size to grow substantially, increasing instruction cache pressure.
The traditional choice is binary: inline or do not inline. Partial inlining is a more nuanced approach that the compiler may take when a function has a recognizable structure —a small, common case and a larger, rare case. The compiler inlines only the small common path, and calls a separately compiled version for the rare path.
This article is part of the Advent of Compiler Optimisations 2025 series by Matt Godbolt.
The Structure That Enables Partial Inlining
Partial inlining works most effectively when a function has a clear fast path and slow path:
int process(unsigned int value) {
if (value <= 100) {
return value * 2; // fast path: simple, frequent
}
return expensive_computation(value); // slow path: complex, infrequent
}
int compute(unsigned int a, unsigned int b) {
return process(a) + process(b);
}
The fast path is a single comparison and a multiply. The slow path calls another function. In practice, value <= 100 holds most of the time. The slow path is rarely taken.
What the Compiler Does
The compiler performs two steps.
Step 1: Function outlining. The compiler splits process into two parts:
- The original
processfunction keeps only the fast path (the comparison andvalue * 2computation). - A new generated function, commonly named something like
process.part.0, contains the slow path.
Step 2: Partial inlining into the caller. At the call sites in compute, the compiler inlines the fast path directly:
compute:
cmp edi, 99 ; is a <= 100?
jbe .L_fast_a ; if so, take fast path
call process.part.0 ; slow path: call the outlined function
.L_fast_a:
lea r8d, [rdi+rdi] ; fast path: a * 2 (inlined)
; ... same pattern for b ...
The fast path executes entirely within compute without any function call overhead. The slow path still calls a function, but that function contains all the complexity that would otherwise inflate the inlined code at every call site.
The Benefits
Inlining benefits for the common case. The fast path has no call instruction, no return, and no function prologue/epilogue overhead. More importantly, the compiler can see both a and the fast-path computation together, enabling constant propagation and further simplification if the value of a is known.
Code size control. The expensive slow path exists only once in the binary, in process.part.0. No matter how many times process is called throughout the program, the slow path code is not duplicated.
External linkage preserved. The original process function still exists and is callable from other translation units. The optimization is transparent to external callers.
Conditions for This Optimization to Fire
Partial inlining depends on several factors:
The function must have a recognizable hot/cold structure. A function that is uniformly expensive throughout is not a candidate —there is no “small part” to inline.
The slow path must be large enough. If the entire function is small, full inlining is cheaper than partial inlining. The benefit of partial inlining comes from avoiding code size growth at call sites where the slow path is never taken.
Compiler support varies. GCC implements partial inlining and applies it in cases like this example. Clang’s inliner may handle the same function differently —it might fully inline, not inline, or produce a different split. The behavior depends on the specific code and compiler version.
Heuristics govern the decision. The compiler estimates the size and frequency of each path using static heuristics or, if available, profile data. Profile-Guided Optimization (PGO) data can significantly improve the accuracy of these estimates and make partial inlining more effective.
Interaction with [[likely]] and [[unlikely]]
In C++20, you can annotate branches with [[likely]] and [[unlikely]] to inform the compiler about expected branch frequencies:
int process(unsigned int value) {
if (value <= 100) [[likely]] {
return value * 2;
}
return expensive_computation(value);
}
This annotation does not directly control partial inlining, but it influences code layout and can affect the compiler’s cost estimates for inlining decisions.
Writing Code That Helps
The most important thing is to structure functions with a clear separation between the common, simple case and the rare, complex case. This is good practice regardless of partial inlining —it makes code easier to read and reason about, and it is the structure that enables the compiler to apply this optimization.
Avoid deeply nested logic where the fast and slow paths are interleaved. A clear early-return or a top-level branch is easier for the compiler to analyze.
Use Compiler Explorer to verify that partial inlining has fired for your function. Look for generated symbols with names like function.part.N —their presence indicates that the compiler has outlined the slow path. The call sites should show the fast path inlined and a call to the outlined version for the slow branch.