yuqi-zheng

Compiler Optimization: Loop-Invariant Code Motion (LICM)


Loop-Invariant Code Motion (LICM) is one of the foundational optimizations a compiler performs. The idea is straightforward: if a computation inside a loop produces the same result on every iteration —because it does not depend on the loop variable and has no side effects —the compiler can safely move it outside the loop and compute it just once.

This article is part of the Advent of Compiler Optimisations 2025 series by Matt Godbolt.

What Makes Code “Loop-Invariant”?

A piece of code is loop-invariant if it satisfies all three of the following conditions:

  • It does not depend on the loop induction variable or any value that changes across iterations.
  • It has no observable side effects.
  • It produces the same result every time it is evaluated.

When these conditions hold, executing the code once before the loop is semantically equivalent to executing it on every iteration —and far cheaper.

The Classic Case: vec.size()

Consider a simple indexed loop over a std::vector:

for (size_t i = 0; i < vec.size(); ++i) {
    // process vec[i]
}

The call to vec.size() is made on every iteration, but the size of the vector does not change. The compiler recognizes this, computes vec.size() once (typically as the difference between the end and begin pointers, shifted right by log2(sizeof(int))), and uses that cached result for the loop bound. The generated assembly contains a single size calculation before the loop, not one per iteration.

A More Interesting Case: Hoisting a Function Call

LICM becomes more interesting when the loop-invariant computation is a function call rather than a simple expression. Consider this function that counts characters within a given range:

std::pair<char, char> get_range(RangeType type);

int count_in_range(std::string_view s, RangeType type) {
    int num = 0;
    for (char c : s) {
        auto [min, max] = get_range(type);
        if (c >= min && c <= max) num++;
    }
    return num;
}

The call to get_range(type) is inside the loop, but type does not change across iterations and get_range has no side effects. A smart compiler should hoist this call out of the loop.

Clang at -O1

Clang performs the hoist correctly. It emits a single call to get_range before the loop begins, extracts the min and max values into registers, then runs a tight inner loop that uses only register comparisons —no function calls, no memory accesses for the range values.

The compiler is able to do this because it can see the function body (or because the function is annotated with [[gnu::pure]]), which proves that the result depends only on the arguments and produces no observable side effects.

GCC’s Limitation

As of this writing, GCC does not perform this hoist. It calls get_range on every iteration, even when the function is annotated with [[gnu::const]]. The root cause appears to be insufficient support for Common Subexpression Elimination (CSE) on functions that return a struct type such as std::pair. This prevents the LICM analysis from firing. The issue has been filed as GCC Bug #122226.

This is a concrete example of where two production compilers differ in their optimization capability for identical, idiomatic C++ code.

Branchless Counting Inside the Loop

Once Clang has hoisted the range call, the inner loop itself demonstrates another neat trick: branchless conditional counting. Instead of an if statement that would introduce a conditional branch, the compiler uses flag-setting instructions:

cmp ecx, edi     ; is c >= min?
setle r8b        ; r8b = 1 if true, 0 otherwise
cmp edx, edi     ; is c <= max?
setge dil        ; dil = 1 if true, 0 otherwise
and dil, r8b     ; both conditions must hold
movzx edi, dil
add rax, rdi     ; accumulate 0 or 1

The setle and setge instructions write a 0 or 1 based on the CPU flags without any branch. The and combines both conditions, and the result is added to the running total. This eliminates branch mispredictions entirely in the inner loop.

Helping the Compiler Hoist Function Calls

If you have a pure function that the compiler cannot see through (for example, it is defined in a separate translation unit and not available for inlining), you can hint to the compiler using GCC/Clang attributes:

  • [[gnu::pure]]: declares that the function reads global memory but has no side effects. The result may vary across calls if global state changes, but within a loop where no relevant globals are modified, LICM can apply.
  • [[gnu::const]]: a stronger declaration that the function depends only on its arguments and reads no global state. Use this only when it is genuinely true.

These attributes allow the compiler to reason about function calls across loop iterations in the same way it reasons about arithmetic expressions.

Takeaways

Do not manually hoist function calls for performance. Code like this is harder to read and maintain:

auto [min, max] = get_range(type);  // manual hoist
for (char c : s) {
    if (c >= min && c <= max) num++;
}

Write the natural, intent-revealing version. Let the compiler decide. In the Clang case, it already does the right thing. In the GCC case, the bug should eventually be fixed.

Verify with Compiler Explorer. When you care about whether a particular hoist has fired, paste your code into Compiler Explorer and check the assembly. Do not assume either way.

Make functions visible. The compiler can only hoist calls it can prove are pure. Functions defined in separate translation units without LTO are opaque —the compiler must assume they have arbitrary side effects and cannot hoist them. Keeping hot utility functions in headers (or using LTO) gives the optimizer the visibility it needs.