yuqi-zheng

Compiler Optimization: Clang's Algebraic Simplifications


Compilers are not just translators. They are reasoning systems that apply mathematical knowledge to transform code into more efficient equivalents. One of the most striking demonstrations of this is Clang’s ability to recognize a simple summation loop and replace it with the Gaussian sum formula —eliminating the loop entirely.

This article is part of the Advent of Compiler Optimisations 2025 series by Matt Godbolt.

The Example

Consider a function that sums all integers from 0 to value - 1:

int sum_up_to(int value) {
    int result = 0;
    for (int x = 0; x < value; ++x) {
        result += x;
    }
    return result;
}

This is O(n). The loop runs value times and performs one addition per iteration. For large values of value, this can take significant time.

GCC: Smart, But Still a Loop

GCC applies several optimizations. It adds a bounds check to handle the case where value is zero or negative. It uses lea instructions for efficient addition. Most interestingly, it pairs up iterations —processing two values of x per loop iteration using the identity x + (x+1) = 2x + 1:

.L3:
    lea edx, [rdx + 1 + rax * 2]  ; result += 2x + 1 (i.e., x + (x+1))
    add eax, 2                    ; x += 2
    cmp edi, eax
    jne .L3

This halves the number of iterations. It is a meaningful optimization —but the algorithm is still O(n). The loop remains.

Clang: No Loop at All

Clang at -O2 generates no loop whatsoever. The entire function compiles to a handful of arithmetic instructions:

lea   eax, [rdi - 1]     ; eax = value - 1
lea   ecx, [rdi - 2]     ; ecx = value - 2
imul  rcx, rax           ; rcx = (value - 1) * (value - 2)
shr   rcx                ; rcx /= 2
lea   eax, [rdi + rcx]   ; eax = value + rcx
dec   eax                ; eax -= 1
ret

This computes:

result = value + ((value - 1) * (value - 2) / 2) - 1

Let’s verify this algebraically. Expanding and simplifying:

  value + ((value - 1)(value - 2) / 2) - 1
= value + (value虏 - 3路value + 2) / 2 - 1
= (2路value + value虏 - 3路value + 2 - 2) / 2
= (value虏 - value) / 2
= value 路 (value - 1) / 2

This is exactly the Gaussian sum formula: the sum of integers from 0 to n-1 is n(n-1)/2. Clang has recognized the loop pattern and replaced it with the closed-form expression. The algorithm went from O(n) to O(1).

How Clang Does This

Clang’s optimizer includes a pass called scalar evolution (SCEV) analysis, which tracks how values change across loop iterations. In this loop:

  • x increases by 1 on each iteration (linear induction variable)
  • result accumulates x at each step (sum of a linear sequence)
  • The sum of consecutive integers 0 through n-1 is a well-known closed form

SCEV identifies the accumulation pattern and the loop bounds, then applies the algebraic identity to replace the loop with the formula. This is not a special case hardcoded for Gaussian sums —it is a general capability of the induction variable analysis and the algebraic simplification framework.

The optimization only fires when the conditions are clean: a simple loop with a non-negative bound, no side effects, a straightforward accumulation. More complex loops with conditionals or early exits will not trigger this transformation.

Why Not Just Write the Formula?

The question arises: if the formula is known, why write the loop?

Several reasons to prefer the loop in source code:

Clarity of intent. The loop is immediately obvious to any reader: sum from 0 to value - 1. The formula value * (value - 1) / 2 requires the reader to recognize what it computes. For more complex accumulations, the formula may not even be recognizable.

Generality. The loop structure makes it easy to add conditions, skip certain values, or change the accumulation operation. The formula must be re-derived for each variation.

Correctness by construction. The loop is hard to get wrong. The formula has edge cases around integer overflow and correct handling of value == 0 that require care.

Compiler trust. As this example demonstrates, the compiler can derive the formula from the loop. The developer does not need to do this manually.

The Limits of This Optimization

Not every loop with an accumulation gets converted to a closed form. Clang applies this when:

  • The loop bounds are known or can be expressed in terms of the loop variable
  • The accumulated value follows a polynomial pattern over iterations
  • There are no function calls, memory writes, or other side effects in the loop body

When these conditions are not met, Clang falls back to a conventional loop with the usual optimizations applied.

Takeaways

This example illustrates the depth of optimization that modern compilers apply. Clang’s optimizer does not merely simplify individual instructions —it performs high-level mathematical reasoning about entire loops. An O(n) algorithm becomes O(1) through algebraic analysis, without any change to the source code.

Write clear, straightforward code. Enable optimizations. Verify the result in Compiler Explorer when it matters. The compiler frequently knows things about your code that are not obvious from reading it.