yuqi-zheng

Compiler Optimization: How Pointer Aliasing Kills Performance


Two pointers alias when they refer to the same memory location. In C and C++, the compiler frequently cannot prove that two pointers do not alias —and when it cannot prove this, it must assume the worst. The result is generated code that reads and writes memory far more often than necessary, defeating optimizations that would otherwise be straightforward.

A Concrete Example

Consider a generic accumulator that sums a span of integers into a member variable:

template<typename T>
struct Counter {
    T total = 0;
    void count(std::span<const int> data) {
        for (int x : data)
            total += x;
    }
};

The behavior looks obvious: accumulate all values into total, then you are done. But whether the compiler can generate efficient code depends entirely on the type T.

When T is int

Look at the assembly for the inner loop:

mov eax, DWORD PTR [rdi]     ; load total from memory
add eax, DWORD PTR [rsi]     ; add current element
mov DWORD PTR [rdi], eax     ; store total back to memory

Every iteration loads total from memory, adds to it, and stores it back. There is no register caching of total across iterations.

The reason is C++‘s strict aliasing rule. The span elements are int, and total is also int. Because two pointers of the same type are permitted to alias under C++, the compiler must consider the possibility that some element in data is actually total itself —that data[3] and total occupy the same address. If that were true, loading and storing total every iteration would be necessary to observe the correct value when it is read through the span.

When T is long

Change T to long and the assembly changes dramatically:

mov rax, QWORD PTR [rdi]     ; load total once, before the loop
.loop:
    movsx rdx, DWORD PTR [rsi]
    add rax, rdx             ; accumulate in a register
    add rsi, 4
    cmp rcx, rsi
    jne .loop
mov QWORD PTR [rdi], rax     ; store total once, after the loop

total is loaded once before the loop and stored once after. All accumulation happens in a register. This is the code you would write if you were optimizing by hand.

The difference: long and int are distinct types. C++‘s strict aliasing rules prohibit aliasing between different fundamental types (with a few specific exceptions). The compiler can therefore guarantee that no element in the int span overlaps with the long total, and it is safe to keep total in a register for the entire loop.

Why Aliasing Blocks Optimizations

The memory load/store pattern is just one symptom. Aliasing uncertainty blocks a range of compiler passes:

Register promotion. The compiler cannot keep a variable in a register if a pointer write might update it behind the scenes.

Loop-invariant code motion (LICM). A value is loop-invariant only if it cannot change during the loop. If any pointer write might reach that value, it is no longer provably invariant.

Vectorization. Automatic SIMD requires that loop iterations do not interfere with each other through memory. Unresolved aliasing makes that proof impossible.

Dead store elimination and load forwarding. When the compiler cannot track which stores affect which loads, it must preserve all of them.

Solutions

Use a local variable

This is the most portable and semantically clear fix:

void count(std::span<const int> data) {
    T local_total = total;
    for (int x : data)
        local_total += x;
    total = local_total;
}

The local variable is provably unaliased —no external pointer can reach a stack variable whose address is never taken and never escapes. The compiler is free to keep it in a register. This works regardless of the types involved and requires no non-standard extensions.

Use standard algorithms

total += std::accumulate(data.begin(), data.end(), T{});

std::accumulate is defined to sum sequentially, which does not allow reordering. For vectorization you would use std::reduce, which permits non-deterministic execution order. Either way, the temporary accumulator inside the algorithm is unaliased by construction.

Use __restrict (non-portable)

void count(std::span<const int> data) __restrict {
    // ...
}

This GCC/Clang extension tells the compiler that this does not alias any other pointer in scope. It achieves the same effect as using a local variable, but it is a promise you make to the compiler rather than a structural guarantee. Violating it produces undefined behavior, and the annotation is not portable to MSVC.

The Underlying Tension

C and C++ give programmers nearly unrestricted ability to cast pointers and reinterpret memory. This flexibility is valuable for systems programming, but it comes at a cost: the compiler must treat pointer relationships conservatively unless it can prove safety through type information or explicit annotations.

Languages with stricter ownership models —Rust in particular —eliminate most aliasing ambiguity by construction. A &mut T in Rust is guaranteed to be the only live mutable reference to that location. Fortran takes a similar approach by assuming array parameters do not alias by default. Both languages benefit from this in the quality of code their compilers can produce.

For C and C++ developers, the practical takeaway is: keep accumulation state in local variables during hot loops. It is the simplest change that gives the compiler the information it needs, and it costs nothing at runtime.