yuqi-zheng

Compiler Optimization: Loop Style Doesn't Matter


One of the recurring debates in C++ is whether higher-level loop idioms —range-based for loops, STL algorithms —carry hidden performance costs compared to hand-written indexed loops or raw pointer iteration. The answer, on any modern compiler with optimizations enabled, is no. The compiler reduces all of these forms to the same machine code.

This article is part of the Advent of Compiler Optimisations 2025 series by Matt Godbolt, and it also tells the story of how Compiler Explorer came to exist.

The Origin of Compiler Explorer

In 2011, Matt Godbolt and his colleagues were debating whether to adopt C++11’s range-based for loops across their codebase. The concern was performance: range-for is syntactic sugar over iterator-based iteration, and iterators have overhead in languages like Java. Would the same be true in C++?

Rather than guessing, Godbolt wrote a small script that compiled a snippet of C++ and displayed the resulting assembly in real time. That script became Compiler Explorer —now one of the most widely used tools for compiler experimentation in the world. The performance question that prompted its creation turns out to have a clear answer.

Four Ways to Sum a Vector

Consider four idiomatic ways to compute the sum of a std::vector<int>:

Indexed for Loop

int sum = 0;
for (size_t i = 0; i < vec.size(); ++i)
    sum += vec[i];

This is the most traditional form. The compiler must compute vec.size() (which involves a pointer subtraction and a right shift), maintain an index variable, and address the vector data by index on each iteration. The generated code is correct and fast, but it retains the index variable and performs slightly more work than necessary.

Pointer Iteration

int sum = 0;
for (const int* p = vec.data(); p != vec.data() + vec.size(); ++p)
    sum += *p;

Here the programmer works with raw pointers. The compiler sees that the start and end pointers can be computed once and used directly, without any size calculation in the loop body. The inner loop is just a load, an add, a pointer increment, and a comparison.

Range-Based For Loop

int sum = 0;
for (int x : vec)
    sum += x;

Despite being the highest-level form, this produces assembly that is identical to the pointer iteration version. The compiler expands the range-for into an iterator-based loop, then optimizes those iterators into raw pointers, arriving at exactly the same instruction sequence.

std::accumulate

return std::accumulate(vec.begin(), vec.end(), 0);

The STL algorithm form also produces the same assembly. The function template is inlined, the iterator arithmetic is simplified to pointer arithmetic, and the result is the same tight inner loop.

What the Assembly Shows

Loop styleOptimal code generatedReadabilitySafety
Indexed forNo (index variable retained)MediumBounds error risk
Pointer iterationYesLowHigh risk
Range-based forYesHighSafe
std::accumulateYesHighSafe

The only form that falls slightly short is the indexed loop. The compiler does not eliminate the index variable, which introduces a small amount of additional bookkeeping. Every other form gets “canonicalized” —reduced to the same internal representation —and produces optimal pointer-based iteration code.

Why This Matters

The practical implication is simple: write the loop form that best expresses your intent. Range-based for loops and STL algorithms are not performance compromises. They are the clearer choice, and they produce at least as good (and sometimes better) code than the alternatives.

Raw pointer loops are not faster than range-for. They are harder to read, easier to get wrong (off-by-one errors, incorrect bounds computation), and provide no benefit to the optimizer. The abstraction cost is zero.

This is what “zero-cost abstractions” means in C++. The language provides higher-level constructs —iterators, range-for, algorithm templates —that compile down to exactly the same machine code as the low-level alternatives. You do not pay for the abstraction at runtime.

The Broader Lesson

When you are uncertain whether a particular C++ idiom has a performance cost, do not guess. Paste a representative snippet into Compiler Explorer and look at the output. The answer is usually that modern compilers are smarter than our intuitions about “overhead” suggest.

In this specific case, the answer has been stable for over a decade: write the loop style that makes your code clearest and most maintainable. The compiler handles the rest.