Compiler Optimization: Using lea for Integer Arithmetic
On x86, adding two integers and storing the result in a third register is more complicated than it sounds. The standard add instruction only takes two operands and overwrites one of its sources. Compilers have a clean solution: the lea instruction, originally designed for address calculation, turns out to be a surprisingly effective arithmetic tool.
The Two-Operand Problem
Most x86 arithmetic instructions follow a two-operand form:
add dst, src ; dst <- dst + src
The destination is also a source, so the result overwrites one of the inputs. This is fine for simple cases, but consider a C function that adds two integers:
int add(int a, int b) { return a + b; }
Under the System V ABI, a arrives in edi and b in esi. The return value goes in eax. A naive implementation using add requires two instructions:
mov eax, edi ; copy a into eax
add eax, esi ; eax = eax + b
The mov is there purely to avoid clobbering edi. It adds no computational value.
What lea Actually Does
The lea instruction —Load Effective Address —computes a memory address and stores it in a register, without actually accessing memory. It supports the full x86 addressing mode syntax:
[base + index * scale + displacement]
where scale can be 1, 2, 4, or 8. The hardware treats this as an arithmetic expression, not a memory reference. The result is written to any register you choose.
For our addition problem, this means:
lea eax, [rdi + rsi]
This computes rdi + rsi and writes the result to eax —a three-operand addition in one instruction, with neither source register modified.
Why Compilers Prefer lea
When GCC or Clang compiles the add(int a, int b) function above, the output is typically:
lea eax, [rdi + rsi]
ret
The benefits are concrete:
Fewer instructions. One lea replaces a mov followed by an add. Smaller code means better instruction cache utilization and less decode pressure.
Sources are preserved. Neither rdi nor rsi is touched. If the caller needs those values later, the compiler does not have to save and restore them, or rearrange the computation.
An independent destination. The result lands in eax directly. No temporaries, no spills.
Hardware support. Modern x86 processors have dedicated execution units for address generation. lea is typically a single-cycle operation and can be issued in parallel with other instructions on superscalar cores.
One subtlety: the operands use 64-bit registers (rdi, rsi) even though int is 32-bit. This is intentional. Writing a 32-bit result into eax automatically zeros the upper 32 bits of rax, satisfying the x86-64 convention without an extra mov or masking step.
Beyond Simple Addition
The addressing mode syntax gives lea more expressive power than plain add:
| Expression | Assembly |
|---|---|
a + b | lea eax, [rdi + rsi] |
a + b * 2 | lea eax, [rdi + rsi*2] |
a + b * 4 + 10 | lea eax, [rdi + rsi*4 + 10] |
x * 5 (i.e., x + x*4) | lea eax, [rdi + rdi*4] |
This makes lea useful for multiplying by small constants that happen to be one more than a power of two: 3, 5, 9. Compilers use this frequently for array index calculations and strength reduction in loops.
The Broader Point
The lea trick is a good example of how compiler backends exploit architectural details that are invisible at the source level. The C code return a + b gives no hint that anything unusual is happening. The compiler’s target-specific code generator knows the instruction set well enough to recognize that lea is the right tool here —not because it is doing address computation, but because its addressing hardware happens to implement the arithmetic the program needs.
This is one reason why hand-optimizing assembly is harder than it looks. Matching or beating a modern compiler requires knowing not just what an instruction does semantically, but how the hardware executes it, and what alternatives exist across the full instruction set.