Compiler Optimization: Replacing Division with Multiplication
Integer division is expensive. On x86-64, the div instruction takes anywhere from 18 to nearly 100 cycles depending on the microarchitecture, compared to a single cycle for addition or about three cycles for multiplication. Worse, each processor core typically has only one division unit, so multiple in-flight divisions cannot proceed in parallel.
Compilers know this, and they go to considerable lengths to eliminate division from generated code. When dividing by a compile-time constant —even one that is not a power of two —modern compilers replace the division with a multiply-and-shift sequence that runs in a fraction of the time.
This article is part of the Advent of Compiler Optimisations 2025 series by Matt Godbolt.
The Motivating Example: Integer to ASCII
Converting an integer to its decimal string representation requires repeated division by 10:
do {
digit = number % 10;
*buf++ = '0' + digit;
number /= 10;
} while (number);
Ten is not a power of two, so the obvious shift trick does not apply. Yet the compiler does not emit a single div instruction. Instead it uses a completely different approach.
Fixed-Point Approximation of Division
For unsigned int x / 10, GCC and Clang generate something like this on x86-64:
mov esi, 3435973837 ; load the magic constant 0xcccccccd
imul rdx, rsi ; rdx = x * 0xcccccccd (64-bit product)
shr rdx, 35 ; rdx >>= 35
The result in rdx is x / 10, exactly, for every possible 32-bit unsigned input. There is no approximation error.
Why This Works
The constant 0xcccccccd is a fixed-point representation of 1/10. More precisely:
0xcccccccd / 2^35 鈮?0.100000000005821 鈮?1/10
Multiplying x by this value and then shifting right by 35 bits is mathematically equivalent to multiplying x by 1/10 —which is division by 10. The compiler constructs the constant and shift amount such that the approximation error never affects the result for any input in the valid range.
This technique is called “magic number division” and was systematically described by Henry S. Warren in Hacker’s Delight. The compiler does not use a lookup table of magic numbers —it computes the appropriate constant for each divisor algebraically during compilation.
Recovering the Remainder
Once the quotient q = x / 10 is known, the remainder is computed without any additional division:
r = x - q * 10
The compiler computes q * 10 efficiently using the lea instruction, which can compute a + b * k for small constants k in a single cycle:
lea r8d, [rdx + rdx*4] ; r8 = q + q*4 = q*5
add r8d, r8d ; r8 = q*10
sub ecx, r8d ; r = x - q*10
Two lea/add instructions replace what would otherwise be another multiply or divide.
Other Divisors
The same technique applies to any compile-time constant divisor, not just 10.
Small odd constants like 3 or 7 use similar magic numbers but may require an additional add or subtract step to correct for rounding behavior in certain input ranges.
Signed division is more complex. The C standard requires that integer division truncates toward zero, meaning -7 / 2 = -3, not -4. For negative dividends, this is the opposite rounding direction from a right shift. The compiler must insert correction logic —typically a conditional add using cmov —to maintain the correct semantics.
Very large or irregular divisors may result in magic number sequences that the compiler judges too complex to be worth it, at which point it falls back to the actual div instruction. This is rare in practice.
The Performance Gap
All of these sequences —even the ones with correction steps —execute in a handful of cycles. The hardware div instruction on the same operands takes 18 or more cycles. For code that divides frequently (such as the integer-to-string conversion loop), this is a measurable improvement.
The financial industry provides a concrete historical example. The FIX protocol, used for order transmission in electronic trading, represents prices and quantities as ASCII decimal strings. High-frequency trading systems needed to format and parse these strings as fast as possible. The compiler’s ability to transform / 10 into a multiply-and-shift sequence was directly relevant to throughput in that context.
What This Means for You
Write x / 10. Do not write (x * 0xcccccccdULL) >> 35. The latter is harder to read, harder to verify for correctness, and gives the compiler less freedom to choose the best implementation for the target architecture.
The compiler knows the right magic number for your divisor, it has verified that the approximation is exact for all inputs in the valid range, and it generates the appropriate correction logic for signed types. Your job is to write clear code; the compiler handles the arithmetic.