yuqi-zheng

Byte Order Reversal in C++: Bit Twiddling vs. Compiler Builtins


Converting between big-endian and little-endian representations is a common task in network programming, binary file parsing, and hardware interfacing. C++ offers two natural approaches: manual bit manipulation and compiler builtins. Examining the generated assembly reveals they are equivalent —but one is far more readable.


The Manual Approach

For a 32-bit integer, byte reversal requires shuffling each of the four bytes into the opposite position:

void reverse32(unsigned int* p) {
    unsigned int& a = *p;
    a = ((a & 0xff000000) >> 24)
      | ((a & 0x00ff0000) >>  8)
      | ((a & 0x0000ff00) <<  8)
      | ((a & 0x000000ff) << 24);
}

The 64-bit version extends this to eight bytes:

void reverse64(unsigned long long* b) {
    unsigned long long& a = *b;
    a = ((a & 0xff00000000000000ULL) >> 56)
      | ((a & 0x00ff000000000000ULL) >> 40)
      | ((a & 0x0000ff0000000000ULL) >> 24)
      | ((a & 0x000000ff00000000ULL) >>  8)
      | ((a & 0x00000000ff000000ULL) <<  8)
      | ((a & 0x0000000000ff0000ULL) << 24)
      | ((a & 0x000000000000ff00ULL) << 40)
      | ((a & 0x00000000000000ffULL) << 56);
}

Correct, but verbose. The intent is buried in eight mask-and-shift operations.


Compiler Builtins

GCC and Clang provide __builtin_bswap32 and __builtin_bswap64 that express the intent directly:

void reverse32_builtin(unsigned int* p) {
    *p = __builtin_bswap32(*p);
}

void reverse64_builtin(unsigned long long* p) {
    *p = __builtin_bswap64(*p);
}

Assembly Output

The compiler recognizes both patterns and emits the same bswap instruction for all four variants:

32-bit:

reverse32:
    mov     eax, DWORD PTR [rdi]
    bswap   eax
    mov     DWORD PTR [rdi], eax
    ret

reverse32_builtin:
    mov     eax, DWORD PTR [rdi]
    bswap   eax
    mov     DWORD PTR [rdi], eax
    ret

64-bit:

reverse64:
    mov     rax, QWORD PTR [rdi]
    bswap   rax
    mov     QWORD PTR [rdi], rax
    ret

reverse64_builtin:
    mov     rax, QWORD PTR [rdi]
    bswap   rax
    mov     QWORD PTR [rdi], rax
    ret

Three instructions. No performance difference between the approaches.


Recommendations

Use __builtin_bswap32 / __builtin_bswap64 over manual bit manipulation. The compiler recognizes and optimizes both, but the builtin makes intent explicit.

Use fixed-width types. Prefer uint32_t and uint64_t over unsigned int and unsigned long long to guarantee the correct width across platforms.

In C++23, use std::byteswap. It is the standard, type-safe, portable spelling:

#include <bit>
uint32_t x = std::byteswap(value);

The manual bit-twiddling version is a useful exercise for understanding what the hardware does, but in production code it adds noise without adding correctness or performance.