Compiler Optimization: Calling Conventions and Argument Passing
Every function call involves a contract between caller and callee about where arguments are placed and where return values will be found. This contract is the calling convention, formalized in the Application Binary Interface (ABI) for each platform. Understanding how arguments are passed —particularly which values stay in registers and which spill to the stack —has direct implications for performance and informs decisions about API design.
This article is part of the Advent of Compiler Optimisations 2025 series by Matt Godbolt.
The System V AMD64 ABI
On Linux x86-64, the System V ABI governs function calls. The first six integer or pointer arguments are passed in registers:
| Argument position | Register |
|---|---|
| 1 | rdi |
| 2 | rsi |
| 3 | rdx |
| 4 | rcx |
| 5 | r8 |
| 6 | r9 |
Arguments beyond the sixth go on the stack. Floating-point arguments use a separate set of XMM registers. Return values go in rax (and rdx for 128-bit values).
The key insight is that passing arguments in registers is essentially free: the values are already in the CPU, and no memory access is needed. Stack arguments require a write at the call site and a read inside the callee, which is slower even when the values are in cache.
Structs Are Not Expensive
A common assumption is that passing a struct by value is expensive because it requires copying. In many cases, this is wrong.
Consider two equivalent-looking signatures:
struct Args { long x, y; };
void foo(long a, long b);
void bar(Args args);
When calling foo(a, b), the compiler puts a in rdi and b in rsi. When calling bar(args), the compiler puts args.x in rdi and args.y in rsi. The generated assembly is identical. No memory is involved, no copy is performed. The struct is simply packed into two registers.
This holds as long as the struct fits within the ABI’s rules for register classification —broadly, if the total size is at most 16 bytes and all fields are integer, pointer, or small floating-point types, the struct passes through registers just like individual arguments.
Smaller Fields
When a struct contains smaller types like int, the ABI packs multiple fields into a single register. A struct with two int members passes as a single rdi value with the first field in the low 32 bits and the second in the high 32 bits. The callee extracts them with a move and a shift:
mov rax, rdi
shr rax, 32 ; rax = y (upper 32 bits)
add eax, edi ; result = y + x (lower 32 bits still in edi)
Very Small Fields
The same packing applies to char and short fields. A struct containing eight char values has a total size of 8 bytes, which fits in a single register. It passes as a single rdi with all eight bytes packed together. The callee uses shifts and masks to extract individual fields.
Many Arguments vs. Structs
An interesting crossover occurs when the number of arguments exceeds six. Consider passing eight separate long values versus passing them as a struct.
With eight independent arguments, the first six go in registers and the last two go on the stack. The callee must load the stack arguments with explicit memory accesses:
add rax, QWORD PTR [rsp+8]
add rax, QWORD PTR [rsp+16]
With a struct containing eight long members, the total size is 64 bytes. This exceeds 16 bytes, so the ABI requires the entire struct to be passed by reference (a pointer to a copy on the stack). This looks inefficient, but for aggregates that are naturally structured this way, it often is not meaningfully slower in practice.
The counterintuitive case is a struct containing many small fields that together fit within 16 bytes. Eight char fields in a struct total 8 bytes —they all pass in a single register, whereas eight separate char arguments would require two to go on the stack.
Design Implications
std::string_view
std::string_view contains a pointer and a size —two 64-bit values totaling 16 bytes. By the System V ABI, this passes in exactly two registers (rdi and rsi). Passing string_view by value is completely free: no memory involved. This is part of why string_view is designed as a value type.
std::optional<T>
std::optional<T> adds a boolean flag to T. If T itself fits in one register and the optional introduces no padding, the combined type may still fit in two registers and pass without memory access. If padding or alignment requirements push the size above 16 bytes, the optional must be passed by reference.
Windows ABI Differences
The Microsoft x64 ABI uses only four registers for integer arguments (rcx, rdx, r8, r9). A string_view (two 64-bit values) exceeds four arguments and therefore requires one value to go on the stack. Types and APIs that perform optimally on Linux may have different performance characteristics on Windows.
Practical Guidelines
Pass small objects by value. Objects up to 16 bytes typically pass entirely in registers. The “pass by const reference” instinct from older C++ conventions is not always beneficial for small types.
Group related parameters into structs. If a function takes many small parameters of related types, grouping them in a struct can reduce stack spilling and improve locality.
Avoid long lists of scalar parameters. More than six integer arguments guarantees stack involvement. Grouping arguments into a compact struct may eliminate that overhead.
Be aware of ABI platform differences. Code that is register-optimal on Linux may behave differently on Windows. If cross-platform performance matters, verify on both.
Use Compiler Explorer to examine the calling sequence for any function signature you care about. The assembly makes the register allocation immediately clear.