yuqi-zheng

Linux rp_filter and Why It Drops Your Trading Packets


You installed a second NIC for a redundant feed, configured the routes, and started receiving data. Then some packets silently vanish. No error, no log, no retry —they are just gone. The kernel drops them before your application ever sees them, and the culprit is rp_filter, Linux’s reverse path filter.

This article explains what rp_filter does, why its default setting is hostile to multi-homed trading machines, and how to configure it correctly.


What is rp_filter?

Reverse path filtering is a security mechanism that validates incoming packets against the routing table. When a packet arrives on an interface, the kernel asks: if I were to send a reply to this source address, would I use the same interface the packet arrived on?

If the answer is no, the packet is suspicious —it might be spoofed— and the kernel drops it silently. There is no ICMP error, no log entry (unless you enable debugging), and no notification to the application.

The check lives in the kernel’s fib_validate_source() function, which is called for every incoming packet during the IP input path (ip_rcvip_rcv_finish), before any transport-layer processing.

The sysctl interface

# Per-interface
net.ipv4.conf.eth0.rp_filter
net.ipv4.conf.eth1.rp_filter

# All interfaces (acts as default)
net.ipv4.conf.all.rp_filter

# Default for new interfaces
net.ipv4.conf.default.rp_filter

The effective value for a given interface is the maximum of conf.all.rp_filter and conf.<ifname>.rp_filter. This is a common trap: setting eth0.rp_filter = 0 while all.rp_filter = 1 still results in strict filtering on eth0.

# Check the effective value
sysctl net.ipv4.conf.all.rp_filter
sysctl net.ipv4.conf.eth0.rp_filter
# Effective = max(all, eth0)

The three modes

0 — No source validation

Disabled. Every incoming packet is accepted regardless of its source address and arrival interface. The kernel performs no reverse-path check at all.

sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.eth0.rp_filter=0

1 — Strict mode (RFC 3704)

The kernel performs a full FIB lookup using the source address as the destination. If the resulting outgoing interface does not match the incoming interface, the packet is dropped.

Packet arrives on eth1 from 10.0.1.100
Kernel looks up route to 10.0.1.100 → outgoing interface is eth0
eth0 ≠ eth1 → DROP

This is the most restrictive mode and the default on many modern distributions (Ubuntu 20.04+, RHEL 8+, CentOS 8+). It breaks on any topology where packets can arrive on a different interface than the one the kernel would use to reply —which is exactly what happens on trading machines with multiple NICs.

2 — Loose mode (RFC 3704)

The kernel performs a FIB lookup using the source address, but only checks whether a route exists at all —not which interface it would use. Packets are dropped only if the source address is completely unreachable from this host.

Packet arrives on eth1 from 10.0.1.100
Kernel looks up route to 10.0.1.100 → route exists (via eth0)
Route exists → ACCEPT (interface mismatch is OK)

Loose mode protects against blatantly spoofed addresses (those that are unroutable) while allowing asymmetric routing.


Why trading machines hit this

Multi-NIC dual-feed setup

A typical low-latency trading machine has two or more NICs, each connected to a different exchange feed or broker:

┌─────────────────────────────┐
│         Trading Host        │
│                             │
│  eth0 (10.0.1.10) ── Feed A│
│  eth1 (10.0.2.10) ── Feed B│
│                             │
└─────────────────────────────┘

With default routes pointing through eth0:

default via 10.0.1.1 dev eth0
10.0.1.0/24 dev eth0
10.0.2.0/24 dev eth1

Packets arriving on eth1 from 10.0.2.x are fine (the reply route also goes through eth1). But if Feed B uses a source address from a different subnet, or if there is a default route via eth0 that matches the source, strict mode drops the packet.

Kernel-bypass NICs

With Solarflare/OpenOnload or EFVI, the data plane bypasses the kernel entirely. But the control plane —ARP, ICMP, route management— still goes through the kernel stack. If the kernel’s rp_filter drops the ARP reply or the ICMP redirect, the kernel-bypass path can’t establish connectivity either.

This is particularly insidious because the kernel-bypass application never sees the dropped packet; it just never receives data. The usual debugging approach of running tcpdump won’t help either, since tcpdump captures before the rp_filter check.

Asymmetric routing in co-location

In exchange co-locations, network topology is often out of your control. The upstream switch may route return traffic through a different path than the incoming traffic, especially with ECMP or redundant switches. From the kernel’s perspective, the reply interface doesn’t match the arrival interface, and strict mode drops the packet.

Bonding / LAG with L3+ hashing

When NICs are bonded with LACP and the hash policy is layer3+4, individual flows are pinned to specific slave interfaces. If the switch sends a flow’s packets on a different slave than the kernel expects (due to hash mismatch or failover), strict mode can drop them.


The silent drop problem

rp_filter drops are invisible by default. The packet is discarded in fib_validate_source() with kfree_skb() and an IPSTATS_MIB_INADDRERRORS increment —no ICMP, no log, no socket error.

Detecting drops

Check per-interface statistics:

cat /proc/net/snmp | grep InAddrErrors
# Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ...
# Ip: 1 64 12345678 0 42 ...

A non-zero InAddrErrors that keeps growing indicates rp_filter drops. But this counter also increments for other reasons (invalid destination addresses), so it is not conclusive.

Use nstat for more detail:

nstat -az | grep InAddrErrors

Enable kernel reverse-path logging:

# Temporarily (verbose, high overhead)
echo 1 > /proc/sys/net/ipv4/conf/eth1/log_martians
# Then check dmesg
dmesg | grep "martian"
# Example output:
# IPv4: martian source 10.0.1.100 from 10.0.2.1, on dev eth1

The log_martians sysctl causes the kernel to log every packet that fails the rp_filter check. This is expensive at high packet rates and should only be used during debugging.

Trace with eBPF:

# Trace drops in fib_validate_source
bpftrace -e 'kprobe:fib_validate_source { @src = ntop(arg0); @if = arg1; }'

Or use dropwatch:

dropwatch -l kas
# Shows exact kernel functions where packets are dropped

The tcpdump trap: tcpdump captures packets at the raw socket layer, which is before the rp_filter check. A packet visible in tcpdump but not received by the application is a strong signal that rp_filter (or netfilter/iptables) is dropping it.


Correct configuration for trading machines

The safe default: loose mode

For any multi-homed host, set loose mode globally:

# /etc/sysctl.d/99-trading.conf
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2

Then verify every interface:

for iface in $(ls /proc/sys/net/ipv4/conf/ | grep -v all | grep -v default); do
    sysctl -w net.ipv4.conf.$iface.rp_filter=2
done

Loose mode still provides protection against spoofed packets from unroutable sources, but does not break asymmetric routing.

When to use strict mode

Use strict mode (1) only on single-homed hosts where you control the entire network path and want maximum spoofing protection. This is rarely the case for trading machines.

When to disable rp_filter entirely

Disable (0) when:

  • You use kernel-bypass (EFVI, DPDK) and the kernel’s FIB lookup returns incorrect results for the bypass interface.
  • You have complex policy routing that the rp_filter logic doesn’t account for.
  • You’ve verified that loose mode still drops legitimate packets.

The trade-off is that you lose all source-address validation from the kernel. If you also run iptables/nftables with rpfilter match, you can replicate the filtering in a more flexible way:

# nftables equivalent with per-rule control
nft add rule ip filter input fib saddr . iif type unreachable drop
# vs. allowing specific sources
nft add rule ip filter input ip saddr 10.0.0.0/8 accept
nft add rule ip filter input fib saddr . iif type unreachable drop

Kernel-bypass specific setup

For Solarflare EFVI with the sfc driver:

# The kernel driver handles ARP/routing for the kernel path.
# EFVI traffic bypasses the kernel entirely.
# Set loose mode on the Solarflare interface to avoid
# dropping ARP replies from the exchange switch.
net.ipv4.conf.sfc_bp0.rp_filter = 2   # or 0

Bonding with LACP

# Bonded interfaces inherit from conf.all.
# Ensure loose mode before bringing up the bond.
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.bond0.rp_filter = 2

The all vs per-interface trap

The interaction between conf.all.rp_filter and conf.<ifname>.rp_filter is the most common misconfiguration. The kernel takes the maximum of the two values:

all.rp_filter = 1, eth0.rp_filter = 0  → effective = 1 (strict!)
all.rp_filter = 0, eth0.rp_filter = 2  → effective = 2 (loose)
all.rp_filter = 2, eth0.rp_filter = 0  → effective = 2 (loose)
all.rp_filter = 2, eth0.rp_filter = 2  → effective = 2 (loose)

This means setting all.rp_filter = 1 (the default on many distros) overrides any per-interface setting of 0. You must set all to the same or lower value than your desired per-interface mode.

Recommended approach:

# Set all to the minimum needed, then per-interface overrides
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
# Per-interface: only override if you need STRICT mode
# (rare on trading machines)

Checking effective values

#!/bin/bash
# Show effective rp_filter for every interface
for iface in $(ls /proc/sys/net/ipv4/conf/ | grep -v all | grep -v default); do
    all=$(cat /proc/sys/net/ipv4/conf/all/rp_filter)
    per=$(cat /proc/sys/net/ipv4/conf/$iface/rp_filter)
    effective=$((all > per ? all : per))
    echo "$iface: all=$all per=$per effective=$effective"
done

Interaction with policy routing

When policy routing (ip rule / ip route get) is in use, the rp_filter check uses the main routing table by default, not the policy-selected table. This means packets that are correctly routed by a policy rule can still fail the rp_filter check if the main table’s route for the source address points to a different interface.

Example

# Main table
default via 10.0.1.1 dev eth0
10.0.1.0/24 dev eth0

# Policy routing table 100
default via 10.0.2.1 dev eth1
10.0.2.0/24 dev eth1

# Rule: packets from 10.0.2.10 use table 100
ip rule add from 10.0.2.10 table 100

A packet arriving on eth1 from 10.0.2.100: the rp_filter check looks up 10.0.2.100 in the main table, finds the route via eth0 (not eth1), and drops it —even though policy routing would correctly route replies via eth1.

Workaround

Set rp_filter = 0 or 2 on the affected interface, or ensure the main table has routes consistent with your policy routing.


Troubleshooting checklist

  1. Packet visible in tcpdump but not in application? → Check rp_filter.
  2. InAddrErrors counter growing? → Check rp_filter + log_martians.
  3. Only some source IPs are affected? → Likely strict mode dropping packets from subnets that route via a different interface.
  4. Problem appeared after adding a NIC? → The new interface probably changed the FIB lookup result for existing sources.
  5. Kernel-bypass app can’t receive but kernel-stack works? → Check that the kernel-side control path (ARP/routing) isn’t being blocked by rp_filter.

Quick diagnostic

# 1. Check effective rp_filter values
for i in /proc/sys/net/ipv4/conf/*/rp_filter; do echo "$i: $(cat $i)"; done

# 2. Check for address errors
nstat -az | grep InAddrErrors

# 3. Enable logging temporarily
sysctl -w net.ipv4.conf.all.log_martians=1
dmesg -w | grep martian

# 4. Test route lookup for a specific source
ip route get 10.0.2.100 from 10.0.2.10 iif eth1

Summary

ModeValueCheckDrops asymmetric?Use case
Disabled0NoneNoKernel-bypass, complex policy routing
Loose2Route exists for source?NoMulti-homed trading machines (recommended)
Strict1Reply uses same interface?YesSingle-homed hosts only

For trading machines: set net.ipv4.conf.all.rp_filter = 2 and net.ipv4.conf.default.rp_filter = 2. Verify with the script above. Never leave all.rp_filter = 1 on a multi-NIC host.


References