TCP Message Framing: Reassembling Length-Prefixed Messages from a Byte Stream

TCP is a stream protocol — it has no concept of message boundaries. A single send() call might arrive as three recv() calls, or three send() calls might coalesce into one buffer. If you’re building a trading protocol over TCP, you need message framing: a way for the receiver to know where one message ends and the next begins.

This article covers the simplest framing scheme — length-prefixing — and a complete C++ implementation that handles all the edge cases.

The Problem

TCP’s abstraction is a bidirectional byte stream:

Sender:  send(msg1)  send(msg2)  send(msg3)
         │           │           │
         └───────────┼───────────┘
                     ▼
            TCP byte stream
                     │
      ┌──────────────┼──────────────┐
      ▼              ▼              ▼
Receiver: recv() → 4 bytes   recv() → 11 bytes   recv() → -1 (EOF)

The receiver has no idea where the message boundaries are. Two common framing strategies:

Strategy	Description	Use Case
Delimiter-based	Messages end with `\n` or a special byte sequence	Text protocols, HTTP
Length-prefixed	Each message starts with N bytes encoding its total length	Binary protocols, FIX, exchange feeds

For trading systems, length-prefixing is the standard. It’s deterministic, requires no escaping, and allows the receiver to know exactly how many bytes to wait for.

Message Format

Our format is the simplest possible: a 2-byte little-endian header encoding the total message length (including itself), followed by the payload.

Bytes	Description
0–1	Message length (`uint16_t`, little-endian) — total message size including these 2 bytes
2…N	Payload — `(length - 2)` bytes

Example: [0x05, 0x00, 'A', 'B', 'C'] → length = 0x0005 = 5 → total 5 bytes → payload = “ABC”.

The length includes the header bytes — this is important because it means the minimum valid message is 2 bytes (length field only, empty payload). More commonly, a length of 0 or 1 should be treated as invalid.

The Interface

We’re given two abstractions:

struct IDataProvider {
    virtual int GetData(std::byte* data, int maxLength) { return 0; }
    virtual ~IDataProvider() = default;
};

struct ITcpSocket {
    virtual void OnMessage(std::byte* bytes, int length) { };
    virtual ~ITcpSocket() = default;
};

GetData mimics recv() — it fills data with up to maxLength bytes and returns the number actually read, or -1 for EOF. OnMessage is the callback for each complete frame.

Our task: implement TcpSocket::Process() that loops until EOF, calling OnMessage for each complete message, handling all edge cases.

Implementation

The State Machine

We need one piece of running state: totalSize. When it’s 0, we’re waiting for a header. When it’s non-zero, we know how many bytes the current message needs and are accumulating toward that total.

class TcpSocket : public ITcpSocket {
public:
    TcpSocket(IDataProvider* provider) : provider_{provider} {}

    void Process() {
        const auto AllocationSize = 65655;
        auto bytes = std::make_unique<std::byte[]>(AllocationSize);

        int totalReceived = 0;
        uint16_t totalSize = 0;

        while (true) {
            // Determine how many bytes to ask for
            int remaining = totalReceived < 2
                ? AllocationSize - totalReceived        // Need header
                : totalSize - totalReceived;            // Need rest of message

            int received = provider_->GetData(
                bytes.get() + totalReceived,
                remaining
            );

            if (received == -1)
                break;

            totalReceived += received;

            // Parse the header once we have at least 2 bytes
            if (totalReceived > 1 && totalSize == 0) {
                totalSize = static_cast<uint16_t>(bytes[0])
                          | (static_cast<uint16_t>(bytes[1]) << 8);
            }

            // Process complete messages
            while (totalSize > 0 && totalReceived >= totalSize) {
                OnMessage(bytes.get(), totalSize);

                // Shift remaining data to the front
                if (totalReceived > totalSize) {
                    std::memmove(
                        bytes.get(),
                        bytes.get() + totalSize,
                        totalReceived - totalSize
                    );
                }

                totalReceived -= totalSize;
                totalSize = 0;

                // Re-parse header if we have enough trailing data
                if (totalReceived > 1) {
                    totalSize = static_cast<uint16_t>(bytes[0])
                              | (static_cast<uint16_t>(bytes[1]) << 8);
                }
            }
        }
    }

private:
    IDataProvider* provider_;
};

The Key Operations

Let’s walk through each critical step.

1. Reading the Right Amount

int remaining = totalReceived < 2
    ? AllocationSize - totalReceived    // Don't know message size yet
    : totalSize - totalReceived;        // Know exactly how many bytes to wait for

Before parsing the header, ask for as much as the buffer can hold. After parsing, ask for exactly the remaining message bytes. This prevents over-reading — if the next message starts in the same GetData call, we’d need AllocationSize to hold it all anyway.

2. Parsing Little-Endian uint16_t

totalSize = static_cast<uint16_t>(bytes[0])
          | (static_cast<uint16_t>(bytes[1]) << 8);

Byte 0 is the LSB, byte 1 is the MSB. The bitwise OR composes them. This is host-endian-independent — it works correctly on both little-endian (x86) and big-endian machines because we’re explicitly reconstructing the value from known byte positions.

An equally valid alternative:

uint16_t totalSize;
std::memcpy(&totalSize, bytes, sizeof(totalSize));

On a little-endian host this is a no-op; on a big-endian host you’d need __builtin_bswap16(). The manual shift approach is more explicit and equally fast (the compiler optimizes it to a single mov on x86).

3. Shifting Leftover Data with `memmove`

if (totalReceived > totalSize) {
    std::memmove(
        bytes.get(),
        bytes.get() + totalSize,
        totalReceived - totalSize
    );
}

This is the critical operation that beginners often get wrong. After processing a message, if there are extra bytes left in the buffer, they belong to the next message. We must shift them to the front without discarding them.

Why memmove and not memcpy? The source and destination regions overlap — bytes.get() + totalSize is ahead of bytes.get(). memcpy has undefined behavior on overlapping memory. memmove handles overlaps correctly.

4. Re-parsing After Shift

totalReceived -= totalSize;
totalSize = 0;

if (totalReceived > 1) {
    totalSize = static_cast<uint16_t>(bytes[0])
              | (static_cast<uint16_t>(bytes[1]) << 8);
}

After shifting, the remaining bytes are now at the front of the buffer. If we have at least 2, we can immediately parse the next message’s header. This handles the case where a single GetData call contains multiple complete messages.

Edge Cases, One by One

Split Header

Read 1: [0x05]              → totalReceived=1, totalSize=0 (need 2 bytes for header)
Read 2: [0x00, 'A','B','C'] → totalReceived=5, parse length=0x0005, full message!

Split Message Body

Read 1: [0x05, 0x00, 'A']   → totalReceived=3, totalSize=5, not enough
Read 2: ['B', 'C']          → totalReceived=5, full message!

Multiple Messages in One Read

GetData returns: [0x03,0x00,'X', 0x04,0x00,'Y','Z']
                 ↑ msg1 (3B)      ↑ msg2 (4B)

Processing:

Parse header → length=3
totalReceived=7 >= 3 → deliver msg1 (bytes 0-2)
Shift bytes [3,6] to front → buffer = [0x04,0x00,‘Y’,‘Z’, …]
totalReceived=4, re-parse header → length=4
totalReceived=4 >= 4 → deliver msg2 (bytes 0-3)
totalReceived=0, loop exits on EOF

No Data / Immediate EOF

GetData returns: -1 → break out of loop, no messages delivered

Follow-Up: Preventing Memory Exhaustion

A common interview follow-up: what if a malicious client sends a huge length value?

The fix: validate the length before allocating or waiting for data:

constexpr uint16_t MAX_MESSAGE_SIZE = 4096;

if (totalReceived > 1 && totalSize == 0) {
    totalSize = static_cast<uint16_t>(bytes[0])
              | (static_cast<uint16_t>(bytes[1]) << 8);

    if (totalSize == 0 || totalSize > MAX_MESSAGE_SIZE) {
        // Protocol violation — close the connection
        return;
    }
}

Without this check, a totalSize of 0xFFFF (65535) would cause you to allocate and wait for 65KB of data. On a busy trading gateway with thousands of connections, this becomes a denial-of-service vector.

Follow-Up: 4-Byte Headers

What if the header were 4 bytes instead of 2?

if (totalReceived > 3 && totalSize == 0) {
    totalSize = static_cast<uint32_t>(bytes[0])
              | (static_cast<uint32_t>(bytes[1]) << 8)
              | (static_cast<uint32_t>(bytes[2]) << 16)
              | (static_cast<uint32_t>(bytes[3]) << 24);
}

The logic is identical — just more bytes to accumulate before parsing. A 4-byte header supports messages up to ~4 GB, which is overkill for most trading protocols (FIX messages are typically < 8 KB).

Key Takeaways

TCP is a stream — message boundaries are your responsibility
Length-prefixing is the simplest framing strategy for binary protocols
Maintain running state — track totalReceived and totalSize between reads
memmove, not memcpy — the buffer shift is an overlapping copy
Re-parse after shifting — handle multiple messages in a single read
Validate the length — prevent memory exhaustion from malicious or corrupt headers
Host-independent endianness — manual byte reconstruction works everywhere

Message framing is one of those problems that looks trivial until you’ve been bitten by a missing memmove at 3 AM. A correct implementation handles partial reads, multiple messages, and trailing data without losing a single byte — and now you have one.