Parsing B3 Exchange Binary UMDF Market Data: From pcap to Order Book
When you connect to the B3 exchange (Brasil Bolsa Balcão) for real-time market data, the feed arrives as raw UDP multicast packets encoded in Binary UMDF — a binary protocol built on Simple Binary Encoding (SBE). There is no JSON, no FIX tags, no delimiters. Just bytes on the wire, structured according to an XML schema.
This article walks through the complete decoding pipeline: from capturing pcap files to building a live MBO (Market By Order) order book. Every protocol layer is explained, every struct mapping shown, and every pitfall called out.
The Protocol Stack
A B3 Binary UMDF packet is a layered structure. Each UDP payload contains a packet header, followed by one or more framed messages. Each message in turn contains an SBE header and a message body whose layout is determined by the template_id.
┌──────────────────────────────────────────┐
│ UDP Payload │
│ ┌────────────────────────────────────┐ │
│ │ Packet Header (12B) │ │
│ ├────────────────────────────────────┤ │
│ │ Framing Header (4B) │ │
│ ├────────────────────────────────────┤ │
│ │ SBE Message Header (8B) │ │
│ ├────────────────────────────────────┤ │
│ │ Message Body (variable) │ │
│ ├────────────────────────────────────┤ │
│ │ Repeating Groups (variable) │ │
│ ├────────────────────────────────────┤ │
│ │ Variable-Length Data │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘
A single UDP packet can contain multiple framed messages. The receiver loops through the payload, reading framing headers and dispatching based on template_id, until the payload is exhausted.
Layer 1: Packet Header
Every Binary UMDF packet begins with a 12-byte packet header:
struct packet_header {
uint8_t channel_id; // Channel identifier (e.g., 84, 90)
uint8_t reserved; // Always 0
uint16_t sequence_version; // Incremented on channel reset
uint32_t sequence_number; // Monotonically increasing per channel
uint64_t sending_time; // Nanoseconds since epoch (PTP-synchronized)
};
Key fields:
channel_id: B3 assigns different channels for different instrument groups. Your feed handler must filter by channel ID to avoid processing irrelevant data.sequence_number: Used for gap detection. Ifsequence_numberskips, you’ve lost packets and must initiate recovery.sending_time: This is the T11 timestamp — the instant just before the matching engine publishes the packet to the UDP multicast, synchronized to PTP with sub-microsecond accuracy.
The sending_time field is critical for latency measurement. As of October 2024, its resolution is nanoseconds, synchronized via PTP to B3’s stratum-3 atomic master clock, with standard deviation under one microsecond.
Layer 2: Framing Header
After the packet header, each message is wrapped in a 4-byte framing header:
struct framing_header {
uint16_t message_length; // Total length including this header
uint16_t encoding_type; // Must be 0xEB50 for SBE
};
The encoding_type field is a sanity check. If it’s not 0xEB50, the message is not SBE-encoded and should be skipped. This guards against corrupted packets or protocol version mismatches.
The message_length tells you how many bytes to consume before the next framing header — this is how you iterate through multiple messages in a single packet.
Layer 3: SBE Message Header
Each message body begins with an 8-byte SBE header:
struct sbe_message_header {
uint16_t block_length; // Fixed-length portion of the message body
uint16_t template_id; // Message type identifier
uint16_t schema_id; // Schema identifier (constant for a given version)
uint16_t schema_version; // Schema version number
};
The template_id is the dispatch key. It tells you which struct to overlay on the bytes that follow. Here are the most important template IDs for order book construction:
| Template ID | Name | Purpose |
|---|---|---|
| 2 | Sequence | Next expected sequence number |
| 3 | SecurityStatus | Trading status per instrument |
| 4 | SecurityDefinition (deprecated) | Instrument definition (v1.6) |
| 9 | EmptyBook | Clear entire order book for a security |
| 10 | SecurityGroupPhase | Trading phase for security group |
| 11 | ChannelReset | Reset all state for a channel |
| 12 | SecurityDefinition | Instrument definition (v1.8+) |
| 30 | SnapshotFullRefresh_Header | Snapshot header with book state |
| 50 | Order_MBO | New order or modify order in book |
| 51 | DeleteOrder_MBO | Delete single order from book |
| 52 | MassDeleteOrder_MBO | Delete orders from position (DELETE_THRU/FROM) |
| 53 | Trade | Trade execution |
| 54 | ForwardTrade | Forward trade execution |
| 55 | ExecutionSummary | Aggressor trade summary |
| 56 | ExecutionStatistics | VWAP, volume stats |
| 71 | SnapshotFullRefresh_Orders_MBO | Full order book snapshot (MBO) |
The block_length tells you the size of the fixed-length portion. After reading block_length bytes, you may encounter repeating groups or variable-length data (described by the XML schema).
Three-Phase Data Recovery
B3 market data is delivered over three separate UDP streams, and you must process them in order:
Phase 1: InstrumentDefinition → Learn all instrument metadata
Phase 2: SnapshotRecovery → Get the full order book state
Phase 3: Incremental → Apply real-time updates
Phase 1: InstrumentDefinition
This feed sends SecurityDefinition messages (template 12 in v1.8+) that describe every tradeable instrument:
struct SecurityDefinition_12 {
SecurityID securityID; // Unique instrument identifier
SecurityExchange securityExchange; // "BVMF"
Symbol symbol; // e.g., "PETR4"
PriceOptional minPriceIncrement; // Tick size (raw int64)
Fixed8 contractMultiplier; // Contract size
uint8_t tickSizeDenominator; // Exponent for price conversion
Currency currency; // "BRL"
// ... 30+ more fields
};
The code collects these into a map:
std::map<uint64_t, PH_SecurityDefinition_12*> map_ph_securityDefinition;
The totNoRelatedSym field tells you how many instruments to expect. The feed handler waits until all definitions are received before proceeding to Phase 2.
Phase 2: SnapshotRecovery
This feed provides the complete order book state via SnapshotFullRefresh_Header_30 and its repeating group noMDEntries (template 71):
struct SnapshotFullRefresh_Header_30 {
uint64_t securityID;
uint32_t lastMsgSeqNumProcessed; // Critical for gap detection
uint32_t totNumReports;
uint32_t totNumBids;
uint32_t totNumOffers;
uint16_t totNumStats;
uint32_t lastRptSeq;
};
struct noMDEntries {
PriceOptional mDEntryPx;
int64_t mDEntrySize;
uint32_t mDEntryPositionNo; // Position in the order book
uint32_t enteringFirm; // Broker ID
UTCTimestampNanos mDInsertTimestamp;
uint64_t secondaryOrderID; // Order ID
char mDEntryType; // '0' = Bid, '1' = Offer
};
The lastMsgSeqNumProcessed field is crucial — it tells you the last incremental sequence number reflected in the snapshot. When you switch to the incremental feed, you must skip any messages with sequence_number <= lastMsgSeqNumProcessed.
Phase 3: Incremental
This is the live feed. Messages arrive at high rate:
Order_MBO(template 50): Insert or modify an orderDeleteOrder_MBO(template 51): Remove a single orderMassDeleteOrder_MBO(template 52): Remove a range of ordersTrade(template 53): Trade executionExecutionSummary(template 55): Aggressor information
The feed handler must verify rptSeq continuity to detect gaps:
if (p_securityStatus_3->rptSeq - map_securityID_rptSeq[p_securityStatus_3->securityID] != 1) {
printf("Packet loss! securityID[%ld] rptSeq[%d]\n", ...);
}
SBE Encoding: Zero-Copy Decoding
Simple Binary Encoding (SBE) is designed for zero-copy decoding. Unlike FIX or JSON, you don’t parse — you overlay a struct onto the raw bytes:
// Zero-copy: just cast the pointer
p_order_MBO_50 = (Order_MBO_50*)malloc(sizeof(Order_MBO_50));
memcpy(p_order_MBO_50, (uint8_t*)pkt_data + offset, p_sbe_hd->block_length);
The block_length from the SBE header tells you exactly how many bytes to copy. No string parsing, no tag-value lookup, no variable-length delimiter scanning. This is why SBE is the encoding of choice for CME, Eurex, B3, and other low-latency exchanges.
Big-Endian Byte Order
SBE uses big-endian (network byte order) for all multi-byte fields. On x86 (little-endian), you must swap bytes:
static inline uint16_t read_uint16(uint8_t* buf) {
return (buf[0] << 8) + buf[1];
}
static inline uint32_t read_uint32(uint8_t* buf) {
return (buf[0] << 24) + (buf[1] << 16) + (buf[2] << 8) + buf[3];
}
However, for the fixed-size SBE message body, you can use memcpy directly if the struct is declared with #pragma pack(1) — the byte swap happens during field access, not during the copy. In production code, you’d use ntohs()/ntohl() or compiler intrinsics for better performance.
Repeating Groups
After the fixed-length body, SBE uses repeating groups for variable-count data. Each group starts with a GroupSizeEncoding:
struct GroupSizeEncoding {
uint16_t blockLength; // Size of each group entry
uint8_t numInGroup; // Number of entries
};
For example, SecurityDefinition_12 has three repeating groups: noUnderlyings_12, noLegs_12, and noInstrAttribs_12. The decoder reads the GroupSizeEncoding, then loops numInGroup times, reading blockLength bytes each iteration.
Variable-Length Data
After repeating groups, there may be variable-length fields (like securityDesc in SecurityDefinition). These are prefixed by a length byte:
uint8_t securityDesc_len = *(uint8_t*)(pkt_data + offset);
offset += 1;
if (securityDesc_len > 0) {
char securityDesc[256];
memcpy(securityDesc, (uint8_t*)pkt_data + offset, securityDesc_len);
offset += securityDesc_len;
}
Price Precision: The Exponent Model
B3 encodes prices as raw int64 values. The actual price is computed by dividing by an exponent that depends on the instrument:
#define EXPONENT_4 10000 // 4 decimal places
#define EXPONENT_7 10000000 // 7 decimal places
#define EXPONENT_8 100000000 // 8 decimal places
int64_t data_exponent(int64_t int_data, uint32_t exponent) {
if (int_data == 0x8000000000000000) return 0; // null value
return int_data / exponent;
}
The exponent for a given instrument is determined by the tickSizeDenominator field in SecurityDefinition_12. For example:
- Equity options:
EXPONENT_4(price in BRL × 10000) - Mini-index futures:
EXPONENT_8(price in points × 100000000)
Null Value Convention
SBE uses a sentinel value for null/optional fields:
const int64_t null_value_int64 = 0x8000000000000000; // INT64_MIN
Any field with this value should be treated as “not present.” The data_exponent() function returns 0 for null values.
MBO Order Book Construction
B3 uses Market By Order (MBO) mode, where each order in the book is individually identified by secondaryOrderID and mDEntryPositionNo. This is fundamentally different from Market By Price (MBP) mode used by many Asian exchanges.
B3 vs. Price-Priority Exchanges
A critical difference: B3 organizes its order book by time priority, not price priority. This means:
- On B3: orders at the same price are sorted by entry time (FIFO)
- On Chinese exchanges (e.g., SSE, SZSE): orders at the same price are sorted by price-time priority, with the best price aggregated
This has implications for mDEntryPositionNo — it represents the absolute position in the time-ordered book, not the price level.
Five Order Book Operations
The sfr_order_book class implements five operations corresponding to MDUpdateAction:
1. NEW (0) — Insert Order
void insert_entries(char entryType, int64_t entryPx, int64_t entrySize,
uint32_t entryPositionNo) {
price_volume* pv = (price_volume*)malloc(sizeof(price_volume));
pv->PositionNo = entryPositionNo;
pv->Price = entryPx;
pv->Volume = entrySize;
// Insert at the specified position
auto iter = list_bid_orders.begin();
std::advance(iter, entryPositionNo - 1);
list_bid_orders.insert(iter, pv);
// Increment position numbers for all subsequent orders
for (iter; iter != list_bid_orders.end(); iter++) {
(*iter)->PositionNo += 1;
}
}
When a new order is inserted at position N, all orders from position N onward must have their PositionNo incremented by 1. This is the key difference from MBP — inserting in the middle of the list shifts everything after it.
2. CHANGE (1) — Modify Order
void change_entries(char entryType, int64_t entryPx, int64_t entrySize,
uint32_t entryPositionNo) {
// Find the order at this position and update price/quantity
for (auto iter = list_bid_orders.begin(); iter != list_bid_orders.end(); iter++) {
if ((*iter)->PositionNo == entryPositionNo) {
(*iter)->Price = entryPx;
(*iter)->Volume = entrySize;
return;
}
}
}
Modification does not change position numbers. However, if the modification changes order priority (e.g., price improvement), B3 sends a DELETE + NEW pair instead of a CHANGE.
3. DELETE (2) — Delete Single Order
void delete_entries(char entryType, uint32_t entryPositionNo) {
// Remove the order at this position
auto iter = list_bid_orders.begin();
std::advance(iter, entryPositionNo - 1);
list_bid_orders.erase(iter);
// Decrement position numbers for all subsequent orders
for (auto iter = list_bid_orders.begin(); iter != list_bid_orders.end(); iter++) {
if ((*iter)->PositionNo > entryPositionNo) {
(*iter)->PositionNo -= 1;
}
}
}
The mirror of insert: after deleting position N, all subsequent positions shift down by 1.
4. DELETE_THRU (3) — Delete All Orders on One Side
void delete_thru_entries(char entryType) {
// Clear the entire bid or ask side
for (auto iter = list_bid_orders.begin(); iter != list_bid_orders.end(); iter++) {
free(*iter);
}
list_bid_orders.clear();
}
This is typically sent during channel reset or trading phase transitions.
5. DELETE_FROM (4) — Delete From Position
The MassDeleteOrder_MBO_52 message with mDUpdateAction=4 deletes all orders from a given position to the end of the book. This is used when an instrument enters a state that invalidates resting orders beyond a certain position.
Snapshot Initialization
When processing SnapshotFullRefresh_Orders_MBO_71, the order book is initialized from the noMDEntries repeating group:
void init_entries(noMDEntries* md) {
price_volume* pv = (price_volume*)malloc(sizeof(price_volume));
pv->PositionNo = md->mDEntryPositionNo;
pv->Price = md->mDEntryPx;
pv->Volume = md->mDEntrySize;
if (md->mDEntryType == '0') {
list_bid_orders.emplace_back(pv);
} else if (md->mDEntryType == '1') {
list_ask_orders.emplace_back(pv);
}
}
Note: '0' = Bid, '1' = Offer. These are FIX tag 269 values.
Timestamps: The PUMA Time Model
B3’s PUMA trading system provides multiple timestamp fields, each captured at a different point in the trade lifecycle. Understanding these is essential for latency analysis.
Timestamp Measurement Points
| Point | Field | Captured By | Description |
|---|---|---|---|
| T1 | InboundBusinessHeader.sendingTime | Client | When the client sent the order |
| T2 | receivedTime | Gateway | When the gateway received the order from socket |
| T4 | marketSegmentReceivedTime | Matching Engine | When the matching engine received from internal bus |
| T5 | transactTime / mDInsertTimestamp | Matching Engine | When the transaction happened |
| T6 | OutboundBusinessHeader.sendingTime | Gateway | When the gateway queued the response to client |
| T10 | mDEntryTimestamp | Matching Engine | During market data message assembly (same for all messages in a packet) |
| T11 | packetHeader.sendingTime | Matching Engine | Just before UDP multicast publish |
Key Observations
-
T10 vs T11:
mDEntryTimestamp(T10) is assigned when the matching engine assembles the market data message.packetHeader.sendingTime(T11) is captured just before the packet is sent to the network. The difference T11 - T10 measures the internal publish latency. -
T5 in both private and public feeds: The
transactTimein private ExecutionReport messages andmDInsertTimestampin publicOrder_MBOmessages both carry T5 — the exact moment the matching engine processed the transaction. -
aggressorTime: The
ExecutionSummarymessage (template 55) includesaggressorTime, which equals the aggressor order’s T4 (when the matching engine received it). This is invaluable for measuring the round-trip from order entry to trade publication. -
PTP synchronization: All timestamps are PTP-synchronized with sub-microsecond accuracy to B3’s stratum-3 atomic clock. The standard deviation of offset is under one microsecond.
pcap Processing Pipeline
The demo code processes offline pcap files in three stages:
void binary_manager::run() {
// Phase 1: Parse InstrumentDefinition pcap
pcap_t* fp_InstrumentDefinition = pcap_open_offline("MBO_090_InstrumentDefinition.pcap", errbuf);
pcap_loop(fp_InstrumentDefinition, 0, dispatcher_handler, (uint8_t*)this);
pcap_close(fp_InstrumentDefinition);
// Wait for all definitions to arrive
while (!is_InstrumentDefinition_ready) { sleep(1); }
// Phase 2: Parse SnapshotRecovery pcap
pcap_t* fp_SnapshotRecovery = pcap_open_offline("MBO_090_SnapshotRecovery.pcap", errbuf);
pcap_loop(fp_SnapshotRecovery, 0, dispatcher_handler, (uint8_t*)this);
pcap_close(fp_SnapshotRecovery);
// Wait for snapshot to be complete
while (!is_snapshotFullRefreshHeader_ready) { sleep(1); }
// Phase 3: Parse Incremental pcap
pcap_t* fp_Incremental = pcap_open_offline("MBO_090_Incremental.pcap", errbuf);
pcap_loop(fp_Incremental, 0, dispatcher_handler, (uint8_t*)this);
pcap_close(fp_Incremental);
}
Network Layer Parsing
The dispatcher callback peels off network headers:
auto dispatcher_handler = [](uint8_t* temp, const struct pcap_pkthdr* header,
const uint8_t* pkt_data) -> void {
binary_manager* pm = (binary_manager*)temp;
IPv4Header ipv4(pkt_data);
if (ipv4.Protocol == 0x11) { // UDP
UDPHeader udp(pkt_data, ipv4.IPv4Header_len);
pm->payload = udp.UDP_HeaderLen - 8; // Subtract UDP header
pm->pkg_len = header->len;
pm->parse_package(pkt_data, udp.UDPHeader_len);
}
};
The IPv4Header and UDPHeader classes handle Ethernet → IP → UDP parsing, extracting the payload offset where the Binary UMDF data begins.
Channel ID Filtering
The parse_package method immediately checks the channel ID:
if (p_pkt_hd->channel_id != m_channel_id) {
printf("error channel[%d], ignore\n", p_pkt_hd->channel_id);
return;
}
This is essential because B3 assigns different channels to different instrument groups. Processing all channels on a single thread would mix unrelated order books.
Output Formats
The demo produces three types of output:
1. SBE Message CSV
Every parsed SBE message is written to a template-specific CSV file. For example, ph_order_MBO_50.csv:
channel_id,sequence_version,sequence_number,securityID,matchEventIndicator,
mDUpdateAction,mDEntryType,padding,mDEntryPx,mDEntrySize,mDEntryPositionNo,
enteringFirm,mDInsertTimestamp,secondaryOrderID,rptSeq,mDEntryTimestamp
2. Tick-Level Market Data
A consolidated tick file with 10 price levels:
updateTime,updateNano,securityID,Symbol,lastPrice,tradeVolume,
bid_px1,bid_sz1,...,bid_px10,bid_sz10,
ask_px1,ask_sz1,...,ask_px10,ask_sz10
3. Order Book Snapshot
The complete MBO order book for each instrument:
updateTime,updateNano,secondaryOrderID,enteringFirm,securityID,Symbol,
lastMsgSeqNumProcessed,totNumBids,totNumOffers,lastPrice,tradeVolume,
highLimitPrice,lowLimitPrice
Common Pitfalls
1. SBE Version Mismatch
B3 upgraded from SecurityDefinition_4 (template 4) to SecurityDefinition_12 (template 12) between protocol versions 1.6 and 1.8. Your decoder must handle both. The schema_version field in the SBE header tells you which version to expect. As the readme notes: “SBE format before February 2024 is not supported.”
2. Position Number Maintenance
The mDEntryPositionNo field is not a simple array index. Inserting at position N shifts all positions ≥ N up by 1. Deleting at position N shifts all positions > N down by 1. If you treat it as a simple array index, your order book will silently corrupt.
3. Sequence Number Gaps
After a gap is detected (sequence_number skips), you cannot simply continue processing. You must:
- Record the gap
- Request a snapshot recovery
- Rebuild the order book from the snapshot
- Resume incremental processing from
lastMsgSeqNumProcessed + 1
4. Big-Endian Confusion
The #pragma pack(1) structs work with memcpy for the message body, but multi-byte fields in the packet header and framing header are big-endian. The code uses read_uint16()/read_uint32() for these, but raw memcpy for SBE message bodies. Mixing these up is a common source of garbled data.
5. Null Value Handling
The sentinel 0x8000000000000000 is a valid int64 bit pattern that represents “not present.” If you forget to check for null values before converting prices, you’ll get garbage numbers.
6. DELETE_THRU vs DELETE_FROM
MassDeleteOrder_MBO_52 with mDUpdateAction=3 (DELETE_THRU) clears all orders on one side. With mDUpdateAction=4 (DELETE_FROM) clears from a position to the end. Confusing these will wipe the wrong part of the book.
SBE vs. Other Financial Encoding Formats
| Feature | SBE | FIX/FAST | Protobuf | JSON |
|---|---|---|---|---|
| Decoding | Zero-copy (overlay struct) | Token-based | Requires parsing | Text parsing |
| Latency | ~100 ns | ~1 μs | ~5 μs | ~50 μs |
| Schema | XML → code generation | XML dictionary | .proto | None |
| Field access | Direct offset | Tag lookup | Field number | Key lookup |
| Wire size | Compact (no tags) | Compact (delta encoding) | Varint encoding | Verbose |
| Used by | CME, Eurex, B3 | CME (legacy), Euronext | Internal systems | Web APIs |
SBE’s zero-copy property is its killer feature for trading systems. The XML schema (b3-market-data-messages-1.8.0.xml) can be used with the real-logic SBE tool to generate C++, Java, or C# codecs automatically.
Quick Reference: Complete Decode Loop
void binary_manager::parse_package(const uint8_t* pkt_data, uint32_t offset) {
// 1. Read packet header
memcpy(p_pkt_hd, pkt_data + offset, sizeof(packet_header));
offset += sizeof(packet_header);
// 2. Filter by channel
if (p_pkt_hd->channel_id != m_channel_id) return;
// 3. Loop through all messages in the packet
while (offset < pkg_len) {
// 3a. Read framing header
memcpy(p_frm_hd, pkt_data + offset, sizeof(framing_header));
offset += sizeof(framing_header);
// 3b. Read SBE header
memcpy(p_sbe_hd, pkt_data + offset, sizeof(sbe_message_header));
offset += sizeof(sbe_message_header);
// 3c. Validate encoding
if (0xEB50 != p_frm_hd->encoding_type) return;
// 3d. Dispatch by template_id
switch (p_sbe_hd->template_id) {
case 50: // Order_MBO
memcpy(p_order_MBO_50, pkt_data + offset, p_sbe_hd->block_length);
// Process insert/change...
break;
case 51: // DeleteOrder_MBO
memcpy(p_deleteOrder_MBO_51, pkt_data + offset, p_sbe_hd->block_length);
// Process delete...
break;
// ... other templates
}
offset += p_sbe_hd->block_length;
// ... handle repeating groups and variable-length data
}
}
Summary
Parsing B3 Binary UMDF market data is a systematic process:
- Capture UDP multicast packets (pcap or live feed)
- Parse network headers (Ethernet → IPv4 → UDP)
- Decode the Binary UMDF packet header (channel ID, sequence number, T11 timestamp)
- Iterate through framed messages using framing headers
- Dispatch by SBE
template_idto the correct struct - Convert prices using the exponent model and null value checking
- Build the MBO order book using five operations (NEW/CHANGE/DELETE/DELETE_THRU/DELETE_FROM)
- Maintain
mDEntryPositionNocorrectly (insert shifts up, delete shifts down) - Detect gaps via
sequence_numberandrptSeqcontinuity - Recover from gaps using the three-phase protocol (InstrumentDefinition → SnapshotRecovery → Incremental)
The SBE encoding makes step 5 essentially free — you overlay a C struct on raw bytes with no parsing overhead. The hard part is getting the order book logic right, especially PositionNo maintenance and the three-phase recovery sequence.