yuqi-zheng

PEP 574: Pickle Protocol 5 and Out-of-Band Data


This is Part 1 of a two-part series on Python zero-copy serialization. Part 2 covers the practical choice between Protocol 5 and shared_memory.


Pickle was designed in 1995 for disk persistence. Today it’s the backbone of Python’s multiprocessing IPC, Dask task graphs, and Ray’s object store. The problem: serializing a large NumPy array triggers three redundant memory copies. PEP 574, shipped in Python 3.8, fixes this with a mechanism called out-of-band data.


The Copy Problem

Serializing a bytearray with the old protocol:

  1. bytearray.__reduce_ex__ creates a bytes copy of the data.
  2. pickle.dumps copies that bytes object into the pickle stream.
  3. On deserialization, a temporary bytes object is created again before the final object is reconstructed.

Three copies for one array. Third-party libraries (Dask, PyArrow) worked around this with custom serialization, but that only helped objects that supported it — plain Python objects were stuck with Pickle.


The Design: Metadata + Data Separation

PEP 574’s core idea: split the pickle stream into two channels.

  • Metadata stream: object type, structure, references — the existing pickle format, unchanged.
  • Data buffers: large binary payloads sent as out-of-band buffers, passed as raw memory views with no copy.

The producer signals which parts of an object are “large data” by wrapping them in PickleBuffer. The consumer decides whether to handle them out-of-band (zero-copy) or inline.


New APIs

APIRole
pickle.PickleBufferWraps a buffer for out-of-band handling in __reduce_ex__
buffer_callback (pickling)Called with each out-of-band buffer; return False to transmit out-of-band
buffers (unpickling)Iterable of pre-allocated buffers to inject on deserialization
Protocol 5New protocol version required for all of the above

Existing code that doesn’t use these APIs is completely unaffected — Protocol 5 is backwards compatible for all previously serializable objects.


Producer API: PickleBuffer

A class implements out-of-band support by returning PickleBuffer from __reduce_ex__:

import pickle

class LargeArray:
    def __reduce_ex__(self, protocol):
        if protocol >= 5:
            return type(self), (pickle.PickleBuffer(self),)
        # fall back to copying for older protocols
        return type(self), (bytes(self),)

PickleBuffer wraps anything that implements the PEP 3118 buffer protocol — NumPy arrays, bytearray, memoryview. It exposes:

  • memoryview(pb) — get shape, strides, format
  • pb.raw() — raw contiguous view (needed for Fortran-order arrays)
  • pb.release() — explicitly release the underlying buffer

The buffer must be contiguous (C or Fortran order). Non-contiguous buffers raise an error at pickle time.


Consumer API: Pickling

buffers = []
data = pickle.dumps(obj, protocol=5, buffer_callback=buffers.append)

When buffer_callback is provided, it’s called for each PickleBuffer the object returns. The callback’s return value controls routing:

  • Return False (or None) → buffer goes out-of-band, stored in buffers, a NEXT_BUFFER opcode is written into the pickle stream
  • Return True → buffer is serialized inline (same as old behavior)

This lets the caller decide per-buffer: transmit large arrays out-of-band, but serialize small metadata inline.

Consumer API: Unpickling

obj = pickle.loads(data, buffers=buffers)

buffers is an iterable consumed in order. Each NEXT_BUFFER opcode in the stream pops the next buffer from this iterable. The consumer can pre-allocate these buffers in shared memory, GPU memory, or any target — the deserializer writes into them directly with no intermediate copy.


New Protocol 5 Opcodes

OpcodeEffect
BYTEARRAY8Construct bytearray from inline stream data
NEXT_BUFFERPop the next buffer from the buffers iterable and push it on the stack
READONLY_BUFFERConvert the top-of-stack buffer to a read-only view

READONLY_BUFFER is important for objects like bytes that are immutable — the consumer can map a read-only memory region as the deserialized value.


Zero-Copy Shared Memory Example

With buffer_callback and buffers, two objects can point at the same physical memory:

import numpy as np
import pickle

a = np.zeros(10)
buffers = []

# serialize: buffer_callback captures the array's memory view, no copy
data = pickle.dumps(a, protocol=5, buffer_callback=buffers.append)

# deserialize: inject the same buffers back — no copy
b = pickle.loads(data, buffers=buffers)

b[0] = 42
print(a[0])  # 42.0 — a and b share the same memory

In a real multiprocessing scenario, the consumer would allocate shared memory and pass memoryview objects as buffers, giving the deserialized array a zero-copy view into shared memory.


What Was Rejected

Persistent load interface: persistent_id() / persistent_load() could theoretically carry buffers, but every object — even int — would trigger a persistent_id() call. Too slow.

Batch buffer passing: passing a list of buffers instead of a callback prevents the callback from signaling inline vs. out-of-band per buffer. Rejected because the flexibility matters.

PickleBuffer in old protocols: allowing PickleBuffer in Protocol 4 would require two extra copies to embed the buffer inline, defeating the purpose. Protocol 5 is required.


Ecosystem Status

ProjectStatus
CPython 3.8+Merged
pickle5 (PyPI)Backport for Python 3.6/3.7
NumPyProtocol 5 + out-of-band buffer support added
Apache ArrowSupport in Python bindings

Python 3.14 will make Protocol 5 the default. Until then, you need protocol=5 explicitly.


Summary

  • Old Pickle copies large arrays 3 times during serialization. Protocol 5 reduces this to zero copies on the data path.
  • PickleBuffer is the producer-side signal; buffer_callback and buffers are the consumer-side hooks.
  • The caller controls routing: each buffer can independently go out-of-band or inline.
  • The deserialization target is caller-supplied — shared memory, GPU memory, or any buffer-protocol object.
  • Requires protocol=5 explicitly until Python 3.14.

References