Ray Async Internals (4): Bridging gRPC and Asio
Part 4 of the Ray async infrastructure series. ← Part 3: Thread pool and periodic timer · Part 5: GCS thread isolation →
Part 1 established that Ray uses Asio as a business logic serializer, not a network engine. This post shows how that works in practice: how an incoming gRPC request travels from a network thread to an Asio event loop, and why the handoff happens where it does.
The Layering
┌──────────────────────────────────────────────────────────┐
│ Business logic layer (lock-free, single-threaded) │
│ GcsNodeManager::HandleRegisterNode, etc. │
└────────────────────────────┬─────────────────────────────┘
│ post()
┌────────────────────────────▼─────────────────────────────┐
│ Asio layer (single-threaded event loop) │
│ instrumented_io_context ("node_manager_io_context") │
└────────────────────────────┬─────────────────────────────┘
│ callback trigger
┌────────────────────────────▼─────────────────────────────┐
│ gRPC layer (multi-threaded network I/O) │
│ GrpcServer, CompletionQueue thread pool │
└──────────────────────────────────────────────────────────┘
The rule: gRPC threads receive packets; Asio threads process them. A single post() call separates the two worlds.
GrpcServer Internals
CompletionQueue Thread Pool
cqs_.resize(num_threads_); // one CompletionQueue per thread
GrpcServer starts N threads, each driving a CompletionQueue. When a gRPC request arrives, one of these threads handles the network read and Protobuf deserialization.
The Handoff: ServerCall
Each RPC method is backed by a ServerCallFactory (generated by the RPC_SERVICE_HANDLER macro). When a request is received, a ServerCall object is created — but it does not call the business handler directly. Instead:
main_service_.post([this, request, reply, callback] {
service_handler_->HandleXXX(request, reply, callback);
}, "HandleXXX");
One post call transfers execution from the gRPC thread to the Asio event loop. From this point on, HandleXXX runs on the dedicated io_context thread, with no concurrent access from other gRPC threads.
RPC_SERVICE_HANDLER Macro
#define RPC_SERVICE_HANDLER(SERVICE, HANDLER, MAX_ACTIVE_RPCS) ...
This macro generates the ServerCallFactory for a given RPC method. It binds:
- The Protobuf request and response types
- The maximum number of concurrent in-flight RPCs (backpressure)
- The business handler function
The macro eliminates boilerplate and ensures every RPC method uses the same handoff pattern — no possibility of accidentally calling business logic directly on a gRPC thread.
Full Request Flow: RegisterNode
Raylet ──(gRPC)──► GCS GrpcServer
│
▼
CompletionQueue thread receives request,
deserializes Protobuf
│
▼
ServerCall object created
│
│ post() to node_manager_io_context
▼
┌──────────────────────────────┐
│ instrumented_io_context queue │
└──────────────┬───────────────┘
│
▼
Dedicated thread executes:
GcsNodeManager::HandleRegisterNode
│
▼
Response returned to Raylet
HandleRegisterNode always runs on the node_manager_io_context thread. It can freely read and write GcsNodeManager state without locks, because nothing else runs concurrently on that thread.
Why This Architecture
| Property | Mechanism |
|---|---|
| High network throughput | gRPC thread pool saturates available cores |
| Lock-free business logic | Single Asio thread serializes all handlers |
| Fault isolation | Different modules can bind to different io_context threads |
| Observability | Each io_context has independent lag metrics (Part 2) |
The lock-free property is the key payoff. GcsNodeManager holds substantial state — node registrations, subscriptions, resource maps. Without the Asio serialization, every access would need a lock. With it, the data structures are effectively single-threaded: no lock contention, simpler code, easier to reason about.
Summary
- Ray’s gRPC handlers never execute business logic directly. They post a lambda to an Asio
io_contextand return. ServerCallis the handoff object;RPC_SERVICE_HANDLERis the macro that generates one per RPC method.- “Receive on many threads, process on one” is the central concurrency architecture of Ray’s C++ services.