yuqi-zheng

Ray Async Internals (4): Bridging gRPC and Asio


Part 4 of the Ray async infrastructure series. ← Part 3: Thread pool and periodic timer · Part 5: GCS thread isolation →


Part 1 established that Ray uses Asio as a business logic serializer, not a network engine. This post shows how that works in practice: how an incoming gRPC request travels from a network thread to an Asio event loop, and why the handoff happens where it does.


The Layering

┌──────────────────────────────────────────────────────────┐
│  Business logic layer (lock-free, single-threaded)       │
│  GcsNodeManager::HandleRegisterNode, etc.                │
└────────────────────────────┬─────────────────────────────┘
                             │ post()
┌────────────────────────────▼─────────────────────────────┐
│  Asio layer (single-threaded event loop)                 │
│  instrumented_io_context ("node_manager_io_context")     │
└────────────────────────────┬─────────────────────────────┘
                             │ callback trigger
┌────────────────────────────▼─────────────────────────────┐
│  gRPC layer (multi-threaded network I/O)                 │
│  GrpcServer, CompletionQueue thread pool                 │
└──────────────────────────────────────────────────────────┘

The rule: gRPC threads receive packets; Asio threads process them. A single post() call separates the two worlds.


GrpcServer Internals

CompletionQueue Thread Pool

cqs_.resize(num_threads_);  // one CompletionQueue per thread

GrpcServer starts N threads, each driving a CompletionQueue. When a gRPC request arrives, one of these threads handles the network read and Protobuf deserialization.

The Handoff: ServerCall

Each RPC method is backed by a ServerCallFactory (generated by the RPC_SERVICE_HANDLER macro). When a request is received, a ServerCall object is created — but it does not call the business handler directly. Instead:

main_service_.post([this, request, reply, callback] {
    service_handler_->HandleXXX(request, reply, callback);
}, "HandleXXX");

One post call transfers execution from the gRPC thread to the Asio event loop. From this point on, HandleXXX runs on the dedicated io_context thread, with no concurrent access from other gRPC threads.

RPC_SERVICE_HANDLER Macro

#define RPC_SERVICE_HANDLER(SERVICE, HANDLER, MAX_ACTIVE_RPCS) ...

This macro generates the ServerCallFactory for a given RPC method. It binds:

  • The Protobuf request and response types
  • The maximum number of concurrent in-flight RPCs (backpressure)
  • The business handler function

The macro eliminates boilerplate and ensures every RPC method uses the same handoff pattern — no possibility of accidentally calling business logic directly on a gRPC thread.


Full Request Flow: RegisterNode

Raylet ──(gRPC)──► GCS GrpcServer


            CompletionQueue thread receives request,
            deserializes Protobuf


            ServerCall object created

                        │ post() to node_manager_io_context

         ┌──────────────────────────────┐
         │ instrumented_io_context queue │
         └──────────────┬───────────────┘


      Dedicated thread executes:
      GcsNodeManager::HandleRegisterNode


             Response returned to Raylet

HandleRegisterNode always runs on the node_manager_io_context thread. It can freely read and write GcsNodeManager state without locks, because nothing else runs concurrently on that thread.


Why This Architecture

PropertyMechanism
High network throughputgRPC thread pool saturates available cores
Lock-free business logicSingle Asio thread serializes all handlers
Fault isolationDifferent modules can bind to different io_context threads
ObservabilityEach io_context has independent lag metrics (Part 2)

The lock-free property is the key payoff. GcsNodeManager holds substantial state — node registrations, subscriptions, resource maps. Without the Asio serialization, every access would need a lock. With it, the data structures are effectively single-threaded: no lock contention, simpler code, easier to reason about.


Summary

  • Ray’s gRPC handlers never execute business logic directly. They post a lambda to an Asio io_context and return.
  • ServerCall is the handoff object; RPC_SERVICE_HANDLER is the macro that generates one per RPC method.
  • “Receive on many threads, process on one” is the central concurrency architecture of Ray’s C++ services.

Next: GCS compile-time thread isolation →