Ray Async Internals (3): Thread Pool and Periodic Timer
Part 3 of the Ray async infrastructure series. ← Part 2: Observability and chaos testing · Part 4: gRPC and Asio bridging →
Two components underpin almost all background work in Ray: IOServicePool, which manages a fleet of io_context threads, and PeriodicalRunner, which schedules repeating callbacks on top of them. This post covers both.
IOServicePool: Event Loop Thread Pool
Structure
┌─────────────────────────────────────────────┐
│ IOServicePool │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │io_ctx[0] │ │io_ctx[1] │ │io_ctx[N] │ │
│ │ Thread 0 │ │ Thread 1 │ │ Thread N │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────┘
Each io_context runs on its own dedicated thread. Handlers posted to a given io_context execute serially on that thread — so code running inside a handler needs no locks, as long as it only touches state owned by that io_context.
Setup
IOServicePool pool(4); // create 4 io_contexts, each on its own thread
pool.Run(); // start all threads (work_guard keeps each alive)
Two Dispatch Strategies
| Method | Strategy | Use case |
|---|---|---|
Get() | Round-robin | Stateless short tasks — spread load evenly |
Get(hash) | Hash-pinned | Objects requiring thread affinity |
The hash-pinned strategy is the important one. When all callbacks for a given connection are routed to the same io_context, they execute in submission order on the same thread. No mutex needed — the io_context itself provides the serialization guarantee.
Example: if a gRPC connection has a 64-bit ID, pool.Get(connection_id) always returns the same io_context for that connection. All incoming requests from that connection process serially, which is exactly what you want for stateful protocol handling.
PeriodicalRunner: Periodic Task Scheduler
Basic Usage
auto runner = PeriodicalRunner::Create(io_context);
runner->RunFnPeriodically([] { SendHeartbeat(); }, /*interval_ms=*/1000, "Heartbeat");
Schedules SendHeartbeat to run every 1000ms on the given io_context.
Lifetime Safety
PeriodicalRunner inherits std::enable_shared_from_this<PeriodicalRunner> and captures weak_from_this() in its timer callback:
timer->async_wait(
[weak_self = weak_from_this(), fn, period, timer](const auto& error) {
if (auto self = weak_self.lock()) {
if (error == boost::asio::error::operation_aborted) return;
fn();
self->DoRunFnPeriodically(fn, period, timer);
}
// lock() returned null — runner was destroyed, loop stops silently
});
When the caller drops its shared_ptr<PeriodicalRunner>, the next timer fires, lock() returns null, and the periodic loop terminates — no crash, no leak, no explicit cancellation API needed.
Note that timer is also captured by value (as a shared_ptr). This keeps the timer alive until the final callback sees the runner is gone and exits. Without this, the timer could be destroyed before its last callback fires.
This is the
weak_from_thispattern from the C++ lifetime safety series applied to a real production scheduler.
Instrumented Execution
When the global event_stats flag is enabled, PeriodicalRunner records the queue wait time between timer expiry and actual callback execution — adding another observability data point on top of the lag metric from Part 2.
How They Compose
IOServicePool::Get(hash) → instrumented_io_context*
↓
PeriodicalRunner::Create(*io_context)
↓
runner->RunFnPeriodically(...)
IOServicePool provides the threads; PeriodicalRunner is the clock that runs on one of them. Nearly all background periodic work in Ray — heartbeating, metric reporting, garbage collection — runs through this combination.
Summary
IOServicePoolachieves lock-free concurrency: multiple threads, each running an independentio_context, with hash-pinning for thread affinity.PeriodicalRunnerusesweak_from_this()for safe lifetime management: when the owner is destroyed, running callbacks drain safely and the loop stops.- Together they are the scheduling substrate for Ray’s background infrastructure.