yuqi-zheng

Ray Internals: The Task, Actor, and Object Model


This is the entry point to a series on reading Ray’s source code. It covers the overall architecture and design philosophy before we dive into individual subsystems.


Ray is a distributed computing framework built around three abstractions: Tasks, Actors, and Objects. You express logic with plain Python; Ray handles scheduling, distributed memory, and parallelism automatically.

Ray also acts as a “distributed glue layer” — you can compose PyTorch, Dask, Modin, and other frameworks within a single Ray application.


Core Concepts

ConceptDescription
TaskA remote function call. Stateless (@ray.remote function) or stateful (Actor method). Executes asynchronously, returns an ObjectRef.
ObjectAn immutable value produced by a task return or ray.put. Referenced by ObjectRef.
ActorA stateful worker process (instance of a @ray.remote class). Tasks are submitted via a handle; internal state persists across calls.
DriverThe program entry point — the process that calls ray.init().
JobAll tasks, objects, and Actors recursively spawned from a single Driver.

Cluster Architecture

┌─────────────────────────────────────────────────┐
│  Head node                                       │
│  ┌──────────────────┐  ┌──────────┐             │
│  │ GCS (global ctrl) │  │ Driver   │             │
│  └──────────────────┘  └──────────┘             │
├─────────────────────────────────────────────────┤
│  Every node                                      │
│  ┌────────────────────────────────────────────┐ │
│  │ Raylet                                     │ │
│  │  ┌──────────────┐  ┌─────────────────────┐ │ │
│  │  │ Scheduler    │  │ Plasma shared memory │ │ │
│  │  └──────────────┘  └─────────────────────┘ │ │
│  └────────────────────────────────────────────┘ │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ Worker 0 │  │ Worker 1 │  │ Worker N │       │
│  └──────────┘  └──────────┘  └──────────┘       │
└─────────────────────────────────────────────────┘
  • Worker processes: Execute tasks and Actor methods. Each holds an ownership table and an in-process store for small objects.
  • Raylet (one per node, shared across Jobs): Task scheduler + Plasma shared-memory object store.
  • GCS (head node): Cluster-level metadata — Actor locations, node membership. Ray 2.0 supports GCS fault tolerance.

The Ownership Model

Ray uses a decentralized metadata model. Every ObjectRef is managed by the Worker that created it, called its Owner.

Owner responsibilities:

  • Guarantee the task that created the object eventually runs
  • Resolve the ObjectRef to its actual value
  • Manage reference counting and garbage collection

Benefits: ~1 RTT latency, tens of thousands of tasks/s per client, linear cluster scaling.

Constraints: An object’s fate is tied to its Owner — if the Owner dies, the object becomes inaccessible. Ownership is non-transferable.

This is an explicit design trade-off: decentralized ownership eliminates a central metadata bottleneck at the cost of fate-sharing.


Memory Tiers

TierContentsAccess pattern
Worker heapRegular Python/C++ heap during task executionLocal memory copy
In-process storeObjects < 100 KB, stored in Owner’s heapCopy on read
Plasma shared memoryObjects returned by tasks or ray.put (≥ 100 KB)Zero-copy on same node, RPC across nodes
Ray metadataTask specs, reference counts — a few KB per live ObjectRefInternal

Large objects stored in Plasma can spill to disk when memory is exhausted, and can be reconstructed (Ray 2.0+).


Task Lifecycle

  1. Submit: Owner packages arguments into a task spec. Small values are inlined; large values are stored via implicit ray.put and passed as ObjectRef.
  2. Dependency resolution: Owner waits for all ObjectRef arguments to become available (objects may be anywhere in the cluster).
  3. Scheduling: Owner requests resources from the distributed scheduler.
  4. Execution: Scheduler assigns a Worker, Owner sends the task spec via gRPC, Worker executes.
  5. Result: Small result → Owner’s in-process store. Large result → local Plasma, Owner notified of location.
  6. Errors: Application-level errors (Python exceptions) are stored as the return value, not retried by default. System-level errors (Worker crash) are retried a configurable number of times.

Object Resolution (Large Objects)

When a task argument or ray.get call needs a large object that’s not local:

  1. Worker requests from local Raylet
  2. Raylet queries the object directory for the object’s location
  3. Raylet fetches the object from the remote Raylet into local Plasma
  4. Worker gets a shared-memory pointer via IPC — zero-copy

Actor Lifecycle

  • Creation: Managed by GCS. Creator registers the Actor, GCS schedules and starts the Actor process.
  • Execution: Tasks go directly to the Actor via gRPC — no scheduler involvement per-call (resources granted at creation time). Tasks from the same caller execute serially in submission order.
  • Concurrency options: Default is serial; opt into asyncio-based concurrent Actor or thread-pool Actor.
  • Cleanup: Regular Actors are cleaned up when the creator exits and no references remain. Detached Actors persist until explicitly killed with ray.kill.

Failure Model

System level: Any node (except the head node) can be lost without affecting the cluster. GCS tracks node membership via heartbeats; timed-out nodes are marked dead, their Raylets exit.

Application level: Tasks and objects share the fate of their Owner. Nested calls naturally create failure isolation boundaries — subtasks in different Owners fail independently. Detached Actors break fate-sharing for long-lived services.

Recovery: Automatic task retry, Actor restart, object spilling, and object reconstruction (Ray 2.0) are all configurable.


Language Runtime

Ray’s core is implemented in C++ and exposed through the CoreWorker library. Python, Java, and C++ (experimental) frontends call into CoreWorker via bindings. CoreWorker handles the ownership table, in-process store, and all gRPC communication.


The next posts in this series go deep into Ray’s async infrastructure — starting with how Ray uses Boost.Asio not as a network engine, but as a lock-free business logic scheduler.