Ray Internals: The Task, Actor, and Object Model

This is the entry point to a series on reading Ray’s source code. It covers the overall architecture and design philosophy before we dive into individual subsystems.

Ray is a distributed computing framework built around three abstractions: Tasks, Actors, and Objects. You express logic with plain Python; Ray handles scheduling, distributed memory, and parallelism automatically.

Ray also acts as a “distributed glue layer” — you can compose PyTorch, Dask, Modin, and other frameworks within a single Ray application.

Core Concepts

Concept	Description
Task	A remote function call. Stateless (`@ray.remote` function) or stateful (Actor method). Executes asynchronously, returns an `ObjectRef`.
Object	An immutable value produced by a task return or `ray.put`. Referenced by `ObjectRef`.
Actor	A stateful worker process (instance of a `@ray.remote` class). Tasks are submitted via a handle; internal state persists across calls.
Driver	The program entry point — the process that calls `ray.init()`.
Job	All tasks, objects, and Actors recursively spawned from a single Driver.

Cluster Architecture

┌─────────────────────────────────────────────────┐
│  Head node                                       │
│  ┌──────────────────┐  ┌──────────┐             │
│  │ GCS (global ctrl) │  │ Driver   │             │
│  └──────────────────┘  └──────────┘             │
├─────────────────────────────────────────────────┤
│  Every node                                      │
│  ┌────────────────────────────────────────────┐ │
│  │ Raylet                                     │ │
│  │  ┌──────────────┐  ┌─────────────────────┐ │ │
│  │  │ Scheduler    │  │ Plasma shared memory │ │ │
│  │  └──────────────┘  └─────────────────────┘ │ │
│  └────────────────────────────────────────────┘ │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ Worker 0 │  │ Worker 1 │  │ Worker N │       │
│  └──────────┘  └──────────┘  └──────────┘       │
└─────────────────────────────────────────────────┘

Worker processes: Execute tasks and Actor methods. Each holds an ownership table and an in-process store for small objects.
Raylet (one per node, shared across Jobs): Task scheduler + Plasma shared-memory object store.
GCS (head node): Cluster-level metadata — Actor locations, node membership. Ray 2.0 supports GCS fault tolerance.

The Ownership Model

Ray uses a decentralized metadata model. Every ObjectRef is managed by the Worker that created it, called its Owner.

Owner responsibilities:

Guarantee the task that created the object eventually runs
Resolve the ObjectRef to its actual value
Manage reference counting and garbage collection

Benefits: ~1 RTT latency, tens of thousands of tasks/s per client, linear cluster scaling.

Constraints: An object’s fate is tied to its Owner — if the Owner dies, the object becomes inaccessible. Ownership is non-transferable.

This is an explicit design trade-off: decentralized ownership eliminates a central metadata bottleneck at the cost of fate-sharing.

Memory Tiers

Tier	Contents	Access pattern
Worker heap	Regular Python/C++ heap during task execution	Local memory copy
In-process store	Objects < 100 KB, stored in Owner’s heap	Copy on read
Plasma shared memory	Objects returned by tasks or `ray.put` (≥ 100 KB)	Zero-copy on same node, RPC across nodes
Ray metadata	Task specs, reference counts — a few KB per live `ObjectRef`	Internal

Large objects stored in Plasma can spill to disk when memory is exhausted, and can be reconstructed (Ray 2.0+).

Task Lifecycle

Submit: Owner packages arguments into a task spec. Small values are inlined; large values are stored via implicit ray.put and passed as ObjectRef.
Dependency resolution: Owner waits for all ObjectRef arguments to become available (objects may be anywhere in the cluster).
Scheduling: Owner requests resources from the distributed scheduler.
Execution: Scheduler assigns a Worker, Owner sends the task spec via gRPC, Worker executes.
Result: Small result → Owner’s in-process store. Large result → local Plasma, Owner notified of location.
Errors: Application-level errors (Python exceptions) are stored as the return value, not retried by default. System-level errors (Worker crash) are retried a configurable number of times.

Object Resolution (Large Objects)

When a task argument or ray.get call needs a large object that’s not local:

Worker requests from local Raylet
Raylet queries the object directory for the object’s location
Raylet fetches the object from the remote Raylet into local Plasma
Worker gets a shared-memory pointer via IPC — zero-copy

Actor Lifecycle

Creation: Managed by GCS. Creator registers the Actor, GCS schedules and starts the Actor process.
Execution: Tasks go directly to the Actor via gRPC — no scheduler involvement per-call (resources granted at creation time). Tasks from the same caller execute serially in submission order.
Concurrency options: Default is serial; opt into asyncio-based concurrent Actor or thread-pool Actor.
Cleanup: Regular Actors are cleaned up when the creator exits and no references remain. Detached Actors persist until explicitly killed with ray.kill.

Failure Model

System level: Any node (except the head node) can be lost without affecting the cluster. GCS tracks node membership via heartbeats; timed-out nodes are marked dead, their Raylets exit.

Application level: Tasks and objects share the fate of their Owner. Nested calls naturally create failure isolation boundaries — subtasks in different Owners fail independently. Detached Actors break fate-sharing for long-lived services.

Recovery: Automatic task retry, Actor restart, object spilling, and object reconstruction (Ray 2.0) are all configurable.

Language Runtime

Ray’s core is implemented in C++ and exposed through the CoreWorker library. Python, Java, and C++ (experimental) frontends call into CoreWorker via bindings. CoreWorker handles the ownership table, in-process store, and all gRPC communication.

The next posts in this series go deep into Ray’s async infrastructure — starting with how Ray uses Boost.Asio not as a network engine, but as a lock-free business logic scheduler.