Ray Internals: The Task, Actor, and Object Model
This is the entry point to a series on reading Ray’s source code. It covers the overall architecture and design philosophy before we dive into individual subsystems.
Ray is a distributed computing framework built around three abstractions: Tasks, Actors, and Objects. You express logic with plain Python; Ray handles scheduling, distributed memory, and parallelism automatically.
Ray also acts as a “distributed glue layer” — you can compose PyTorch, Dask, Modin, and other frameworks within a single Ray application.
Core Concepts
| Concept | Description |
|---|---|
| Task | A remote function call. Stateless (@ray.remote function) or stateful (Actor method). Executes asynchronously, returns an ObjectRef. |
| Object | An immutable value produced by a task return or ray.put. Referenced by ObjectRef. |
| Actor | A stateful worker process (instance of a @ray.remote class). Tasks are submitted via a handle; internal state persists across calls. |
| Driver | The program entry point — the process that calls ray.init(). |
| Job | All tasks, objects, and Actors recursively spawned from a single Driver. |
Cluster Architecture
┌─────────────────────────────────────────────────┐
│ Head node │
│ ┌──────────────────┐ ┌──────────┐ │
│ │ GCS (global ctrl) │ │ Driver │ │
│ └──────────────────┘ └──────────┘ │
├─────────────────────────────────────────────────┤
│ Every node │
│ ┌────────────────────────────────────────────┐ │
│ │ Raylet │ │
│ │ ┌──────────────┐ ┌─────────────────────┐ │ │
│ │ │ Scheduler │ │ Plasma shared memory │ │ │
│ │ └──────────────┘ └─────────────────────┘ │ │
│ └────────────────────────────────────────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Worker 0 │ │ Worker 1 │ │ Worker N │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
- Worker processes: Execute tasks and Actor methods. Each holds an ownership table and an in-process store for small objects.
- Raylet (one per node, shared across Jobs): Task scheduler + Plasma shared-memory object store.
- GCS (head node): Cluster-level metadata — Actor locations, node membership. Ray 2.0 supports GCS fault tolerance.
The Ownership Model
Ray uses a decentralized metadata model. Every ObjectRef is managed by the Worker that created it, called its Owner.
Owner responsibilities:
- Guarantee the task that created the object eventually runs
- Resolve the
ObjectRefto its actual value - Manage reference counting and garbage collection
Benefits: ~1 RTT latency, tens of thousands of tasks/s per client, linear cluster scaling.
Constraints: An object’s fate is tied to its Owner — if the Owner dies, the object becomes inaccessible. Ownership is non-transferable.
This is an explicit design trade-off: decentralized ownership eliminates a central metadata bottleneck at the cost of fate-sharing.
Memory Tiers
| Tier | Contents | Access pattern |
|---|---|---|
| Worker heap | Regular Python/C++ heap during task execution | Local memory copy |
| In-process store | Objects < 100 KB, stored in Owner’s heap | Copy on read |
| Plasma shared memory | Objects returned by tasks or ray.put (≥ 100 KB) | Zero-copy on same node, RPC across nodes |
| Ray metadata | Task specs, reference counts — a few KB per live ObjectRef | Internal |
Large objects stored in Plasma can spill to disk when memory is exhausted, and can be reconstructed (Ray 2.0+).
Task Lifecycle
- Submit: Owner packages arguments into a task spec. Small values are inlined; large values are stored via implicit
ray.putand passed asObjectRef. - Dependency resolution: Owner waits for all
ObjectRefarguments to become available (objects may be anywhere in the cluster). - Scheduling: Owner requests resources from the distributed scheduler.
- Execution: Scheduler assigns a Worker, Owner sends the task spec via gRPC, Worker executes.
- Result: Small result → Owner’s in-process store. Large result → local Plasma, Owner notified of location.
- Errors: Application-level errors (Python exceptions) are stored as the return value, not retried by default. System-level errors (Worker crash) are retried a configurable number of times.
Object Resolution (Large Objects)
When a task argument or ray.get call needs a large object that’s not local:
- Worker requests from local Raylet
- Raylet queries the object directory for the object’s location
- Raylet fetches the object from the remote Raylet into local Plasma
- Worker gets a shared-memory pointer via IPC — zero-copy
Actor Lifecycle
- Creation: Managed by GCS. Creator registers the Actor, GCS schedules and starts the Actor process.
- Execution: Tasks go directly to the Actor via gRPC — no scheduler involvement per-call (resources granted at creation time). Tasks from the same caller execute serially in submission order.
- Concurrency options: Default is serial; opt into asyncio-based concurrent Actor or thread-pool Actor.
- Cleanup: Regular Actors are cleaned up when the creator exits and no references remain. Detached Actors persist until explicitly killed with
ray.kill.
Failure Model
System level: Any node (except the head node) can be lost without affecting the cluster. GCS tracks node membership via heartbeats; timed-out nodes are marked dead, their Raylets exit.
Application level: Tasks and objects share the fate of their Owner. Nested calls naturally create failure isolation boundaries — subtasks in different Owners fail independently. Detached Actors break fate-sharing for long-lived services.
Recovery: Automatic task retry, Actor restart, object spilling, and object reconstruction (Ray 2.0) are all configurable.
Language Runtime
Ray’s core is implemented in C++ and exposed through the CoreWorker library. Python, Java, and C++ (experimental) frontends call into CoreWorker via bindings. CoreWorker handles the ownership table, in-process store, and all gRPC communication.
The next posts in this series go deep into Ray’s async infrastructure — starting with how Ray uses Boost.Asio not as a network engine, but as a lock-free business logic scheduler.