yuqi-zheng

C++ Async Callbacks: Lambda Capture and the Destruction Order Fiasco


A common crash pattern in async C++: an object registers a callback with a long-lived component, the object gets destroyed, and the callback fires later — accessing memory that no longer exists. This is called the Destruction Order Fiasco, and it’s subtle enough to slip through code review.

This post breaks down the problem, shows a real example from the Ray distributed computing framework, and explains why capturing shared_ptr by value in a lambda is the right fix.


The Problem

Consider this scenario:

  1. CoreWorker registers a callback with GcsClient (which lives longer)
  2. The callback closes over this — a raw pointer to CoreWorker
  3. CoreWorker is destroyed; its members are freed
  4. GcsClient fires the callback
  5. The callback dereferences a dangling pointer → crash

This is a use-after-free, and in production it often manifests as an intermittent, hard-to-reproduce segfault.


The Dangerous Pattern: Capturing [this]

class CoreWorker {
public:
    CoreWorker(std::shared_ptr<GcsClient> gcs) : gcs_client_(gcs) {
        // Danger: lambda captures raw `this`
        gcs_client_->Subscribe([this](const NodeID& node_id) {
            ref_counter_->ResetObjectsOnRemovedNode(node_id);
        });
    }

    ~CoreWorker() { /* ref_counter_ and friends are destroyed here */ }

private:
    std::shared_ptr<GcsClient> gcs_client_;
    std::shared_ptr<ReferenceCounter> ref_counter_;
};

[this] captures an 8-byte raw pointer. It participates in no reference counting. Nothing stops CoreWorker from being destroyed while GcsClient still holds the callback.

Timeline of the crash:

1. CoreWorker constructed → callback registered, closure holds raw `this`
2. CoreWorker destroyed   → ref_counter_ freed
3. GcsClient fires event  → callback invoked
4. this->ref_counter_->... → use-after-free → segfault

The Fix: Capture shared_ptr by Value

From core_worker.cc in the Ray source:

void CoreWorker::SubscribeToNodeChanges() {
  std::call_once(subscribe_to_node_changes_flag_, [this]() {
    // Capture shared ownership to avoid destruction order fiasco between
    // gcs_client, reference_counter_, raylet_client_pool_, and
    // core_worker_client_pool_.
    auto on_node_change = [reference_counter = reference_counter_,
                           rate_limiter = lease_request_rate_limiter_,
                           raylet_client_pool = raylet_client_pool_,
                           core_worker_client_pool = core_worker_client_pool_](
                              const NodeID &node_id,
                              const rpc::GcsNodeAddressAndLiveness &data) {
      if (data.state() == rpc::GcsNodeInfo::DEAD) {
        reference_counter->ResetObjectsOnRemovedNode(node_id);
        raylet_client_pool->Disconnect(node_id);
        core_worker_client_pool->Disconnect(node_id);
      }
    };

    gcs_client_->Nodes().AsyncSubscribeToNodeAddressAndLivenessChange(
        std::move(on_node_change), /*callback*/...);
  });
}

The comment is the giveaway: “capture shared ownership to avoid destruction order fiasco”. Each member (reference_counter_, raylet_client_pool_, etc.) is a shared_ptr. Capturing them by value copies the smart pointer — not the underlying object — and increments the reference count.

Safe timeline:

1. CoreWorker constructed → lambda captures shared_ptrs (ref count +1 each)
2. CoreWorker destroyed   → members' ref counts drop by 1
                          → underlying objects NOT freed (lambda still holds them)
3. GcsClient fires event  → callback runs safely, objects still alive
4. GcsClient destroyed    → lambda destroyed, ref counts hit zero → objects freed

The lambda becomes self-contained: it owns everything it needs to execute safely, independent of CoreWorker’s lifetime.


Performance Cost

Capturing a shared_ptr by value is a shallow copy — it copies two pointers (16 bytes) and does one atomic increment on the reference count. The managed object itself is not copied.

CaptureCost
[this]8-byte pointer copy, no atomic op
[shared_ptr]16-byte pointer copy + one atomic increment

In practice this is negligible. The real cost of a callback is the heap allocation from type-erasure in std::function — and that happens regardless of what you capture.


Choosing the Right Capture

ScenarioCaptureReason
Synchronous, local scope (e.g. std::sort)[&] or [this]Lifetime is obvious, zero overhead
Callback runs immediately, caller waits[&] or [this]Call stack keeps object alive
Callback stored in long-lived component[shared_ptr]Extend lifetime, prevent dangling
Callback is best-effort (e.g. UI refresh)[weak_ptr]Don’t force the object to stay alive

The shared_ptr vs weak_ptr choice comes down to whether the callback must run or may be discarded:

// weak_ptr: callback is optional — skip if object is gone
auto callback = [weak = weak_from_this()] {
    if (auto self = weak.lock()) {
        self->UpdateUI();
    }
};

Ray uses shared_ptr because the cleanup logic (ResetObjectsOnRemovedNode, Disconnect) is not optional — it must run even if CoreWorker is gone.


Summary

  • [this] capture is a raw pointer — it carries no ownership, no safety guarantee
  • Capturing shared_ptr members by value gives the lambda partial ownership of the resources it needs
  • The lambda becomes self-contained and safe to invoke regardless of the registering object’s lifetime
  • Cost is minimal: one atomic op per captured pointer
  • Use weak_ptr when the callback is optional; shared_ptr when it must execute

The next time you write a callback that gets stored somewhere, ask: could the object I’m closing over be destroyed before this fires? If yes, capture shared_ptr — not this.


This is Part 1 of a series. Part 2 covers the weak_from_this pattern for callbacks that should silently abort when the object is gone.

References

  • Ray source: src/ray/core_worker/core_worker.cc
  • Scott Meyers, Effective Modern C++ — Item 31: Avoid default capture modes
  • C++ Core Guidelines F.53: Avoid capturing by reference in lambdas used non-locally