yuqi-zheng

_GLIBCXX_USE_CXX11_ABI=0: Why the Old ABI Is Slow


The _GLIBCXX_USE_CXX11_ABI=0 compiler flag forces GCC to use the old (pre-C++11) standard library implementation. It exists for binary compatibility with libraries compiled by GCC 4.x, and the only reason to use it is when you must link against a closed-source .so or .a that was built with the old ABI and cannot be recompiled.

In every other case, keeping the default new ABI is the correct choice. The performance differences are not subtle.


The Core Differences

Two standard library components change fundamentally between the two ABIs: std::string and std::list.

std::string: COW vs SSO

Old ABI (=0)New ABI (=1, default)
StrategyCopy-On-Write (COW)Small String Optimization (SSO)
MemoryAlways heap-allocates, even for 1-character stringsStores short strings (< 16 bytes) inline, no malloc
ConcurrencyAtomic refcount updates on every copy, even constDeep copy (or SSO), no shared state

The SSO advantage is decisive for programs that create many short strings — IDs, keys, path components, log messages. Every one of those allocations goes away.

The COW concurrency cost is subtler but equally real: copying a const string across threads requires an atomic increment on the shared reference count. With enough threads doing this, the atomic contention becomes a bottleneck.

std::list::size(): O(n) vs O(1)

This is the most dangerous difference. Under the old ABI, std::list::size() is O(n) — it walks the entire linked list and counts nodes. Under the new ABI, C++11 mandates O(1) — the size is stored as a member variable.

Putting list.size() in a loop condition with the old ABI turns an O(n) algorithm into O(n²).


Benchmarks

String Creation (SSO)

Creating 10 million short strings:

for (int i = 0; i < 10000000; ++i) {
    std::string s = "short";
    len_sum += s.length();
}
ABITimeSpeedup
Old (=0)504 msbaseline
New (=1)44 ms11.6x

The old ABI calls malloc 10 million times. The new ABI never touches the heap.

std::list::size()

Calling size() once on a list of 10 million elements:

ABITimeSpeedup
Old (=0)39 msbaseline
New (=1)0.0001 ms340,000x

The old ABI traverses 10 million nodes. The new ABI reads a single integer.

Multi-threaded String Copy

4 threads, each copying a 50-character string 1 million times:

ABITimeSpeedup
Old (=0)943 msbaseline
New (=1)570 ms1.65x

The atomic refcount operations in COW cause cache coherence traffic that slows all threads.


When You Actually Need =0

There is exactly one valid reason: binary compatibility. If you must link a library compiled with GCC < 5 and do not have its source code, you need _GLIBCXX_USE_CXX11_ABI=0 to match the std::string and std::list layouts. This is the scenario that motivates TensorFlow and some CUDA SDK builds to require it.

If you have the source, recompile. If you are writing new code, use the default.


How to Check

#if _GLIBCXX_USE_CXX11_ABI
    // new ABI
#else
    // old ABI
#endif

Or inspect the preprocessor output:

echo | g++ -dM -E - | grep GLIBCXX_USE_CXX11_ABI

If nothing prints, you are on the default (new ABI). If it shows #define _GLIBCXX_USE_CXX11_ABI 0, someone or something in your build system is forcing the old ABI.


Summary

The old ABI is a compatibility shim, not a performance option. It makes std::list::size() linear, forces std::string to heap-allocate for short strings, and introduces atomic contention on concurrent string copies. Unless a legacy binary dependency requires it, there is no reason to use it.