String Split Performance: string_view Is Not Optional
String splitting is one of the most common operations in text processing. C++ offers multiple ways to implement it, and the choice of return type — std::string vs std::string_view — matters more than the specific algorithm used. This benchmark compares 11 implementations using Google Benchmark.
The Implementations
Three design axes were tested:
- Return type:
std::vector<std::string>(allocating) vsstd::vector<std::string_view>(zero-copy) - Iteration style: index-based (
find_first_of), iterator-based (std::find_first_of), and pointer-based (raw pointers withfind_first_of) - Library: hand-written vs Abseil
StrSplit
Hand-written implementations
Index-based (string return):
std::vector<std::string> split(const std::string& str,
const std::string& delims = " ") {
std::vector<std::string> output;
size_t first = 0;
while (first < str.size()) {
const auto second = str.find_first_of(delims, first);
if (first != second)
output.emplace_back(str.substr(first, second - first));
if (second == std::string::npos) break;
first = second + 1;
}
return output;
}
Pointer-based (string_view return):
std::vector<std::string_view> splitSVPtr(std::string_view str,
std::string_view delims = " ") {
std::vector<std::string_view> output;
for (auto first = str.data(), second = str.data(), last = first + str.size();
second != last && first != last;
first = second + 1) {
second = std::find_first_of(first, last,
std::cbegin(delims), std::cend(delims));
if (first != second)
output.emplace_back(first, second - first);
}
return output;
}
Single-char delimiter (string_view)
The fastest implementation restricts to a single delimiter character:
std::vector<std::string_view> SplitByChar(std::string_view s, char delim = ' ') {
std::vector<std::string_view> out;
out.reserve(s.size() / 4 + 2);
size_t start = 0;
for (size_t i = 0; i <= s.size(); ++i) {
if (i == s.size() || s[i] == delim) {
out.emplace_back(s.data() + start, i - start);
start = i + 1;
}
}
return out;
}
This version always emits empty fields (like strtok with empty preservation), which makes it suitable for CSV parsing and similar protocols.
Abseil
// Returns std::vector<std::string>
auto v = absl::StrSplit(str, absl::ByAnyChar(delim));
// Returns std::vector<absl::string_view>
auto v = absl::StrSplit(strv, absl::ByAnyChar(delim));
// Single char (fastest Abseil variant)
auto v = absl::StrSplit(strv, delim);
Results
Input: a ~400-character Lorem Ipsum paragraph. Lower is better.
| Implementation | Return Type | Time (ns) | vs Baseline |
|---|---|---|---|
split | vector<string> | 3,042 | baseline |
splitCYB | vector<string> | 3,486 | 1.15x slower |
splitAbseil | vector<string> | 2,890 | 1.05x faster |
splitStd | vector<string> | 1,473 | 2.1x faster |
splitPtr | vector<string> | 1,406 | 2.2x faster |
splitAbseilSV | vector<sv> | 2,487 | 1.2x faster |
splitAbseilChar | vector<sv> | 807 | 3.8x faster |
splitSV | vector<sv> | 585 | 5.2x faster |
splitSVStd | vector<sv> | 540 | 5.6x faster |
splitSVPtr | vector<sv> | 538 | 5.7x faster |
SplitByChar | vector<sv> | 390 | 7.8x faster |
Analysis
The return type dominates
Every string_view-returning implementation outperforms every string-returning one, regardless of the iteration strategy. The fastest string-returning variant (splitPtr, 1406 ns) is still 3.6x slower than the slowest string_view variant (splitAbseilSV, 2487 ns — wait, that’s also string_view but uses Abseil’s multi-char mode).
The real comparison is within each group:
- String-returning: pointer-based (
splitPtr, 1406 ns) beats index-based (split, 3042 ns) by 2.2x. - String_view-returning: single-char (
SplitByChar, 390 ns) beats multi-char pointer-based (splitSVPtr, 538 ns) by 1.4x.
Why string_view wins
Every call to output.emplace_back(str.substr(...)) in the string-returning versions allocates heap memory for each token. A 40-word paragraph creates ~40 heap allocations per call. The string_view versions allocate once for the vector’s internal buffer and store pointers into the original string.
The single-char advantage
SplitByChar is not just simpler — it is faster because comparing a single character (s[i] == delim) is cheaper than searching for any character in a delimiter set (find_first_of). When the delimiter is known at compile time to be a single character, take advantage of it.
Abseil multi-char is surprisingly slow
splitAbseilSV (2487 ns) is 4.6x slower than splitSVPtr (538 ns), despite both returning string_view. This is likely because Abseil’s ByAnyChar path has more overhead per token (delimiter set construction, iterator indirection) compared to a straightforward hand-written loop.
The single-char Abseil variant (splitAbseilChar, 807 ns) is much better but still 2x slower than the hand-written SplitByChar. The abstraction cost of Abseil’s split machinery is non-trivial.
Recommendations
- Return
string_viewunless you need to modify or own the tokens. The allocation savings alone account for a 5x speedup. - Use a single-char implementation when the delimiter is a single character. It is the fastest option and the simplest code.
- If you need multi-char delimiters, the pointer-based
find_first_ofapproach withstring_viewis the fastest. - Abseil’s
StrSplitis fine for general-purpose code where readability matters more than the last 2x. Use the single-char overload when possible. - Reserve the output vector. Pre-allocating
s.size() / 4 + 2slots (asSplitByChardoes) avoids re-allocations during growth.