Auto-Generating C++ Operators with LibClang
C++ lacks static reflection. When you want to print an enum value by name or dump a struct’s fields to a log stream, you write operator<< by hand for each type. On a codebase with hundreds of types, this is mechanical work that accumulates maintenance debt whenever fields are renamed or reordered.
This article shows how to automate the task with a small source-to-source tool built on LibClang (LLVM’s C API to the Clang compiler) and its C++ AST Matcher DSL.
The problem in concrete terms
Given:
enum class foo { a, b };
struct bar {
foo x;
int y;
};
The desired output is:
std::ostream &operator<<(std::ostream &os, foo v) {
switch (v) {
case foo::a: os << "a"; break;
case foo::b: os << "b"; break;
}
return os;
}
std::ostream &operator<<(std::ostream &os, const bar &v) {
os << "bar(";
os << "x=" << v.x;
os << ", y=" << v.y;
os << ")";
return os;
}
The tool should parse the header, walk the AST, and emit this text. Adding a new field to bar just requires re-running the generator.
Tool structure
The tool uses LibTooling (the C++ layer above LibClang) and the ASTMatchers library. Three components are needed: matchers to identify the interesting AST nodes, a callback to emit code when a match fires, and a main that wires everything together.
Step 1: define AST matchers
auto EnumMatcher =
enumDecl(isExpansionInMainFile()).bind("enum");
auto RecordMatcher =
recordDecl(isExpansionInMainFile(), unless(isImplicit())).bind("record");
isExpansionInMainFile() restricts matching to the file explicitly passed on the command line, so included library headers are ignored. unless(isImplicit()) filters out compiler-synthesized records such as lambda closure types.
Step 2: implement the match callback
class Printer : public MatchFinder::MatchCallback {
public:
void run(const MatchFinder::MatchResult &Result) override {
if (const auto *Enum = Result.Nodes.getNodeAs<EnumDecl>("enum")) {
llvm::outs() << "std::ostream &operator<<(std::ostream &os, "
<< Enum->getName() << " v) {\n"
<< " switch (v) {\n";
for (const auto *EC : Enum->enumerators()) {
llvm::outs() << " case " << EC->getQualifiedNameAsString()
<< ": os << \"" << EC->getName() << "\"; break;\n";
}
llvm::outs() << " }\n return os;\n}\n\n";
}
if (const auto *Record = Result.Nodes.getNodeAs<RecordDecl>("record")) {
llvm::outs() << "std::ostream &operator<<(std::ostream &os, const "
<< Record->getName() << " &v) {\n"
<< " os << \"" << Record->getName() << "(\";\n";
bool first = true;
for (const auto *Field : Record->fields()) {
if (!first) llvm::outs() << " os << \", \";\n";
llvm::outs() << " os << \"" << Field->getName()
<< "=\" << v." << Field->getName() << ";\n";
first = false;
}
llvm::outs() << " os << \")\";\n return os;\n}\n\n";
}
}
};
The callback receives a MatchResult containing the matched node under the bound name. EnumDecl::enumerators() iterates constants in declaration order; RecordDecl::fields() iterates non-static data members.
This implementation is simplified. Production code should handle namespaces (use getQualifiedNameAsString() on the record), private fields, template specializations, and forward declarations.
Step 3: main function
static llvm::cl::OptionCategory GenOstreamCategory("genostream options");
int main(int argc, const char **argv) {
auto OptionsParser =
clang::tooling::CommonOptionsParser::create(argc, argv, GenOstreamCategory);
if (!OptionsParser) {
llvm::errs() << toString(OptionsParser.takeError()) << "\n";
return 1;
}
clang::tooling::ClangTool Tool(
OptionsParser->getCompilations(),
OptionsParser->getSourcePathList());
Printer Callback;
clang::ast_matchers::MatchFinder Finder;
Finder.addMatcher(EnumMatcher, &Callback);
Finder.addMatcher(RecordMatcher, &Callback);
return Tool.run(clang::tooling::newFrontendActionFactory(&Finder).get());
}
CommonOptionsParser parses the standard LibTooling flags, including -p to specify a compilation database.
Building the tool
On Fedora with clang-devel and llvm-devel installed:
g++ -std=c++17 -Wall genostream.cpp -o genostream -lclang-cpp -lLLVM
For a project that needs to be portable across LLVM versions, use CMake and link against clangTooling and clangASTMatchers explicitly.
Running
Generate a compile_commands.json from your CMake build:
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build
Then run the tool:
./genostream -p build/ src/types.h
The generated code is written to stdout. Redirect it to a .cpp file and add it to your build as a generated source:
./genostream -p build/ src/types.h > src/types_operators.cpp
If the tool cannot find stddef.h or similar system headers, point it at the Clang resource directory:
./genostream -p build/ src/types.h \
--extra-arg="-resource-dir /usr/lib64/clang/10.0.1/"
Beyond operator<<
The same pattern applies to any operator or function that has a regular structure determined by the type’s fields or enumerators:
- JSON serialization (
to_json/from_jsonfor nlohmann/json) - Binary serialization (Cereal, Boost.Serialization)
- Equality and comparison operators (
operator==,operator<=>) - Hash specializations (
std::hash<T>) - Protocol buffer or FlatBuffers adapters
A Python equivalent using the libclang Python bindings is simpler to write but cannot express the full range of AST matcher predicates. For any non-trivial matching logic, the C++ API is preferable.
References
- LLVM LibTooling documentation: https://clang.llvm.org/docs/LibTooling.html
- AST Matchers reference: https://clang.llvm.org/docs/LibASTMatchersReference.html
- Eli Bendersky, “Parsing C++ in Python with Clang”: https://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang