yuqi-zheng

Auto-Generating C++ Operators with LibClang


C++ lacks static reflection. When you want to print an enum value by name or dump a struct’s fields to a log stream, you write operator<< by hand for each type. On a codebase with hundreds of types, this is mechanical work that accumulates maintenance debt whenever fields are renamed or reordered.

This article shows how to automate the task with a small source-to-source tool built on LibClang (LLVM’s C API to the Clang compiler) and its C++ AST Matcher DSL.


The problem in concrete terms

Given:

enum class foo { a, b };
struct bar {
  foo x;
  int y;
};

The desired output is:

std::ostream &operator<<(std::ostream &os, foo v) {
  switch (v) {
    case foo::a: os << "a"; break;
    case foo::b: os << "b"; break;
  }
  return os;
}

std::ostream &operator<<(std::ostream &os, const bar &v) {
  os << "bar(";
  os << "x=" << v.x;
  os << ", y=" << v.y;
  os << ")";
  return os;
}

The tool should parse the header, walk the AST, and emit this text. Adding a new field to bar just requires re-running the generator.


Tool structure

The tool uses LibTooling (the C++ layer above LibClang) and the ASTMatchers library. Three components are needed: matchers to identify the interesting AST nodes, a callback to emit code when a match fires, and a main that wires everything together.

Step 1: define AST matchers

auto EnumMatcher =
    enumDecl(isExpansionInMainFile()).bind("enum");

auto RecordMatcher =
    recordDecl(isExpansionInMainFile(), unless(isImplicit())).bind("record");

isExpansionInMainFile() restricts matching to the file explicitly passed on the command line, so included library headers are ignored. unless(isImplicit()) filters out compiler-synthesized records such as lambda closure types.

Step 2: implement the match callback

class Printer : public MatchFinder::MatchCallback {
public:
  void run(const MatchFinder::MatchResult &Result) override {

    if (const auto *Enum = Result.Nodes.getNodeAs<EnumDecl>("enum")) {
      llvm::outs() << "std::ostream &operator<<(std::ostream &os, "
                   << Enum->getName() << " v) {\n"
                   << "  switch (v) {\n";
      for (const auto *EC : Enum->enumerators()) {
        llvm::outs() << "    case " << EC->getQualifiedNameAsString()
                     << ": os << \"" << EC->getName() << "\"; break;\n";
      }
      llvm::outs() << "  }\n  return os;\n}\n\n";
    }

    if (const auto *Record = Result.Nodes.getNodeAs<RecordDecl>("record")) {
      llvm::outs() << "std::ostream &operator<<(std::ostream &os, const "
                   << Record->getName() << " &v) {\n"
                   << "  os << \"" << Record->getName() << "(\";\n";
      bool first = true;
      for (const auto *Field : Record->fields()) {
        if (!first) llvm::outs() << "  os << \", \";\n";
        llvm::outs() << "  os << \"" << Field->getName()
                     << "=\" << v." << Field->getName() << ";\n";
        first = false;
      }
      llvm::outs() << "  os << \")\";\n  return os;\n}\n\n";
    }
  }
};

The callback receives a MatchResult containing the matched node under the bound name. EnumDecl::enumerators() iterates constants in declaration order; RecordDecl::fields() iterates non-static data members.

This implementation is simplified. Production code should handle namespaces (use getQualifiedNameAsString() on the record), private fields, template specializations, and forward declarations.

Step 3: main function

static llvm::cl::OptionCategory GenOstreamCategory("genostream options");

int main(int argc, const char **argv) {
  auto OptionsParser =
      clang::tooling::CommonOptionsParser::create(argc, argv, GenOstreamCategory);
  if (!OptionsParser) {
    llvm::errs() << toString(OptionsParser.takeError()) << "\n";
    return 1;
  }

  clang::tooling::ClangTool Tool(
      OptionsParser->getCompilations(),
      OptionsParser->getSourcePathList());

  Printer Callback;
  clang::ast_matchers::MatchFinder Finder;
  Finder.addMatcher(EnumMatcher, &Callback);
  Finder.addMatcher(RecordMatcher, &Callback);

  return Tool.run(clang::tooling::newFrontendActionFactory(&Finder).get());
}

CommonOptionsParser parses the standard LibTooling flags, including -p to specify a compilation database.


Building the tool

On Fedora with clang-devel and llvm-devel installed:

g++ -std=c++17 -Wall genostream.cpp -o genostream -lclang-cpp -lLLVM

For a project that needs to be portable across LLVM versions, use CMake and link against clangTooling and clangASTMatchers explicitly.


Running

Generate a compile_commands.json from your CMake build:

cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -B build

Then run the tool:

./genostream -p build/ src/types.h

The generated code is written to stdout. Redirect it to a .cpp file and add it to your build as a generated source:

./genostream -p build/ src/types.h > src/types_operators.cpp

If the tool cannot find stddef.h or similar system headers, point it at the Clang resource directory:

./genostream -p build/ src/types.h \
  --extra-arg="-resource-dir /usr/lib64/clang/10.0.1/"

Beyond operator<<

The same pattern applies to any operator or function that has a regular structure determined by the type’s fields or enumerators:

  • JSON serialization (to_json / from_json for nlohmann/json)
  • Binary serialization (Cereal, Boost.Serialization)
  • Equality and comparison operators (operator==, operator<=>)
  • Hash specializations (std::hash<T>)
  • Protocol buffer or FlatBuffers adapters

A Python equivalent using the libclang Python bindings is simpler to write but cannot express the full range of AST matcher predicates. For any non-trivial matching logic, the C++ API is preferable.


References