ALPACA (Automated Library and Program API Change Analyzer) is a Clang compiler-frontend based tool designed to detect and analyze changes between different versions of C/C++ source code. By leveraging AST-based semantic information from the Clang compiler combined with syntactic similarity analysis, ALPACA can identify and classify various types of code changes and refactorings.
- Detection of various entity (functions, variables, records) changes between two versions of a code base
- Support for complex C/C++ codebases (>1M LOC)
- High accuracy (92% change detection rate, 91% correct classification) -> dont know if we can/should write that in before the paper is published
- Can handle template specializations and overloaded functions
- Built on Clang LibTooling for robust C/C++ parsing
- CMake 3.0 or higher
- LLVM/Clang 12
- C++17 compatible compiler
# Clone the repository
git clone https://github.com/tudasc/Alpaca
cd ALPACA
# Configure with CMake
cmake -B build
# Build
cmake --build build
# Verify LLVM/Clang version
./build/APIAnalysis --help
We provide a Dockerfile that sets up an example from the evaluation with an OpenMPI codebase analysis and all the required build dependencies:
# Build the Docker image
docker build -t alpaca .
# Run ALPACA in Docker
docker run -v $(pwd):/work -w /work alpaca APIAnalysis
ALPACA requires two versions of the codebase to perform the analysis:
./APIAnalysis --oldDir=/path/to/old/version \
--newDir=/path/to/new/version \
[options]
--oldDir, --newDir
: Paths to the old and new versions of the project (required)--oldCD, --newCD
: Paths to compilation databases--extra-args
: Additional Clang arguments for both versions--extra-args-old, --extra-args-new
: Version-specific Clang arguments--exclude
: Directories/files to ignore (comma-separated)--doc
: Enable the second analysis step using Levenshtein Matching--json
: Output results in JSON format--ipf
: Include private functions in analysis
For best results, provide compilation databases for both versions:
- Generate compilation databases (for example using Bear):
cd old_version
bear -- make
cd ../new_version
bear -- make
- Run analysis with compilation databases:
./APIAnalysis --oldDir=old_version \
--newDir=new_version \
--oldCD=old_version/compile_commands.json \
--newCD=new_version/compile_commands.json
- Parallel Processing: Implementation of parallel extraction and analysis using OpenMPI to significantly reduce runtime for large codebases
- Robust Handling of Overloaded Functions: Reworking of C++ overloaded function handling for improved accuracy and coverage
- Extended Change Detection: Support for additional types of code changes and refactorings
- Reference Graph Matcher: Introduction of a new matching algorithm using Reference Graphs to improve entity matching between project versions
ALPACA is licensed under the BSD 3-Clause License. See LICENSE for details.