222 lines
7.1 KiB
Markdown
Raw Normal View History

2024-09-07 18:00:09 +06:00
<!--
Copyright © 2022 Rot127 <unisono@quyllur.org>
SPDX-License-Identifier: BSD-3
-->
# Architecture updater - Auto-Sync
`auto-sync` is the architecture update tool for Capstone.
Because the architecture modules of Capstone use mostly code from LLVM,
we need to update this part with every LLVM release. `auto-sync` helps
with this synchronization between LLVM and Capstone's modules by
automating most of it.
Please refer to [intro.md](intro.md) for an introduction about this tool.
## Install
#### Setup Python environment and Tree-sitter
```
cd <root-dir-Capstone>
# Python version must be at least 3.11
sudo apt install python3-venv
# Setup virtual environment in Capstone root dir
python3 -m venv ./.venv
source ./.venv/bin/activate
```
#### Install Auto-Sync framework
```
cd suite/auto-sync/
pip install -e .
```
#### Clone Capstones LLVM fork and build `llvm-tblgen`
```bash
git clone https://github.com/capstone-engine/llvm-capstone vendor/llvm_root/
cd llvm-capstone
git checkout auto-sync
mkdir build
cd build
# You can also build the "Release" version
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug
cd ../../
```
#### Install `llvm-mc` and `FileCheck`
Additionally, we need `llvm-mc` and `FileCheck` to generate our regression tests.
You can build it, but it will take a lot of space on your hard drive.
You can also get the binaries [here](https://releases.llvm.org/download.html) or
install it with your package manager (usually something like `llvm-18-dev`).
Just ensure it is in your `PATH` as `llvm-mc` and `FileCheck` (not as `llvm-mc-18` or similar though!).
## Architecture
Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works.
This step is essential! Please don't skip it.
## Update an architecture
Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync.
Not all arch-modules support Auto-Sync yet.
Check if your architecture is supported.
```
./src/autosync/ASUpdater.py -h
```
Run the updater
```
./src/autosync/ASUpdater.py -a <ARCH>
```
## Update procedure
1. Run the `ASUpdater.py` script.
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
and update them if necessary.
3. Try to build Capstone and fix the build errors.
## Post-processing steps
This update translates some LLVM C++ files to C.
Because the translation is not perfect (maybe it will some day)
you will get build errors if you try to compile Capstone.
The last step to finish the update is to fix those build errors by hand.
## Additional details
### Overview updated files
This is a rough overview what files of an architecture are updated and where they are coming from.
**Files originating from LLVM** (Automatically updated)
These files are LLVM source files which were translated from C++ to C
Not all the listed files below are used by each architecture.
But those are the most common.
- `<ARCH>Disassembler.*`: Bytes to `MCInst` decoder.
- `<ARCH>InstPrinter.*` or `<ARCH>AsmPrinter.*`: `MCInst` to asm string decoder.
- `<ARCH>BaseInfo.*`: Commonly use functions and definitions.
`*.inc` files are exclusively generated by LLVM TableGen backends:
`*.inc` files for the LLVM component are named like this:
- `<ARCH>Gen*.inc` (note: no `CS` in the name)
Additionally, we generate more details for Capstone with `llvm-tblgen`.
Like enums, operand details and other things.
They are saved also to `*.inc` files, but have the `CS` in the name to make them distinct from the LLVM generated files.
- `<ARCH>GenCS*.inc`
**Capstone module files** (Not automatically updated)
Those files are written by us:
- `<ARCH>DisassemblerExtension.*` All kind of functions which are needed by the LLVM component, but could not be generated or translated.
- `<ARCH>Mapping.*`: Binding code between the architecture module and the LLVM files. This is also where the detail is set.
- `<ARCH>Module.*`: Interface to the Capstone core.
### Relevant documentation and troubleshooting
**LLVM file translation**
For details about the C++ to C translation of the LLVM files refer to `CppTranslator/README.md`.
**Generated .inc files**
Documentation about the `.inc` file generation is in the [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) repository.
**Troubleshooting**
- If some features aren't generated and are missing in the `.inc` files, make sure they are defined as `AssemblerPredicate` in the `.td` files.
Correct:
```
def In32BitMode : Predicate<"!Subtarget->isPPC64()">,
AssemblerPredicate<(all_of (not Feature64Bit)), "64bit">;
```
Incorrect:
```
def In32BitMode : Predicate<"!Subtarget->isPPC64()">;
```
**Formatting**
- If you make changes to the `CppTranslator` please format the files with `black` and `usort`
```
pip3 install black usort
python3 -m usort format src/autosync
python3 -m black src/autosync
```
## Refactor an architecture for Auto-Sync framework
Not all architecture modules support Auto-Sync yet.
Here is an overview of the steps to add support for it.
<hr>
To refactor one of them to use `auto-sync`, you need to add it to the configuration.
1. Add the architecture to the supported architectures list in `ASUpdater.py`.
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)
Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:
```
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
```
The task after this is to:
- Replace leftover C++ syntax with its C equivalent.
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
- Edit the main header file of the architecture (`include/capstone/<ARCH>.h`) to include the generated enums (see below)
- Add any missing logic to the translated files.
- Make it build and write tests.
- Run the Differ again and always select the old nodes.
**Notes:**
- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
```
// generate content <FILENAME.inc> begin
// generate content <FILENAME.inc> end
```
The update script will insert the content of the `.inc` file at this place.
- If you find yourself fixing the same syntax error multiple times,
please consider adding a `Patch` to the `CppTranslator` for this case.
- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.
- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
- Sometimes the LLVM code uses a single function from a larger source file.
It is not worth it to translate the whole file just for this function.
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
## Adding a new architecture
Adding a new architecture follows the same steps as above. With the exception that you need
to implement all the Capstone files from scratch.
Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.