-
Notifications
You must be signed in to change notification settings - Fork 72
Slang
Slang is a source-to-source compiler (also known as a transpiler) that translates an idiomatic subset of Pharo code into C code. Not all Pharo code is valid Slang code, and some parts of it have special C semantics. This makes Slang feel more like a DSL (domain specific language) than a general purpose language. This said, this document gives an overview of Slang's architecture.
A simple showcase of Slang features is available at https://github.com/guillep/Slang-example
Slang is a multipass transpiler. It takes as input a Slang program (made of many classes), creates intermediate data structures to manipulate it, does many transformation passes on it, and finally writes the output C code in one or many files (headers and .c files).
The input Slang program is made of classes and methods. Classes must inherit from a common Slang superclass (SlangClass, SlangStruct...) to be translatable. Methods have no restrictions, but to be manipulable, they are first transformed into their AST form.
An AST (short for Abstract Syntax Tree) is a tree representation of the structure of some code. It is said abstract because it does not store any concrete syntactic information (like parenthesis, brackets, spaces...). The next example illustrates a method source code and its AST representation.
methodName | var | var := 1. ^ [ var < 10 ] value
method - name: methodName - temps: [ var ] - statements: [ assignment - variable: variable - name: var - expression: constant - value: 1 return - expression: message - selector: value - receiver: block - statements: [ message - selector: < - receiver: variable - name: var - arguments: [ constant - value: 10 ] ] ]
The initial AST is a Pharo AST, containing all possible Pharo syntactic structures (constants, variables, blocks, assignments...). Slang then proceeds to transform the Pharo AST into a Slang AST (also named TAST, all classes being prefixed with T). The Slang AST is a hybrid AST representation that can represent Pharo code and C constructs too (e.g., switches, statement blocks). Then the Slang translator (named CCodeGenerator) does many optimization passes on the Slang AST:
- inlining (see #doInlining:)
- with special cases for dispatch tables
- sorting of methods (see #sortMethods:)
- localization of global variables (see #localizeGlobalVariables)
- dead code elimination
Finally, the translator proceeds to emit C code, some times doing extra transformations in the process, necessary to make slang code into valid C code. For example the following code
var := [ a . b . c ] value.
Is translated to
a;
b;
var = c;
Slang was mainly designed as a DSL to write virtual machines, and as such, there are many VM specific optimizations and transformations. The most notable VM transformation is the dispatch table transformation that transforms:
self dispatchOn: aValue in: aTable
in a switch/case statements using "aValue" as the condition, and each of the elements in the table as the cases of the switch statements. This transformation also allows for "expansion" using the <expandCases>
pragma, which duplicates the body of the case for each case and constant propagates the case value (e.g., 47), which allows for other optimizations like constant folding and dead code elimination.
Improve the existing architecture
- introduce an explicit C code representation
- better Virtual Machine integration and modularization
- this will allow us to do C-level analyses before writing the C code and thus
- detect bugs before going through the C compiler
Incremental transpilation
- detect bugs when editing a method or class
- get feedback fast!
- move extra transformations to a pre-pass
- make transformations go through a C AST (this will allow us to do validations) => for each transformation - analyse all cases - write tests to validate the migration
- Make RBAST -> TAST transformation use a visitor
- VM specific optimizations and transformations should be split and be made optional
- move hardcoded transformation to a list of optimization passes (as in modern compilers)
Preliminary set of improvements, to be defined more concretely (and with specific cases)
- Improve management of name conflicts (e.g., temps conflicting with method names, name mangling generates conflicts between #foo and #foo:)
- Improve dynamic allocation patterns: Right now doing a dynamic allocation requires to have conditional code (ifSimulation:ifTranslation:)
- Inline of structure methods into non-structures yields wrong code transformations
- Introduce simulation configurations
The cast
branch in https://github.com/pharo-project/opensmalltalk-vm/tree/cast contains an initial effort in this direction:
- it includes a C AST and a C parser
- it includes initial TAST -> CAST transformations + tests
- Compiling and repo organization
- Participate!
- Misc