Skip to content

Commit

Permalink
Merge pull request #326 from MilesCranmer/parametric-expressions
Browse files Browse the repository at this point in the history
BREAKING: Change expression types to `DynamicExpressions.Expression` (from `DynamicExpressions.Node`)
  • Loading branch information
MilesCranmer authored Oct 6, 2024
2 parents 8f67533 + e2b369e commit 749cc34
Show file tree
Hide file tree
Showing 58 changed files with 2,627 additions and 1,028 deletions.
28 changes: 20 additions & 8 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ jobs:
fail-fast: false
matrix:
test:
- "unit"
- "integration"
- "part1"
- "part2"
- "part3"
julia-version:
- "1.6"
- "1.8"
Expand All @@ -37,22 +38,31 @@ jobs:
include:
- os: windows-latest
julia-version: "1"
test: "unit"
test: "part1"
- os: windows-latest
julia-version: "1"
test: "integration"
test: "part2"
- os: windows-latest
julia-version: "1"
test: "part3"
- os: macOS-latest
julia-version: "1"
test: "part1"
- os: macOS-latest
julia-version: "1"
test: "unit"
test: "part2"
- os: macOS-latest
julia-version: "1"
test: "integration"
test: "part3"
- os: ubuntu-latest
julia-version: "~1.11.0-0"
test: "unit"
test: "part1"
- os: ubuntu-latest
julia-version: "~1.11.0-0"
test: "integration"
test: "part2"
- os: ubuntu-latest
julia-version: "~1.11.0-0"
test: "part3"

steps:
- uses: actions/checkout@v4
Expand All @@ -62,6 +72,8 @@ jobs:
version: ${{ matrix.julia-version }}
- name: "Cache dependencies"
uses: julia-actions/cache@v2
with:
cache-name: julia-cache;workflow=${{ github.workflow }};job=${{ github.job }};os=${{ matrix.os }};julia=${{ matrix.julia-version }};project=${{ hashFiles('**/Project.toml') }}
- name: "Build package"
uses: julia-actions/julia-buildpkg@v1
- name: "Run tests"
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ docs/src/index.md
*.code-workspace
.vscode
**/*.json
LocalPreferences.toml
30 changes: 21 additions & 9 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
name = "SymbolicRegression"
uuid = "8254be44-1295-4e6a-a16d-46603ac705cb"
authors = ["MilesCranmer <miles.cranmer@gmail.com>"]
version = "0.24.5"
version = "0.25.0"

[deps]
ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DifferentiationInterface = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63"
DispatchDoctor = "8d63f2c5-f18a-4cf2-ba9d-b3f60fc568c8"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
DynamicExpressions = "a40a106e-89c9-4ca8-8020-a735e8728b6b"
Expand All @@ -27,39 +30,48 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[weakdeps]
Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"

[extensions]
SymbolicRegressionEnzymeExt = "Enzyme"
SymbolicRegressionJSON3Ext = "JSON3"
SymbolicRegressionSymbolicUtilsExt = "SymbolicUtils"

[compat]
ADTypes = "^1.4.0"
Compat = "^4.2"
ConstructionBase = "<1.5.7"
Dates = "1"
DifferentiationInterface = "0.5"
DispatchDoctor = "0.4"
Distributed = "1"
DynamicExpressions = "0.16"
DynamicQuantities = "0.10, 0.11, 0.12, 0.13, 0.14"
# Note that the <0.0.1 bound is required for old julia compat (which does
# not have stdlib packages available in [compat])
Distributed = "<0.0.1, 1"
DynamicExpressions = "1"
DynamicQuantities = "1"
Enzyme = "0.12"
JSON3 = "1"
LineSearches = "7"
LossFunctions = "0.10, 0.11"
MLJModelInterface = "~1.5, ~1.6, ~1.7, ~1.8, ~1.9, ~1.10, ~1.11"
MacroTools = "0.4, 0.5"
Optim = "~1.8, ~1.9"
PackageExtensionCompat = "1"
Pkg = "1"
Pkg = "<0.0.1, 1"
PrecompileTools = "1"
Printf = "1"
Printf = "<0.0.1, 1"
ProgressBars = "~1.4, ~1.5"
Random = "1"
Random = "<0.0.1, 1"
Reexport = "1"
SpecialFunctions = "0.10.1, 1, 2"
StatsBase = "0.33, 0.34"
SymbolicUtils = "0.19, ^1.0.5"
TOML = "1"
SymbolicUtils = "0.19, ^1.0.5, 2, 3"
TOML = "<0.0.1, 1"
julia = "1.6"

[extras]
Enzyme = "7da242da-08ed-463a-9acd-ee780be4f1d9"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"
7 changes: 7 additions & 0 deletions benchmark/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,12 @@ BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Bumper = "8ce10254-0962-460f-a3d8-1f77fea1446e"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DynamicExpressions = "a40a106e-89c9-4ca8-8020-a735e8728b6b"
LoopVectorization = "bdcacae8-1622-11e9-2a5c-532679323890"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[preferences.DynamicExpressions]
instability_check = "disable"

[preferences.SymbolicRegression]
instability_check = "disable"
28 changes: 9 additions & 19 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,35 +10,27 @@ MultitargetSRRegressor
## equation_search

```@docs
equation_search(X::AbstractMatrix{T}, y::AbstractMatrix{T};
niterations::Int=10,
weights::Union{AbstractVector{T}, Nothing}=nothing,
variable_names::Union{Array{String, 1}, Nothing}=nothing,
options::Options=Options(),
numprocs::Union{Int, Nothing}=nothing,
procs::Union{Array{Int, 1}, Nothing}=nothing,
runtests::Bool=true,
loss_type::Type=Nothing,
) where {T<:DATA_TYPE}
equation_search
```

## Options

```@docs
Options
MutationWeights(;)
MutationWeights
```

## Printing

```@docs
string_tree(tree::Node, options::Options; kws...)
string_tree
```

## Evaluation

```@docs
eval_tree_array(tree::Node, X::AbstractMatrix, options::Options; kws...)
eval_tree_array
EvalOptions
```

## Derivatives
Expand All @@ -51,22 +43,20 @@ all variables (or, all constants). Both use forward-mode automatic, but use
`Zygote.jl` to compute derivatives of each operator, so this is very efficient.

```@docs
eval_diff_tree_array(tree::Node, X::AbstractMatrix, options::Options, direction::Int)
eval_grad_tree_array(tree::Node, X::AbstractMatrix, options::Options; kws...)
eval_diff_tree_array
eval_grad_tree_array
```

## SymbolicUtils.jl interface

```@docs
node_to_symbolic(tree::Node, options::Options;
variable_names::Union{Array{String, 1}, Nothing}=nothing,
index_functions::Bool=false)
node_to_symbolic
```

Note that use of this function requires `SymbolicUtils.jl` to be installed and loaded.

## Pareto frontier

```@docs
calculate_pareto_frontier(hallOfFame::HallOfFame{T,L}) where {T<:DATA_TYPE,L<:LOSS_TYPE}
calculate_pareto_frontier
```
90 changes: 88 additions & 2 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ we can see that the output types are `Float32`:
r = report(mach)
best = r.equations[r.best_idx]
println(typeof(best))
# Node{Float32}
# Expression{Float32}
```

We can also use `Complex` numbers (ignore the warning
Expand Down Expand Up @@ -228,7 +228,93 @@ a constant `"2.6353e-22[m s⁻²]"`.
Note that you can also search for dimensionless units by settings
`dimensionless_constants_only` to `true`.

## 7. Additional features
## 7. Working with Expressions

Expressions in `SymbolicRegression.jl` are represented using the `Expression` type, which combines the raw `Node` type with an `OperatorEnum`. This allows for more flexible and powerful expression manipulation and evaluation.

Here's an example:

```julia
using SymbolicRegression

# Define options with operators
options = Options(; binary_operators=[+, -, *], unary_operators=[cos])

# Create expression nodes
operators = options.operators
variable_names = ["x1", "x2"]
x1 = Expression(Node{Float64}(feature=1); operators, variable_names)
x2 = Expression(Node{Float64}(feature=2); operators, variable_names)

# Construct an expression using the operators from options
expr = x1 * cos(x2 - 3.2)

# Evaluate the expression directly
X = rand(Float64, 2, 100)
output = expr(X)
```

This `Expression` type, contains both the structure
and the operators used in the expression. These are what
are returned by the search. The raw `Node` type (which is
what used to be output directly) is accessible with

```julia
get_contents(expr)
```

## 8. Parametric Expressions

Parametric expressions allow the algorithm to optimize parameters within the expressions during the search process. This is useful for finding expressions that not only fit the data but also have tunable parameters.

To use this, the data needs to have information on which class
each row belongs to --- this class information will be used to
select the parameters when evaluating each expression.

For example:

```julia
using SymbolicRegression
using MLJ

# Define the dataset
X = NamedTuple{(:x1, :x2)}(ntuple(_ -> randn(Float32, 30), Val(2)))
X = (; X..., classes=rand(1:2, 30))
p1 = [0.0f0, 3.2f0]
p2 = [1.5f0, 0.5f0]

y = [
2 * cos(X.x1[i] + p1[X.classes[i]]) + X.x2[i]^2 - p2[X.classes[i]] for
i in eachindex(X.classes)
]

# Define the model with parametric expressions
model = SRRegressor(
niterations=100,
binary_operators=[+, *, /, -],
unary_operators=[cos],
expression_type=ParametricExpression,
expression_options=(; max_parameters=2),
parallelism=:multithreading
)

# Train the model
mach = machine(model, X, y)
fit!(mach)

# View the best expression
report(mach)
```

The final equations will contain parameters that were optimized during training:

```julia
typeof(report(mach).equations[end])
```

This example demonstrates how to set up a symbolic regression model that searches for expressions with parameters, optimizing both the structure and the parameters of the expressions based on the provided class information.

## 9. Additional features

For the many other features available in SymbolicRegression.jl,
check out the API page for `Options`. You might also find it useful
Expand Down
Loading

0 comments on commit 749cc34

Please sign in to comment.