-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing with Matplotlib's triangulation #34
Comments
Thanks for the report @henry2004y. I'll have a bit more to say about this but is the line |
Some thoughts
With this in mind, here's a script showing what I've looked into for this and how I've rewritten your benchmarks. using NaturalNeighbours
using PyPlot
using BenchmarkTools
function meshgrid(x, y)
X = [x for _ in y, x in x]
Y = [y for y in y, _ in x]
X, Y
end
f = (x, y) -> sin(x * y) - cos(x - y) * exp(-(x - y)^2)
const x = vec([(i - 1) / 9 for i in (1, 3, 4, 5, 8, 9, 10), j in (1, 2, 3, 5, 6, 7, 9, 10)])
const y = vec([(j - 1) / 9 for i in (1, 3, 4, 5, 8, 9, 10), j in (1, 2, 3, 5, 6, 7, 9, 10)])
const z = f.(x, y)
const xg = LinRange(0, 1, 100)
const yg = LinRange(0, 1, 100)
function nn_benchmark()
itp = interpolate(x, y, z; derivatives=false)
X, Y = meshgrid(xg, yg)
_x, _y = vec(X), vec(Y)
return itp(_x, _y; method=Triangle(), parallel=false)
end
function mpl_benchmark()
triang = matplotlib.tri.Triangulation(x, y)
interpolator = matplotlib.tri.LinearTriInterpolator(triang, z)
X, Y = meshgrid(xg, yg)
return interpolator(X, Y)
end
@benchmark nn_benchmark()
#=
julia> @benchmark nn_benchmark()
BenchmarkTools.Trial: 297 samples with 1 evaluation.
Range (min … max): 15.082 ms … 95.031 ms ┊ GC (min … max): 0.00% … 46.52%
Time (median): 15.459 ms ┊ GC (median): 0.00%
Time (mean ± σ): 16.848 ms ± 7.385 ms ┊ GC (mean ± σ): 4.69% ± 7.69%
█▄
██▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▅▄▄▄▅ ▅
15.1 ms Histogram: log(frequency) by time 48.8 ms <
Memory estimate: 1.85 MiB, allocs estimate: 48720.
=#
@benchmark mpl_benchmark()
#=
julia> @benchmark mpl_benchmark()
BenchmarkTools.Trial: 2832 samples with 1 evaluation.
Range (min … max): 1.030 ms … 14.261 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.822 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.759 ms ± 417.625 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ ▄▅▆█▆▅▇▄▅▄▂▁▂▁▁▁
▂▄▅▆███▇▅▆▄▃▃▃▄▃▂▄▃▄▅▅▄▅▄▅▆▆█████████████████▆▆▅▅▄▄▃▃▂▂▂▂▂▂ ▄
1.03 ms Histogram: frequency by time 2.46 ms <
Memory estimate: 237.89 KiB, allocs estimate: 74.
=#
# Your observed 16x difference is real. What if we use threads?
function nn_benchmark_threads() # I'm using 10 threads
itp = interpolate(x, y, z; derivatives=false)
X, Y = meshgrid(xg, yg)
_x, _y = vec(X), vec(Y)
return itp(_x, _y; method=Triangle(), parallel=true)
end
@benchmark nn_benchmark_threads()
#=
julia> @benchmark nn_benchmark_threads()
BenchmarkTools.Trial: 1322 samples with 1 evaluation.
Range (min … max): 2.352 ms … 94.722 ms ┊ GC (min … max): 0.00% … 80.89%
Time (median): 2.645 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.770 ms ± 7.382 ms ┊ GC (mean ± σ): 19.06% ± 9.26%
█▁
██▅▁▄▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆▇ ▇
2.35 ms Histogram: log(frequency) by time 43.7 ms <
Memory estimate: 1.84 MiB, allocs estimate: 47709.
=# So, without multithreading your observed differences in timings are real. With threading, the results are a bit closer but still not what I'd like. I am quite impressed with how matplotlib must be doing this to get this so fast. The number of allocations with NaturalNeighbours.jl I'm getting seems to suggest there is a type instability somewhere in the code. I'll do some digging. |
I did find one issue #35, and another that can't be fixed without a new major release, but they don't make much difference. I'm honestly not sure what the main difference is coming from and I can't tell exactly what Matplotlib is doing. |
I know little about how Matplotlib achieves this. This is probably the interface of the method: https://github.com/matplotlib/matplotlib/blob/be56634d682bed257cb941369d8d3600635ddadf/lib/matplotlib/tri/_triinterpolate.py#L232 |
I probably can't look at that for thinking of changes depending on Matplotlib's license, I'd have to double check that. I was thinking though: I've had to do something similar previously in FiniteVolumeMethod.jl which essentially uses this interpolation, and there was a bit of a difference between (1) storing the barycentric coordinates (these are the coordinates Maybe matplotlib stores it as well? I wonder if I should add some sort of cache into struct InterpolantCache{T1<:TriangleCache,T2 <:SibsonCache}
triangle::T1
sibson::T2 # not sure if sibson needs a cache actually
... other methods ...
end
struct TriangleCache{T<:Triangulation}
coefficients::Dict{NTuple{3,Int},NTuple{3,Float64}}
end
... that the interpolant could then leverage if the user asks to store this (since they might not always want to for larger problems where the data would be a bit much). The caches would be empty until a user calls |
Well, I got around to looking at caching - see https://github.com/DanielVandH/NaturalNeighbours.jl/tree/triangle. It doesn't really make things much better unfortunately. julia> @benchmark nn_benchmark()
BenchmarkTools.Trial: 443 samples with 1 evaluation.
Range (min … max): 9.076 ms … 426.486 ms ┊ GC (min … max): 0.00% … 59.54%
Time (median): 9.986 ms ┊ GC (median): 0.00%
Time (mean ± σ): 11.203 ms ± 20.125 ms ┊ GC (mean ± σ): 6.15% ± 4.00%
▃▇▅▂▃▄█▂
▃▃▃▃▇████████▇█▅▃▃▂▃▁▃▂▁▃▁▁▁▁▁▁▃▂▂▃▁▁▁▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
9.08 ms Histogram: frequency by time 14.8 ms <
Memory estimate: 1.84 MiB, allocs estimate: 47159.
julia> @benchmark mpl_benchmark()
BenchmarkTools.Trial: 2906 samples with 1 evaluation.
Range (min … max): 1.079 ms … 263.383 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.594 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.714 ms ± 4.880 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▄██▇█▆▅▃▃▆▂▅▄▄▃▅▅▆▆█▄▅▅▄▂▁
▁▁▂███████████████████████████████▆▆▄▅▃▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁ ▅
1.08 ms Histogram: frequency by time 2.6 ms <
Memory estimate: 237.89 KiB, allocs estimate: 74.
julia> function nn_benchmark_threads() # I'm using 10 threads
itp = interpolate(x, y, z; derivatives=false)
X, Y = meshgrid(xg, yg)
_x, _y = vec(X), vec(Y)
return itp(_x, _y; method=Triangle(; allow_cache = true), parallel=true)
end
nn_benchmark_threads (generic function with 1 method)
julia> @benchmark nn_benchmark_threads()
BenchmarkTools.Trial: 1296 samples with 1 evaluation.
Range (min … max): 2.343 ms … 186.348 ms ┊ GC (min … max): 0.00% … 59.95%
Time (median): 2.646 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.845 ms ± 11.402 ms ┊ GC (mean ± σ): 18.90% ± 6.32%
▁█▇▂
▃████▆▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▂ ▃
2.34 ms Histogram: frequency by time 7.45 ms <
Memory estimate: 1.87 MiB, allocs estimate: 47579. I'm not going to commit the cached The only other difference could come from DelaunayTriangulation.jl's
One idea to improve |
All that said, if you are happy with the comparison between the performance with For evaluating at many points, NaturalNeighbours.jl does seem to win with multithreading at least. You can run the above tests with const xg = LinRange(0, 1, 250)
const yg = LinRange(0, 1, 250) and the multithreading benchmark is 2x faster than Matplotlib, although 10x worse without. e: Actually, it's 2x slower without caching.. Okay, I'm going to push that caching. You should use |
Thanks for your effort! I would suggest leave this open as a reminder if someone intends to refactor the code and speed it up~ |
The caching work should be registered soon (~15 minutes) as v1.3.4. By default, it will now cache the necessary coordinates for |
@henry2004y With DelaunayTriangulation 1.1.0 you can more easily change how predicates are computed by using the julia> function nn_benchmark(kernel)
itp = interpolate(x, y, z; derivatives=false)
X, Y = meshgrid(xg, yg)
_x, _y = vec(X), vec(Y)
return itp(_x, _y; method=Triangle(), parallel=true, predicates = kernel)
end
nn_benchmark (generic function with 2 methods)
julia> @benchmark nn_benchmark($FastKernel())
BenchmarkTools.Trial: 270 samples with 1 evaluation.
Range (min … max): 12.527 ms … 27.177 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 18.385 ms ┊ GC (median): 0.00%
Time (mean ± σ): 18.495 ms ± 1.480 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▄▂▆▇█▄▇▂
▃▁▁▁▃▁▁▁▁▁▁▃▃▁▁▃▃▁▁▃▁▃▁▄▆▆▇██████████▇▇▅▆▄▄▅▃▃▄▁▃▃▁▃▁▃▁▁▁▃▃ ▃
12.5 ms Histogram: frequency by time 23.2 ms <
Memory estimate: 485.34 KiB, allocs estimate: 889.
julia> @benchmark nn_benchmark($AdaptiveKernel()) # the new default
BenchmarkTools.Trial: 170 samples with 1 evaluation.
Range (min … max): 24.946 ms … 40.503 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 29.362 ms ┊ GC (median): 0.00%
Time (mean ± σ): 29.488 ms ± 2.992 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▂ ▃▅█ ▃▂▃▅ ▂ ▅ ▆█▅▅▃▂ ▃
█▄█████▁▄███████▇▄████████▅▄▅▄█▅▄▅▅▇▅▅▄▄▁▄▇▁▁▄▁▄▁▄▁▁▁▁▁▁▁▁▄ ▄
24.9 ms Histogram: frequency by time 38.9 ms <
Memory estimate: 484.62 KiB, allocs estimate: 889.
julia> @benchmark nn_benchmark($ExactKernel()) # the old default
BenchmarkTools.Trial: 163 samples with 1 evaluation.
Range (min … max): 24.614 ms … 300.534 ms ┊ GC (min … max): 0.00% … 47.87%
Time (median): 28.723 ms ┊ GC (median): 0.00%
Time (mean ± σ): 31.088 ms ± 21.944 ms ┊ GC (mean ± σ): 3.57% ± 4.94%
▃ ▇▁█▇▃▃▃▇▇▆▁ ▁▂
▃▃████████████████▇▄▄▃▇▆▃▆▃▄▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
24.6 ms Histogram: frequency by time 46.6 ms <
Memory estimate: 1.38 MiB, allocs estimate: 35231. |
Thank you! I'll give it a try~ |
Hi! Thanks for the great package!
As a user, I want to report a performance comparison I found against Matplotlib:
This is running with 1 thread, but NaturalNeighbours.jl is ~ 16 times slower. I don't know if Matplotlib automatically uses multithreading, but this performance difference is noticeable. I don't trust the allocation statistics reported above for Matplotlib, but the measured time should be ok.
The text was updated successfully, but these errors were encountered: