Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely stop pirating Base methods #306

Open
rafaqz opened this issue Jul 11, 2022 · 22 comments
Open

Completely stop pirating Base methods #306

rafaqz opened this issue Jul 11, 2022 · 22 comments

Comments

@rafaqz
Copy link

rafaqz commented Jul 11, 2022

In a session where OffsetArrays is a dependency of some package that I'm using, this is very strange to see:

julia> zeros(10:20)
11-element OffsetArray(::Vector{Float64}, 10:20) with eltype Float64 with indices 10:20:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

This really should be an error. An OffsetArray is not at all what I wanted. I simply made a typo.

The main problem with this is it breaks simple mental models for understanding what has gone wrong in arbitrary julia code. Base type + Base method returning a package type should be completely outside of what we expect as a possible behavior.

@rafaqz rafaqz changed the title Completely stop pyrating Base methods Completely stop pirating Base methods Jul 11, 2022
@jishnub
Copy link
Member

jishnub commented Jul 11, 2022

I'm in favour of removing these convenience methods in OffsetArrays 2.0. With the new Origin type, these aren't inconvenient to create anymore.

@rafaqz
Copy link
Author

rafaqz commented Jul 11, 2022

Great. A wrapper type is how to do this. DimensionalData.jl does similar things without any piracy.

@jishnub
Copy link
Member

jishnub commented Jul 12, 2022

Perhaps we need to retain the convenience of zeros(indices) in some way. Currently, the ways to construct such an array is

  • OffsetArray(zeros(length), indices)
  • Origin(origin)(zeros(length))

In the first case, providing the length is redundant given the axes. The second case is useful if we know the origin, but not as much if we know, say, the endpoint and the length. The missing feature is the ability to construct such an array given just the axes, which is precisely the convenience that zeros(indices) used to provide. Perhaps the solution is simply to introduce another wrapper type named Axes and ask users to call zeros(Axes(indices)) instead of zeros(indices). The upside of this is that this is very explicit, and unlikely to come as a surprise to a reader. The downside is the proliferation of types that exist merely as convenience features.

@rafaqz
Copy link
Author

rafaqz commented Jul 12, 2022

Yes thats exactly how DimensionalData.jl does it. X/Y/Z wrappers can be used in base methods like zero, fill, rand to get arrays with named axes and lookup keys. rand(X('a':'z'), Y(5)) gives a 26×5 array with X and Y dims, X having the alphabet as a lookup index.

OffsetArrays could be similar.

Axes seems a little generic to be exported without clashes (maybe Makie.jl already exports it?), what about Offset? Like zeros(Offset(10:20), 5)

That screams "This returns an OffsetArray".

Edit: no Makie.jl exports Axis. Maybe its fine, IDK.

@jishnub
Copy link
Member

jishnub commented Jul 13, 2022

offset has a different meaning in this package (the difference between firstindex(axis) and 1). Perhaps a name like OffsetAxes might be clearer and unlikely to clash with other symbols. Then again, we may end up not exporting this, like we haven't exported Origin.

@rafaqz
Copy link
Author

rafaqz commented Jul 13, 2022

Sure, if it's not exported you can have whatever. I was just wondering how to match the convenience of a range, and exporting something might help with that.

@wnoise
Copy link

wnoise commented Jul 15, 2022

Perhaps we need to retain the convenience of zeros(indices) in some way. Currently, the ways to construct such an array is
...
In the first case, providing the length is redundant given the axes.
...
The missing feature is the ability to construct such an array given just the axes, which is precisely the convenience that zeros(indices) used to provide. Perhaps the solution is simply to introduce another wrapper type named Axes and ask users to call zeros(Axes(indices)) instead of zeros(indices). The upside of this is that this is very explicit, and unlikely to come as a surprise to a reader. The downside is the proliferation of types that exist merely as convenience features.

Isn't the existing IdOffsetRange exactly the same thing as this new Axes type? A tuple of them is what gets returned for axes(::OffsetArray).

@jishnub
Copy link
Member

jishnub commented Jul 17, 2022

Yes, it seems essentially equivalent, except it doesn't subtype AbstractUnitRange. Thus, we won't see different behavior for 3:5 and OffsetArrays.IdOffsetRange(3:5).
Tbh I'm uncertain what's the best approach here. If we agree that the current behavior is convenient (I think it is), it might be possible to implement this in Base to avoid the type-piracy here (to some extent). This also obviates the need for additional types in this package. On the other hand, if the current behavior seems surprising (eg. why are zero(3:4) and zeros(3:4) different, and why is rand(3:4) something entirely different), then we may need to rethink how these methods should work.

The bigger type-piracy in this package is perhaps in similar. This might be more difficult to resolve.

@wnoise
Copy link

wnoise commented Jul 18, 2022

well, zero(3:4) should error instead of returning an array; an array is not the neutral element of a UnitRange. (There is no neutral element of UnitRange . StepRangeLen(0,0,len) is about as close as you can get, still a different type, but keeping it as a Range though.)

@aplavin
Copy link
Member

aplavin commented Jul 18, 2022

(3:4) + zero(3:4) == (3:4), so it can definitely be considered a neutral element.

If you want to require typeof(zero(x) + x) == typeof(x) for some reason, then zero methods for other types would also be removed. Among others, this identity doesn't hold for x = true, x = π and x = LinearAlgebra.I.

But it seems more useful for zero(3:4) to continue returning an abstractarray: either a vector as it does now, or a StepRangeLen for efficiency.

@wnoise
Copy link

wnoise commented Jul 19, 2022

If you want to require typeof(zero(x) + x) == typeof(x)

Indeed, I almost always do, because that's the natural reading of the docs. My fundamental point is that a range is not actually an array, not the collection of numbers it can be expanded into, but a promise about constant spacing (and density for UnitRanges). Converting to an array loses that information. There is currently a dispatch that fails (rather than merely losing efficiency) after a UnitRange is converted to an array, because it loses this information:

julia> r = 4:8
4:8

julia> mod(2, r)
7

julia> mod(2, r + zero(r))
ERROR: MethodError: no method matching mod(::Int64, ::Vector{Int64})
Closest candidates are:
  mod(::Union{Int128, Int16, Int32, Int64, Int8}, ::Unsigned) at int.jl:282
  mod(::Integer, ::Base.OneTo) at range.jl:1478
  mod(::Integer, ::AbstractUnitRange{<:Integer}) at range.jl:1479
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[27]:1

There are of course many others that merely lose efficiency -- e.g. minimum, etc.

x = true

I have always been suspicious of arithmetic on Bools. If you have to define it, it should be saturating and return Bools.

x = π

Heck, both x == x + zero(x) and the type one fails for this.

x = LinearAlgebra.I

This actually works! (except for the typeparameter changing from Bool to Int64 when unspecified, but see previous comment about arithmetic on Bools.)

julia> typeof(I)
UniformScaling{Bool}

julia> typeof(zero(I))
UniformScaling{Bool}

julia> typeof(I + zero(I))
UniformScaling{Int64}

julia> I + zero(I) == I
true

@rafaqz
Copy link
Author

rafaqz commented Jul 19, 2022

This may be an issue to raise in Base?

Here we have no ability to change the definition of zero, just zeros.

@johnnychen94
Copy link
Member

@jishnub @timholy do you think this package is stable enough to become a part of Base?

IMO similar is actually the consequence of OffsetArrays being conservative on pirating. If OffsetArray is a part of Base, it's then feasible to fix many nonconventional indexing semantics issues.

@jishnub
Copy link
Member

jishnub commented Jul 26, 2022

Certain parts from this package may be ported to Base, such as the similar and reshape methods, and IdOffsetRange. This should provide most functionalities. I'm not certain if we should add OffsetArray to Base, as ideally offset indexing is something that should be handled through traits.

@rafaqz
Copy link
Author

rafaqz commented Feb 20, 2023

Another example of this:
rafaqz/DimensionalData.jl#464 (comment)

Loading OffsetArrays.jl "fixes" a bug in Base.stack with DimensionalData.jl.

And critically, we test against OffsetArrays.jl, as AbstractArray extending packages should do.

But this will hide the bug

Base functions can pass our test suit and fail in general use because OffsetArrays.jl is loaded in our tests.

@jishnub
Copy link
Member

jishnub commented Feb 20, 2023

That's a bit unfortunate, although I think this is better fixed in Base, by defining an AbstractOneTo type that custom 1-based axes can subtype. This is the fundamental similar piracy which is at the core of this package, and might be harder to change than zeros and other convenience functions.

@rafaqz
Copy link
Author

rafaqz commented Feb 20, 2023

I'm not sure AbstractOneTo would solve this problem. The similar method in question can't be extended practically without type piracy, that only OffsetArrays.jl can do.

See: JuliaLang/julia#48736 (comment)

@jishnub
Copy link
Member

jishnub commented Feb 21, 2023

AbstractOneTo will not solve type piracy, but it'll ensure that similar for OneTo and equivalent types are always handled in Base, and only offset axes leak to this package.

@mauro3
Copy link

mauro3 commented May 24, 2024

This is another odd one:

julia> (1:10)[Base.IdentityUnitRange(5:7)]
5:7

julia> using OffsetArrays

julia> (1:10)[Base.IdentityUnitRange(5:7)]
OffsetArrays.IdOffsetRange(values=5:7, indices=5:7)

@jishnub
Copy link
Member

jishnub commented May 24, 2024

This is technically more consistent, but we should probably stop defining these methods for IdentityUnitRange.

@rafaqz
Copy link
Author

rafaqz commented May 24, 2024

But it's also worse than my example because it's changing the return type of a Base method that did not error.

To me this one is really bad.

"technically more consistent" can't apply to type piracy, it's by definition technically inconsistent.

@jishnub
Copy link
Member

jishnub commented Aug 19, 2024

How about a compromise for now: we allow projects to disable the type-piracy in OffsetArrays using Preferences.jl? This needs to be a one-time setup. This might confuse users of the package, but this might be clarified in the documentation of OffsetArrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants