Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more detailed data structure information #161

Open
treeowl opened this issue Jun 26, 2017 · 9 comments
Open

Add more detailed data structure information #161

treeowl opened this issue Jun 26, 2017 · 9 comments

Comments

@treeowl
Copy link
Collaborator

treeowl commented Jun 26, 2017

Off the top of my head:

  • Are Empty subtrees trees allowed everywhere? Nowhere? Some places but not others?
  • Exactly how is the bitmap represented? That seems relatively tricky, and particularly important for improving the test suite.
  • What is the endianness of hash consumption?
@tibbe
Copy link
Collaborator

tibbe commented Jun 27, 2017

Are Empty subtrees trees allowed everywhere? Nowhere? Some places but not others?

They are allowed everywhere but as an optimization they should only appear in an empty map. The point of BitmapIndexed is to avoid having to store pointers to Empty in the tree. It would be possible to refactor the data structure to

data HashMap k v = Empty | Tree k v
data Tree k v = ...  -- the rest

I don't know if that's a win or not. One related thing I've considered is to have

data HashMap k v = Empty | Root k v
data Root k v = Root !Size !(Tree k v)
data Tree k v = ...

if we ever wanted to stored the size (for O(1) lookup). Function like insert would have to return the diff in size and update the root.

Exactly how is the bitmap represented? That seems relatively tricky, and particularly important for improving the test suite.

Just like the HAMT data structure presented in the original paper: the N bits of the bitmap represent the N-subtrees that are present. The array only contains the present subtrees with no empty trees in-between. We use popcount to go between the logical and compact indices.

What is the endianness of hash consumption?

Are you asking which end of the hash we use first when indexing or are you asking how hashable hashes various machine word-sized types?

@treeowl
Copy link
Collaborator Author

treeowl commented Jun 27, 2017 via email

@tibbe
Copy link
Collaborator

tibbe commented Jun 28, 2017

Well, if they're not supposed to be in non-empty maps, that can be part of
a validity check.

Sure. Perhaps add a macro-like check like we have for other things. You might have to add it to every function that mutates the map however. Or perhaps use smart constructors that validate the invariants.

I don't see how that will work for unions or intersections.

I haven't thought it through. Perhaps we'd just recompute the size from scratch while doing the union. Each recursive call to union could return the subtree size.

I meant which end of the hash is used first, but I guess the Hashable
question must interact with that for performance.

It uses the low order bits first. See D.HM.Base.index. Either could work, it depends on which side you expect to be better behaved. For example, if you were storing pointer and the Hashable instance for those was identity then you'd expect the lowest three bits to be zero on 64-bit platforms due to alignment.

@treeowl
Copy link
Collaborator Author

treeowl commented Jun 28, 2017 via email

@tibbe
Copy link
Collaborator

tibbe commented Jun 28, 2017

Can't unions skip all the processing of subtrees that are only found in one
of the maps? Descending them all just to count could be pricy.

Possibly. As I said I haven't thought this through fully.

I'd expect the pointer hash to rotate three bits to the right.

Yes that's a common approach (or xor high and low bits).

@sjakobi
Copy link
Member

sjakobi commented Jul 13, 2020

Are Empty subtrees trees allowed everywhere? Nowhere? Some places but not others?

They are allowed everywhere but as an optimization they should only appear in an empty map. The point of BitmapIndexed is to avoid having to store pointers to Empty in the tree.

Based on #154 (comment), it seems that the Eq instance relies on Empty only existing at the top-level, not as a subtree.

@treeowl
Copy link
Collaborator Author

treeowl commented Jul 13, 2020

Correct. Empty only for the whole tree.

@sjakobi
Copy link
Member

sjakobi commented Jul 30, 2020

For reference, there was some discussion of invariants in #282 (comment).

IMHO it's very important that these invariants are documented (ideally in the code), and also tested.

@sjakobi
Copy link
Member

sjakobi commented May 3, 2022

#444 has made some progress on documenting and checking the invariants.

Better documentation in proper prose and with examples is still needed though.

EDIT: #425 and #450 are related.

@tibbe tibbe removed their assignment May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants