-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more detailed data structure information #161
Comments
They are allowed everywhere but as an optimization they should only appear in an empty map. The point of data HashMap k v = Empty | Tree k v
data Tree k v = ... -- the rest I don't know if that's a win or not. One related thing I've considered is to have data HashMap k v = Empty | Root k v
data Root k v = Root !Size !(Tree k v)
data Tree k v = ... if we ever wanted to stored the size (for O(1) lookup). Function like
Just like the HAMT data structure presented in the original paper: the N bits of the bitmap represent the N-subtrees that are present. The array only contains the present subtrees with no empty trees in-between. We use popcount to go between the logical and compact indices.
Are you asking which end of the hash we use first when indexing or are you asking how hashable hashes various machine word-sized types? |
On Jun 26, 2017 11:56 PM, "Johan Tibell" <notifications@github.com> wrote:
Are Empty subtrees trees allowed everywhere? Nowhere? Some places but not
others?
They are allowed everywhere but as an optimization they should only appear
in an empty map. The point of BitmapIndexed is to avoid having to store
pointers to Empty in the tree.
Well, if they're not supposed to be in non-empty maps, that can be part of
a validity check.
It would be possible to refactor the data structure to
data HashMap k v = Empty | Tree k vdata Tree k v = ... -- the rest
I don't know if that's a win or not.
Well, it adds an indirection, which would matter if you have a container of
maps. I really wish the newtype optimization were available for GADTs under
special circumstances; such circumstances apply here and would make this
free.
One related thing I've considered is to have
data HashMap k v = Empty | Root k vdata Root k v = Root !Size !(Tree k
v)data Tree k v = ...
if we ever wanted to stored the size (for O(1) lookup). Function like insert
would have to return the diff in size and update the root.
I don't see how that will work for unions or intersections.
*Exactly how is the bitmap represented?* That seems relatively tricky, and
particularly important for improving the test suite.
Just like the HAMT data structure presented in the original paper: the N
bits of the bitmap represent the N-subtrees that are present. The array
only contains the present subtrees with no empty trees in-between. We use
popcount to go between the logical and compact indices.
So the popcount of the bitmap must equal the size of the array? That sounds
like another good fact for generating test cases and checking validity.
Given a logical index, we mask off all but that many bits of the bitmap and
popcount to get the compact index? Given a compact index, I don't see a
particularly nice way to get the logical index; does that use a binary
search, or is it just never necessary?
What is the endianness of hash consumption?
Are you asking which end of the hash we use first when indexing or are you
asking how hashable hashes various machine word-sized types?
I meant which end of the hash is used first, but I guess the Hashable
question must interact with that for performance.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#161 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABzi_ZBJH_HQHyFzZpq44P1Y6D-lRj7eks5sIH14gaJpZM4OFmPM>
.
|
Sure. Perhaps add a macro-like check like we have for other things. You might have to add it to every function that mutates the map however. Or perhaps use smart constructors that validate the invariants.
I haven't thought it through. Perhaps we'd just recompute the size from scratch while doing the union. Each recursive call to union could return the subtree size.
It uses the low order bits first. See |
On Tue, Jun 27, 2017 at 8:49 PM, Johan Tibell ***@***.***> wrote:
Well, if they're not supposed to be in non-empty maps, that can be part of
a validity check.
Sure. Perhaps add a macro-like check like we have for other things. You
might have to add it to every function that mutates the map however. Or
perhaps use smart constructors that validate the invariants.
I just meant that the test suite can say `valid result .&&. ....` in each
test.
I don't see how that will work for unions or intersections.
I haven't thought it through. Perhaps we'd just recompute the size from
scratch while doing the union. Each recursive call to union could return
the subtree size.
Can't unions skip all the processing of subtrees that are only found in one
of the maps? Descending them all just to count could be pricy.
I meant which end of the hash is used first, but I guess the Hashable
question must interact with that for performance.
It uses the low order bits first. See D.HM.Base.index. Either could work,
it depends on which side you expect to be better behaved. For example, if
you were storing pointer and the Hashable instance for those was identity
then you'd expect the lowest three bits to be zero on 64-bit platforms due
to alignment.
I'd expect the pointer hash to rotate three bits to the right.
|
Possibly. As I said I haven't thought this through fully.
Yes that's a common approach (or xor high and low bits). |
Based on #154 (comment), it seems that the |
Correct. |
For reference, there was some discussion of invariants in #282 (comment). IMHO it's very important that these invariants are documented (ideally in the code), and also tested. |
Off the top of my head:
Empty
subtrees trees allowed everywhere? Nowhere? Some places but not others?The text was updated successfully, but these errors were encountered: