-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Higher log2m value not always producing more accurate estimates #15
Comments
Here's a test case demonstrating the issue (requires Guava to use murmur hash)...
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
My understanding from the HLL algorithm (which may be flawed, in which case please correct me and close this issue) is that for any fixed set of input values, the accuracy of any estimate from an HLL built from those values should increase as the "m" value used in the HLL increases.
Ie:
In my testing however, I'm frequently encountering situations where "smaller" HLL instances are producing more accurate cardinality estimates -- which I can't explain.
I've created a reproducible test case that demonstrates the problem, which i will post as a separate comment.
The text was updated successfully, but these errors were encountered: