-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support count min sketch data structure and most commands #2524
base: unstable
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go through the detail, just some style nits firstly
Was there a change made to GetOptions recently? There seems to be a problem after the merge. |
Yes, the operations to db requires an options here. |
I believe this pr should be good for further review |
Nice, I'll take a round tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would take a look at redis sketch later
@mapleFU do the changes seem right? sorry for the bother. |
@PragmaTwice Is the workflow able to run again? |
Sorry for delaying, I'll take a pass |
uint32_t width; | ||
uint32_t depth; | ||
uint64_t counter = 0; | ||
std::vector<uint32_t> array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about handling the array
. Since regarding the cmsketch as a string is ok to me, but parsing it to an std::vector
is a bit weird to me. Can we take a Buffer
and do zero copying when doing this?
Like:
LoadMetadata from string. When storing, forcing LittleEndian in the array
When reading from metadata, report error if not length enough, and hold a sliced buffer on underlying data
When read/write, using memcpy to avoid unaligned access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it can be like how we handle the JSON data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here I'm more interested in whether it's better to be a single key or putting the array in subkeys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here I'm more interested in whether it's better to be a single key or putting the array in subkeys.
Actually I think this in single key is ok since we merely "only query metadata" if not calling "info"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the limit of the size of such an array? Do we need to consider to split it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's good to have one rocksdb key with a 50M-bytes value.
The default engine doesn't handle this well. There're some blog engine which could do this, and we can enable the blob in kvrocks here.
Perhaps we can limit the size to 1MB firstly? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Also yeah we should not put the array inside the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may try a round tonight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll adjust it to 1MB limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do we want to hold the array in a buffer on storage, and do operations on that?
@jonathanc-n would you mind fix the lint? |
I'm outing these days, so maybe late reply |
@mapleFU Alright, thanks for the reviews, I put the changes in for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM. I'll update some comment before approve these two days
@mapleFU Srorry about that, fixed the lint. |
No problem, we're almost their |
I'm a bit tired today and will revisit this tomorrow |
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
1. Add just IncrBy syntax (wip) 2. Extract a hash function rather than explicit xxh 3. Fix a bug in merge
I just reach home and have some minor updates, would continue tomorrow. Sorry for delaying |
// Initialize the destination CMS with the source CMSes after initializations | ||
// since vector might resize and reallocate memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this might cause memory issue
src/types/redis_cms.cc
Outdated
@@ -165,33 +151,34 @@ rocksdb::Status CMS::MergeUserKeys(engine::Context &ctx, const Slice &user_key, | |||
} | |||
|
|||
std::string dest_ns_key = AppendNamespacePrefix(user_key); | |||
LockGuard guard(storage_->GetLockManager(), dest_ns_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause deadlock
@mapleFU Were you gonna make your own changes to the code? |
Yes I'm back from holiday and working, would continue. |
references issue: #2425
The commands, IncrBy, Info, InitByDim, InitByProb, and Query have been made.
The merge command will still need working on, and go integration tests will be added for the future. I will also probably add a cache similar to the hyperloglog for in-memory operations.