Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support count min sketch data structure and most commands #2524

Open
wants to merge 39 commits into
base: unstable
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

references issue: #2425

The commands, IncrBy, Info, InitByDim, InitByProb, and Query have been made.

The merge command will still need working on, and go integration tests will be added for the future. I will also probably add a cache similar to the hyperloglog for in-memory operations.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go through the detail, just some style nits firstly

src/commands/commander.h Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Show resolved Hide resolved
src/types/cms.cc Outdated Show resolved Hide resolved
@jonathanc-n
Copy link
Contributor Author

Was there a change made to GetOptions recently? There seems to be a problem after the merge.

@mapleFU
Copy link
Member

mapleFU commented Sep 7, 2024

Was there a change made to GetOptions recently? There seems to be a problem after the merge.

Yes, the operations to db requires an options here.

@jonathanc-n
Copy link
Contributor Author

jonathanc-n commented Sep 8, 2024

I believe this pr should be good for further review

@mapleFU
Copy link
Member

mapleFU commented Sep 9, 2024

Nice, I'll take a round tonight

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take a look at redis sketch later

src/commands/cmd_cms.cc Outdated Show resolved Hide resolved
src/commands/cmd_cms.cc Outdated Show resolved Hide resolved
src/commands/cmd_cms.cc Outdated Show resolved Hide resolved
src/commands/cmd_cms.cc Outdated Show resolved Hide resolved
src/commands/cmd_cms.cc Outdated Show resolved Hide resolved
src/types/cms.cc Outdated Show resolved Hide resolved
src/types/cms.cc Outdated Show resolved Hide resolved
src/types/cms.cc Show resolved Hide resolved
src/types/redis_cms.cc Show resolved Hide resolved
src/types/cms.cc Outdated Show resolved Hide resolved
@jonathanc-n
Copy link
Contributor Author

@mapleFU do the changes seem right? sorry for the bother.

@jonathanc-n
Copy link
Contributor Author

@PragmaTwice Is the workflow able to run again?

@mapleFU
Copy link
Member

mapleFU commented Sep 12, 2024

do the changes seem right? sorry for the bother.

Sorry for delaying, I'll take a pass

uint32_t width;
uint32_t depth;
uint64_t counter = 0;
std::vector<uint32_t> array;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@git-hulk @PragmaTwice

I'm thinking about handling the array. Since regarding the cmsketch as a string is ok to me, but parsing it to an std::vector is a bit weird to me. Can we take a Buffer and do zero copying when doing this?

Like:

LoadMetadata from string. When storing, forcing LittleEndian in the array
When reading from metadata, report error if not length enough, and hold a sliced buffer on underlying data
When read/write, using memcpy to avoid unaligned access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it can be like how we handle the JSON data structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here I'm more interested in whether it's better to be a single key or putting the array in subkeys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here I'm more interested in whether it's better to be a single key or putting the array in subkeys.

Actually I think this in single key is ok since we merely "only query metadata" if not calling "info"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the limit of the size of such an array? Do we need to consider to split it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's good to have one rocksdb key with a 50M-bytes value.

The default engine doesn't handle this well. There're some blog engine which could do this, and we can enable the blob in kvrocks here.

Perhaps we can limit the size to 1MB firstly? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Also yeah we should not put the array inside the metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may try a round tonight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll adjust it to 1MB limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we want to hold the array in a buffer on storage, and do operations on that?

@mapleFU
Copy link
Member

mapleFU commented Sep 13, 2024

@jonathanc-n would you mind fix the lint?

src/types/cms.cc Outdated Show resolved Hide resolved
src/types/cms.cc Outdated Show resolved Hide resolved
src/types/cms.cc Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
@mapleFU
Copy link
Member

mapleFU commented Sep 13, 2024

I'm outing these days, so maybe late reply

@jonathanc-n
Copy link
Contributor Author

@mapleFU Alright, thanks for the reviews, I put the changes in for it.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. I'll update some comment before approve these two days

src/types/cms.cc Outdated Show resolved Hide resolved
src/types/redis_cms.cc Outdated Show resolved Hide resolved
@jonathanc-n
Copy link
Contributor Author

@mapleFU Srorry about that, fixed the lint.

@mapleFU
Copy link
Member

mapleFU commented Sep 29, 2024

No problem, we're almost their

src/types/redis_cms.cc Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
src/types/cms.h Outdated Show resolved Hide resolved
@mapleFU
Copy link
Member

mapleFU commented Sep 29, 2024

I'm a bit tired today and will revisit this tomorrow

src/types/cms.h Outdated Show resolved Hide resolved
src/types/redis_cms.cc Outdated Show resolved Hide resolved
src/types/redis_cms.cc Outdated Show resolved Hide resolved
jonathanc-n and others added 3 commits September 30, 2024 10:18
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
1. Add just IncrBy syntax (wip)
2. Extract a hash function rather than explicit xxh
3. Fix a bug in merge
@mapleFU
Copy link
Member

mapleFU commented Sep 30, 2024

I just reach home and have some minor updates, would continue tomorrow. Sorry for delaying

Comment on lines +197 to +198
// Initialize the destination CMS with the source CMSes after initializations
// since vector might resize and reallocate memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this might cause memory issue

@@ -165,33 +151,34 @@ rocksdb::Status CMS::MergeUserKeys(engine::Context &ctx, const Slice &user_key,
}

std::string dest_ns_key = AppendNamespacePrefix(user_key);
LockGuard guard(storage_->GetLockManager(), dest_ns_key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause deadlock

@jonathanc-n
Copy link
Contributor Author

@mapleFU Were you gonna make your own changes to the code?

@mapleFU
Copy link
Member

mapleFU commented Oct 11, 2024

Yes I'm back from holiday and working, would continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants