CNDB-12460: Fix PQVector reencoding when refining in CompactionGraph #1506

jkni · 2025-01-15T17:21:01Z

What is the issue

When refining PQVectors in CompactionGraph, indexing of vectors may fail because vectors in the graph aren't yet encoded, or vectors may incorrectly be encoded as all zeroes.

What does this PR fix and why was it fixed

When re-encoding vectors after refinement, use the previous count of compressed vectors rather than the max ordinal in the graph. Because adding to the graph is asynchronous, there may be encoded vectors that aren't in the graph. The old approach would not re-encode any vectors with ordinals above the max ordinal. Depending on timing, this could cause vectors to fail to index (if they try to compare to an ordinal that isn't in the encoded vectors) or be encoded as all zeroes (due to the code that fills holes in compressed vectors).

Checklist before you submit for review

Make sure there is a PR in the CNDB project updating the Converged Cassandra version
Use NoSpamLogger for log lines that may appear frequently in the logs
Verify test results on Butler
Test coverage for new/modified code is > 80%
Proper code formatting
Proper title for each commit staring with the project-issue number, like CNDB-1234
Each commit has a meaningful description
Each commit is not very long and contains related changes
Renames, moves and reformatting are in distinct commits

…already encoded vectors rather than the max ordinal in the graph, as addition to the graph is asynchronous and some encoded vectors may not yet be indexed.

jbellis · 2025-01-15T17:33:41Z

Restating the problem:

Although compressedVectors.count() == builder.getGraph().getIdUpperBound() when all insertions have completed, these are not equivalent during construction since addGraphNode (which modifies the graph upper bound) is called separately and asynchronously, so refine() needs to use the former instead of the latter.

Fix LGTM.

sonarqubecloud · 2025-01-15T17:57:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-01-15T18:03:36Z

❌ Build ds-cassandra-pr-gate/PR-1506 rejected by Butler

1 new test failure(s) in 1 builds
See build details here

Found 1 new test failures

Test	Explanation	Branch history	Upstream history
...ToolEnableDisableBinaryTest.testMaybeChangeDocs	regression	🔴	🔵🔵🔵🔵🔵🔵🔵

Found 3 known test failures

CNDB-12460: When refining PQVectors in CompactionGraph, use count of …

da000f5

…already encoded vectors rather than the max ordinal in the graph, as addition to the graph is asynchronous and some encoded vectors may not yet be indexed.

jkni requested review from jbellis and michaeljmarshall January 15, 2025 17:21

jbellis approved these changes Jan 15, 2025

View reviewed changes

michaeljmarshall approved these changes Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNDB-12460: Fix PQVector reencoding when refining in CompactionGraph #1506

CNDB-12460: Fix PQVector reencoding when refining in CompactionGraph #1506

jkni commented Jan 15, 2025 •

edited

Loading

jbellis commented Jan 15, 2025 •

edited

Loading

sonarqubecloud bot commented Jan 15, 2025

cassci-bot commented Jan 15, 2025

CNDB-12460: Fix PQVector reencoding when refining in CompactionGraph #1506

Are you sure you want to change the base?

CNDB-12460: Fix PQVector reencoding when refining in CompactionGraph #1506

Conversation

jkni commented Jan 15, 2025 • edited Loading

What is the issue

What does this PR fix and why was it fixed

Checklist before you submit for review

jbellis commented Jan 15, 2025 • edited Loading

sonarqubecloud bot commented Jan 15, 2025

Quality Gate passed

cassci-bot commented Jan 15, 2025

❌ Build ds-cassandra-pr-gate/PR-1506 rejected by Butler

Found 1 new test failures

Found 3 known test failures

jkni commented Jan 15, 2025 •

edited

Loading

jbellis commented Jan 15, 2025 •

edited

Loading