Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LegacySSTableTest method testVerifyOldTupleSSTables for frozen tuples #1440

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

michaelsembwever
Copy link
Member

@michaelsembwever michaelsembwever commented Nov 26, 2024

What is the issue

DSP-24600 Unreadable SSTables upgrading from DSE 5.1.x to DSE 6.8.36 and up

All the SSTables for the table that has the tuple (with something freezable inside it) is unreadable with newer versions. While it may report these SSTables as corrupt, they are not – as they have not been altered in any way. A rollback to the previous version will make them readable again. In an online upgrade we expect this problem to be identified when the first node has been upgraded, and definitely before the cluster is finished upgraded, and rolling back an in-progress upgrade is normal (though not ideal) operation.

What does this PR fix and why was it fixed

Adds unit test to reproduce the issue.
Fix is @roxananeo's commit, formats me and earlier are marked as having implicitly frozen tuples.

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

roxananeo and others added 5 commits November 28, 2024 10:29
When upgrading from 5.1 to HCD we noticed that UDTs are not correctly serialized/deserialized. This is fixable by allowing the fix() function
from the SerializationHeader class to execut which in turn will happen if the `hasImplicitFrozenTuples` returns true.
As this function was returning always false the fix() was never attempted.
@michaelsembwever michaelsembwever marked this pull request as ready for review November 30, 2024 14:11
@michaelsembwever
Copy link
Member Author

@jacek-lewandowski jacek-lewandowski self-requested a review December 5, 2024 07:05
@@ -348,7 +348,7 @@ static class TrieIndexVersion extends Version
hasOriginatingHostId = version.matches("(a[d-z])|(b[b-z])") || version.compareTo("ca") >= 0;
hasMaxColumnValueLengths = version.matches("b[a-z]"); // DSE only field
correspondingMessagingVersion = version.compareTo("ca") >= 0 ? MessagingService.VERSION_SG_10 : MessagingService.VERSION_3014;
hasExplicitlyFrozenTuples = version.compareTo("cc") < 0 || version.compareTo("da") >= 0; // we don't know if what DA is going to be eventually, but it is almost certain it will not include explicitly frozen tuples
hasImplicitlyFrozenTuples = version.compareTo("cc") < 0 || version.compareTo("da") >= 0; // we don't know if what DA is going to be eventually, but it is almost certain it will not include implicitly frozen tuples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the comment is probably off, hmm...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is - "DA ... will include implicitly frozen tuples" (because the patch was not applied on OSS... yet)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will adjust

@@ -84,7 +84,7 @@ public static void fixNonFrozenUDTIfUpgradeFrom30()
CassandraVersion previousVersion = new CassandraVersion(previousVersionString);
if (previousVersion.major != 3 || previousVersion.minor > 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo, this also applies to 3.1 to 3.5
rather than support that, we should probably just add an assertion here that we're not upgrading from those versions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest thing we could do is to reject ma sstables as we would never know whether they are fixed or not. For the safety, the user needs to convert such tables manually with its knowledge of the history

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:

based on agreement to drop support for big sstables before me (which makes sense as HCD-1.1 only needs to support upgrades from 3.0.25+, 3.11.11+ and DSE 5.1)

  • BigFormat.earliest_supported_version raised to me
    • need to investigate if scrub honours this too, and that means SSTableHeaderFix can be deleted altogether,
  • SSTableHeaderFix is no longer called from CassandraDaemon.
  • add startup checks that we're not upgrading from <3.0.25 or <3.11.11

@djatnieks
Copy link

Will porting this to main-5.0 require special consideration?

@michaelsembwever
Copy link
Member Author

Will porting this to main-5.0 require special consideration?

No, main-5.0 only further raises online compatibility to big format na
And for HCD, we're not initially planning to support upgrades from C* 3.x or DSE 5.1

Furthermore SSTableHeaderFix has been removed altogether there, so we're converging.
I need to investigate if stuff in the tests warrant upstreaming… (I would say we should be using LegacySSTableTest a lot more! given how efficient it is at catching upgrade sstable issues)

@michaelsembwever
Copy link
Member Author

There's a few test failures here, so still wip…

…nly from versions >=3.0.25, >=3.11.11, and dse-5.1

this removes the need for, and all bugs related to, SSTableHeaderFix
startup with fail hard if detecting a previous version <3.0.25 or <3.11.11
remove all test data for now unsupported sstable formats
…rted_version applies to both online and offline
Copy link

sonarqubecloud bot commented Dec 7, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
41.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@michaelsembwever
Copy link
Member Author

There's a few test failures here, so still wip…

most of the failures (maybe all) are fixed…

Copy link
Collaborator

@roxananeo roxananeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

sonarqubecloud bot commented Jan 9, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
41.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1440 rejected by Butler


6 new test failure(s) in 11 builds
See build details here


Found 6 new test failures

Test Explanation Branch history Upstream history
...estMaxSSTablesToCompact[useDiskBoundaries true] regression 🔴🔴🔴🔴🔴🔴🔴 🔵🔵🔵🔵🔵🔵🔵
...tHybridSearchHoleInClusteringColumnOrdering[dc] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...t,wide=false,scenario=MEMTABLE_QUERY] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...ii,ascii>>,wide=false,scenario=COMPACTED_QUERY] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...p<int,int>>>,wide=true,scenario=MEMTABLE_QUERY] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...t,varint>,wide=false,scenario=POST_BUILD_QUERY] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 57 known test failures

@@ -197,13 +197,16 @@ public PartitionIndexIterator indexIterator(Descriptor descriptor, TableMetadata
static class BigVersion extends Version
{
public static final String current_version = "nb";
public static final String earliest_supported_version = "ma";
public static final String earliest_supported_version = "me";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I don't see a reason to limit to me. This would be incompatible with OSS as even Cassandra 5.0 still supports ma.

Copy link
Member Author

@michaelsembwever michaelsembwever Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I don't see a reason to limit to me.

The server can no longer run SSTableHeaderFix as it breaks good sstables.
All sstables < me can be assumed to be bad sstables (if the schema contains the issue).

Astra no longer needs to upgrade from, or process, me sstables anymore. So in the CC codebase this becomes a concern for HCD and for sideloader (and anyone else that uses the CC codebase?)

When an upgraded HCD node starts and sees a sstable < me we benefit from fail-fast. This informs the user that they must upgrade (header fix + scrub) these sstables before performing the upgrade.

It might be possible to do this in other ways (e.g. StartupChecks), and if we want to restore the SSTableHeaderFix for the sake of tooling and offline upgradability, that's a valid discussion. But the PR presents the simpler approach first.

This would be incompatible with OSS as even Cassandra 5.0 still supports ma.

No. CC can provide a compatibility subset of what C* 5.0 provides.
(More importantly, 5.0 server-side doesn't support < me sstables when the issue exists, regardless of what minimum_version pretends.)

Also, C* 5.0 (and CC main-5) has already removed SSTableHeaderFix.

Copy link
Collaborator

@jacek-lewandowski jacek-lewandowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on removing SSTableHeaderFix. I don't understand the advantages of this step. Instead, we could limit what it does to just fix the UT columns and do not touch the collections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants