Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert all tables to use UTF8MB4 charset #9767

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ryanb-gds
Copy link
Contributor

We have seen an increasing number of errors recently relating to publishers attempting to use 4 byte characters in Whitehall. This commit adds a data migration which will convert all of the tables to use the UTF8MB4 charset. This uses a data migration because it is not subject to the 15 minute limit for schema migrations applied in the kubernetes configuration.

The migration will need to be run out of hours as the tables will need to be locked during conversion.

Trello: https://trello.com/c/aTlLraLX

@ryanb-gds ryanb-gds marked this pull request as ready for review January 3, 2025 16:22
Copy link
Contributor

@ChrisBAshton ChrisBAshton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please drop commits 2 and 3 though, not much value in committing those.

# Disable foreign key constraints because foreign keys using strings will prevent conversion due to a charset mismatch.
connection.execute "SET foreign_key_checks = 0;"
connection.tables.each do |table|
next if table == "schema_migrations"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth a quick inline comment on why this is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, this is a data migration, so this isn't necessary any longer. I think I'm right in saying Active Record migrations lock the migrations table (or used to? don't know if they still do actually) while the migration is running to stop multiple migrations running simultaneously. This would prevent the conversion from running for the table. Data migrations don't appear to do that, so we no longer need that line.

We have seen an increasing number of errors recently relating to publishers attempting to use 4 byte characters in Whitehall.
This commit adds a data migration which will convert all of the tables to use the UTF8MB4 charset.
This uses a data migration because it is not subject to the 15 minute limit for schema migrations applied in the kubernetes configuration.

The migration will need to be run out of hours as the tables will need to be locked during conversion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants