Skip to content

Commit

Permalink
Convert all tables to use UTF8MB4 charset
Browse files Browse the repository at this point in the history
We have seen an increasing number of errors recently relating to publishers attempting to use 4 byte characters in Whitehall.
This commit adds a data migration which will convert all of the tables to use the UTF8MB4 charset.
This uses a data migration because it is not subject to the 15 minute limit for schema migrations applied in the kubernetes configuration.

The migration will need to be run out of hours as the tables will need to be locked during conversion.
  • Loading branch information
ryanb-gds committed Dec 20, 2024
1 parent 6d6c23c commit 1d3bfa7
Show file tree
Hide file tree
Showing 3 changed files with 188 additions and 175 deletions.
2 changes: 1 addition & 1 deletion config/database.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
default: &default
encoding: utf8
encoding: utf8mb4
adapter: mysql2
prepared_statements: true
variables:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Convert all tables to utf8mb4 in order to improve support for non-English languages.

connection = ActiveRecord::Base.connection
# Disable foreign key constraints because foreign keys using strings will prevent conversion due to a charset mismatch.
connection.execute "SET foreign_key_checks = 0;"
connection.tables.each do |table|
next if table == "schema_migrations"

puts "START: Converting table #{table} to utf8mb4"
connection.execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
puts "END: Converted table #{table} to utf8mb4"
end
connection.execute "SET foreign_key_checks = 1;"
Loading

0 comments on commit 1d3bfa7

Please sign in to comment.