Skip to content

Commit

Permalink
Convert all tables to use UTF8MB4 charset
Browse files Browse the repository at this point in the history
We have seen an increasing number of errors recently relating to publishers attempting to use 4 byte characters in Whitehall.
This commit adds a data migration which will convert all of the tables to use the UTF8MB4 charset.
This uses a data migration because it is not subject to the 15 minute limit for schema migrations applied in the kubernetes configuration.

The migration will need to be run out of hours as the tables will need to be locked during conversion.
  • Loading branch information
ryanb-gds committed Jan 3, 2025
1 parent 664e8f9 commit 93b7648
Show file tree
Hide file tree
Showing 3 changed files with 185 additions and 174 deletions.
2 changes: 1 addition & 1 deletion config/database.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
default: &default
encoding: utf8
encoding: utf8mb4
adapter: mysql2
prepared_statements: true
variables:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Convert all tables to utf8mb4 in order to improve support for non-English languages.

connection = ActiveRecord::Base.connection
# Disable foreign key constraints because foreign keys using strings will prevent conversion due to a charset mismatch.
connection.execute "SET foreign_key_checks = 0;"
connection.tables.each do |table|
puts "START: Converting table #{table} to utf8mb4"
connection.execute "ALTER TABLE `#{table}` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
puts "END: Converted table #{table} to utf8mb4"
end
connection.execute "SET foreign_key_checks = 1;"
Loading

0 comments on commit 93b7648

Please sign in to comment.