Skip to content

6.0.0

Compare
Choose a tag to compare
@github-actions github-actions released this 19 Mar 09:28
· 32 commits to master since this release

[Redshift-only] New migration mechanism & recovery tables

Previously, Redshift loaders would migrate the shredded table to the latest available schema version. This could lead to a race condition between transformer & loader.

As of 6.0.0, loader will migrate the shredded table to the latest schema version discovered in the shredding_complete payload (rather than the latest existing version). Also, thanks to the new file hierarchy described below, the loader is able to issue one COPY statement per schema version. This enables the loader to decide on the exact set of columns.

Also, we are introducing a new mechanism to prevent the loader from failing when the schema is not evolved correct. You can find more information about it in here.

[Redshift-only] Monitoring recovery tables

Previous versions have been printing the table name to stdout. As of 6.0.0, in case an event is loaded to a recovery table, the name of that recovery table will be printed instead.

In case webhook is configured, previous recent versions would use load_succeeded/3-0-0 to report information about the successful load.

As of 6.0.0, loader will use load_succeeded/3-0-1 schema which comes with $.recoveryTableNames key to report the list of names of recovery tables loaded in the batch (breaking schema keys from shredding_complete payload).

[Redshift-only] $.featureFlags.disableMigration configuration

RDB Loader 6.0.0 introduces a new configuration, $.featureFlags.disableMigration, a list of schema criterion to disable migration for.

For the provided schema criterions only, RDB Loader will neither migrate the corresponding shredded table nor create recovery tables for breaking schema versions. Loader will attempt to load to the corresponding shredded table without migrating.

This is useful if you have older schemas with breaking changes and don’t want the loader to apply the new logic to them.

New file hierarchy for shredded events

Both batch & stream transformers would write shredded events based on the following scheme so far

vendor/name/model

As of 6.0.0, all transformers will use the following scheme

vendor/name/model/revision/addition

which increases granularity of the output, enabling higher precision in downstream usage.

Removal of padding \N char

Transformers write events to S3 to be loaded by Redshift. For the loading command to work, all events at a given path (e.g. com.acme/button_click/1) must follow the same format. A batch, however, may contain events with different versions of a given schema. In particular, events with a newer schema might have new fields not present in the events with an older one.

Previously, transformers solved this problem by formatting all events according to the latest version of the schema and using the \N character in case of missing fields.

As of 6.0.0, there is no need to do that, because — as explained above — events using different versions of a schema are written to different paths.

New license

Following our recent licensing announcement, RDB Loader is now released under the Snowplow Limited Use License Agreement.

Changelog

  • Bump AWS SDK to 1.12.677 (#1344)
  • Bump commons-compress to 1.26.0 (#1344)
  • Bump nimbus-jose-jwt to 9.37.2 (#1344)
  • Add mandatory SLULA license acceptance flag (#1344)
  • Bump schema-ddl to 0.22.1 (#1342)
  • Bump AWS SDK to 2.23.17 (#1339)
  • pubsub transformer: increase subscriber's awaitTermiantePeriod (#1328)
  • pubsub transformer: Increase default value of minDurationPerAckExtension (#1326)
  • Loader: Fix column names for shredded tables (#1332)
  • Redshift loader: send statsd metrics for recovery tables (#1331)
  • Quote column names in Redshift load statements (#1330)
  • Loader: Report recovery table names in load_succeeded payload (#1318)
  • Loader: Fix table name in COPY logs (#1316)
  • Upgrade schema-ddl to 0.20.0 (#1265)
  • Move to Snowplow Limited Use License (#1345)