- BUG FIX:
multipleOf
validation- FIX LINK
- Due to floating point errors in Python and JSONSchema,
multipleOf
validation has been failing.
- FEATURES:
JSONSchema: anyOf
Support- Streamed
JSONSchema
s which includeanyOf
combinations should now be fully supported - This allows for full support of Stitch/Singer's
DateTime
string fallbacks.
- Streamed
JSONSchema
: allOf` Support- Streamed
JSONSchema
s which includeallOf
combinations should now be fully supported - Columns are persisted as normal.
- This is perceived to be most useful for merging objects, and putting in place things like
maxLength
etc.
- Streamed
- BUG FIX: Buffer Flushing at frequent intervals/with small batches
- FIX LINK
- Buffer size calculations relied upon some "sophisticated" logic for determining the "size" in memory of a Python object
- The method used by Singer libraries is to simply use the size of the streamed
JSON
blob - Performance Improvement seen due to batches now being far larger and interactions with the remote being far fewer.
- BUG FIX:
NULLABLE
not being implied when field is missing from streamedJSONSchema
- FIX LINK
- If a field was persisted in remote, but then left out of a subsequent streamed
JSONSchema
, we would fail - In this instance, the field is implied to be
NULL
, but additionally, if values are present for it in the streamed data, we should persist it.
- FEATURES:
- Performance improvement for upserting data
- Saw long running queries for some
SELECT COUNT(1)...
queries- Resulting in full table scans
- These queries are only being used for
is_table_empty
, therefore we can use a more efficientSELECT EXISTS(...)
query which only needs a single row to be fetched
- Saw long running queries for some
- Performance improvement for upserting data
- FEATURES:
- Performance improvement for upserting data
- For large or even reasonably sized tables, trying to upsert the data was prohibitively slow
- To mitigate this, we now add indexes to allow
- This change can be opted out of via the
add_upsert_indexes
config option - NOTE: This only effects intallations post
0.2.1
, and will not upgrade/migrate existing installations
- Support for latest PostgreSQL 12.0
- PostgreSQL recently released 12.0, and we now have testing around it and can confirm that
target-postgres
should function correctly for it!
- PostgreSQL recently released 12.0, and we now have testing around it and can confirm that
- Performance improvement for upserting data
- BUG FIX:
STATE
messages being sent at the wrong time- FIX LINK
STATE
messages were being output incorrectly for feeds which had many streams outputting at varying rates
- NOTE: The
minor
version bump is not expected to have much effect on folks. This was done to signal the output change from the below bug fix. It is our impression not many are using this feature yet anyways. Since this was not apatch
change, we decided to make this aminor
instead ofmajor
change to raise less concern. Thank you for your patience! - FEATURES:
- Performance improvement for creating
tmp
tables necessary for uploading data- PostgreSQL dialects allow for creating a table identical to a parent table in a single command
CREATE TABLE <name> (LIKE <parent-name>);
- Previously we leveraged using our
upsert
helpers to create new tables. This resulted in many calls to remote, of varying complexity.
- Performance improvement for creating
- BUG FIX: No
STATE
Message Wrapper necessary- FIX LINK
STATE
messages are formatted as{"value": ...}
target-potgres
emitted the full message- The official
singer-target-template
, doesn't write out thatvalue
"wrapper", and just writes the JSON blob contained in it - This fix makes
target-postgres
do the same
- BUG FIX:
canonicalize_identifier
Not called on all identifiers persisted to remote- FIX LINK
- Presently, on column splits/name collisions, we add a suffix to an identifier
- Previously, we did not canonicalize these suffixes
- While this was not an issue for any
targets
currently in production, it was an issue for some up and comingtargets
. - This fix simply makes sure to call
canonicalize_identifier
before persisting an identifier to remote
- FEATURES:
- Root Table Name Canonicalization
- The
stream
name is used for the value of the root table name in Postgres stream
names are controlled exclusively by the tap and do not have to meet many standards- Previously, only
stream
names which were lowercase, alphanumeric, etc. - Now, the
target
can canonicalize the root table name, allowing for the inputstream
name to be whatever thetap
provides.
- The
- Root Table Name Canonicalization
- Singer-Python: bumped to latest 5.6.1
- Psycopg2: bumped to latest 2.8.2
- FEATURES:
STATE
Message support- Emits message only when all records buffered before the
STATE
message have been persisted to remote.
- Emits message only when all records buffered before the
- SSL Support for Postgres
- Added config options for enabling/supporting SSL support.
- BUG FIX:
ACTIVATE_VERSION
Messages did not flush buffer- FIX LINK
- When we issue an activate version record, we presently do not flush the buffer after writing the batch. This results in more records being written to remote than need to be.
- This results in no functionality change, and should not alleviate any known bugs.
- This should be purely performance related.
- Singer-Python: bumped to latest
- Minor housekeeping:
- Updated container versions to latest
- Updated README to reflect new versions of PostgreSQL Server
- BUG FIX: A bug was identified for de-nesting.
- ISSUE LINK
- FAILING TESTS LINK
- FIX LINK
- Subtables with subtables did not serialize column names correctly
- The column names ended up having the table names (paths) prepended on them
- Due to the denested table schema and denested records being different no information showed up in remote.
- This bug was ultimately tracked down to the core denesting logic.
- This will fix failing uploads which had nullable columns in subtables but
no data was seen populating those columns.
- The broken schema columns will still remain
- Failing schemas which had non-null columns in subtables will still be broken
- To fix will require dropping the associated tables, potentially resetting the entire
db
/schema
- To fix will require dropping the associated tables, potentially resetting the entire
- BUG FIX: A bug was identified for path to column serialization.
- LINK
- A nullable properties which had multiple JSONSchema types
- ie, something like
[null, string, integer ...]
- Failed to find an appropriate column in remote to persist
None
values to.
- ie, something like
- Found by usage of the Hubspot Tap
- FEATURES:
- Added the
persist_empty_tables
config option which allows the Target to create empty tables in Remote.
- Added the
- BUG FIX: A bug was identified in 0.1.3 with stream
key_properties
and canonicalization.- LINK
- Discovered and fixed by @mirelagrigoras
- If the
key_properties
for a stream changed due to canonicalization, the stream would fail to persist due to:- the
persist_csv_rows
key_properties
values would remain un-canonicalized (sp?) and therefore cause issues once serialized into a SQL statement - the pre-checks for tables would break because no values could be pulled from the schema with un-canonicalized fields pulled out of the
key_properties
- the
- NOTE: the
key_properties
metadata is saved with raw field names.
- SCHEMA_VERSION: 1
- LINK
- Initialized a new field in remote table schemas
schema_version
- A migration in
PostgresTarget
handles updating this
- BUG FIX: A bug was identified in 0.1.2 with column type splitting.
- LINK
- A schema with a field of type
string
is persisted to remote- Later, the same field is of type
date-time
- The values for this field will not be placed under a new column, but rather under the original
string
column
- The values for this field will not be placed under a new column, but rather under the original
- Later, the same field is of type
- A schema with a field of type
date-time
is persisted to remote- Later, the same field is of type
string
- The original
date-time
column will be madenullable
- The values for this field will fail to persist
- The original
- Later, the same field is of type
- FEATURES:
- Added the
logging_level
config option which uses standard Python Logger Levels to configure more details about what Target-Postgres is doing- Query level logging and timing
- Table schema changes logging and timing
- Added the