Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature add docs on oldkeys #188

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ Parameters
* `include-type-oids`: add type oids. Default is _false_.
* `include-domain-data-type`: replace domain name with the underlying data type. Default is _false_.
* `include-column-positions`: add column position (_pg_attribute.attnum_). Default is _false_.
* `include-origin`: add origin of a piece of data. Default is _false_.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this was missing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops. Good catch!

* `include-not-null`: add _not null_ information as _columnoptionals_. Default is _false_.
* `include-default`: add default expression. Default is _false_.
* `include-pk`: add _primary key_ information as _pk_. Column name and data type is included. Default is _false_.
Expand Down Expand Up @@ -491,6 +492,39 @@ DROP TABLE
DROP TABLE
```

Explanation of format 1 data
=============================
Each entry has the following fields:

| key | value | optional (with option to display) |
|-----|-------|-----------------------------------|
| xid | transaction id of the changeset | optional (absent `include-xids`) |
| nextlsn | the next lsn of the changeset | optional (absent `include-lsn`) |
| timestamp | the timestamp each changeset was commited | optional (absent `include-timestamp`) |
| origin | the origin that a tuple came from [ref](https://www.highgo.ca/2020/04/18/the-origin-in-postgresql-logical-decoding/) | optional (absent `include-origin`) |
| change | array of changes where each change is a json struct containing data about a single change in the transaction | required |
| change.kind | the kind of change, can be one of insert/update/delete/truncate | required |
| change.schema | the schema that the table, that the tuple associated with the change is in | optional (present `include-schemas`) |
| change.table | the table that a change is associated with | required |
| change.pk | json structure containing information about the primary keys | optional (absent `include-pk`) |
| change.pk.pknames | array containing the names of the primary key columns | optional (absent `include-pk`) |
| change.pk.pktypes | array containing the types of the primary key columns | optional (absent `include-pk`) |
| change.columnnames | array of the names of all columns in the table | required (only present for insert/update) |
| change.columntypes | array of the types of all columns in the table | required (only present for insert/update) |
| change.columntypeoids | array of the oids of all columns in the table | optional (absent `include-type-oids` when present only present for insert/update) |
| change.columnpositions | array of the booleans for whether each column in the table is optional | optional (absent `include-column-positions` when present only present for insert/update) |
| change.columnoptionals | array of the booleans for whether each column in the table is optional | optional (absent `include-not-null` when present only present for insert/update) |
| change.columndefaults | array of the defaults for each column in the table | optional (absent `include-default` when present only present for insert/update) |
| change.columnvalues | array of the values for each column in the table | required (only present for insert/update) |
| change.oldkeys | json structure containing information about the [replica identity](https://www.postgresql.org/docs/10/logical-replication-publication.html) of the previous column (usually the primary key, if replica-identity is full, it is index and the index changes, or the primary key changes this will show the entire previous row) | required (only present for update/delete) |
| change.oldkeys.keynames | array containing the names of the replica identity columns | required (only present for update/delete) |
| change.oldkeys.keytypes | array containing the types of the replica identity columns | required (only present for update/delete) |
| change.oldkeys.keytypeoids | array containing the oids of the replica identity columns | optional (absent `include-type-oids` only present for update/delete) |
| change.oldkeys.keyvalues | array containing the values of the replica identity columns | required (only present for update/delete) |

Note: Unchanged TOAST datum columns are not output for any of the rows.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just dug into this one here. Is there a way to see if an unchanged TOAST column has been omitted? how are users supposed to deal with TOASTed column without the deprecated include-unchanged-toast option? it seems that without ddl changes in the logical replication log (which makes sense), you can't tell the difference between a column that's been removed, and an unchanged TOAST column that's been omitted for that reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, and I guess my suggestion would be a new field change.omittedunchangedtoastcolumns that is an array of the names of unchanged toast columns (i.e. if the row had a datum in column foo that was unchanged toast, the change would have omittedunchangedtoastcolumns: ["foo"] in it).

Let me know if you'd like that to be included, I can probably do a follow up PR since it seems reasonably straightforward to implement.

Also let me know if there's a different way we're supposed to be knowing whether a toasted column has been omitted.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mijoharas you table looks good. I have a few comments/suggestions.

  • section should be added before section Examples
  • title: 'JSON description' or 'JSON in details'
  • you are describing the format 1. However, we should describe all format versions. I suggest two tables (one for each format). Mixing fields could possibly cause some confusion.
  • table headers: use bold. The third column could be described as requirement and you should use mandatory and optional. If it is an optional use (enabled by include-foo).
  • acronyms in uppercase letters such as LSN and JSON.
  • origin value: the replication origin. You can link replication origin to https://www.postgresql.org/docs/current/replication-origins.html
  • change.kind value: separate each kind with comma such as insert, update, delete, and truncate.
  • change.schema value: table schema.
  • change.table value: table name.
  • change.pk value: s/primary keys/primary key/.
  • change.columnnames value: array containing all table column names.
  • change.columnntypes value: array containing all table column type names.
  • change.columntypeoids value: array containing all table column type oids.
  • change.columnpositions value: array containing column position(pg_attribute.attnum) for each column.
  • change.oldkeys value: use /current/ instead of /10/. Details about REPLICA IDENTITY is at https://www.postgresql.org/docs/current/sql-altertable.html
  • change.oldkeys.keynames value: array containing table replica identity column names.
  • change.oldkeys.keytypes value: array containing table replica identity column type names.
  • change.oldkeys.keytypeoids value: array containing table replica identity column type oids.

Regarding unchanged TOAST columns, you could say:

UPDATE will not provide columns for new tuple whose values have not been modified and its value is stored in a TOAST table. If your table has REPLICA IDENTITY FULL, the unchanged value will be provided in the oldkeys (format 1) or identity (format 2).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like change.omittedunchangedtoastcolumns. It fits the case: Postgres should provide support in order to use it. While writing the previous reply, I realized that we might copy unchanged TOAST columns from old tuple to new tuple iif REPLICA IDENTITY FULL.

Copy link
Contributor Author

@mijoharas mijoharas Nov 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree with all the suggestions and will apply. I hadn't dug into format 2 much, hence the omission but I'll dive in and add that as an extra table as suggested.

question on the TOAST stuff:

Postgres should provide support in order to use it.

is this referring to postgres supporting plugins in including unchanged toast columns? or is this referring to providing information on DDL changes for logical replication plugins? if so, I agree with you I guess. But given that it currently doesn't what can a user do to understand whether a TOASTed column was omitted? I agree putting in the extra field is a bit of a hack, but it's the only way for a user of the plugin to understand if the column changes are due to ddl remove column or unchanged TOAST.

I could very much be misunderstanding what you're meaning, so do let me know if that's the case.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres omits unchanged TOAST columns for performance reasons because it was initially designed with replication as the main use case. However, it is not so good for CDC solutions. When I said Postgres should provide support, I'm saying that Postgres should provide a mechanism that says doesn't omit unchanged TOAST columns.



License
=======

Expand Down