Are you exploring CDC open-source solutions or just interested on checking how it works❓
This repo is a portable sample project for running CDC with pglogical plugin. In just a few commands you can test the capabilities of this built-in Postgres features and extend it as you please.
You can find pre-made scripts to run inserts, deletes and updates just by passing NUM_RECORDS
env var to the make
commands. Whenever you use that command, it will randomize the amount of actions between 1 and your number.
📝 Note: This repo showcases pglogical plugin
with some adjustments to extend basic functionality and cover some weaknesses of the software. If you prefer to see the basic behaviour with no enhancements, you can check the CDC Logical Replication tutorial.
- Rename
.env.example
file to.env
and set your credentials for the databases. - Docker Desktop
- Be default, pglogical will copy an initial snapshot when enabled.
- You need PRIMARY KEYs on the TABLES you want to replicate. Not VIEWS.
INSERT
,UPDATE
,DELETE
,TRUNCATE
commands work properly.- This feature doesn’t replicate sequences, DDLs or schemas.
- Schema changes are ignored. Adding or dropping columns will do nothing.
- If you have set
pglogical.replication_set_add_all_tables('default', ARRAY['public']);
and runCREATE TABLE
, you need to also add it on the target database, otherwise it will break the replication. - The previous point works similarly to the
DROP TABLE
, you need to remove it from the replication set and then drop it on both sides.
- Scripts will add/remove columns to tables on source and target databases.
- Scripts will add new tables to the replication and to both databses.
- Scripts will remove tables from the replication set and drop tables on both databases.
To run the project, you can do:
make build
NUM_RECORDS=10000 make run #Default to 5000 if not specified
Docker will start by default with all the parameters configured for pglogical plugin. Open another terminal and run:
make cdc-pglogical
This will apply the following things on each side:
- Source/Publisher
- Check if a node exists.
- Create pglogical extension.
- Create a publisher node.
- Add tables to the replication set.
- Target/Subscriber
- Check if a node exists.
- Create pglogical extension.
- Create a subscription to the publisher.
You can run these commands to validate everything was created properly in both databases:
show wal_level; -- logical
show max_worker_processes; -- 10
show max_replication_slots; -- 10
show max_wal_senders; -- 10
show shared_preload_libraries; -- pglogical
These scripts will run INSERT
, UPDATE
, DELETE
randomly to one of the 3 generated tables on generate_data.py
.
NUM_RECORDS=5 make insert-data
NUM_RECORDS=5 make delete-data
NUM_RECORDS=5 make update-data
make truncate
Or you can run single commands directly on the source database if you prefer.
To query databases like to use DBeaver, but you can use VSCode or psql if you prefer.
Validate the CDC process is OK on the Source/Publisher side. You should see the listed tables you are replicating and the active slot named with a pattern like this pgl_target_db_source_node_*
:
select * from pg_replication_slots;
select * from pglogical.replication_set_table;
Validate the CDC is OK on the Target/Subscriber side. You should see the listed subscription.
You can always come back to this command to check if the replication is broken:
select * from pglogical.show_subscription_status();
Count rows after running INSERT
or DELETE
in real-time:
select count(*) from products;
select count(*) from user_profiles;
select count(*) from transactions;
select * from pglogical.show_subscription_status();
Validate after UPDATE
:
select max(updated_at) from products;
select max(updated_at) from user_profiles;
select max(updated_at) from transactions;
select * from pglogical.show_subscription_status();
📝 Note: With basic functionality in place, these statements would break the replicationg.
The scripts are limited to run against transactions
, products
and user_profiles
tables.
CREATE TABLE
will add a new table randomly. If you want to run INSERT
, UPDATE
, DELETE
, TRUNCATE
you should do it manually:
make create-table
make drop-table
You should this next to each other since the 'subtype' column will be removed:
make add-column
make drop-column
Remember to always check on Target Database if the changes are resulting or not.
If you ended up dropping all the tables, you can do Ctrl+C and make restart
to spin-up the project again.
- Get tips, learnings and tricks for your Data career!
- Join the Substack newsletter to get similar content to this one and more to improve your Data career!