pg_mapreduce

What

Distribute heavy loads on/to replicas in Postgresql, asyncronously, using DB Links.

Why

Why not? :-D
P.O.C. on dblinks ;-)
Well, and more importantly, distributed and asyncronous tasks in PG

How

Partition the big bad table in a way so that every replica (and the master, if you want/as in this case) has its own partition. Let every replica do the operation on its own oart fo the data, as indicated in the pgmapreduce.mapping table.

Of course, this does not make sense in docker, specially since it runs on the same (local) machine...
But in a distributed env, and with millions or rows and some replicas, it gets more interesting.

And of course, calculating the average is just a "whatever task". More useful things could be done.

IMPORTANT: use of this code in business context is strictly forbidden unless with explicit consent.

Show me

Init

$ ./stack_start.sh

$ psql -Uwiwwo -p5445  -hlocalhost postgres
Password for user wiwwo: wiwwo123

=# \i INIT.sql
(snip)

Test run

Calling the Function

$ psql -Upgmapreduce -p5445  -hlocalhost postgres
Password for user pgmapreduce: pgmapreduce123

=> select pgmapreduce.calculate_avg();

More detailed version

$ psql -Upgmapreduce -p5445  -hlocalhost postgres
Password for user pgmapreduce: pgmapreduce123

=> select * from dblink.dblink_connect    ('dbl_pg_red',   'pg_red');
=> select * from dblink.dblink_connect    ('dbl_pg_green', 'pg_green');
=> select * from dblink.dblink_connect    ('dbl_pg_blue',  'pg_blue');

=> select * from dblink.dblink_send_query ('dbl_pg_red',   'select pgmapreduce.gimme_avg(''pg_red'')');
=> select * from dblink.dblink_send_query ('dbl_pg_green', 'select pgmapreduce.gimme_avg(''pg_green'')');
=> select * from dblink.dblink_send_query ('dbl_pg_blue',  'select pgmapreduce.gimme_avg(''pg_blue'')');

=> select * from dblink.dblink_get_result ('dbl_pg_red')   as avg_pg_red   (avg float);
=> select * from dblink.dblink_get_result ('dbl_pg_green') as avg_pg_green (avg float);
=> select * from dblink.dblink_get_result ('dbl_pg_blue')  as avg_pg_blue  (avg float);

=> exit

Cleanup

$ ./cleanup.sh

INSPIRED BY

Parallel jobs in Postgres

Script - PostgreSQL multiple async parallel execution in PL/pgSQL

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
master		master
slave		slave
INIT-STEP1-dblink.sql		INIT-STEP1-dblink.sql
INIT-STEP2-init-pg_mapreduce.sql		INIT-STEP2-init-pg_mapreduce.sql
INIT-STEP3-init-pgbench_accounts.sql		INIT-STEP3-init-pgbench_accounts.sql
INIT.sql		INIT.sql
README.md		README.md
cleanup.sh		cleanup.sh
docker-compose.yml		docker-compose.yml
stack_start.sh		stack_start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_mapreduce

What

Why

How

Show me

Init

Test run

Calling the Function

More detailed version

Cleanup

INSPIRED BY

About

Releases

Packages

Languages

wiwwo/pg_mapreduce

Folders and files

Latest commit

History

Repository files navigation

pg_mapreduce

What

Why

How

Show me

Init

Test run

Calling the Function

More detailed version

Cleanup

INSPIRED BY

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages