website logo
⌘K
Getting Started 🚀
What is DataLakeHouse.io?
Our Business-Value Focus
Learn the Basic Concepts
Connectors
Operations Applications
Toast
Asana
Aloha POS
BILL
Bloom Growth
Bullhorn
Ceridian Dayforce
ConnectWise
Facebook Ads
Food Delivery Service Connector
Google Analytics 4
Harvest
Hubspot
Jira
MailChimp
McLeod Transportation
NetSuite (Oracle NetSuite)
Optimum HRIS
Oracle EBS
Oracle PeopleSoft
QuickBooks Online
Salesforce
Shopify
Square
Square Marketplace
Stripe
TriNet
Verizon Wireless Business
Workday HCM
Xero
Zoom
Databases
Files & Object Storage
SSH Tunnel Setup for Hosted Database Systems
Databases FAQ
SQL Transformations
Terraform: Reverse Terraforming
DBT Cloud Transformations
Sync Bridge (Data Pipelines)
Create a Sync Bridge
Manually Run a Sync Bridge
Deleting a Sync Bridge
Historical Re-sync
Analytics
Access Analytics
Snowflake Usage Analytics
Data Catalog
Create the Catalog
Populate the Catalog
Access the Catalog
Data Warehouse Clouds
Snowflake
Open Source DW Models
Alerts & Notifications
Slack Notifications
Untitled doc
Logs & Monitoring
Security
Callback Links
Service Level Agreement (SLA)
Release Notes
May 2023
April 2023
Q3 2022
Q4 2022
Community Overview
Contributor Agreements
Code Contribution Guide
About
License
Viewpoint
Docs powered by archbee 

Logical Replication Setup

29min

Logical Replication is the main means for allowing PostgreSQL to have Change Data Capture (CDC) logic to track changes and retrieve only the delta/changes to your database tables synchronized. This increases speed of the synchronization so that any system synchronizing data need not retrieve all records from the database tables all the time on each synchronization frequency event. To set up logical replication on your Postgres database please follow the instructions on this page.

Setup Logical Replication :: test_decoding

Test decoding is the longest running change tracking solution for PostgreSQL, and it can capture deletes which enables the soft delete function in DataLakeHouse.io, as of 12/2021 other change tracking versions will be supported in the future.

Check to see if any logical replication slots already exist by running the command: select * from pg_catalog.pg_replication_slots;

There is almost always one default one called pghoard_local with a physical slot_type. If this is the only one that exist then you have NO logical replication slots yet.

Execute and Record the Following Settings

Review Current Settings
|
select setting from pg_settings where name ='wal_level';
select setting from pg_settings where name ='max_replication_slots';
select setting from pg_settings where name ='wal_sender_timeout';
select setting from pg_settings where name ='max_wal_senders';


Ensure that:

  • WAL_LEVEL = 'logical'
  • WAL_SENDER_TIMEOUT = 0 or 60000 (0 = infinity)
  • MAX_REPLICATION_SLOTS >= 8
  • MAX_WAL_SENDERS >= 16

Create a User for DataLakeHouse.io

Replace the placeholders, schema name, etc. below with your actual values and record them for later references as you'll need them to setup the user that will have access to the database you will connect with in the DataLakeHouse.io connection info, for example we usually recommend the '' as 'datalakehouse_sync_svc':

Create a User
|
CREATE USER <username> PASSWORD 'some-password';
GRANT USAGE ON SCHEMA "public" TO <username>;
GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO <username>;
------- #### or for specifically cherry-picked selected tables, #### -------
-- ALTER DEFAULT PRIVILEGES IN SCHEMA "some_schema" REVOKE SELECT ON TABLES FROM <username>;
-- revoke the tables using this concept
-- REVOKE SELECT ON ALL TABLES IN SCHEMA "some_schema" FROM <username>;
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT ON TABLES TO <username>;
-- ………then……for continued individual table access
-- GRANT SELECT ON "some_schema"."some_table" TO <username>;
-- ALTER DEFAULT PRIVILEGES IN SCHEMA "some_schema" GRANT SELECT ON TABLES TO <username>;


as an example, to create a user with access to all tables:

Pgsql
|
CREATE USER datalakehouse_sync_svc PASSWORD 'P@ssword1';
GRANT USAGE ON SCHEMA "public" TO datalakehouse_sync_svc;
GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO datalakehouse_sync_svc;




Create the Logical Replication

Creating the logical replication slots require some basic configurations on the PostgreSQL server.

Adjust a few configurations: -- Set timeout to 30 mins SET statement_timeout = '1800';

Now create the actual logical replication slot, using specifically the name 'datalakehouseio_replication_slot': SELECT pg_create_logical_replication_slot('datalakehouseio_replication_slot', 'test_decoding');

Create a ROLE and Grant permissions to the user created previously,

  • CREATE ROLE datalakehouse_sync_role;
  • ALTER ROLE datalakehouse_sync_role WITH REPLICATION;
  • GRANT datalakehouse_sync_role TO datalakehouse_sync_svc;

Other Options Acceptible

Creating a logical replication slot has the ability to add options such as "include-xids" and "skip-empty-xacts". As of 12/1/2021, DataLakeHouse requires the Xids (the default), but does not concern over skip-empty-xacts, but we suggest that leaving the default settings as-is unless otherwise communicated by our support team. There are other not well-known options for test_decoding and wal2json which unless communicated by our support team should not be used.

Analyzing Logical Replication Slots

Peeking at the logical changes are the easiest way to check if the logical replication is working and if to see if there are any tracked changes flowing through without actually changing the pointerlike when you use get_changes command:

  • SELECT count(*) FROM pg_logical_slot_peek_changes('datalakehouseio_replication_slot', null, null);

Also, retrieve the change inacted as well, as this is the same logic that DataLakeHouse.io uses, SELECT * FROM pg_logical_slot_get_changes('datalakehouseio_replication_slot', null, null); which will move the pointer of the tracking forward and clear out the change records from the WAL thus removing the disk space previously consumed so the disk does not grow out of control.

[NB]The pg_current_wal_lsn and the lsn_distance continue to change as they are based on the Log Sequence Number (LSN), a 64-bit integer used to determine the position of the WRite Ahead Log (WAL) stream which is there to preserve data integrity acting as a pointer to the XLOG. The print out is two hexadecimal numbers upt to 8-digits each, separated by a slash (/), for examplte 63/B30000220. If you compare to LSNs by using basic operator like =, >, < , −, the result is the number of bytes between the two WAL positions.



Test Your Logical Replication

In order to confirm that your logical replication is working, you can artificially create, or wait for, a DML activity of INSERT, UPDATE, DELETE.

Then run the above peek_changes command,

SQL
|
SELECT * FROM pg_logical_slot_peek_changes('datalakehouseio_replication_slot', null, null);


Get the last received WAL Position

This will show the last received and last replayed WAL positions. If there is a delta (in Bytes) beteween pg_last_wal_receive_lsn and pg_last_wal_replay_lsn, then there is a lag then data is available typically.

SQL
|
select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();


Confirm the Logical Replication Lag/Pointer

To quickly shown any flush confirmations from using the get_changes:

Pgsql
|
SELECT 
    slot_name,
    confirmed_flush_lsn, 
    pg_current_wal_lsn(), 
    (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance
FROM pg_replication_slots
-- WHERE slot_name IN ('datalakehouseio_replication_slot')
;


Which will look similar to the following output

slot_name

confirmed_flush_lsn

pg_current_wal_lsn

lsn_distance

pghoard_local





63/B00008F0

datalakehouseio_replication_slot

63/390006B0

63/B00008F0

1996489280

(2 rows)







Overall Big Picture Replication View

Pgsql
|
select * from pg_stat_replication;


Get the Time Lag In Human Readable Format

Pgsql
|
SELECT now() - pg_last_xact_replay_timestamp() AS time_lag;


General Logical Replication Slot Details

This table provides the basic information about the logic replication you have created. When DataLakeHouse.io is retrieving from the slot, the active colum will typically be set to "t" instead of "f" for true and false repsectively.

Pgsql
|
SELECT * FROM pg_replication_slots;


Peeking at Logical Slot Changes

There are several parameters for working with the pg_logical_slot_peek_changes function as further described here, https://pgpedia.info/p/pg_logical_slot_peek_changes.html, for testing the flow of WAL changes as identifying the LSNs.

Pgsql
|
SELECT * FROM pg_logical_slot_peek_changes('datalakehouseio_replication_slot', NULL, NULL, 'include-xids', '0');


Changes to the pg_last_wal_replay_lsn

Is not leveraged by DataLakeHouse.io as this relates to a streaming replication standby replication instance, where DataLakeHouse.io is on-demand frequency sync scheduled query-based.

Disk Space Increase is Normal

Aiven.io has a good article on standard operation increase in disk space due to WAL, https://developer.aiven.io/docs/products/postgresql/concepts/pg-disk-usage.html

Run any of the main postgreSQL commands to check disk space on the instance:

  • \l or \l+
  • SELECT pg_size_pretty(pg_database_size('yourdbname'));

Logical Replication Failues

In a standard process where the DDL of your postgreSQL tables do not change much there is very little cause for failure other than not have a sync frequency that is not aligned with the volume of data your database tables produce commensurate with the disk space of the database/server. Since Logical Replication does not track DDL changes an error could occur if a DDL change is made but downstream impacts are not considered. In the case of DataLakeHouse.io a manual change may be required on your target system in order to reflect a DDL change on your source system.

When errors, if any occur, please report them immediately by opening up a support ticket. Methods used on your source database side to clear issues may include things such as using:

  • pg_replication_origin_advance, to skip the transaction that is failing

Other References

  • (https://www.postgresql.org/docs/9.4/catalog-pg-replication-slots.html)PostgreSQL Replication Slots Fields/Columns Available
  • https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/change-data-capture-in-postgres-how-to-use-logical-decoding-and/ba-p/1396421



Updated 03 Mar 2023
Did this page help you?
Yes
No
PREVIOUS
XMIN
NEXT
Change Log & Notes
Docs powered by archbee 
TABLE OF CONTENTS
Setup Logical Replication :: test_decoding
Execute and Record the Following Settings
Create a User for DataLakeHouse.io
Create the Logical Replication
Other Options Acceptible
Analyzing Logical Replication Slots
Test Your Logical Replication
Get the last received WAL Position
Confirm the Logical Replication Lag/Pointer
Overall Big Picture Replication View
Get the Time Lag In Human Readable Format
General Logical Replication Slot Details
Peeking at Logical Slot Changes
Changes to the pg_last_wal_replay_lsn
Disk Space Increase is Normal
Logical Replication Failues
Other References