Connectors
...
Databases
PostgreSQL

Logical Replication Setup

10min

Logical Replication is the main means for allowing PostgreSQL to have Change Data Capture (CDC) logic to track changes and retrieve only the delta/changes to your database tables synchronized. This increases speed of the synchronization so that any system synchronizing data need not retrieve all records from the database tables all the time on each synchronization frequency event. To set up logical replication on your Postgres database please follow the instructions on this page.

DLH.io supports both pgoutput and test_decoding logical replication plugin extensions types. pgoutput proves to provide the most granular selectivity of CDC replication capability. pgoutput allows control of individual tables of individual schemas to only be chosen for replication, as well as an option for all tables across all schemas in the database. The test_decoding plugin configuration only allows CDC replication control at the database level (thus, all tables of all schemas) which can be overkill for some data pipeline requirements. Both pg_output and test_decoding replication allow for insert, updates, and deletes to be tracked where XMIN does not track deletes.

Use the links below for instruction on how to set up your respective logical replication mode:

Other Logical Replication Details

Because logical replication is based on the core of the underlying postgresql system and WAL, there are some common analysis approaches that can be taken across all types of logical replication as described below.

Overall Big Picture Replication View

Pgsql


Get the Time Lag In Human Readable Format

Pgsql


General Logical Replication Slot Details

This table provides the basic information about the logic replication you have created. When DataLakeHouse.io is retrieving from the slot, the active colum will typically be set to "t" instead of "f" for true and false repsectively.

Pgsql


Peeking at Logical Slot Changes

There are several parameters for working with the pg_logical_slot_peek_changes function as further described here, https://pgpedia.info/p/pg_logical_slot_peek_changes.html, for testing the flow of WAL changes as identifying the LSNs.

Pgsql


Changes to the pg_last_wal_replay_lsn

Is not leveraged by DataLakeHouse.io as this relates to a streaming replication standby replication instance, where DataLakeHouse.io is on-demand frequency sync scheduled query-based.

Disk Space Increase is Normal

Aiven.io has a good article on standard operation increase in disk space due to WAL, https://developer.aiven.io/docs/products/postgresql/concepts/pg-disk-usage.html

Run any of the main postgreSQL commands to check disk space on the instance:

  • \l or \l+
  • SELECT pg_size_pretty(pg_database_size('yourdbname'));

Logical Replication Failues

In a standard process where the DDL of your postgreSQL tables do not change much there is very little cause for failure other than not have a sync frequency that is not aligned with the volume of data your database tables produce commensurate with the disk space of the database/server. Since Logical Replication does not track DDL changes an error could occur if a DDL change is made but downstream impacts are not considered. In the case of DataLakeHouse.io a manual change may be required on your target system in order to reflect a DDL change on your source system.

When errors, if any occur, please report them immediately by opening up a support ticket. Methods used on your source database side to clear issues may include things such as using:

  • pg_replication_origin_advance, to skip the transaction that is failing

Other References