Q4 2022
Our July release gives more options, more features, and improved speed of synchronization for our popular connectors:
Previously DLH.io allowed for only one general replication slot to be created per any database and it had to be a specific name, datalakehouseio_replication_slot. While we still recommend this name for your slot when creating your database, we now allow for any custom name of a replication slot. You will simply add the name of the slot when you create your source connection to PostgreSQL. You will not be able to change it later for obvious reasons for that source connection but this added flexibility is a great edition for teams with DBAs with strong opinions on naming conventions.
We still recommend that if you need to create multiple replication slots for example to capture data changes only on specific schemas and not the entire database, that you use our recommended naming convention, datalakehouseio_replication_slot, and use a short suffix to differentiate the slots, for example, datalakehouseio_replication_slot_schema1, or datalakehouseio_replication_slot_mytbl.
Ultimately this enables slots to not be read or cleared out for a situation where a database has multiple schemas and each schema should be synchronized on a different frequency with different parameters. By creating multiple source connections to the same database with different replication slots, you will deselect the schemas for each connection you wish not to synchronize and the replication slot will be respected for that synchronization of that source. We've seen the most success for this use case with Aiven.io PostgreSQL customers, Hooray!!!
The PostgresSQL database is notorious for dropping the replication slot upon a database update/refresh and we've seen this happen with a number of customers who do not have large replicated instances and other means of maintaining the state of their slots. DLH.io will now auto-recreate the replication slot for you in the database you use in your DLH.io Postgres Source Connection if it detects that it was removed or no longer available. Permissions to do so will obviously need to be part of the privileges granted to the user credentials you use in the source connection.
To learn more about other ways the Postgres could retain your slot states you could reference some documentation from PostgreSQL: