Tips & Tricks for Replicating Databases Externally
It is easy to keep a database in sync with other databases of the same kind using built-in replication technology. But sometimes people wish to replicate their database to external systems such as columnar warehouses where they can combine their data with other data, from different sources for analytic purposes.
In the past, a common approach has been to use full CSV dumps, but this is inefficient for frequent updates. To speed up the updates, last modified dates and triggers have been used to replicate only the changed data. Some databases, such as Postgres and MySQL, offer ways to tap into their native change stream which makes incremental updates even easier.
In this presentation, we are going to take a detailed look at the available mechanisms in different types of databases and how well they support the replication of inserts, updates and deletes, as well as history tracking and schema changes.
We will conclude with some practical challenges encountered in implementing a robust data pipeline service.
Meel Velliste is VP of Engineering at Fivetran. Having joined the company shortly after its founding in 2013, he helped build Fivetran data pipeline technology from the ground up.
Prior to Fivetran, Meel comes from an academic background, with a degree in Computer Science, a Ph.D. in Bioengineering, and has performed research in diverse areas such as computer vision, real-time software systems, and brain-computer interfaces.