Removing the tight coupling caused by change data capture (CDC)

Andrew Jones
2 min read3 days ago

Many data platforms start with a change data capture (CDC) service to extract data from an organisations transactional databases — the source of truth for their most valuable data.

The idea is once you bring all that data into your data warehouse you can build whatever you need on top of that data.

However, what you have built is now tightly coupled to the upstream transactional database, and that will lead to problems in the future.

CDC tightly couples your warehouse to the upstream database

This is a quote from Shopify’s article on their CDC setup:

One of the unfortunate aspects of CDC is the tight coupling of the external event consumers to the internal data model of the source. Breaking changes and data migrations can be a normal part of application evolution. By allowing external consumers to couple on these fields, the impact of breaking changes spreads beyond the single codebase into multiple consumer areas.

So, we’re aware of these limitations and have been for years.

And yet we’ve been working around those problems, building tools to solve the symptoms by deploying services such as data cataloguing, lineage, and automated testing tools, without addressing the root cause.

While these tools still have a role to play, they shouldn’t be there solely to catch and alert on upstream data issues as if that is something that could never be addressed.

Instead, change your upstream processes to create data as consumable events (or data products, if you like) that can confidently be built on top of.

One way to make this change is to adopt data contracts.

Data contracts provide an interface between the upstream database and the downstream warehouse — just like an API.

And just like an API we get the following benefits:

  • An explicit interface around which we can set expectations (for example SLIs)
  • The ability to change things upstream without impacting all users downstream
  • Clear assignment of ownership and responsibilities

For more on how data contracts can help reduce coupling and drive the improvement in your data quality check out my book or sign up to my daily newsletter.

--

--

Andrew Jones

Principal Engineer 🔧. Created Data Contracts, then wrote the book on it 📙. Google Developer Expert. Passionate about data and driving value from it.