Change Data Capture (CDC) and its Magic

Day 1/30: I was always curious on how CDCs work.

Jun 24, 2025

Today, I deep-dived into the CDC, understanding how it works, how it captures changes in the Database, and how DiceDb’s reactivity differs from CDC and its usecases.

What is CDC?

In today's data-driven world, organizations need real-time insights to make quick decisions and maintain competitive advantages. Two powerful approaches have emerged to handle data changes: Change Data Capture (CDC) and reactive databases like DiceDB. While both address the challenge of keeping systems synchronized with data changes, they employ fundamentally different architectures. This blog explores the intricacies of both approaches, their mechanisms, and their respective use cases.

How does CDC work and its types?

CDC basically keeps track of current and new Data using various ways and sends an event whenever the user specificed source is modified. There are three main types:

Log Based CDC

This is considered the most efficient implementation method, where the transaction log of the source database is continuously monitored for new entries. This approach reads database transaction logs (such as Postgres' WAL or MySQL's binary log) to capture changes without impacting the source system's performance. More on WAL here.

Advantages:

Minimal impact on source database performance
Provides complete change history

Disadvantages:

Complex to implement due to proprietary log formats
Requires parsing of internal database structures

Trigger Based CDC

The name clears up most of the things. This type of CDC uses Database triggers to keep track of the rows when changes occur, creating change logs in shadow tables. Triggers are set to fire before or after INSERT, UPDATE, or DELETE operations

Advantages:

Real-time change capture

Disadvantages:

Significant performance impact on source database
Requires trigger maintenance as applications evolve
Can strain system resources

Timestamp Based CDC

This actually polls data from the Database records to check the updated timestamps. Stores current timestamps and polls whenever the timestamp is greater than the last one, it pushes the new data from the row in an Event

Advantages:

Simpler to Implement

Disadvantages:

Not Realtime. The time in between polls can delay the Event

Usecases

After reading all the types and what CDC is, You might wonder why not emit events directly to Kafka when data changes, skipping the need for a CDC pipeline altogether? In many cases, that works, especially when your application logic governs data flow. However, CDC shines in scenarios where data modifications occur outside your application layer or when you need to replicate existing systems. Here are some concrete use cases:

Database Synchronization and Replication: CDC enables continuous replication between databases, ensuring target systems stay synchronized with source systems without full data refreshes.
Data Warehouse Synchronization: When organizations need to keep analytical systems synchronized with operational databases
Real-Time Analytics: Orgs use CDC to feed data warehouses and data lakes with up-to-date information for real-time business intelligence.
Cloud Migration: CDC facilitates zero-downtime migrations by keeping source and target systems synchronized during transition periods. Airbyte supports this where you can migrate from a MySQL to Postgres without much Effort.
Event-Driven Architectures: CDC provides the foundation for event-driven systems by capturing and propagating data changes as events
Cache Invalidation: Uber uses something called Flux CDC, which captures changes made to the MySQL database and replicates them to Redis. When a change occurs, the corresponding cache entry is invalidated to ensure data consistency.

Reactivity:

Here comes DiceDB by Arpit Bhayani. By the way, I am a contributor at DiceDB(PR Link). DiceDB brings in the awesome concept of Reactivity which means its a push-based Database which pushes events whenever, the client connected to it uses the WATCH command. Unlike conventional databases where clients must query for data, DiceDB proactively pushes updated query results to clients as soon as underlying data changes.

CDCs vs Reactive DBs:

While both CDC and DiceDB address data change management, they employ fundamentally different approaches:

Architecture Philosophy

CDC: Operates as a middleware layer between source and target systems, focusing on data replication and synchronization. CDC captures changes and delivers them to downstream systems, maintaining separation between data producers and consumers.

DiceDB: Functions as a reactive database that eliminates the distinction between data storage and change notification. It integrates reactivity directly into the database engine, providing immediate result set updates to subscribers.

Change Notification Approach

CDC: Provides change events or deltas, notifying subscribers about what changed but requiring them to process these changes to understand current state.

DiceDB: Delivers complete, updated result sets rather than just change notifications. When subscribed data changes, clients receive the new query results immediately, not just information about what changed.

Implementation Complexity

CDC: Requires additional infrastructure for change capture, processing, and delivery. Organizations must implement and maintain CDC pipelines, often involving multiple tools and components.

DiceDB: Provides built-in reactivity through simple command subscriptions. The complexity is handled internally by the database engine, simplifying application development.

Data Processing

CDC: Follows a "capture-and-forward" model where changes are detected, captured, and then sent to interested parties.

DiceDB: Implements a "subscribe-and-receive" model where clients express interest in specific queries and automatically receive updated results

Future Plans:

I will be going through an opensource CDC’s codebase and will be trying to implement a scrappy CDC, mostly a log based on my own, which will work on top of SQL Databases.

Change Data Capture (CDC) and its Magic

Day 1/30: I was always curious on how CDCs work.

What is CDC?

How does CDC work and its types?

Log Based CDC

Advantages:

Disadvantages:

Trigger Based CDC

Advantages:

Disadvantages:

Timestamp Based CDC

Advantages:

Disadvantages:

Usecases

Reactivity:

CDCs vs Reactive DBs:

Architecture Philosophy

Change Notification Approach

Implementation Complexity

Data Processing

Future Plans:

Discussion about this post