Skip to main content

Introduction

Pipes are designed to capture and track Change Data Capture (CDC) events—such as inserts, updates, and deletions—within a connected database in near real-time. Once detected, these CDC changes are automatically transferred to the specified destination, ensuring seamless data synchronization.

When a database is connected to Boltic via a pipe, the system continuously monitors the database for any CDC events. This real-time observation ensures that whenever a modification occurs—whether it's a new insert, an update, or a deletion—the data is promptly transferred to the destination. This keeps the destination synchronized with the latest CDC changes, significantly reducing the risk of data inconsistencies.

You can establish a connection between one source and one destination within a single pipeline. However, multiple pipes can load CDC data into the same destination, allowing you to manage data from different sources efficiently.

By leveraging pipes, you automate the flow of CDC data between various systems and databases, eliminating the need for manual data handling and ensuring near real-time data synchronization. The key benefits include:

  • Improved decision-making: Access the latest and most accurate CDC data.
  • Enhanced productivity: Eliminate manual data management tasks.
  • Data-driven strategies: Consistent and reliable CDC data to support your business decisions.

Sources

  • MongoDB: A flexible, NoSQL document-oriented database designed to handle large volumes of unstructured data, ideal for CDC implementations.
  • PostgreSQL: A powerful, open-source relational database known for its robustness and support for complex queries and advanced CDC features.
  • MySQL: A widely used relational database, valued for its speed, simplicity, and support for structured data, including CDC capabilities.

Destinations

  • BigQuery: A highly scalable, cloud-based data warehouse, ideal for real-time analytics and large-scale CDC data management.
  • ClickHouse: A fast, open-source columnar database designed for high-performance CDC analytics and reporting.
  • Kafka: A distributed streaming platform used to build real-time data pipelines and stream CDC processing applications.
  • Redshift: A fully managed cloud data warehouse optimized for large-scale data analytics and CDC, providing fast query execution and seamless integration with other AWS services.
  • Snowflake: A cloud-based data warehouse that offers powerful data sharing, scaling, and security features for efficient CDC processing.
  • Google Cloud Storage: A scalable and secure object storage solution for archiving, backups, and large dataset storage, including CDC data.