Skip to main content

ClickHouse

ClickHouse is an open-source, high-performance columnar database system designed for real-time analytical processing. It is optimized for handling large-scale data with lightning-fast query execution, making it an ideal choice for OLAP (Online Analytical Processing) applications.


Key Features​

  1. Scalability: ClickHouse supports both horizontal scaling and distributed clusters, allowing independent scaling of compute and storage resources. It efficiently handles high-volume data and fluctuating workloads.

  2. High Performance: With its columnar storage, advanced compression techniques, and specialized indexing, ClickHouse ensures fast query execution, particularly for complex analytical queries.

  3. Support for Semi-Structured Data: ClickHouse can efficiently process semi-structured data, including formats such as JSON and XML, giving you the flexibility to handle a wide range of data types.

  4. Data Sharing: ClickHouse enables secure data sharing across multiple clusters and users without creating data duplicates, ensuring a collaborative data environment.

  5. Integration: Seamlessly integrates with popular ETL tools, data visualization platforms, and cloud services like Amazon S3, Kafka, and more. This simplifies data ingestion, transformation, and reporting workflows.

  6. Enhanced Security: ClickHouse offers robust security through role-based access control (RBAC), encryption, and integration with external authentication systems such as LDAP and OAuth.


Getting Started​

πŸ› οΈ Prerequisites​

Before proceeding, ensure the following:

  • Access to a ClickHouse server.
  • Credentials for an admin user (e.g., default) with sufficient privileges to create users and grant permissions.

πŸ‘€ Step 1: Login with an Admin User​

Use the following command to log in to the ClickHouse CLI with an admin user:

clickhouse-client --user=default --password=<admin_password>

⚠️ Replace <admin_password> with the password for the admin user.

Note: If no password is set for the default user, omit the --password flag.


βœ… Step 2: Create a New User​

Run the following query to create a new user with a secure password:

CREATE USER my_user IDENTIFIED BY 'my_secure_password';

πŸ”’ Replace my_user with your desired username and my_secure_password with a strong password for the new user.


πŸ“œ Step 3: Assign Permissions​

Assign the necessary permissions to enable the new user to perform operations like creating databases, manipulating tables, and loading data from GCS.

πŸ“‚ 3.1 Database Management Permissions​

Allow the user to create and drop databases:

GRANT CREATE DATABASE ON *.* TO my_user;

πŸ—οΈ 3.2 Table Management Permissions​

Allow the user to create, drop, truncate, and alter tables:

GRANT CREATE TABLE, DROP TABLE, TRUNCATE, ALTER ON *.* TO my_user;

πŸ“ 3.3 Data Manipulation Permissions​

Grant permissions for inserting, selecting, and updating data in tables:

GRANT INSERT, SELECT, ALTER TABLE ON *.* TO my_user;

⏳ 3.4 Temporary Table Permissions​

Enable the user to create and use temporary tables:

GRANT CREATE TEMPORARY TABLE ON *.* TO my_user;
GRANT INSERT, SELECT ON *.* TO my_user;

☁️ 3.5 Permissions for Loading Data from GCS​

Grant the required FILE privilege for loading data from external sources like GCS or S3:

GRANT S3 ON *.* TO my_user;
GRANT FILE ON *.* TO my_user;

NOTE: Here S3 is referring to both S3 and GCS.


⚠️ Important Note​

  • The default user or another administrative user must have ACCESS MANAGEMENT privileges to create new users.

If you encounter permission issues, check the admin user's privileges:

SHOW GRANTS FOR default;

Best Practices​

  • Adopt Clear Naming Conventions: Maintain consistency in naming roles, schemas, and tables for easier management and readability.
  • Regularly Audit User Permissions: Ensure users have the necessary privileges and follow the principle of least privilege to minimize access risks.
  • Optimize Resource Usage: Regularly assess resource consumption and adjust configurations like replica count or hardware resources to meet changing demands.
  • Use Appropriate Partitioning and Indexing: Optimize query performance by selecting the right partitioning strategy and indexing methods, such as primary keys or specialized indexes.
  • Monitor Query Performance: Regularly monitor query execution times and adjust query logic or database settings to ensure efficient performance.
  • Efficient Data Loading with Bulk Inserts: Use bulk inserts for loading large datasets, leveraging ClickHouse's high-performance batch processing capabilities.
  • Take Advantage of Distributed Architecture: Leverage ClickHouse’s distributed nature for horizontal scaling and redundancy, ensuring high availability and fault tolerance.
  • Enable Backups and Data Snapshots: Regularly back up your data or set up automated snapshots to ensure disaster recovery and minimize data loss risks.
  • Run Optimization Tasks Regularly: Schedule optimizations such as OPTIMIZE TABLE to keep your tables well-organized and performant.
  • Implement Security Measures: Ensure encryption at rest and in transit, apply role-based access control (RBAC), and secure network traffic to protect your data.

Setup Guide to Integrate Clickhouse With Boltic​

This guide will give you a brief idea of what steps you need to follow to integrate Clickhouse into Boltic.

  1. Integration Name: Enter a unique name for this integration.

    Integration Name

  2. Host Details:

    ParameterDescription
    Host nameIP address of the PostgreSQL server.
    Port NumberEnter PostgreSQL port number (default is 5432).
    Database NameName of the database to connect to (required).

    Using Host


Authentication​

To validate the identity of database users and ensure high-level security, authentication can be done using the following methods:

  • Username and Password (Required)

  • SSH Authentication (If needed):

  • SSH Host

  • SSH Port

  • SSH Username

  • SSH Authentication Method (supports private key and password). For more details, refer to SSH Authentication.


Test & Save​

Click on Test & Save to validate your data destination configuration and confirm that the connection is successfully established.

Test & Save

Congratulations! You have successfully integrated Clickhouse with Boltic. For further assistance, refer to Clickhouse’s official documentation.


πŸ€“ Have Questions?​

We're here to help! Feel free to reach out via email at πŸ“§ [email protected].