Create a ClickHouse sink connector for Aiven for Apache Kafka®
The ClickHouse sink connector delivers data from Apache Kafka® topics to a ClickHouse database for efficient querying and analysis.
Prerequisites
Before you begin, ensure that you have the following:
- An Aiven for Apache Kafka® service with Apache Kafka Connect enabled or a dedicated Aiven for Apache Kafka Connect® service.
 - Access to a ClickHouse service (either Aiven for ClickHouse or an external instance),
including:
- Hostname, port, and credentials for the ClickHouse service.
 - A pre-created target database and table.
 
 
Limitations
The ClickHouse sink connector has the following limitations related to data consistency and exactly-once delivery:
- 
No exactly-once delivery after restore: The connector does not guarantee exactly-once delivery after the ClickHouse service is restored from a backup, powered off, or forked. However, at least once delivery is guaranteed, which may result in duplicate records in ClickHouse.
 - 
Manual removal of duplicate records: If duplicates occur, manually remove them to maintain data consistency in ClickHouse. For detailed instructions, see Remove duplicate records.
 
Remove duplicate records
Ensure all potential duplicates are processed before removing them:
- 
Verify the committed offset in Aiven for Apache Kafka:
- Access the Aiven Console and select your Aiven for Apache Kafka service.
 - Click Topics and select the topic used by the connector.
 - Go to the Consumer Group tab and check the Offset column for the committed offset.
 
 - 
Verify the committed offset in ClickHouse:
- 
In the ClickHouse service, access the query editor.
 - 
If you are using Aiven for ClickHouse®, go to the service's Overview page, and click Query editor.
 - 
Run the following query to get offset details:
SELECT key, minOffset, maxOffset, state FROM connect_state; 
 - 
 - 
Confirm the following conditions from the query result:
- The 
statecolumn is set toAFTER_PROCESSING. - The 
minOffsetandmaxOffsetcolumns have the same value. - The committed offset from Apache Kafka is equal to or greater than the
minOffsetvalue in ClickHouse. 
 - The 
 - 
Remove duplicate records:
After confirming these conditions, remove any duplicate records by running the following SQL command in ClickHouse:
OPTIMIZE TABLE table_name DEDUPLICATE; 
Create a ClickHouse sink connector configuration file
Create a file named clickhouse_sink_connector.json with the following configuration:
{
    "name": "clickhouse_sink_connector",
    "connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
    "tasks.max": "1",
    "topics": "test_topic",
    "hostname": "my-clickhouse-hostname",
    "port": "12345",
    "database": "default",
    "username": "avnadmin",
    "password": "mypassword",
    "ssl": "true",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter"
}
Parameters
name: Name of the connector.topics: Apache Kafka topics from which to pull data.hostname: Hostname of the ClickHouse serviceport: Port of the ClickHouse service.database: Target database in ClickHouse.username: Username for authentication in the ClickHouse service.password: Password for authentication in the ClickHouse service.ssl: Set totrueto enable SSL encryption.
For more configuration options, see the ClickHouse sink connector GitHub repository.
Create the connector
- Aiven Console
 - Aiven CLI
 
- 
Access the Aiven Console.
 - 
Select your Aiven for Apache Kafka® or Aiven for Apache Kafka Connect® service.
 - 
Click Connectors.
 - 
Click Create connector if Apache Kafka Connect is already enabled on the service. If not, click Enable connector on this service.
Alternatively, to enable connectors:
- Click Service settings in the sidebar.
 - In the Service management section, click Actions > Enable Kafka connect.
 
 - 
In the sink connectors list, select ClickHouse, and click Get started.
 - 
On the ClickHouse connector page, go to the Common tab.
 - 
Locate the Connector configuration text box and click Edit.
 - 
Paste the configuration from your
clickhouse_sink_connector.jsonfile into the text box. - 
Click Create connector.
 - 
Verify the connector status on the Connectors page.
 
To create the ClickHouse sink connector using the Aiven CLI, run:
avn service connector create SERVICE_NAME @clickhouse_sink_connector.json
Parameters:
SERVICE_NAME: Name of your Aiven for Apache Kafka® service.@clickhouse_sink_connector.json: Path to the JSON configuration file.
Example: Define and create a ClickHouse sink connector
This example shows how to create a ClickHouse sink connector with the following properties:
- Connector name: 
clickhouse_sink_connector - Apache Kafka topic: 
test-topic - ClickHouse hostname: 
clickhouse-31d766f9-systest-project.avns.net - ClickHouse port: 
14420 - Target database: 
default - Username: 
avnadmin - Password: 
mypassword - SSL: 
true 
{
    "name": "clickhouse_sink_connector",
    "connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
    "tasks.max": "1",
    "topics": "test-topic",
    "hostname": "clickhouse-31d766f9-systest-project.avns.net",
    "port": "14420",
    "database": "default",
    "username": "avnadmin",
    "password": "mypassword",
    "ssl": "true"
}
Once this configuration is saved in the clickhouse_sink_connector.json file, you can
create the connector using the Aiven Console or CLI, and verify that data from the
Apache Kafka topic test-topic is successfully delivered to your ClickHouse instance.