How Atlas Manages Schema Changes in ClickHouse Cluster Mode
Working with ClickHouse clusters requires understanding several key concepts to effectively manage schema changes with Atlas. While Atlas provides excellent support for both single-node and cluster environments, there are specific considerations and configuration tips you need to know when operating in cluster mode.
This guide covers the essential topics and best practices for managing schema changes across your ClickHouse cluster with Atlas, ensuring consistency and reliability in distributed environments.
Using a ClickHouse Cluster as a Dev Database
Atlas uses a separate ClickHouse instance as a dev database to normalize schema changes before applying them to your production cluster.
Atlas can natively spin up ephemeral single-node ClickHouse containers using the docker://
protocol.
If you need a multi-node dev environment, you must set it up manually. ClickHouse provides example Docker Compose recipes for quickly setting up a cluster. Depending on your needs, you can configure your dev database with one or more nodes.
Distributed Schema Changes with the ON CLUSTER
Clause
Sometimes, you need to apply schema changes to all nodes in a ClickHouse cluster.
In these cases, you need to use the ON CLUSTER
clause in your migration.
To enable this feature, configure your Atlas project file (atlas.hcl
) to include the cluster configuration.
env "local" {
diff "clickhouse" {
cluster {
name = "my_cluster" # The name of your ClickHouse cluster
}
}
}
If you want to dynamically set the cluster name based on the connected node, you can use ClickHouse macros:
env "local" {
diff "clickhouse" {
cluster {
name = "{cluster}" # Use the predefined macro for the cluster name
}
}
}
When everything is set up, the grenerated SQL migration will include the ON CLUSTER
clause:
CREATE TABLE IF NOT EXISTS `my_table` ON CLUSTER `my_cluster` (...) ENGINE = MergeTree()
Advanced HCL Schema with SQL Expressions
While Atlas supports defining engines as simple enums in HCL (like MergeTree
or SharedMergeTree
), many ClickHouse features require more complex configurations. Atlas provides the sql()
function to handle these advanced scenarios.
Database & Table Engines with Parameters
For engines that require parameters, you can use SQL expressions:
schema "default" {
engine = sql("Replicated('/clickhouse/schemas/{cluster}', '{replica}')")
}
table "distributed_users" {
schema = schema.default
engine = sql("Distributed('{cluster}', 'my_database', 'users')")
column "id" {
type = UInt64
}
column "name" {
type = String
}
}
table "replicated_users" {
schema = schema.default
engine = sql("ReplicatedMergeTree('/clickhouse/tables/{cluster}/users', '{replica}')")
column "id" {
type = UInt64
}
}
Other Attributes Supporting SQL Expressions
Beyond engines, several other attributes support sql()
expressions:
Column Types and Defaults
table "advanced_example" {
schema = schema.default
engine = MergeTree
column "data" {
type = sql("Nested(name String, value UInt32)") # Complex nested type
}
column "created_at" {
type = DateTime
default = sql("now()") # Function-based default
}
column "computed_field" {
type = String
default = sql("concat('prefix_', toString(id))")
default_kind = MATERIALIZED
}
}
TTL Expressions
table "time_series_data" {
schema = schema.default
engine = MergeTree
ttl = sql("created_at + INTERVAL 30 DAY") # Table-level TTL
column "metadata" {
type = String
ttl = sql("timestamp + INTERVAL 7 DAY") # Column-level TTL
}
}
Use sql()
expressions when you need complex ClickHouse configurations that go beyond basic enums and types.