How Atlas Manages Schema Changes in ClickHouse Cluster Mode

Working with ClickHouse clusters requires understanding several key concepts to effectively manage schema changes with Atlas. While Atlas provides excellent support for both single-node and cluster environments, there are specific considerations and configuration tips you need to know when operating in cluster mode.

This guide covers the essential topics and best practices for managing schema changes across your ClickHouse cluster with Atlas, ensuring consistency and reliability in distributed environments.

Using a ClickHouse Cluster as a Dev Database

Atlas uses a separate ClickHouse instance as a dev database to normalize schema changes before applying them to your production cluster. Atlas can natively spin up ephemeral single-node ClickHouse containers using the docker:// protocol.

If you need a multi-node dev environment, you must set it up manually. ClickHouse provides example Docker Compose recipes for quickly setting up a cluster. Depending on your needs, you can configure your dev database with one or more nodes.

Distributed Schema Changes with the `ON CLUSTER` Clause

Sometimes, you need to apply schema changes to all nodes in a ClickHouse cluster. In these cases, you need to use the ON CLUSTER clause in your migration. To enable this feature, configure your Atlas project file (atlas.hcl) to include the cluster configuration.

atlas.hcl
env "local" {
  diff "clickhouse" {
    cluster {
      name = "my_cluster" # The name of your ClickHouse cluster
    }
  }
}

If you want to dynamically set the cluster name based on the connected node, you can use ClickHouse macros:

atlas.hcl
env "local" {
  diff "clickhouse" {
    cluster {
      name = "{cluster}" # Use the predefined macro for the cluster name
    }
  }
}

When everything is set up, the grenerated SQL migration will include the ON CLUSTER clause:

CREATE TABLE IF NOT EXISTS `my_table` ON CLUSTER `my_cluster` (...) ENGINE = MergeTree()

Advanced HCL Schema with SQL Expressions

While Atlas supports defining engines as simple enums in HCL (like MergeTree or SharedMergeTree), many ClickHouse features require more complex configurations. Atlas provides the sql() function to handle these advanced scenarios.

Database & Table Engines with Parameters

For engines that require parameters, you can use SQL expressions:

schema "default" {
  engine = sql("Replicated('/clickhouse/schemas/{cluster}', '{replica}')")
}

table "distributed_users" {
  schema = schema.default
  engine = sql("Distributed('{cluster}', 'my_database', 'users')")
  column "id" {
    type = UInt64
  }
  column "name" {
    type = String
  }
}

table "replicated_users" {
  schema = schema.default
  engine = sql("ReplicatedMergeTree('/clickhouse/tables/{cluster}/users', '{replica}')")
  column "id" {
    type = UInt64
  }
}

Other Attributes Supporting SQL Expressions

Beyond engines, several other attributes support sql() expressions:

Column Types and Defaults

table "advanced_example" {
  schema = schema.default
  engine = MergeTree
  
  column "data" {
    type = sql("Nested(name String, value UInt32)")  # Complex nested type
  }
  column "created_at" {
    type = DateTime
    default = sql("now()")  # Function-based default
  }
  column "computed_field" {
    type = String
    default = sql("concat('prefix_', toString(id))")
    default_kind = MATERIALIZED
  }
}

TTL Expressions

table "time_series_data" {
  schema = schema.default
  engine = MergeTree
  ttl = sql("created_at + INTERVAL 30 DAY")  # Table-level TTL
  
  column "metadata" {
    type = String
    ttl = sql("timestamp + INTERVAL 7 DAY")  # Column-level TTL
  }
}

Use sql() expressions when you need complex ClickHouse configurations that go beyond basic enums and types.

Using a ClickHouse Cluster as a Dev Database​

Distributed Schema Changes with the ON CLUSTER Clause​

Advanced HCL Schema with SQL Expressions​

Database & Table Engines with Parameters​

Other Attributes Supporting SQL Expressions​

Column Types and Defaults​

TTL Expressions​

Additional Resources​