Skip to main content

Migration Analyzers

The database is often the most critical component in software architectures. Being a stateful component, it cannot be easily rebuilt, scaled-out or fixed by a restart. Outages that involve damage to data or simply unavailability of the database are notoriously hard to manage and recover from, often taking long hours of careful work by a team's most senior engineers.

As most outages happen directly as a result of a change to a system, Atlas provides users with means to verify the safety of planned changes before they happen. The sqlcheck package provides interfaces for analyzing the contents of SQL files to generate insights on the safety of many kinds of changes to database schemas. With this package developers may define an Analyzer that can be used to diagnose the impact of SQL statements on the target database.

Using these interfaces, Atlas provides different Analyzer implementations that are useful for determining the safety of migration scripts.

Analyzers

Below are the Analyzer implementations currently supported by Atlas.

Non-Linear Changes

Non-additive changes, often referred to as non-linear changes, are changes to the migration directory that are not added in a sequential order. This is a bit like the linear history in version control systems, where migration files are commits and the migration directory is the repository. Let's explain with three examples why ensuring the linearity of the migration directory is important:

  • When a developer introduces a new migration file without having the latest state of the directory, there is a risk of generating an incorrect file that might conflict with the actual schema and cause a failure during deployment.
  • When a developer merges a feature branch to the main branch with a new migration file that is not positioned at the end of the directory, there is a risk that Atlas will skip this migration file, as there might be some database that already contains a higher version of the migration directory. This scenario can cause unexpected and surprising behavior during deployments.
  • Having non-linear history can make it challenging to roll back (or revert changes) to a specific version of the migration directory, as the state of the database might be different than the state of the migration directory. Applying the migration directory in consistent order promises deterministic behavior.

Luckily, Atlas detects non-linear and non-additive changes made to a migration directory. To enable this behavior in your project, integrate Atlas into your GitHub Actions or GitLab CI pipelines, and Atlas will automatically detect and report non-linear changes during the CI run.

Destructive Changes

Destructive changes are changes to a database schema that result in loss of data. For instance, consider a statement such as:

ALTER TABLE `users` DROP COLUMN `email_address`;

This statement is considered destructive because whatever data is stored in the email_address column will be deleted from disk, with no way to recover it. There are definitely situations where this type of change is desired, but they are relatively rare. Using the destructive (GoDoc) Analyzer, teams can detect this type of change and design workflows that prevent it from happening accidentally.

Running migration linting locally on in CI fails with exit code 1 in case destructive changes are detected. However, users can disable this by configuring the destructive analyzer in the atlas.hcl file:

atlas.hcl
lint {
destructive {
error = false
}
}

Data-dependent Changes

Data-dependent changes are changes to a database schema that may succeed or fail, depending on the data that is stored in the database. For instance, consider a statement such as:

ALTER TABLE `example`.`orders` ADD UNIQUE INDEX `idx_name` (`name`);

This statement is considered data-dependent because if the orders table contains duplicate values on the name column we will not be able to add a uniqueness constraint. Consider we added two records with the name atlas to the table:

mysql> create table orders ( name varchar(100) );
Query OK, 0 rows affected (0.11 sec)

mysql> insert into orders (name) values ("atlas");
Query OK, 1 row affected (0.06 sec)

mysql> insert into orders (name) values ("atlas");
Query OK, 1 row affected (0.01 sec)

Attempting to add a uniqueness constraint on the name column, will fail:

mysql> ALTER TABLE `example`.`orders` ADD UNIQUE INDEX `idx_name` (`name`);
ERROR 1062 (23000): Duplicate entry 'atlas' for key 'orders.idx_name'

This type of change is tricky because a developer trying to simulate it locally might succeed in performing it only to be surprised that their migration script fails in production, breaking a deployment sequence or causing other unexpected behavior. Using the data_depend (GoDoc) Analyzer, teams can detect this risk early and account for it in pre-deployment checks to a database.

By default, data-dependent changes are reported but not cause migration linting to fail. Users can change this by configuring the data_depend analyzer in the atlas.hcl file:

atlas.hcl
lint {
data_depend {
error = true
}
}

Backward Incompatible Changes

Backward-incompatible changes, also known as breaking changes, are schema changes that have the potential to break the contract with applications that rely on the old schema. For instance, renaming a column from email_address to email can cause errors during deployment (migration) phase if applications running the previous version of the schema reference the old column name in their queries.

By default, detected breaking changes are reported but do not cause migration linting to fail. Users can change this by configuring the incompatible analyzer in the atlas.hcl file:

atlas.hcl
lint {
incompatible {
error = true
}
}

Naming Conventions Policy

In database schema design, maintaining consistency and readability through naming conventions is a widely common practice. Atlas provides an analyzer that can help enforce naming conventions on a variety of schema resources, including tables, columns, and indexes.

Users can enable this by configuring the naming analyzer in their atlas.hcl file:

atlas.hcl
lint {
naming {
match = "^[a-z]+$"
message = "must be lowercase"
}
}

By default, detected naming violations are reported but do not cause migration linting to fail. Users can change this by configuring the naming analyzer in the atlas.hcl file:

atlas.hcl
lint {
naming {
error = true
match = "^[a-z]+$"
message = "must be lowercase"
}
}

Concurrent Index Policy (PostgreSQL)

Schema changes like CREATE INDEX or DROP INDEX can cause the database to lock the table against write operations. Luckily, PostgreSQL provides the CONCURRENTLY option that may be more resource-intensive, but allows normal database operations to continue while the index is built or dropped.

Atlas provides an analyzer that identifies non-concurrent index creation or deletion for tables not created within the same file, and recommends executing them concurrently.

Additionally, since indexes cannot be created or deleted concurrently within a transaction, Atlas ensures the atlas:txmode none directive exists in the file header to prevent this file from running in a transaction. This check can be disabled along with the other ones as follows:

atlas.hcl
lint {
concurrent_index {
check_create = false // `true` by default.
check_drop = false // `true` by default.
check_txmode = false // `true` by default.
}
}

By default, detected concurrent index violations are reported but do not cause migration linting to fail. Users can change this by configuring the concurrent_index analyzer in the atlas.hcl file:

atlas.hcl
lint {
concurrent_index {
error = true
}
}

Checks

The following schema change checks are provided by Atlas:

CheckShort Description
BCBackward incompatible changes
BC101Renaming a table
BC102Renaming a column
CDConstraint deletion changes
CD101Foreign-key constraint was dropped
DSDestructive changes
DS101Schema was dropped
DS102Table was dropped
DS103Non-virtual column was dropped
MFData-dependent changes (changes that might fail)
MF101Add unique index to existing column
MF102Modifying non-unique index to unique
MF103Adding a non-nullable column to an existing table
MF104Modifying a nullable column to non-nullable
PGPostgreSQL specific checks
PG110Creating table with non-optimal data alignment
PG1Concurrent Indexes  
PG101Missing the CONCURRENTLY in index creation
PG102Missing the CONCURRENTLY in index deletion
PG103Missing atlas:txmode none directive in file header
PG104PRIMARY KEY constraint creation that acquires an ACCESS EXCLUSIVE lock on the table
PG105UNIQUE constraint creation that acquires an ACCESS EXCLUSIVE lock on the table
PG3PostgreSQL-specific blocking table changes  
PG301Column type change that requires table and indexes rewrite
PG302Adding a column with a volatile DEFAULT value requires a rewrite of the table
PG303Modifying a nullable column to non-nullable requires a full table scan
PG304Adding a PRIMARY KEY on nullable columns implicitly set them to NOT NULL requires a full table scan
PG305Adding a CHECK constraint that requires a full table scan
PG306Adding a FOREIGN KEY constraint that requires a full table scan and blocks write operations
MYMySQL and MariaDB specific checks
MY101Adding a non-nullable column without a DEFAULT value to an existing table
MY102Adding a column with an inline REFERENCES clause has no actual effect
MY110Removing enum values from a column requires a table copy
MY111Reordering enum values of a colum requires a table copy
MY112Inserting new enum values not at the end requires a table copy
MY113Exceeding 256 enum values changes storage size and requires a table copy
MY120Removing set values from a column requires a table copy
MY121Reordering set values of a colum requires a table copy
MY122Inserting new set values not at the end requires a table copy
MY123Exceeding 8, 16, 24, 32 or 64 set values changes the storage size and requires a table copy
NMNaming Conventions
NM101Schema name violates the naming convention
NM102Table name violates the naming convention
NM103Column name violates the naming convention
NM104Index name violates the naming convention
NM105Foreign-key constraint name violates the naming convention
NM106Check constraint name violates the naming convention
LTSQLite specific checks
LT101Modifying a nullable column to non-nullable without a DEFAULT value

DS101

Destructive change that is reported when a database schema was dropped. For example:

DROP SCHEMA test;

DS102

A destructive change is reported when a table is dropped, posing a risk of data loss. For example:

DROP TABLE t;

The suggested solution is to back up the data before dropping the table, or add a pre-migration check to ensure that the table is empty before dropping it:

+ -- atlas:txtar
+
+ -- checks/destructive.sql --
+ SELECT NOT EXISTS (SELECT 1 FROM t1) AS t1_empty;
+
+ -- migration.sql --
DROP TABLE t1;

DS103

A destructive change is reported when a non-virtual column is dropped, posing a risk of data loss. For example:

ALTER TABLE t DROP COLUMN c;

The suggest solution to resolve these errors is to add a pre-migration check that ensures the column is empty before dropping it:

+ -- atlas:txtar
+
+ -- checks/destructive.sql --
+ SELECT NOT EXISTS (SELECT 1 FROM t WHERE c IS NOT NULL) AS t_c_empty;
+
+ -- migration.sql --
ALTER TABLE t DROP COLUMN c;

MF101

Adding a unique index to a table might fail in case one of the indexed columns contain duplicate entries. For example:

CREATE UNIQUE INDEX i ON t(c);

MF102

Modifying a non-unique index to be unique might fail in case one of the indexed columns contain duplicate entries.

note

Since index modification is done with DROP and CREATE, this check will be reported only when analyzing changes programmatically or when working with the declarative workflow.

MF103

Adding a non-nullable column to a table might fail in case the table is not empty. For example:

ALTER TABLE t ADD COLUMN c int NOT NULL;

MF104

Modifying nullable column to non-nullable might fail in case it contains NULL values. For example:

ALTER TABLE t MODIFY COLUMN c int NOT NULL;

The solution, in this case, is to backfill NULL values with a default value:

UPDATE t SET c = 0 WHERE c IS NULL;
ALTER TABLE t MODIFY COLUMN c int NOT NULL;

BC101

Renaming a table is a backward-incompatible change that can cause errors during deployment (migration) if applications running the previous version of the schema refer to the old name in their statements. For example:

ALTER TABLE `users` RENAME TO `Users`;

Unlike other checks, there is no single correct way to resolve this one. Here are some possible solutions:

  1. It's likely that this change was introduced when you renamed one of the entities in your ORM, and the linter helped you catch the potential problem this could cause. In such cases, most ORM frameworks allow you to rename the entity while still pointing to its previous table name. e.g., here is how you can do it in Ent and in GORM.

  2. If renaming is desired but the previous version of the application uses the old table name, a temporary VIEW can be created to mimic the previous schema version in the deployment phase. However, the downside of this solution is that mutations using the old table name will fail. Yet, if Atlas detects a consecutive statement with a CREATE VIEW <old_name>, it will ignore this check.

    ALTER TABLE `users` RENAME TO `Users`;
    CREATE VIEW `users` AS SELECT * FROM `Users`;
  3. If renaming the table is desired and no clients depend on it yet, or if it is acceptable to return errors during migration phase when traffic is minimal, you can configure Atlas to ignore this check with the following directive:

    -- atlas:nolint BC101
    ALTER TABLE `users` RENAME TO `Users`;

BC102

Renaming a column is a backward-incompatible change that can cause errors during deployment (migration) if applications running the previous version of the schema refer to the old column name in their statements. For example:

ALTER TABLE `users` RENAME COLUMN `user_name` TO `name`;

Unlike other checks, there is no single correct way to resolve this one. Here are some possible solutions:

  1. It's likely that this change was introduced when you renamed a field in one of the entities in your ORM, and the linter helped you catch the potential problem this could cause. In such cases, most ORM frameworks allow you to rename a field while still pointing to its previous column name. e.g., in Ent you can configure it using the StorageKey option, and in GORM you can set it by adding the column struct tag.

  2. If renaming is desired but the previous version of the application uses the old column name, a temporary VIRTUAL generated column can be created to mimic the previous schema version in the deployment phase. However, the downside of this solution is that mutations using the old column name will fail. Yet, if Atlas detects a consecutive command with such a column, it will ignore this check.

    -- For MySQL or MariaDB:
    ALTER TABLE `posts` RENAME COLUMN `id` TO `uid`, ADD COLUMN `id` int AS (`uid`);

    -- SQLite:
    ALTER TABLE `posts` RENAME COLUMN `id` TO `uid`;
    ALTER TABLE `posts` ADD COLUMN `id` int AS (`uid`);
  3. If renaming the column is desired and no clients depend on it yet, or if it is acceptable to return errors during the migration phase when traffic is minimal, you can configure Atlas to ignore this check with the following directive:

    -- atlas:nolint BC102
    ALTER TABLE `posts` RENAME COLUMN `id` TO `uid`;

CD101

Constraint deletion is reported when a foreign-key constraint was dropped. For example:

ALTER TABLE pets DROP CONSTRAINT owner_id;

MY101

Adding a non-nullable column to a table without a DEFAULT value implicitly sets existing rows with the column zero (default) value. For example:

ALTER TABLE t ADD COLUMN c int NOT NULL;
-- Append column `c` to all existing rows with the value 0.

MY102

Adding a column with an inline REFERENCES clause has no actual effect. Users should define a separate FOREIGN KEY specification instead. For example:

-CREATE TABLE pets (owner_id int REFERENCES users(id));
+CREATE TABLE pets (owner_id int, FOREIGN KEY (owner_id) REFERENCES users(id));

MY110

Removing enum values from a column changes the column type and requires a table copy. During this process, the table is locked for write operations. In addition, operation may fail if the column contains values that are not in the new enum definition.

MY111

Reordering enum values of a colum requires a table copy. During this process, the table is locked for write operations.

Note that since the order of the enum values defines how the table is sorted when using ORDER BY on the column, controlling the ordering behavior can be achieved using the DESC clause or expressions as follows:

-- Instead of reversing the enum values.
SELECT * FROM t ORDER BY enum_column DESC;

-- Instead of reordering the enum values.
SELECT * FROM t ORDER BY CASE enum_column WHEN 'a' THEN 1 WHEN 'b' THEN 2 ELSE 3 END;

MY112

Inserting new enum values not at the end requires table copy. During this process, the table is locked for write operations.

If possible, it is recommended to add new enum values at the end of the enum definition.

MY113

Exceeding 256 enum values changes storage size and requires a table copy. During this process, the table is locked for write operations.

MY120

Removing set values from a column requires a table copy. During this process, the table is locked for write operations. In addition, operation may fail if the column contains values that are not in the new set definition.

MY121

Reordering set values of a colum requires a table copy. During this process, the table is locked for write operations.

MY122

Inserting new set values not at the end requires table copy. During this process, the table is locked for write operations.

If possible, it is recommended to add new set values at the end of the set definition.

MY123

Adding set values to a column changes its storage size and requires a table copy, if the new amount changes the number of bytes needed for the bitmap. The storage size of a set column is 1, 2, 3, 4 or 8 bytes, depending on the number of values: One byte can store up to 8 values, two bytes can store up to 16 etc.

During this process, the table is locked for write operations.

LT101

Modifying a nullable column to non-nullable without setting a DEFAULT might fail in case it contains NULL values. The solution is one of the following:

1. Set a DEFAULT value on the modified column:

-- create "new_users" table
CREATE TABLE `new_users` (`a` int NOT NULL DEFAULT 1);
-- copy rows from old table "users" to new temporary table "new_users"
INSERT INTO `new_users` (`a`) SELECT IFNULL(`a`, 1) FROM `users`;
-- drop "users" table after copying rows
DROP TABLE `users`;
-- rename temporary table "new_users" to "users"
ALTER TABLE `new_users` RENAME TO `users`;

2. Backfill NULL values with a default value:

-- backfill previous rows
UPDATE `users` SET `a` = 1 WHERE `a` IS NULL;
-- disable the enforcement of foreign-keys constraints
PRAGMA foreign_keys = off;
-- create "new_users" table
CREATE TABLE `new_users` (`a` int NOT NULL);
-- copy rows from old table "users" to new temporary table "new_users"
INSERT INTO `new_users` (`a`) SELECT `a` FROM `users`;
-- drop "users" table after copying rows
DROP TABLE `users`;
-- rename temporary table "new_users" to "users"
ALTER TABLE `new_users` RENAME TO `users`;
-- enable back the enforcement of foreign-keys constraints
PRAGMA foreign_keys = on;

NM101

A schema has been given a name that violates the naming convention.

NM102

A table has been given a name that violates the naming convention.

NM103

A column has been given a name that violates the naming convention.

NM104

An index has been given a name that violates the naming convention.

NM105

A foreign-key constraint has been given a name that violates the naming convention.

NM106

A check constraint has been given a name that violates the naming convention.

PG110

Creating a table with optimal data alignment may help minimize the amount of required disk space. For example consider the next Postgres table on a 64-bit system:

CREATE TABLE accounts (
id bigint PRIMARY KEY,
premium boolean,
balance integer,
age smallint
);

Each tuple in the table takes 24 bytes of successive memory without the header. the id attribute takes 8 bytes, the premium takes 1 byte and 3 bytes of padding, the balance takes 4 bytes and the age takes 2 bytes, and lastly 6 bytes of padding allocated for the end of the row. In total 9 bytes of padding are allocated for each row.

Compared to same table with different ordering which only takes 16 bytes in memory with 1 byte of padding:

CREATE TABLE accounts (
id bigint PRIMARY KEY,
balance integer,
age smallint,
premium boolean
);

PG101

Creating an index non-concurrently acquires a SHARE lock on the table blocking writes but allowing reads during the operation.

PG102

Dropping an index non-concurrently acquires an ACCESS EXCLUSIVE lock on the table blocking both reads and writes during the operation.

PG103

Indexes cannot be created or deleted concurrently within a transaction. Add the atlas:txmode none directive to the header to prevent this file from running in a transaction.

PG104

Adding a PRIMARY KEY constraint (with its index) acquires an ACCESS EXCLUSIVE lock on the table, blocking all access during the operation. The solution is to add as follows:

  1. Create the UNIQUE INDEX concurrently:

    20240101000001_add_unique_index.sql
    CREATE UNIQUE INDEX CONCURRENTLY users_pkey ON users(id);
  2. Validate the index creation and ensure it is in VALID state. Otherwise, the constraint creation will fail with the following error:

    ERROR:  index "users_pkey" is not a valid

    Note, this step can be automated using a pre-migration check on the migration file below.

  3. After validating the index state, add the PRIMARY KEY constraint using the created index:

    20240101000002_add_primary_key.sql
    ALTER TABLE users ADD PRIMARY KEY USING INDEX users_pkey;

PG105

Adding a UNIQUE constraint (with its index) acquires an ACCESS EXCLUSIVE lock on the table, blocking all access during the operation. The solution is to add as follows:

  1. Create the UNIQUE INDEX concurrently:

    20240101000001_add_unique_index.sql
    CREATE UNIQUE INDEX CONCURRENTLY users_id_key ON users(id);
  2. Validate the index creation and ensure it is in VALID state. Otherwise, the constraint creation will fail with the following error:

    ERROR:  index "users_id_key" is not a valid

    Note, this step can be automated using a pre-migration check on the migration file below.

  3. After validating the index state, add the UNIQUE constraint using the created index:

    20240101000002_add_unique_const.sql
    ALTER TABLE users ADD UNIQUE USING INDEX users_id_key;

PG301

A change to the column type that requires adjusting the actual stored data and triggers a rewrite of the table and potentially indexes. During this operation, an ACCESS EXCLUSIVE lock is acquired on the table, preventing any level of access, including SELECT.

PG302

Adding a column with a volatile DEFAULT value requires a rewrite of the table. During this operation, an ACCESS EXCLUSIVE lock is acquired on the table, preventing any level of access, including SELECT.

PG303

Modifying a column from NULL to NOT NULL requires a full table scan. During this operation, an ACCESS EXCLUSIVE lock is acquired on the table, preventing any level of access, including SELECT.

Columns with CHECK constraints

If the table has a CHECK constraint that ensures NULL cannot exist, such as CHECK (c > 10 AND c IS NOT NULL), the table scan is skipped, and therefore this check is not reported.

PG304

Adding a PRIMARY KEY on a nullable columns implicitly sets them to NOT NULL, resulting a full table scan unless there is a CHECK constraint that ensures NULL cannot exist.

PG305

Adding a CHECK constraint without the NOT VALID clause requires scanning the table to verify that all rows satisfy the constraint. During this operation, an ACCESS EXCLUSIVE lock is acquired on the table, preventing any level of access, including SELECT.

PG306

Adding a FOREIGN KEY constraint without the NOT VALID clause requires a full table scan to verify that all rows satisfy the constraint. During this process, a SHARE ROW EXCLUSIVE lock is acquired on both tables blocking write operations.