Skip to main content

Safe NOT NULL Migrations on PostgreSQL: Avoid Aborted Deploys and Table Locks

The Problem

Adding a NOT NULL constraint to an existing column is a routine schema change, but on PostgreSQL, a plain ALTER TABLE ... ALTER COLUMN ... SET NOT NULL has two failure modes that are easy to miss when testing locally or against data that does not represent production:

  • Data-dependent change: The column might already hold NULLs in production. If it does, SET NOT NULL fails and the migration fails with it. Since success depends on the data, the migration can pass on a local or test database and still fail in production where the NULLs actually exist. Atlas flags this as MF104.
  • Blocking table change: Even when no NULLs exist, PostgreSQL still has to verify it. SET NOT NULL acquires an ACCESS EXCLUSIVE lock, which blocks reads as well as writes, and holds it while it scans every row in the table. The scan is instant on a small table, but on a large or busy one, the lock can be held long enough to stall queries and the deployment. Atlas reports this as PG303.

The safe way to make this change is well-known: backfill the data, then add the constraint without a blocking scan. Doing this by hand for every column is tedious and easy to get wrong, so Atlas automates it for you.

Enter: Atlas

Atlas can plan this change safely for you. You configure the behavior once in atlas.hcl, and it is applied to every migration Atlas generates, locally and in CI. The rest of this guide sets this up.

To download and install the latest release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | sh

The not_null PostgreSQL diff policy is available only to Atlas Pro users. To use this feature, run:

atlas login

The Solution

By default, when Atlas detects a nullability change in your schema, it generates a migration with a plain SET NOT NULL:

migration.sql
-- Modify "users" table
ALTER TABLE "users" ALTER COLUMN "name" SET NOT NULL;

We already covered why this change is risky. Atlas's linting catches it too: atlas migrate lint reports both the data-dependent risk (MF104) and the full table scan (PG303):

Analyzing changes from version 20260616100000 to 20260616120000 (1 migration in total):

-- analyzing version 20260616120000
-- data dependent changes detected:
-- L3: Modifying nullable column "name" to non-nullable might fail in case it contains NULL values
https://atlasgo.io/lint/analyzers#MF104
-- blocking table changes detected:
-- L3: Modifying nullable column "name" to non-nullable requires a full table scan
https://atlasgo.io/lint/analyzers#PG303
-- ok (2.1ms)

-------------------------
-- 8.4ms
-- 1 version with warnings
-- 1 schema change
-- 2 diagnostics

Atlas can resolve both of these risks with its not_null diff policy, which we configure in two steps:

  1. Add a pre-migration check that stops a migration before it runs if a column still holds NULLs. This handles the data-dependent risk (MF104).
  2. Switch to the lock-safe strategy, which applies the NOT NULL without ever holding an ACCESS EXCLUSIVE lock for a table scan. This handles the blocking risk (PG303), with trade-offs which we cover in that step.

The following sections walk through each step.

Step 1: Gate the Change with a Pre-Migration Check

We want Atlas to verify that our column has no NULLs before the migration runs, and to stop the deployment if it finds any. We instruct it to do so by setting check in the not_null block in atlas.hcl:

atlas.hcl
diff "postgres" {
not_null {
check = true
}
}

With check on, Atlas no longer writes a plain SQL file. Instead, it writes the version as a txtar archive: a text-based format for a complete migration plan in which each section begins with a -- FILENAME -- marker. Atlas puts the pre-migration check in a checks.sql section and the DDL in a migration.sql section:

migrations/20260616120000.sql
-- atlas:txtar

-- checks.sql --
-- atlas:assert MF104
SELECT NOT EXISTS (SELECT 1 FROM "users" WHERE "name" IS NULL) AS "not_null";

-- migration.sql --
-- Modify "users" table
ALTER TABLE "users" ALTER COLUMN "name" SET NOT NULL;

The checks.sql assertion must pass before migration.sql runs, so if "name" contains any NULL values, the deployment stops upfront with a clear message rather than of failing halfway through. That solves the first problem: atlas migrate lint no longer reports MF104. The DDL itself is unchanged, though, so the full table scan (PG303) is still flagged:

Analyzing changes from version 20260616100000 to 20260616120000 (1 migration in total):

-- analyzing version 20260616120000
-- blocking table changes detected:
-- L3: Modifying nullable column "name" to non-nullable requires a full table scan
https://atlasgo.io/lint/analyzers#PG303
-- ok (1.8ms)

-------------------------
-- 7.9ms
-- 1 version with warnings
-- 1 schema change
-- 1 diagnostic

The data-dependent risk is solved, but the lock (PG303) is still there.

Step 2: Remove the ACCESS EXCLUSIVE Lock

Now, we want Atlas to make the column NOT NULL without holding an ACCESS EXCLUSIVE lock for a full table scan, which puts us at risk for downtime on a large or busy table. We instruct it to do so by setting lock_safe alongside check in the not_null block:

atlas.hcl
diff "postgres" {
not_null {
check = true
lock_safe = true
}
}

lock_safe relies on a property of PostgreSQL: a NOT VALID constraint is added without scanning the table, so it only constrains future rows. It can later be validated under a SHARE UPDATE EXCLUSIVE lock, which does not block reads or writes. Once a validated constraint proves the column has no NULLs, the actual SET NOT NULL is instant because PostgreSQL skips its own scan when an existing constraint already guarantees the invariant.

The exact form depends on your PostgreSQL version:

On PostgreSQL 18, NOT NULL is a first-class, nameable catalog constraint (contype = 'n') that supports the NOT VALID clause. Atlas uses it directly, so the validated constraint replaces SET NOT NULL entirely: there is no separate ALTER COLUMN step, and nothing to drop afterward.

migrations/20260616120000.sql
-- atlas:txtar

-- checks.sql --
-- atlas:assert MF104
SELECT NOT EXISTS (SELECT 1 FROM "users" WHERE "name" IS NULL) AS "not_null";

-- migration.sql --
-- atlas:txmode none

-- Add a not-null constraint to column "name" without scanning the table
ALTER TABLE "users" ADD CONSTRAINT "users_name_not_null" NOT NULL "name" NOT VALID;
-- Validate constraint "users_name_not_null"
ALTER TABLE "users" VALIDATE CONSTRAINT "users_name_not_null";

The constraint name users_name_not_null follows PostgreSQL's own convention for the implicit name a plain SET NOT NULL would generate (<table>_<column>_not_null), so a later inspection folds it back into the column's nullability with no drift.

In both versions the checks.sql is unchanged from Step 1; lock_safe only rewrites the highlighted migration.sql, so the change stays guarded against dirty data while shedding the lock.

Why the transaction is disabled

Both plans above, on PostgreSQL 18 and below, open with -- atlas:txmode none. That directive tells Atlas not to wrap the migration in a single transaction, but to run each statement on its own. The reason comes down to how PostgreSQL handles locks: once acquired, a lock is held until the end of the transaction.

ADD CONSTRAINT ... NOT VALID is instant on its own because it does not scan the table, but it still takes a brief ACCESS EXCLUSIVE lock to record the new constraint. Inside a single transaction, that lock would be held until the transaction commits. The VALIDATE that follows (which by itself takes only a SHARE UPDATE EXCLUSIVE lock) would run its table scan while the table is still under the ACCESS EXCLUSIVE lock held by the preceding ADD. That is the exact lock we set out to avoid.

Running each statement on its own lets the ADD commit and release its lock immediately, so VALIDATE scans under the lighter SHARE UPDATE EXCLUSIVE lock instead. With both options on, the migration applies the NOT NULL without an ACCESS EXCLUSIVE table scan, and atlas migrate lint passes with no diagnostics.

lock_safe builds on check

lock_safe cannot be used on its own. Removing the lock without first proving the data is clean would just move the abort from the scan to the SET NOT NULL itself, so Atlas requires check = true whenever lock_safe is set. Configuring lock_safe alone is an error:

atlas.hcl:3,5-13: attribute check is required when lock_safe is set

When Atlas Skips the Lock-Safe Plan

The full NOT VALID dance is not always needed. When a plain SET NOT NULL would not scan the table anyway, the extra statements buy nothing, so Atlas falls back to the simplest plan that is still lock-safe. This is usually a plain SET NOT NULL. This happens in the following cases:

The column is already covered by a CHECK

If the table already has a validated CHECK (col IS NOT NULL) constraint that this migration did not add, PostgreSQL skips the scan on SET NOT NULL by itself: a valid check already proves no NULLs exist, so there is nothing left to verify. On the other hand, a NOT VALID check would not help here, since PostgreSQL has not confirmed it holds for the existing rows. Atlas emits a plain SET NOT NULL, with no NOT VALID constraint and no txmode none:

migration.sql
-- Modify "users" table
ALTER TABLE "users" ALTER COLUMN "bio" SET NOT NULL;

check = true still adds the MF104 pre-check, and atlas migrate lint recognizes the covering constraint too, so it does not report PG303. The diff and the linter agree the change is already lock-safe.

The migration adds a covering CHECK

If the same migration both adds a CHECK (col IS NOT NULL) and sets the column NOT NULL, Atlas reuses that constraint instead of creating a throwaway one: it adds the check NOT VALID, validates it, sets the column NOT NULL, and keeps the check, since it is part of your desired schema:

migration.sql
-- atlas:txmode none

-- Add the "users_bio_check" constraint covering the nullability of column "bio" without scanning the table
ALTER TABLE "users" ADD CONSTRAINT "users_bio_check" CHECK ("bio" IS NOT NULL) NOT VALID;
-- Validate constraint "users_bio_check"
ALTER TABLE "users" VALIDATE CONSTRAINT "users_bio_check";
-- Modify "users" table
ALTER TABLE "users" ALTER COLUMN "bio" SET NOT NULL;

A single CHECK that covers several columns going NOT NULL in the same migration is added and validated only once.

Another change already rewrites the table

Some changes rewrite the whole table under an ACCESS EXCLUSIVE lock regardless: adding a column with a volatile default, a stored generated column, or an identity column; switching the table to UNLOGGED; or changing its access method.

When one of these is in the same migration, a separate NOT VALID and VALIDATE would only add a second scan, so Atlas lets the SET NOT NULL ride along on the combined ALTER as a plain statement, with no txmode none and no temporary constraint:

migration.sql
-- Modify "users" table
ALTER TABLE "users" SET UNLOGGED, ALTER COLUMN "name" SET NOT NULL;

Summary

Adding a NOT NULL constraint to a PostgreSQL column carries two risks: the migration can fail on existing NULLs, and SET NOT NULL can lock the table for a full scan.

The not_null diff policy removes both: check adds a pre-migration check that stops the deployment before any DDL when the column still holds NULLs, and lock_safe plans the change with a validated NOT VALID constraint so the NOT NULL is applied without an ACCESS EXCLUSIVE scan.

Without policycheckcheck + lock_safe
Existing NULLsMigration fails mid-runCaught upfront by MF104Caught upfront by MF104
Table scan under ACCESS EXCLUSIVEYesYesNo, validated under SHARE UPDATE EXCLUSIVE
Lint resultMF104 and PG303PG303Clean

For zero-downtime NOT NULL migrations on PostgreSQL, enable both knobs. Together they let the column become NOT NULL while the table stays readable and writable throughout, with no risk of a failed deploy:

atlas.hcl
diff "postgres" {
not_null {
check = true
lock_safe = true
}
}

Next Steps

Atlas plugs into every major CI platform to run migration linting on each pull request, so unsafe changes are caught in review before they reach production: