Unico is a leading digital identity technology provider in Brazil, developing secure and efficient digital identity
solutions for businesses and individuals. Their platform helps organizations streamline identity verification processes, delivering a seamless
user experience while enhancing security and reducing fraud.
The Missing Layer: Tackling Complexity, Outages, and Risks in Database Schema Management
At Unico, the Platform Engineering team, led by Luiz Casali, is focused on improving developer productivity. "Reducing complexity for developers
is one of our top priorities," Luiz explained.
Unico's Platform team had previously built solutions to automate CI/CD workflows for code using Bazel and GitHub Actions and for infrastructure
using Terraform and Atlantis. The team was missing a standardized solution for managing database schema changes.
This gap introduced several pressing issues:
Risky Manual Processes: Database schema changes (migrations) were performed manually, increasing the chance of human error.
Unreliable Deployments: Unplanned, database-related outages were common, emphasizing the need for a safer way to handle database changes.
Compliance Requirements: The team needed to document and review schema changes to maintain governance standards, but the lack of automation
made this challenging.
Determined to bridge this gap and establish a safer, more efficient solution for developers, Unico's Platform Engineering team began researching
the best database migration tools. Thiago da Silva Conceição, a Site Reliability Engineer (SRE) in the team, took the lead on this technical evaluation.
The Challenge of Managing Database Schema Migrations
Traditional schema migration tools posed significant challenges for Unico. As Thiago noted, "Automation
with our previous tool was never easy to maintain. It followed a specific pattern, and only a few team members were familiar with it." The
team faced limitations that affected usability, integration, and overall productivity:
Limited Usability and Adoption: The tool required extensive knowledge, and documentation was limited, making it difficult to adopt across
the team.
Lack of Automation Support: Automated migrations and reliable error-handling were lacking, leading to inconsistent deployments and a need
for additional manual oversight.
Compliance Difficulties: The absence of automated documentation and governance features made it challenging to maintain and provide records
for audit and compliance requirements.
With these challenges in mind, Unico needed a solution that could offer usability, integration with existing tools, and comprehensive metrics to
continuously monitor and improve database migrations.
"In the end, choosing Atlas was easy. It is a simple, yet powerful tool, offering a significant impact with many ready-made features that would
require customization in other tools." Thiago Silva Conceição, SRE, Unico
During the search for a new solution, Unico's engineering team prioritized several criteria:
Ease of Use: The tool needed to be straightforward and accessible for all developers, not just a few specialized team members.
Integration and Compatibility: It had to fit naturally with Unico's technology stack, particularly with Terraform, which was already in
heavy use.
Metrics and Alerts: Real-time insights and alerts were essential to monitor migrations effectively.
Thiago compared a few traditional solutions before selecting Atlas. Atlas's declarative schema-as-code approach, along with its HCL compatibility
and robust cloud management, aligned well with Unico's needs. It allowed the team to automate migrations, eliminate manual errors, and centralize
schema management, creating a unified experience across their projects. "Atlas allowed us to keep the DDL in HCL while still supporting SQL
scripts for specific use cases through its versioning model," Thiago shared.
Another key priority for Unico's Platform Engineering team was standardization. With multiple teams working across diverse programming
languages and databases, The Platform Engineering team needed a unified migration tool that would work for a wide array of use cases, without
sacrificing ease of use or reliability. To simplify the developer experience and streamline internal operations, they aimed to find a single
solution that could support all teams consistently and seamlessly.
Atlas emerged as the ideal fit by providing plugin support for various databases, ORMs and integrations, making it a flexible tool for Unico's
entire tech stack. The ability to standardize migration management with Atlas allowed Unico's Platform Engineering team to enforce consistent
practices across all projects. Atlas became the single source of truth for schema management, offering a centralized framework for
building policies, integrating into CI/CD pipelines, and supporting developers.
By implementing Atlas as a standard, the Platform Engineering team eliminated the need to train on or maintain multiple tools, reducing complexity
and operational overhead. Now, Unico's developers enjoy a unified experience, and the platform team has only one tool to integrate, support, and
scale as the company grows.
The migration to Atlas was seamless, with no need to recreate migration files or impose rigid formats. "We simply imported the schemas from the
production database, keeping the process straightforward and efficient," Thiago said. The team was able to quickly onboard Atlas and start seeing
results, with pre-built actions in Atlas Cloud providing essential metrics, notifications, and dashboards for tracking progress.
This success reinforced the decision to adopt Atlas:
"Month over month, we see smaller and smaller incidents.
— Luiz Casali, Senior Engineering Manager
Outcome: Faster Development Cycles, Increased Reliability, and Enhanced Compliance
With Atlas in place, Unico's Platform Engineering team has achieved several key outcomes:
Accelerated Development Cycles: Automation of database migrations streamlined the development process, enabling faster iterations and more
rapid deployments.
Increased Reliability: Atlas's linting and testing tools reduced migration errors and enhanced deployment stability, contributing to Unico's
goal of reducing incidents.
Enhanced Compliance: Atlas's automated documentation ensures that each migration step is recorded, simplifying compliance by providing a clear,
auditable record of all schema changes.
By automating these processes, the team has successfully reduced manual work and achieved a more predictable migration workflow. Now, as Unico grows,
they are assured that their migration practices will scale smoothly, maintaining operational costs without sacrificing speed or reliability.
Atlas brings the declarative mindset of infrastructure-as-code to database schema management, similar to Terraform, but focused on databases. Using
its unique schema-as-code approach, teams can quickly inspect existing databases and get started with minimal setup.
Like Unico, we recommend anyone looking for a schema migration solution to get started with Atlas by trying
it out on one or two small projects. Dive into the documentation, join our Discord community for support,
and start managing your schemas as code with ease.
Get the latest Atlas tips and updates in our newsletter.
We are excited to announce the release of Atlas v0.29, which
continues our journey to make working with database easier, safer and more reliable. This release includes several
significant updates that we are happy to share with you:
Approval flows for the Kubernetes Operator: Moving to a declarative way of managing database schemas has plenty of
advantages, but many teams want to see and approve changes before they are applied. Doing this from the CLI is
straightforward, but until recently it was not easy to provide this experience in Kubernetes-based workflows.
With the new approval flows, you can now review and approve schema migrations seamlessly, ensuring database
changes are well-governed while maintaining deployment velocity.
Prisma support: Following our integrations with some of the most popular ORMs in our industry, we are happy to
announce our official guide on using Atlas to manage database schemas for Prisma projects.
GitLab CI/CD Components: Integrating GitLab CI with Atlas just got much easier with the new GitLab CI/CD
components.
IntelliJ Plugin: Our IntelliJ plugin has been upgraded with code folding, inline SQL syntax highlighting and
suggestions, and syntax highlighting within heredoc clauses.
Timeseries Engine support for ClickHouse: ClickHouse users can now explore beta support for timeseries data in Atlas.
Constraint Triggers support for PostgreSQL: PostgreSQL users can now manage constraint triggers with Atlas.
Moving to a declarative way of managing database schemas has plenty of advantages, but many teams want to see and approve
changes before they are applied.
Providing flows for keeping a human-in-the-loop from the CLI is straightforward, but until recently it was not easy to
provide this experience in Kubernetes-based workflows.
Following our recent KubeCon session,
the Atlas Operator now includes approval flows for declarative schema
migrations, making database changes in Kubernetes safer:
Pre-approvals - with pre-approvals, teams enhance their CI pipelines to detect schema changes and integrate
their planning and approval in the code review process. The approved policies are then applied to the database by
the operator.
Ad-hoc Approvals - with ad-hoc approvals, the operator pauses the migration process and waits for human approval
before applying the schema change. This is useful for schema changes that were not approved in advance or for projects
that do not have a strict pre-approval policy.
Following popular demand from the Atlas community, we are excited to announce our official guide on using Atlas to manage
database schemas for Prisma projects.
Prisma already has an excellent migration tool called prisma migrate, so why would you want to use Atlas with Prisma?
In many cases, Prisma's migrate indeed checks all the boxes for managing your database schema. However, being tightly
coupled with the Prisma ORM, some use cases might require a dedicated schema management tool that can be used across
different ORMs and frameworks.
This guide shows how to load your existing prisma.schema file into Atlas, manage your schema changes, and apply them to
your database using the Atlas CLI.
GitLab CI/CD components are reusable YAML templates that you can use in your GitLab CI/CD pipelines to automate workflows
within your GitLab project. Our newly published components are designed to simplify the process of integrating Atlas
with GitLab CI/CD pipelines, enabling you to automate database schema management tasks with ease.
Our IntelliJ plugin has been upgraded with code folding, inline SQL syntax highlighting and suggestions, and syntax
highlighting within heredoc clauses. Our goal with these efforts is to make writing real world database applications
with Atlas easier and more enjoyable.
ClickHouse recently added support for an experimental TimeSeries engine, which is designed to
optimize storage and query performance for time-series data.
Atlas now supports this experimental feature, enabling ClickHouse users to manage schemas for their time-series data
tables with ease:
You can simply define a TimeSeries table in your Atlas schema.
The CONSTRAINT TRIGGER is a PostgreSQL extension of the SQL standard, which works like a regular trigger but allows its execution time to be dynamically controlled using the SET CONSTRAINTS command.
Starting with this version, users can define constraint triggers, and Atlas will manage their lifecycles. Their definitions are also supported in the Atlas HCL syntax:
For two decades now, the common practice for handling rollbacks of database schema migrations has been pre-planned
"down migration scripts". A closer examination of this widely accepted truth reveals critical gaps that result in
teams relying on risky, manual operations to roll back schema migrations in times of crisis.
In this post, we show why our existing tools and practices cannot deliver on the GitOps promise of "declarative" and
"continuously reconciled" workflows and how we can use the Operator Pattern to build a new solution for robust and safe
schema rollbacks.
One of the most liberating aspects of working on digital products is the ability to roll back changes. The Undo Button,
I would argue, is one of the most transformative features of modern computing.
Correcting mistakes on typewriters was arduous. You would roll the carriage back and type over any errors, leaving messy,
visible corrections. For bigger changes, entire pages had to be retyped. Correction fluid like whiteout offered a
temporary fix, but it was slow and imprecise, requiring careful application and drying time.
Digital tools changed everything. The Undo Button turned corrections into a simple keystroke, freeing creators to
experiment without fear of permanent mistakes. This shift replaced the stress of perfection with the freedom to try,
fail, and refine ideas.
When it comes to software delivery, having an Undo Button is essential as well. The ability to roll back changes
to a previous state is a critical safety net for teams deploying new features, updates, or bug fixes. Specifically,
rollbacks impact one of the key metrics of software delivery: Mean Time to Recovery (MTTR).
MTTR is a measure of how quickly a system can recover from failures. When a deployment fails, or a bug is discovered
in production, teams generally have two options: triage and fix the issue (roll forward), or roll back to a previous
known stable state.
When the fix to an issue is not immediately clear, or the issue is severe, rolling back is often the fastest way to
restore service. This is why having a reliable rollback mechanism is crucial for reducing MTTR and ensuring high
availability of services.
Undoing a change in a local environment like a word processor is straightforward. There are multiple ways to implement
an Undo Button, but they all rely on the same basic principle: the system keeps track of every change made and can
revert to a previous state.
In a distributed system like modern, cloud-native applications, things are not so simple. Changes are made across
multiple components with complex dependencies and configurations.
The key capability that enables rolling back changes is described in the seminal book, "Accelerate: The Science of
Lean Software and DevOps". The authors identify "Comprehensive Configuration Management" as one of the key technical
practices that enables high performance in software delivery:
"It should be possible to provision our environments and build, test, and deploy our software in a fully automated
fashion purely from information stored in version control.” 1
In theory, this means that if we can store everything there is to know about our system in version control, and have
an automated way to apply these changes, we should be able to roll back to a previous state by simply applying a
previous commit.
The principle of "Comprehensive Configuration Management" evolved over the years into ideas like "Infrastructure as
Code" and "GitOps". These practices advocate for storing all configuration and infrastructure definitions in version
control in a declarative format, and using automation to apply these changes to the system.
Projects like ArgoCD and Flux have popularized the
GitOps approach to managing Kubernetes clusters. By providing a structured way to define the desired state of your system
in Git (e.g., Kubernetes manifests), and automatically reconciling the actual state with it, GitOps tools
provide a structured and standardized way to manage satisfy this principle.
On paper, GitOps has finally brought us a working solution for rollbacks. Revert the commit that introduced the
issue, and all of your problems are gone!
Teams that have tried to fully commit to the GitOps philosophy usually find that the promise of "declarative" and
"continuously reconciled" workflows is not as straightforward as it seems. Let's consider why.
Declarative resource management works exceptionally well for stateless resources like containers. The way Kubernetes
handles deployments, services, and other resources is a perfect fit for GitOps. Consider how a typical deployment
rollout works:
A new replica set is created with the new version of the application.
Health checks are performed to ensure the new version is healthy.
Traffic is gradually shifted to healthy instances of the new version.
As the new version proves stable, the old version is scaled down and eventually removed.
But will this work for stateful resources like databases? Suppose we want to change the schema of a database.
Could we apply the same principles to roll back a schema migration? Here's what it would look like:
A new database is spun up with the new schema.
Health checks are performed to ensure the new schema is healthy.
Traffic is gradually shifted to the new database instance.
The old database is removed.
This would get the job done... but you would probably find yourself out of a job soon after.
Stateless resources are really great to manage because we can always throw out whatever isn't working for us
and start fresh. But databases are different. They are stateful, and they are comprised not only of a software component
(the database engine), the configuration (server parameters and schema), but also the data itself. The data itself
cannot, by definition, be provisioned from version control.
Stateful resources like databases require a different approach to manage changes.
The common practice for managing schema changes in databases is to use "up" and "down" migration scripts in tandem with
a migration tool (like Flyway or Liquibase). The idea is simple: when you want to make a change to the schema, you write
a script that describes how to apply the change (the "up" migration). Additionally, you write a script that describes
how to undo the change (the "down" migration).
For example, suppose you wanted to add a column named "short_bio" to a table named "users". Your up migration script
might look like this:
ALTERTABLE users ADDCOLUMN short_bio TEXT;
And your down migration script might look like this:
ALTERTABLE users DROPCOLUMN short_bio;
In theory, this concept is sound and satisfies the requirements of "Comprehensive Configuration Management". All
information needed to apply and roll back the change is stored in version control.
Theory, once again, is quite different from practice.
When you write a down migration, you are essentially writing a script that will be executed in the future to revert
the changes you are about to make. By definition, this script is written before the "up" changes are applied.
This means that the down migration is based on the assumption that the changes will be applied correctly.
But what if they are not?
Suppose the "up" migration was supposed to add two columns, the down file would be written to remove these two columns.
But what if the migration was partially applied and only one column was added? Running the down file would fail, and we
would be stuck in an unknown state.
Yes, some databases like PostgreSQL support transactional DDL, which means that if the migration fails, the changes are
rolled back, and you end up with a state this consistent with a specific revision. But even for PostgreSQL, some
operations cannot be run in a transaction, and the database can end up in an inconsistent state.
For MySQL, which does not support transactional DDL, the situation is even worse. If a migration fails halfway through,
you are left with only a partially applied migration and no way to roll back.
When you are working on a local database, without real traffic, having the up/down mechanism for migrations might feel
like hitting Undo and Redo in your favorite text editor. But in a real environment with real traffic, it is not the
case.
If you successfully rolled out a migration that added a column to a table, and then decided to revert it, its inverse
operation (DROP COLUMN) does not merely remove the column. It deletes all the data in that column. Re-applying the
migration would not bring back the data, as it was lost when the column was dropped.
For this reason, teams that want to temporarily deploy a previous version of the application, usually do not revert the
database changes, because doing so will result in data loss for their users. Instead, they need to assess the situation
on the ground and figure out some other way to handle the situation.
Many modern deployment practices like Continuous Delivery (CD) and GitOps advocate for the software delivery process to
be automated and repeatable. This means that the deployment process should be deterministic and should not require
manual intervention. A common way of doing this is to have a pipeline that receives a commit, and then automatically
deploys the build artifacts from that commit to the target environment.
As it is very rare to encounter a project with a 0% change failure rate, rolling back a deployment is something everyone
needs to be prepared for.
In theory, rolling back a deployment should be as simple as deploying the previous version of the application. When it
comes to versions of our application code, this works perfectly. We pull and deploy the container image corresponding
to the previous version.
This strategy does not work for the database, for two reasons:
For most migration tools, down or rollback is a separate command that needs to be executed specifically. This
means that the deployment machinery needs to know what the current version of the target database is in order to
decide whether to migrate up or down.
When we pull artifacts from a previous version, they do not contain the down files that are needed to revert the
database changes back to the necessary schema - they were only created in a future commit!
These gaps mean that teams are left with two options: either they need to manually intervene to roll back the database
changes, or they need to develop a custom solution that can handle the rollback in an automated way.
Going back to our main theme of exploring whether database rollbacks and GitOps can be compatible, let's expand on
this last point.
The ArgoCD documentation suggests
that the way to integrate schema migrations is to use a Kubernetes Job that executes your migration tool of choice,
and to annotate the Job as a PreSync hook:
This image will typically be built as part of your CI/CD pipeline, and will contain the migration tool and the migration
scripts for the relevant commit or release:
apiVersion: batch/v1 kind: Job metadata: name: db-migration annotations: argocd.argoproj.io/hook: PreSync argocd.argoproj.io/hook-delete-policy: HookSucceeded spec: template: spec: containers: -name: migrate image: your-migration-image:{{ .Values.image.tag }}# Example using Helm values restartPolicy: Never
When ArgoCD detects a new commit in the Git repository, it will create a new Job that runs the migration tool. If the
migration is successful, the Job will complete successfully, and the new version of the application will be deployed.
This will work for the up migration. But what happens when you need to roll back?
Teams commonly hit the two issues we mentioned above:
The deployment machinery does not know what the current version of the target database is, and therefore cannot
decide whether to migrate up or down.
Unless a team has carefully thought about this and implemented a mechanism inside the image to decide what to do, the
deployment machinery will always migrate up.
The image that is pulled for the rollback does not contain the down files that are needed to revert the database
changes back to the necessary schema. Most migration tools will silently keep the database in the current state.
What are the implications?
The database is no longer in sync with the current Git commit, violating all GitOps principles.
Teams that do need to roll back the database changes are left with a manual process that requires intervention and
coordination.
The Operator Pattern is a Kubernetes-native way to extend the Kubernetes API to manage additional resources. Operators
typically ship two main components: a Custom Resource Definition (CRD) that defines the new resource type, and a
controller that watches for changes to these resources and takes action accordingly.
The Operator Pattern is a perfect fit for managing stateful resources like databases. By extending the Kubernetes API
with a new resource type that represents a database schema, we can manage schema changes in a GitOps-friendly way.
A specialized controller can watch for changes to these resources and apply the necessary changes to the database in
a way that a naive Job cannot.
The Atlas Operator is a Kubernetes Operator that enables you to manage your database schemas natively from your
Kubernetes
cluster. Built on Atlas, a database schema-as-code tool (sometimes called "like Terraform for
databases"), the Atlas Operator extends the Kubernetes API to support database schema management.
Atlas has two core capabilities that are helpful to building a GitOps-friendly schema management solution:
A sophisticated migration planner that can generates migrations by diffing the desired state of the schema with
the current state of the database.
A migration analyzer that can analyze a migration and determine whether it is safe to apply and surface risks
before the migration is applied.
Atlas supports two kinds of flows for managing database schema changes: declarative and versioned. They are reflected
in the two main resources that the Atlas Operator manages:
The first resource type is AtlasSchema which is used to employ the declarative flow. With AtlasSchema, you define
the desired state of the database schema in a declarative way, and the connection string to the target database.
The Operator is then responsible for generating the necessary migrations to bring the database schema to the desired
state, and applying them to the database. Here is an example of an AtlasSchema resource:
apiVersion: db.atlasgo.io/v1alpha1 kind: AtlasSchema metadata: name: myapp spec: url: mysql://root:pass@mysql:3306/example schema: sql:| create table users ( id int not null auto_increment, name varchar(255) not null, email varchar(255) unique not null, short_bio varchar(255) not null, primary key (id) );
When the AtlasSchema resource is applied to the cluster, the Atlas Operator will calculate the diff between the
database at url and the desired schema, and generate the necessary migrations to bring the database to the desired
state.
Whenever the AtlasSchema resource is updated, the Operator will recalculate the diff and apply the necessary changes
to the database.
The second resource type is AtlasMigration which is used to employ the versioned flow. With AtlasMigration, you
define the exact migration that you want to apply to the database. The Operator is then responsible for applying any
necessary migrations to bring the database schema to the desired state.
Here is an example of an AtlasMigration resource:
apiVersion: db.atlasgo.io/v1alpha1 kind: AtlasMigration metadata: name: atlasmigration-sample spec: url: mysql://root:pass@mysql:3306/example dir: configMapRef: name:"migration-dir"# Ref to a ConfigMap containing the migration files
When the AtlasMigration resource is applied to the cluster, the Atlas Operator will apply the migrations in the
directory specified in the dir field to the database at url. Similarly to classic migration tools, Atlas uses
a metadata table on the target database to track which migrations have been applied.
The Atlas Operator is designed to handle rollbacks in a GitOps-friendly way. This is where the power of the Operator
Pattern really shines as it can make nuanced and intelligent decisions about how to handle changes to the managed
resources.
To roll back a schema change in an ArgoCD-managed environment, you would simply revert the AtlasSchema or
AtlasMigration resource to a previous version. The Atlas Operator would then analyze the changes and generate the
necessary migrations to bring the database schema back to the desired state.
In the discussion above we kept talking about edge cases that arise when rolling back database schema changes, and
concluded that they require manual consideration and intervention. What if we could automate this process?
The Operator Pattern is all about codifying operational knowledge into software. Let's consider how the Operator Pattern
can be used to address the challenges we discussed:
Understanding intent. The Operator can discern between up and down migrations. By comparing between the current
state of the database and the desired version, the operator decides whether to go up or down.
Having access to the necessary information. Contrary to a Job that only has access to the image it was built
with, the Operator stores metadata about the last execution as a ConfigMap via the Kubernetes API. This metadata
enables the operator to migrate down even though the current image does not information about the current state.
Intelligent Diffing. Because the Operator is built on top of Atlas's Schema-as-Code engine, it can calculate
correct migrations even if the database is in an inconsistent state.
Safety checks. The Operator can analyze the migration and determine whether it is safe to apply. This is a
critical feature that can prevent risky migrations from being applied. Depending on your policy, it can even
require manual approval for specific types of changes!
In this talk, we explored the challenges of rolling back database schema changes in a GitOps environment. We discussed
the limitations of the traditional up/down migration approach, and how the Operator Pattern can be used to build a
more robust and automated solution.
If you have any questions or would like to learn more, please don't hesitate to reach out to us on our
Discord server.
1: Forsgren, Nicole, Jez Humble, and Gene Kim. Accelerate: The Science of Lean Software and
DevOps. IT Revolution Press, 2018.
Get the latest Atlas tips and updates in our newsletter.
Today, we're excited to release the new schema plan command, which many of you have been eagerly awaiting.
Taking the declarative workflow to the next level, the schema plan command lets you review, analyze and even edit
declarative migration plans at pull-request stage, making schema apply much safer and predictable. Additionally,
several new features have been added to Atlas in this release, and we'll cover them in this blog post as well.
For those visiting us for the first time, Atlas is a language-agnostic tool for managing and migrating database schemas
using modern DevOps principles. Users define their desired database schema state declaratively, and Atlas handles the rest.
The "state" can be defined using SQL, HCL (Atlas flavor), your preferred ORM, another database, or a combination of all.
To get started, visit the getting-started doc.
Since the first release, Atlas supports declarative migrations. Using the schema apply command, users provide the desired schema,
and a URL (connection string) to the target database, and Atlas computes the migration plan, and applies it to the database
after the user approves it. This workflow is very similar to Terraform, but for databases schemas.
Although the declarative workflow feels magical, and works well for most cases, it had some inherent limitations:
Since changes are computed at runtime, reviews also happen at runtime, either by policy (explained below) or manually.
This creates a less predictable and streamlined deployment process compared to applications development, where code reviews
occur during the pull request (PR) stage. Since Atlas promotes the "Schema as Code" approach, we aim to bring the same
experience to database schema changes.
Another limitation of this workflow is that users can define the desired state but have no control on the exact steps to
reach it. Although Atlas provides a set of diff policies to fine-tune migration planning, users sometimes need more
control over how the migrations are applied.
Data changes, like back-filling columns with custom UPDATE statements, are difficult to express declaratively.
Fortunately, since Atlas provides also a versioned workflow, companies faced these limitations have been able to fall back
to it. While versioned migration has its own limitations (like history linearity), it still works well for most
cases. Combined with Atlas's automatic migration planning, the overall experience is closely to the declarative migration,
but not the same.
We believe that declarative migration is the future for most cases. It lets engineers focus on feature development, not
migrations. Additionally, this workflow allows schema transitions between any states, generating the most efficient plan,
unlike versioned migration, which relies on a linear history of changes.
We address these limitations by introducing the schema plan command. Let's dive in.
The atlas schema plan command allows users to pre-plan, review, and approve declarative migrations before executing them
on the database. It lets users modify the SQL migration plan (if necessary), involve team members in the review, and
ensure the approval is done at development stage, and no human intervention is needed during deployment (atlas schema apply)
stage.
How does it work? Users modify their schema code (e.g., ORM models, SQL or HCL) and open a PR with the changes. Then,
Atlas computes the migration plan, runs analysis, and simulates it on a dev-database. Lastly,
it comments on the PR with the results:
Plan Generated by atlas schema plan
Once the PR is approved and merged, the plan is saved in the Atlas Registry in a "ready to be applied" state.
During deployment (schema apply), Atlas checks for any pre-planned migration for the given schema transition
(State1 -> State2) and uses it if available, otherwise falling back to other approval policies.
This process can also be done locally, allowing users to plan and approve locally, then apply remotely.
If you follow our blog, you know we love practical examples. To maintain this tradition and demonstrate the new command,
let’s dive into an example.
Before running atlas schema plan, let's ensure that a schema repository named app exists in Atlas Registry and there
is a database containing the previous schema state (before our changes):
Schema Definition
Config File
schema.sql
CREATETABLE users ( id INTEGERPRIMARYKEY AUTOINCREMENT, name TEXT );
atlas.hcl
env "local"{ # URL to the target database. url="sqlite://main.db" # URL to the dev-database. dev="sqlite://dev?mode=memory" schema{ # Desired schema state. src="file://schema.sql" # Atlas Registry config. repo{ name="app" } } }
We run atlas schema push to create the schema in Atlas Registry:
Suppose we want to add a non-nullable email column to the users table. Let's update the schema.sql file and then run
atlas schema plan to generate a migration plan.
schema.sql
CREATETABLE users ( id INTEGERPRIMARYKEY AUTOINCREMENT, name TEXT, email TEXTNOTNULL );
We run atlas schema plan to generate a migration plan for adding the email column to the users table:
atlas schema plan --envlocal
The output looks like this:
Planning migration from local database to file://schema.sql (1 statement in total): -- add column "email" to table: "users": -> ALTER TABLE `users` ADD COLUMN `email` text NOT NULL; ------------------------------------------- Analyzing planned statements (1 in total): -- data dependent changes detected: -- L2: Adding a non-nullable "text" column "email" will fail in case table "users" is not empty https://atlasgo.io/lint/analyzers#MF103 -- ok (346.192µs) ------------------------- -- 5.038728ms -- 1 schema change -- 1 diagnostic ? Approve or abort the plan: ▸ Approve and push Abort
Atlas detects data-dependent changes in the migration plan and provides a diagnostic message. In this case, it warns
that adding the non-nullable email column, will fail if the users table is not empty. The recommended solution is to
provide a default value for the new column. Let's fix this by adding a default value to the email column and re-run the
atlas schema plan command.
schema.sql
CREATETABLE users ( id INTEGERPRIMARYKEY AUTOINCREMENT, name TEXT, email TEXTNOTNULLDEFAULT'unknown' );
Then, we run atlas schema plan again to generate a new migration plan, but this time, we approve it:
atlas schema plan --envlocal
Planning migration from local database to file://schema.sql (1 statement in total): -- add column "email" to table: "users": -> ALTER TABLE `users` ADD COLUMN `email` text NOT NULL DEFAULT 'unknown'; ------------------------------------------- Analyzing planned statements (1 in total): -- no diagnostics found ------------------------- -- 6.393773ms -- 1 schema change ? Approve or abort the plan: ▸ Approve and push Abort
Once approved, the migration plan will be pushed to the Atlas Registry, and can be applied using atlas schema apply.
At this stage, we can run atlas schema apply to apply the changes to the database, on any environment, without
re-calculating the SQL changes at runtime or requiring human intervention.
Applying approved migration using pre-planned file 20240923085308 (1 statement in total): -- add column "email" to table: "users" -> ALTER TABLE `users` ADD COLUMN `email` text NOT NULL DEFAULT 'unknown'; -- ok (749.815µs) ------------------------- -- 802.902µs -- 1 migration -- 1 sql statement
Starting with this release, Atlas Registry supports the declarative workflow. It allows you
to store, version, and maintain a single source of truth for your database schemas and their migration plans.
It is similar to DockerHub, but for your schemas and migrations. In addition to functioning as storage and Atlas state
management, it is schema-aware and provides extra capabilities such as ER diagrams, SQL diffing, schema docs, and more.
In addition to the schema plan command, we have added several new features and improvements to Atlas. Here are some highlights:
Users running atlas schema apply with a Pro license will now receive a detailed migration linting report and can control
the approval based on it. Read more about the Review and Approval Policies.
The schema apply command now supports the --edit flag, allowing users to safely edit the migration plan before applying
it. Note that if your manual changes are not in sync with the desired state, Atlas will detect schema drift and reject the
changes.
The GitHub Action and gh extension for
Atlas have been updated to support the new declarative workflow.
That's all for this release! But, we are already working on several features and improvements in the pipeline. To be
transparent with our community, here is a look at what's coming next:
Partition support for the PostgreSQL driver.
CircleCI, GitLab CI, Kubernetes Operator, and Terraform Provider will support the new declarative workflow.
A new schema lint command, allowing users to lint their schemas with built-in and custom analyzers.
A Prisma provider for Atlas, enabling Prisma users to import their Prisma schema into Atlas schema state.
We hope you enjoy the new features and improvements. As always, we would love to hear your
feedback and suggestions on our Discord server.
Get the latest Atlas tips and updates in our newsletter.