Skip to main content

14 posts tagged with "migrations"

View All Tags

Deep Dive into Declarative Migrations

· 15 min read
Rotem Tamir
Building Atlas

Prepared for an Atlas Community Webinar, October 2024

Introduction

In recent years, the shift to declarative resource management has transformed modern infrastructure practices. Groundbreaking projects like Terraform, for infrastructure as code, and Kubernetes, for container orchestration, have exemplified the power of this approach. By focusing on what the end state should be rather than how to achieve it, declarative methods make systems more scalable, predictable, and easier to maintain—essential qualities for handling today's complex environments.

However, when it comes to managing database schemas, the industry has been slow to adopt declarative workflows. Atlas was created almost four years ago to address this gap.

Atlas supports two kinds of database schema migration workflows:

  • Versioned Migrations - each change to the database is described as a migration script, essentially a SQL file containing the SQL commands to apply the change. Migrations are versioned and applied in order.

    Contrary to most existing migration tools, Atlas relies on users defining the desired state of the database schema in code Atlas generates the necessary migration scripts to bring the database to the desired state.

  • Declarative Migrations - the database schema is described in a declarative way, and changes are applied by comparing the desired schema with the current schema and generating the necessary SQL commands to bring the database to the desired state.

To date, most teams that used Atlas in production have used it's versioned migration workflow which synthesizes the simplicity and explicitness of classic migration tools with the benefit of automated migration generation.

Recent improvements to Atlas have addressed many of the challenges and concerns teams have expressed around using declarative migrations in production in the past. In this post, we'll take a deep dive into the declarative migration workflow

Declarative Migrations in Production?

Declarative migrations are a powerful concept, and it may surprise you, but, they are not new. In previous incarnations, they were often referred to as "state based" migrations, but they were never regarded as a production-grade solution. A quick look around the documentation of popular tools will reveal that they are often discouraged:

Hibernate ORM:

Although the automatic schema generation is very useful for testing and prototyping purposes, in a production environment, it’s much more flexible to manage the schema using incremental migration scripts.

EF Core Docs:

EnsureCreated will create the database if it doesn't exist and initialize the database schema. If any tables exist (including tables for another DbContext class), the schema won't be initialized.

Liquibase:

State-based deployments offer a quick, simplistic way to determine change scripts and are useful for analysis and drift detection. Due to its simplicity, though, it’s unreliable and risky to use as a primary change management approach.

Challenges of Declarative Migrations

Having interviewed many engineers and architects about their experience with schema management, we've identified several concerns that teams have with declarative migrations:

Approval Process

Versioned migrations serve a dual purpose, they are both an explicit set of instructions for the migration tool and a source-controlled artifact that can be examined and approved by human reviewers.

For many teams, the idea of only versioning the desired state of the database schema and not the actual migration scripts means forfeiting their main mechanism for review and approval of schema changes. This is a valid concern, and any production-grade declarative migration tool should provide a way to review and approve changes before they are applied.

Safety

Schema changes are a risky business, if you are not careful you can easily find yourself in a situation where you've accidentally deleted data, locked a table for writes for a long period of time or introduced a production outage via a breaking schema change.

By definition, migrations generated by declarative workflows are non-deterministic, as it depend as they depend on the current state of the database. This also one of their main advantages, as they can handle unexpected circumstances and reconcile drift automatically. However, many teams are uncomfortable with the idea of not knowing exactly what will be applied to their database.

As one user once wrote us on HackerNews:

I'll never be comfortable with any tool for that automatically generates schema changes, as I'm just never sure at what point it decides to delete parts of my prod db.

Customization

Another common concern is the lack of control over the generated migrations. Migration planning can be thought of as a navigation problem, how to get from point A to point B. Declarative migrations are like using a GPS, you tell it where you want to go, and it figures out the best route. But what if you want to travel through some specific coordinates?

When it comes to database schema management, there may be multiple ways to get from the current to the desired state, some of them completely undesirable. For instance, a simple column rename can also be achieved by dropping the column and creating a new one (thereby losing all the data in the column) or with PostgreSQL index creation may lock a table if not used with the CONCURRENTLY option.

Tools like Kubernetes and Terraform don't let the user specify the exact steps to get to the desired state, or require the development of custom plugins (such as Terraform Providers or Kubernetes Controllers) to achieve this. For database schema changes, this level of customization is often necessary.

Production-grade Declarative Migrations

Atlas was designed with these concerns in mind and in the next section, we'll explore how Atlas addresses these concerns to provide teams with a production-grade declarative migration workflow.

Pre-planning Migrations

To address users' need for knowing the exact migrations that will be applied to their database, Atlas provides the schema plan command. Users may run a command similar to:

atlas schema plan --env local --save

Atlas will calculate the plan, analyze it and present the user with a detailed report of the migrations that will be applied and their potential impact on the database:

Planning migration from local database to file://./schema.pg.hcl (1 statement in total):

-- create "t1" table:
-> CREATE TABLE "t1" (
"c1" text NOT NULL
);

-------------------------------------------

Analyzing planned statements (1 in total):

-- no diagnostics found

-------------------------
-- 19.389709ms
-- 1 schema change
? Approve or abort the plan:
▸ Approve and save
Abort

After saving, a "plan file" file is created:

plan "20241029120559" {
from = "PKrr2qiovsNurI70kgT+AW3rInDu74E1PSOSHgh8CrA="
to = "Szdpl/ADvG8kKrXBERSxjvU/I1nprpBejpPX7fLoWmc="
migration = <<-SQL
-- Create "t1" table
CREATE TABLE "t1" (
"c1" text NOT NULL
);
SQL
}

The plan block contains three relevant fields:

  • from - A "fingerprint" hash which encodes the "current" state of the database.
  • to - A similar hash for the "desired" state of the database.
  • migration - The SQL commands that will be applied to the database to bring it from the from state to the to state.

The from and to fields are significant because they allow Atlas to guarantee that the plan will only be executed if the database is in the known from state. This is a crucial safety feature that enables the deterministic execution of the plan.

The plan file can then be applied to the database using the schema apply command:

atlas schema apply --env local --plan file://20241029120559.plan.hcl

Customization

As we previously mentioned, there are often multiple ways to get from the current to the desired state of the database. To provide users with the ability to customize the generated migrations, Atlas provides two ways to control the process.

Diff Policies

Diff policies can be defined on the project level to control how Atlas generates migrations. For instance, a project may define a policy that indexes must be created or dropped concurrently:

env "local" {
diff {
// By default, indexes are not added or dropped concurrently.
concurrent_index {
add = true
drop = true
}
}
}

In some cases, users may want to tell Atlas to skip DROP operations completely:

env "local" {
diff {
skip {
drop_schema = true
drop_table = true
}
}
}
Editing Plans

In some cases, users may want to edit the generated plan before applying it. By using the --edit flag on the schema plan command, Atlas will open the plan in the user's default editor for review and modification:

When the user runs:

atlas schema plan --env local --edit --save

Atlas will calculate the plan, and open it in the user's default editor:

-- Create "t1" table
CREATE TABLE "t1" (
"c1" text NOT NULL
);

The user may edit the plan as needed, adding or removing statements, and then save the plan.

-- Create "t1" table
CREATE TABLE "t1" (
"c1" text NOT NULL
);
+ INSERT INTO "t1" ("c1") VALUES ('hello');

Atlas verifies that the edited plan is still valid and that the resulting schema is equivalent to the desired schema.

Suppose a user makes a change that is not in line with the desired schema, for instance, adding a column that is not in the desired schema:

-- Create "t1" table
CREATE TABLE "t1" (
"c1" text NOT NULL,
+ "c2" text NOT NULL
);

Atlas will detect the drift and present the user with an error message:

Abort: the planned state does not match the desired state after applying the file:

--- planned state
+++ desired state
@@ -1,3 +1,4 @@
CREATE TABLE "t1" (
- "c1" text NOT NULL
+ "c1" text NOT NULL,
+ "c2" text NOT NULL
);

Safety Checks

In addition to its advanced diffing capabilities, Atlas contains a migration analysis engine that can simulate changes and detect potential issues before they occur. Until recently, this engine was only available to users of the versioned migration workflow via the migrate lint command. However, in the latest release, we've included the analysis step in the schema plan and schema apply commands for declarative migrations.

When a user runs the schema plan command, Atlas will analyze the plan and present the user with a detailed report of the potential impact of the migrations.

For instance, if a user tries to drop a column or table, Atlas will present a warning:


Planning migration from local database to file://./schema.pg.hcl (1 statement in total):

-- drop "t1" table:
-> DROP TABLE "t1";

-------------------------------------------
Analyzing planned statements (1 in total):

-- destructive changes detected:
-- L2: Dropping table "t1"
https://atlasgo.io/lint/analyzers#DS102
-- suggested fix:
-> Add a pre-migration check to ensure table "t1" is empty before dropping it

-------------------------
-- 16.281417ms
-- 1 schema change
-- 1 diagnostic

? Approve or abort the plan:
▸ Approve and save
Abort

For a full list of diagnostics and suggested fixes, users can visit the Atlas Lint Analyzers page.

Modern CI/CD Workflows with Declartive Migrations

Atlas was designed to be used in modern CI/CD workflows, and the declarative migration workflow is no exception. In the final section of this post, we'll explore how teams can integrate Atlas into their CI/CD pipelines to ensure that:

  1. All changes to the database schema result from changes to the desired state.
  2. Changes are planned automatically during the CI process.
  3. Changes are analyzed automatically to detect potential issues.
  4. Changes are approved by the team before being applied to the database.
  5. Changes are applied to the database in a deterministic and safe manner.

Demo: CI/CD Workflow with Atlas

Here's a quick demo video of how to integrate Atlas into a CI/CD pipeline using GitHub Actions:

.

To summarize, our demo CI/CD workflow works as follows:

  1. The user changes the desired state of the database, in our case a file named schema.sql.
  2. The user commits the changes to the schema.sql file and creates a pull request.
  3. GitHub Actions runs the atlas schema plan command to calculate the migration plan.
  4. The Action creates a comment on the pull request with the following information:
    • The migration plan.
    • An analysis of the changes, including any potential issues.
    • Instructions for manually editing the plan if needed.
  5. The user reviews the plan and approves the PR.
  6. Once the PR is merged, GitHub Actions run another action that pushes the new state and plan to the Schema Registry.
  7. Once the desired state and plan are in the Schema Registry, the user can deploy the changes to the database using the atlas schema apply command.

A word about the Schema Registry

The Schema Registry is a fairly new concept introduced by Atlas. The registry is a central location where schemas and migration plans can be stored and shared among team members. You can think about it as a "DockerHub for Schema Management" or an "Artifact Repository for Database Schemas".

As we saw, it is possible to simply use files to store the desired state and migration plans, but the Schema Registry provides several advantages:

  • Source of Truth - the registry can be used as the source of truth for the database schema, ensuring that all team members are working with the same schema and that all changes are tracked and go through a structured review process. Similar to how teams use DockerHub to store and share container images, making sure that only code that went through a team's CI/CD pipeline makes it's way to production.
  • Streamlined Deployment - as we will see below, the registry isn't a static blob storage, but a dynamic service that can be queried and interacted with. This allows the Atlas CLI to only fetch relevant and approved migration plans, during deployment, enforcing the deterministic and safe execution of the migrations.
  • Visualizations and Insights - the registry can be used to generate visualizations of the schema, track changes over time, and provide insights into the state of the database. This can be useful for auditing, debugging, and planning future changes.

How everything fits together

As we saw in the demo, the Schema Registry is a central component of the CI/CD workflow. It acts as the source of truth for the database schema and migration plans, ensuring that all changes are tracked and reviewed before being applied to the database.

Here's what happens when a user runs the atlas schema apply command:

  1. Atlas inspects the target database and fetches the current state of the schema. Atlas assigns a "fingerprint hash".
  2. Atlas analyzes the desired state of the database and calculates the hash for that state.
  3. Atlas uses this tuple of (from, to) hashes to search the Schema Registry for a matching migration plan.
  4. If a plan is found, Atlas verifies that it is approved and safe to apply.
  5. If the plan is approved, Atlas applies the plan to the database.

What happens if an approved plan is not found?

By default, Atlas will not apply any changes to the database if an approved plan is not found. This is a safety feature that ensures that only changes that have been reviewed and approved by the team are applied to the database.

However, in some cases, users may want to apply changes to the database that have not been approved. After all, most schema changes are simple and additive and don't pose any risk to the database. Additionally, some environments are mostly used for testing and development and don't require the same level of scrutiny as production.

In cases where no approved plan is found, Atlas will calculate the plan and run it through analysis ("linting").

Atlas provides a mechanism called "Review Policies" to define what should happen based on the linting results.

Review policies are defined in the project configuration file using the lint.review attribute:

lint {
review = ERROR // ERROR | ALWAYS
}

The review policy can be set to one of the following values:

  1. ERROR - Atlas will require manual review and approval only if the linting report contains errors, i.e., one of the analyzers is configured to return an error. See the destructive-changes analyzer as an example.
  2. WARNING - Atlas will require manual review and approval if the linting report contains warnings (diagnostics) or errors. See the list of checks that can be detected by the analyzers.
  3. ALWAYS (default) - Atlas will always require manual review and approval, regardless of the linting report.

Wrapping Up

In this post, we've taken a deep dive into the declarative migration workflow in Atlas. Here's what we covered:

  1. Until recently, declarative migrations were considered a risky and unreliable way to manage database schemas.
  2. The top challenges teams have with declarative migrations are around approval, safety, and customization.
  3. Atlas addresses these concerns by providing a way to review and approve changes, analyze and simulate changes, and customize the generated migrations.
  4. Atlas can be integrated into modern CI/CD workflows to ensure that all changes to the database schema are tracked, reviewed, and applied in a deterministic and safe manner.
  5. The Schema Registry is a central component of the CI/CD workflow, acting as the source of truth for the database schema and migration plans.
  6. Atlas provides a mechanism called "Review Policies" to define what should happen based on the linting results.

We hope this post has given you a better understanding of how Atlas can be used to manage database schemas in a declarative way. If you have any questions or would like to learn more, please don't hesitate to reach out to us on our Discord server.

How Conceal.IO Manages 1,500+ Redshift Schemas Using Atlas

· 4 min read
Rotem Tamir
Building Atlas

"Everything on Atlas is just making too much sense for us."
— Kaushik Shanadi, Chief Architect

Conceal, a cybersecurity company, creates a secure browsing experience using a browser extension. With a lean engineering team, When Conceal shifted from serving individual consumers to working with managed service providers (MSPs), their clients' security requirements drove the need for a robust, multi-tenant architecture to ensure data isolation and scalability.

Kaushik Shanadi, VP and Chief Architect, led the charge in finding that solution.

Database per tenant testimonial redshift

Evaluating Alternative Solutions

To meet the growing business demands and provide sufficient isolation for each customer’s data in a scalable and secure manner, the team considered three alternatives:

  1. Keep a Single Database, Isolating on the Application Layer: This option was quickly dismissed because of the team's negative experience with this solution.

    Like many others, they found that adding a tenant_id column to every table and hoping that developers remember to filter by it on every query was a burden and risk they were not willing to take. Additionally, this was not acceptable to some of their customers, who required strict data isolation.

  2. Database-per-Tenant Approach: While this approach ensured both data isolation and scalability, the cost of maintaining a Redshift cluster for each customer made this alternative prohibitive.

  3. Hybrid Solution (Schema-per-Tenant): Ultimately, they chose a schema-per-tenant model, which kept the data isolated and secure without the high cloud costs. This approach also offered the flexibility to switch specific customers to their own isolated database if needed.

"It was way too easy to do that with Atlas compared to any other method," Kaushik remarked.

info

For a deeper analysis of the different multi-tenant architectures, read our blogpost and guide on the topic.

The Challenge of Schema-per-Tenant Architecture

Schema-per-tenant architectures present unique challenges, primarily around managing database schema migrations:

  • Migration duration scales linearly with tenant count.
  • Detecting inconsistencies becomes a needle in a haystack problem.
  • Rollbacks are difficult to orchestrate.

Atlas overcomes these challenges with its declarative schema-as-code approach. By automating migration planning, Atlas ensures that every schema remains consistent, aligned, and easy to manage.

Read more about the challenges and how Atlas solves them here

Safety First: Managing 1,500+ Schemas with 7 Engineers Using One Tool

Implementation

According to Kaushik, the implementation process was easy and smooth. Amazon Redshift was a requirement for both long-term storage and machine learning (ML) training data. "Migrating with old migration tools is a nightmare," said Kaushik. After discovering that Atlas supports Redshift, he ran a few POCs locally to test Atlas.

"I was able to get everything working and saw how fast the migration process was, so we pushed it to development," he explained.

Outcome

By leveraging Atlas' declarative Schema-as-Code capabilities and its ability to manage schema per tenant architectures, Conceal.IO successfully manages a fleet of over 1,500 identical Redshift schemas, isolating customers' data from one another as required. This unified approach ensures that all schemas are designed consistently, aligned, and kept in the same state. This alignment has resulted in:

  • Faster feature deployment across all customers
  • Improved analytics and ML preparation
  • Faster onboarding for new engineers
  • The ability to add more customers as needed without worrying about deployments becoming slower as they grow
  • Fixed operational costs, ensuring scalability without an exponential rise in expenses

All of this was achieved in a safe and secure environment.

"Having a lot of database schemas but only one tool to manage them all makes our lives so much easier", Kaushik added.

Next Steps

The Conceal.IO team plans to use the upcoming holiday season to migrate their production workloads running on PostgreSQL into Atlas as well. Tired of dealing with constant migration issues, Kaushik is confident that “using Atlas will make it so much easier.”

"Support has been awesome, and the speed has been incredible too," summarized Kaushik. "Everything I need from an enterprise solution is available out of the box. It was almost too easy to say — let’s just use Atlas."

Getting Started

Atlas applies the declarative mindset to database schema management, like Terraform, but for databases. Using its unique, schema-as-code approach, teams can automatically inspect existing databases and get started in no time.

Like Conceal.IO, we recommend anyone looking for a schema migration solution to get started with Atlas by trying it out on one or two small projects. Read the documentation, join our Discord community for support, and start managing your schemas as code.

Strategies for Reliable Schema Migrations

· 12 min read
Rotem Tamir
Building Atlas

Adapted from a talk given at Kube Native 2024:

Introduction

Database schema migrations are a critical part of the software development lifecycle. They allow us to evolve our data model as our application grows and changes. However, migrations can also be a significant source of risk and downtime if not handled carefully.

Despite teams applying all the common best practices, such as using a migration tool to automate changes and carefully examining each migration before approving it during code review, issues still slip through the cracks, making migrations a source of significant headaches for many engineering teams.

This post explores five strategies for making database schema migrations more reliable by introducing Atlas, a database schema-as-code tool that helps teams make their database schema changes both safer and easier to manage.

Strategies for Reliable Schema Migrations with atlas

Prerequisite: Automate

Having interviewed over a hundred engineering teams about their database migration practices, we found that a surprisingly large number of teams perform database migrations manually. This involves running SQL statements via scripts or a point-and-click interface directly against the database, often without proper testing or version control.

Manual migrations are error-prone and difficult to reproduce, leading to inconsistencies between environments and increased risk of downtime. Additionally, manual changes to the database are amongst the most stressful and dreaded tasks for developers, as they can lead to nasty outages if not done correctly.

Much has been written about the importance of automating database migrations, but it's worth reiterating here. Even if you do nothing else, please automate your schema changes!

Strategy 1: Schema-as-Code

Classic database migration tools like Flyway or Liquibase are great for automating the process of executing SQL scripts against a database. However, despite being categorized as "automation tools," they still require plenty of manual work to write, test and review the SQL scripts that define the schema changes.

Atlas takes a different approach by treating the database schema as code. Instead of manually defining the schema changes in SQL scripts, developers define the desired state of the database schema in a declarative format using code and let the tool handle the rest.

For instance, a developer may define the following schema in Atlas:

schema.sql
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);

Next, to apply this schema to a database, the developer runs the following command:

atlas schema apply --env local --to file://schema.sql

Atlas will connect to the database, inspect its current schema, calculate the diff between the current and desired state, and propose a migration plan to bring the database to the desired state. The developer can then review and approve the migration before applying it.

By automating the process of defining and applying schema changes, Atlas makes it easier to manage database migrations and reduces the risk of human error.

Strategy 2: Test DB logic like any other code

Modern databases are a lot more than just containers for data. They also embody complex business logic in the form of constraints, triggers, stored procedures, and functions.

To make sure that these database objects work as expected, and keep doing so after changes are made, it's important to test them like any other code.

Atlas provides a testing framework that allows developers to write tests with a simple syntax. For example, to test a function that returns true if the input is positive and false otherwise, a developer might write the following test:

schema.test.hcl
test "schema" "positive_func" {
parallel = true
assert {
sql = "SELECT positive(1)"
}
log {
message = "First assertion passed"
}
assert {
sql = <<SQL
SELECT NOT positive(0);
SELECT NOT positive(-1);
SQL
}
log {
message = "Second assertion passed"
}
}

By treating database objects as code and writing tests for them, developers can catch make sure that their database code works reliably and consistently and prevent regressions when making changes.

Strategy 3: Test data migrations

Most commonly, migrations deal with schema changes, such as adding or removing columns, creating tables, or altering constraints. However, as your application evolves, you may need to add or refactor data within the database, which is where data migrations come in. For instance, you may need to seed data in a table, backfill data for existing records in new columns, or somehow transform existing data to accommodate changes in your application.

Data migrations can be especially tricky to get right, and mistakes can be problematic and irreversible. For this reason testing data migrations is crucial. Testing data migrations typically involves the following steps:

  1. Setting up an empty database.
  2. Applying migrations up to the one before the test.
  3. Seeding test data.
  4. Running the migration under test.
  5. Making assertions to verify the results.

This process can be cumbersome to set up and buggy as it often involves writing an ad-hoc program to automate the steps mentioned above or manually testing the migration.

Atlas's migrate test command simplifies this by allowing you to define test cases in a concise syntax and acts as a harness to run these tests during local development and in CI.

Using Atlas, developers can write tests for data migrations in a simple format, making it easier to catch issues early and ensure that data migrations work as expected. For example, to test a data migration that backfills a new column with values from an existing column, a developer might write the following test:

migrate.test.hcl
test "migrate" "check_latest_post" {
migrate {
to = "20240807192632"
}
exec {
sql = <<-SQL
INSERT INTO users (id, email) VALUES (1, 'user1@example.com'), (2, 'user2@example.com');
INSERT INTO posts (id, title, created_at, user_id) VALUES (1, 'My First Post', '2024-01-23 00:51:54', 1), (2, 'Another Interesting Post', '2024-02-24 02:14:09', 2);
SQL
}
migrate {
to = "20240807192934"
}
exec {
sql = "select * from users"
format = table
output = <<TAB
id | email | latest_post_ts
----+-------------------+---------------------
1 | user1@example.com | 2024-01-23 00:51:54
2 | user2@example.com | 2024-02-24 02:14:09
TAB
}
log {
message = "Data migrated successfully"
}
}

In this test, the developer migrates the database to a specific version, seeds test data, runs the migration under test, and verifies the results. By automating this process, Atlas makes it easier to test data migrations and catch issues early.

Strategy 4: Automate Safety Checks

Even with the best intentions, mistakes can still happen during the development and review of migrations leading to downtime and data loss. To mitigate this risk, it's important to automate safety checks that catch common mistakes before applying the migration to the database.

Before we dive into strategies for reliable migrations, let's take a look at some common ways migrations can go wrong.

Destructive Changes

mysql> select * from dropped;

ERROR 1146 (42S02): Table 'default.dropped' doesn't exist

Migrations often involve DROP DDL statements that can lead to data loss if executed against a table or column that still contains valuable data. Unfortunately, modern databases do not have a built-in undo button, so once a destructive change is applied, it can be challenging (if not impossible) to recover the lost data.

This might sound like an obvious mistake to avoid, but it's surprisingly common in practice. For example, consider this published incident report from Resend, which states:

On February 21st, 2024, Resend experienced an outage that affected all users due to a database migration that went wrong. This prevented users from using the API (including sending emails) and accessing the dashboard from 05:01 to 17:05 UTC (about 12 hours).

The database migration accidentally deleted data from production servers. We immediately began the restoration process from a backup, which completed 6 hours later.

Constraint Violations

mysql> alter table candy ADD UNIQUE (name);

ERROR 1062 (23000): Duplicate entry 'm&m' for key 'candy.name'

Migrations that involve adding or modifying constraints can fail if the existing data does not meet the new constraints. For example, adding a NOT NULL constraint to a column that already contains NULL values will cause the migration to fail.

What makes this even more confusing is that such migrations will often succeed in a development or testing environment where the data is different from production. This can lead to a false sense of confidence that the migration is safe to apply in production.

Breaking Changes

mysql> select id, renamed_column from candy limit 1;

ERROR 1054 (42S22): Unknown column 'renamed_column' in 'field list'

A popular topic amongst data teams today is "data contracts" - the (sometimes implicit) agreement between the applications and their downstream consumers about the structure and semantics of the data. This is often mentioned in the context of data engineering teams building data pipelines, but the same concerns apply to the interface between your application backend and the database.

When a migration changes the structure of the database, it can break the contract between the application and the database, leading to runtime errors and potentially to data corruption. For example, renaming a column that is used by the application will cause queries to fail, leading to downtime and unhappy users.

Table locks

mysql> INSERT INTO candy (name, kind) VALUES ('kif-kef', 'chocolate');

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Migrations that involve large tables or complex operations can cause table locks that prevent other queries from executing. This can lead to timeouts, deadlocks, and other performance issues that affect the availability of the application.

For instance, suppose your MySQL table has an enum column with a million rows, and you want to add a new value to the enum. If you add the new value, not at the end of the enum, but in the middle (for example from ('a', 'b', 'c') to ('a', 'b', 'd', 'c')), MySQL will lock the table for the duration of the migration, while it is rewriting the column on disk.

Preventing Risky Migrations

To prevent these common mistakes, Atlas provides a set of safety checks that run automatically before applying a migration. These checks analyze the migration and the database schema to identify potential issues and warn the developer before applying the migration.

Such checks can be run either locally during development or in CI before approving the migration and merging it into the main branch. By catching issues early, these safety checks help prevent downtime and data loss caused by risky migrations.

For instance, Atlas might warn the developer about a potentially destructive change like dropping a table:

atlas migrate lint --env local --latest 1

Might produce the following output:

Analyzing changes from version 20240929125035 to 20240929125127 (1 migration in total):

-- analyzing version 20240929125127
-- destructive changes detected:
-- L2: Dropping non-virtual column "email"
https://atlasgo.io/lint/analyzers#DS103
-- suggested fix:
-> Add a pre-migration check to ensure column "email" is NULL before dropping it
-- ok (225.916µs)

-------------------------
-- 98.996916ms
-- 1 version with warnings
-- 1 schema change
-- 1 diagnostic

Being able to identify and fix these issues while working locally can save a lot of time and headaches down the road, but this is further amplified when running these checks in CI, where teams can ensure that no risky migrations are merged into the main branch.

Strategy 5: Pre-migration checks

As we mentioned above, safety checks can catch common mistakes before applying a migration, but they are not foolproof. Some changes depend on the state of the data in the database, which cannot be determined statically by analyzing the migration script.

For example, consider a migration that adds a NOT NULL constraint to a column that already contains NULL values. A safety check can warn the developer about this potential issue, but it cannot guarantee that the migration will succeed in all cases. Similarly, a migration that drops a column might be safe and reversible if the column is empty, but risky if it contains valuable data.

To handle these cases, Atlas provides a mechanism for defining pre-migration checks that run before applying the migration. These checks can analyze the state of the database and the data to determine if the migration is safe to apply. In case of an issue, the check can prevent the migration from running and provide guidance on how to resolve the issue.

For instance, the code below defines a pre-migration check that ensures the table users is empty before dropping it:

20240201131900_drop_users.sql
-- atlas:txtar

-- checks.sql --
-- The assertion below must be evaluated to true. Hence, table users must be empty.
SELECT NOT EXISTS(SELECT * FROM users);

-- migration.sql --
-- The statement below will be executed only if the assertion above is evaluated to true.
DROP TABLE users;

Summary

Database schema migrations are a critical part of the software development lifecycle, but they can also be a significant source of risk and downtime if not handled carefully. By following the strategies outlined in this post and using tools like Atlas, teams can make their database schema migrations more reliable and reduce the risk of downtime and data loss.

To summarize, here are the five strategies for reliable schema migrations:

StrategyImpact
Schema-as-codeAutomated migration planning
Test schema logicPrevent regressions in DB logic
Test data migrationsPrevent data corruption
Automated quality checksCatch risky changes in Dev/CI
Pre-migration checksVerify data dependent changes

We hope you enjoyed this post and found it useful. As always, we welcome your feedback and suggestions on our Discord server.

What's missing in EF Core Migrations? Announcing Atlas v0.26

· 6 min read
Rotem Tamir
Building Atlas

Hi everyone,

It's been about a month since our last release, and we're excited to announce that Atlas v0.26 is now available! In this release we are happy to introduce a new feature that has been requested by many of you: support for Entity Framework Core. As part of our ever going effort to improve the quality and coverage of our documentation, we have published a set of guides on testing database schemas and migrations as well as a new GORM Portal.

Additionally, we have published an official "Supported Version Policy" and made some changes to our EULA, described below.

To download and install the latest release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | sh

What's missing in EF Core Migrations?

EF Core is the most popular ORM used in the .NET community, supported by Microsoft. EF Core allows users to manage their database schemas using its migrations. EF Core's migrations have long been a popular and reliable choice for managing database schema changes in the C# ecosystem.

However, EF Core migrations have lacks some capabilities can make them difficult to work with:

  1. Support for advanced database features. Like many ORMs, EF Core is designed to be database-agnostic, which means it does not support all the features of every database it can connect to. This can make it difficult to use and manage database features such as triggers, stored procedures, Row-level security and custom data types.
  2. Testing migrations. Migrations are typically considered the most risky part of a deployment. Therefore, automatically verifying they are safe and correct is paramount. Like most ORMs, EF Core does not provide a way to automatically test migrations.
  3. Production Grade Declarative Flow. EF Core supports a very basic declarative flow name EnsureCreated that can be used to create the database without specifying migrations. However, as the documentation warns, this method should not be used in production. For teams that want to adapt a "Terraform-for-databases" approach, this can be a blocker.
  4. Integration with modern CI/CD pipelines. EF Core migrations are typically run using the dotnet ef command line tool. Migrations should be integrated into the software delivery pipeline to ensure that the database schema is always in sync with the application code. This can be difficult to achieve with EF Core migrations.

Atlas and EF Core Migrations

Atlas is a database schema as code tool that allows developers to inspect, plan, test, and execute schema changes to their database. Atlas can be used to replace EF Core migrations with a more modern DevOps approach.

Comparing Atlas to EF Core migrations:

  • Loading Core Models. Similarly to EF Core migrations, Atlas can load the schema of an EF Core project. EF Core users can keep using the EF Core models as the source of truth for their database schema. Using the Atlas EF Core Provider, Atlas can load the schema of an EF Core project and use it as the source of truth for the database schema.

  • Composing schemas. Atlas can compose schemas from multiple sources, including EF Core models, SQL files, and external schema datasources. This enables users to natively declare schemas that layer advanced database features (such as views, triggers) as part of the schema source of truth which is not possible with EF Core.

  • Automatic planning. Similarly to EF Core migrations, with its "versioned migrations" workflow, Atlas can automatically plan schema migrations by diffing the data model with the migration directory.

  • Declarative flow. Atlas supports a declarative flow that can be used to create the database schema from scratch without using migrations. This is useful for teams that want to adapt a "Terraform-for-databases" approach.

  • Testing migrations. Atlas can automatically lint and test migrations to ensure they are safe and correct. Using this capability teams can reduce the risk of deploying migrations to production.

  • Integration with CI/CD pipelines. Atlas can be integrated into modern CI/CD pipelines using native integrations with popular CI/CD tools like GitHub Actions, CircleCI, GitLab CI, Terraform, Kubernetes, ArgoCD, and more.

Getting Started with Atlas + EF Core

As part of this version, we are happy to release the Atlas EF Core Provider.

To get started with Atlas and EF Core, head over to the Official Guide.

New GORM Portal

Over the past year, we have seen a significant increase in the number of users using GORM with Atlas. To better support GORM users in the Atlas ecosystem, we have launched a new documentation portal that provides guides, tutorials, and examples for using GORM with Atlas.

You can find the new GORM portal here.

Testing Database Schemas and Migrations

Since starting Atlas, it has been Ariel and my belief that accurate and thorough documentation is essential for the success of any software project targeted at developers. To that end, our team has been working hard to improve the quality and coverage of our documentation covering one of the most important aspects of working with Database Schema-as-Code: Testing.

As part of this set of guides you can find:

Supported Version Policy

To ensure the best performance, security and compatibility, the Atlas team will only support the three most recent minor versions of the CLI. For example, if the latest version is v0.26, the supported versions will be v0.25 and v0.24 (in addition to any patch releases and the "canary" release which is built twice a day).

As part of our this policy, binaries for versions that were published more than 6 months ago will be removed from the CDN and Docker Hub.

EULA Changes

The standard Atlas binary is provided under the Atlas EULA. We have recently made some changes to the EULA to reflect new data privacy considerations described in our CLI Data Privacy document. As part of these changes Atlas may collect anonymous telemetry (aggregated, anonymized, non-personal) data to help us improve the product. If you wish to opt-out of telemetry, you may set the ATLAS_NO_ANON_TELEMETRY environment variable to true.

The updated EULA can be found here.

Wrapping Up

That's all for this release! We hope you try out (and enjoy) all of these new features and find them useful. Stay tuned for our next release which is going to include some exciting new features around declarative flows and database schema observability.

As always, we would love to hear your feedback and suggestions on our Discord server.

Why Your Team Needs a Database Schema-as-Code Tool

· 11 min read
Rotem Tamir
Building Atlas

The Evolution of Database Schema Management

In today's world, where software is developed and deployed at a rapid pace, selecting the right tools for your team is crucial. The right tools can help you move faster, be more productive, and reduce the risk of errors and bugs. Most backend applications are backed by a database, and maintaining the database schema is a critical part of the development process.

Our industry has seen many trends in the way database schemas evolve over the years. In the early days, DBAs would manually run SQL DDL commands directly against the database through a terminal. Then came GUI based, point-and-click database management tools that included schema editing capabilities.

In the early 2000s, we saw the rise of "database migration tools" that allowed developers to write versioned database upgrade scripts (usually in SQL), called "migrations", that would be executed by the tool itself. This trend was further intensified with the rise of DevOps and CI/CD, as it became the de-facto standard way for automating database schema management.

The Rise of Database Schema-as-Code

In recent years, we have witnessed a seismic shift in the way cloud infrastructure is managed. The rise of tools such as Terraform and Kubernetes has led to the concept of "Infrastructure-as-Code". As the complexity of infrastructure grew, it became clear that the only way to manage it effectively was to use a declarative approach and treat it "as code".

Infrastructure-as-code, in this context, means that the desired state of the infrastructure is defined in a formal language ("as code") and stored in a version control system (such as Git). The infrastructure is then managed by a tool that compares the desired state to the actual state and makes the necessary changes to bring the actual state to the desired state.

This approach has many advantages over the traditional imperative approach of manually running commands against the infrastructure. It allows for better collaboration between team members, it is more robust and less susceptible to errors, it is self documenting, and allows for better visibility, governance and control into the state of the infrastructure.

Following this trend, some projects such as Atlas and Skeema have started to apply the same principles to database schema management. Instead of using an imperative approach, where developers do the planning and tools are only used to execute the plan and keep track of what has been done, they bring to the table a simpler approach. This approach, which is gaining the name "Database Schema-as-Code", applies a declarative approach to database schema management: developers provide the desired state of the database and the tool automatically plans the required migrations.

A practical example

Let's use Atlas to show a quick example of how database schema-as-code tools work.

Atlas allows users to represent their database schema in plain SQL or in HCL. The schema is saved in a schema file, which holds the desired state of our database.

We will start with a simple schema that represents a users table, in which each user has an ID and a name:

schema.hcl
table "users" {
schema = schema.example
column "id" {
null = false
type = int
}
column "name" {
null = true
type = varchar(100)
}
primary_key {
columns = [column.id]
}
}

Now, let's apply this schema to an empty MySQL database:

atlas schema apply \
-u "mysql://root:pass@localhost:3306/example" \
--to file://schema.hcl \
--dev-url "docker://mysql/8/example"

Atlas will compare the desired state from our schema file to the database URL provided, and plan the migration:

-- Planned Changes:
-- Create "users" table
CREATE TABLE `users` (
`id` int NOT NULL,
`name` varchar(100) NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
? Are you sure?:
▸ Apply
Lint and edit
Abort

After applying this to our database, we will run the command once more and see that Atlas recognizes there is no drift between the two states:

Schema is synced, no changes to be made

Let's make a simple change to our schema by adding an email column:

schema.hcl
table "users" {
schema = schema.example
column "id" {
null = false
type = int
}
column "name" {
null = true
type = varchar(100)
}
column "email" {
null = true
type = varchar(255)
}
primary_key {
columns = [column.id]
}
}

Now let's run the schema apply command again, and view the migration plan that Atlas will create for us:

-- Planned Changes:
-- Modify "users" table
ALTER TABLE `users` ADD COLUMN `email` varchar(255) NULL;
? Are you sure?:
▸ Apply
Lint and edit
Abort

To conclude, in this example, we got a glimpse of how a tool like Atlas works with a database schema using the declarative approach.

Do you need a Database Schema-as-Code tool?

If you are using an older more traditional database migration tool or not using one at all, you might be wondering if you should consider switching to a Database Schema-as-Code tool. Here are five signs that you might need to consider it:

1. Your team is making changes to the database schema directly in production

The worst kind of tool for managing database schemas is no tool at all. Running migrations manually against production databases should be avoided at all costs. First, humans are notoriously prone to errors, and the risk of making a mistake that will cause downtime or data loss is very high. Second, any changes made directly in production are not tracked anywhere, so it is very hard to know what has been done and by whom. Third, it means that your deployments cannot be fully automated and you are missing out on the benefits of CI/CD.

Using a Database Schema-as-Code tool as the exclusive way of making changes to the database schema is a great way to avoid these problems. It allows you to automate your deployments, track all changes to the database schema in source control, and collaborate more effectively with your team.

2. You don't have a single source of truth for your database schema

Having a single source of truth for how your application is deployed is a key principle of DevOps. It allows you to have a clear picture of the state of your application at any given time, and it allows you to automate your deployments and avoid human errors.

By requiring all migrations to be checked in to source control, traditional migration tools can go a long way to ensure that the database schema is in a consistent state, but they suffer from two issues in this regard:

  1. They only keep track of the changes to the database ("migrations"). To figure out the actual schema of the database at any given version, you need to apply all the migrations up to that version to an empty database and inspect the result with an additional tool. Database Schema-as-Code tools, on the other hand, naturally document the schema of the database in any given version.

  2. They have no way to ensure that the actual database's schema is consistent with the migrations. If someone makes a change directly in the database, there is no way to know about it. When you apply your schema to a target database using a Database Schema-as-Code tool, it will compare the desired state to the actual state and make the necessary changes to bring the actual state to the desired state.

    This ensures that over time your actual databases will not drift from the desired state.

3. You don't have any database experts on your team

In the past, managing database schemas was under the responsibility of DBAs. They were the ones who were responsible for making sure that the database schema is consistent and that all changes are done safely.

In recent years, as cloud-native, microservices-based architectures have become more popular, the amount of databases that each team manages has grown significantly. This has led to a situation where many teams don't have a DBA , and the responsibility for managing the database schema falls on the shoulders of the developers.

If you maintain a small schema with a few tables, you might not feel the need for a tool to help you manage it. But as schemas grow in size and complexity, to contain thousands of database objects (tables, views, functions, stored procedures, etc.), it becomes very hard to manage them manually. A Database Schema-as-Code tool can help you manage your schema more effectively, even if you don't have any DBAs on your team.

4. You manually verify the safety of your database migrations

If your application is small and the stakes for making a mistake are low, you might be able to get away with messing up a schema change now and then. But if your workload is mission-critical, if application downtime or data loss can have a significant impact on your business, you need to make sure that your migrations are safe.

Many teams have been through the unpleasant experience of a failed migration that caused a significant outage. This is usually followed by a change in review processes where migrations are reviewed by multiple team members, and a manual verification process is added to make sure that the migration is safe to run.

This process is time-consuming and error-prone. Even worse, it turns team members with more database experience into bottlenecks, slowing down the development process and team velocity.

Database Schema-as-Code tools can help you automate this process and make it more robust. Atlas, for example, comes with built-in support for Migration Linting, Diff Policies, and other safety mechanisms that help you catch errors and issues before they lead to a full-blown production outage.

5. Your app and infra delivery are way better than your database deployments

When interviewing platform engineers about how their deployment infrastructure works, we keep hearing the same story: "Our deployments are 100% pure Infrastructure-as-Code GitOps goodness, we can do them 50 times a day!" And when we ask about databases… "Oh yeah, there’s that part... It’s not as great."

A lot has changed in the way applications are deployed since the early days of database migrations. The versioned migrations paradigm was a great improvement over the previous manual approach, but it is still very much an imperative approach. Modern application delivery principles such as GitOps and Infrastructure-as-Code work in a declarative way, and so they need a database schema management tool that works in a similar way.

If your application and infra delivery feel like a well-oiled machine, but your database deployments are still semi-manual, or feel clunky and error-prone, you should consider switching to a Database Schema-as-Code tool where you can manage your database schema in a Kubernetes Operator or a Terraform Provider.

Enter: Atlas

Atlas is a Database Schema-as-Code tool that allows you to manage your database schema in a declarative way. With Atlas, developers provide the desired state of the database schema and Atlas automatically plans the required migrations. To get started with Atlas, you can follow our quickstart guide..

Wrapping up

In this article, we have discussed the evolution of database schema management tools and the rise of Database Schema-as-Code tools. We have also discussed five signs that you might need to consider switching to a Database Schema-as-Code tool. I hope you found this article useful.

As always, we would love to hear your feedback and suggestions on the Atlas Discord server.

GitOps for Databases, Part 2: Atlas Operator and ArgoCD

· 7 min read
Rotem Tamir
Building Atlas
info

This is the second post in a two-part tutorial, which demonstrates how to use the Atlas Operator in tandem with Atlas Cloud and ArgoCD to create a slick, modern GitOps workflow for managing your database migrations natively in Kubernetes.

In part one, we demonstrated how to initialize an Atlas project, and create a CI/CD pipeline that automatically plans, verifies and stores your database migrations in Atlas Cloud using GitHub Actions.

In this part, we will show how to deploy these migrations using the Atlas Operator and ArgoCD to demonstrate a complete GitOps workflow for database migrations.

How to GitOps your Database Migrations on Kubernetes

"We can wrap existing schema management solutions into containers, and run them in Kubernetes as Jobs. But that is SILLY. That is not how we work in Kubernetes."

-Viktor Farcic, DevOps ToolKit

As applications evolve, so do their database schemas. The practice of automating the deployment of database schema changes has evolved hand in hand with modern devops principles into what is known as database migrations. As part of this evolution, hundreds of "migration tools" have been created to help developers manage their database migrations. These tools range from ORM and language specific tools like Alembic for Python, to language agnostic tools like Flyway and Liquibase.

When Kubernetes came along and teams started to containerize their applications, the knee-jerk reaction was to wrap these legacy tools in a container and run them as part of the application deployment process. We discussed some of the shortcomings of this approach in a recent KubeCon talk and earlier Webinar.

Atlas was created from the ground up to be a modern database migration tool that embodies modern DevOps principles and is designed to run natively in Kubernetes. The Atlas Operator enables teams to extend the native Kubernetes API with new resource types that represent database schemas and migrations. By using these capabilities it is possible to natively integrate database migrations into your GitOps workflow.

Prerequisites

  • A running Kubernetes cluster - for learning purposes, you can use Minikube, which is a tool that runs a single-node Kubernetes cluster on your laptop.
  • kubectl - a command-line tool for interacting with Kubernetes clusters.
  • Helm - a package manager for Kubernetes.

Setting up the Atlas Operator and ArgoCD

1. Install ArgoCD

To install ArgoCD run the following commands:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Wait until all the pods in the argocd namespace are running:

kubectl wait --for=condition=ready pod --all -n argocd

kubectl will print something like this:

pod/argocd-application-controller-0 condition met
pod/argocd-applicationset-controller-69dbc8585c-6qbwr condition met
pod/argocd-dex-server-59f89468dc-xl7rg condition met
pod/argocd-notifications-controller-55565589db-gnjdh condition met
pod/argocd-redis-74cb89f466-gzk4f condition met
pod/argocd-repo-server-68444f6479-mn5gl condition met
pod/argocd-server-579f659dd5-5djb5 condition met

For more information or if you run into some errors refer to the Argo CD Documentation.

2. Install the Atlas Operator

helm install atlas-operator oci://ghcr.io/ariga/charts/atlas-operator

Helm will print something like this:

Pulled: ghcr.io/ariga/charts/atlas-operator:0.3.6
Digest: sha256:7e29c15e846fa9c25164f4ad5a7cb7f25e9ead2882082f0352985e58c1976f99
NAME: atlas-operator
LAST DEPLOYED: Mon Dec 11 10:25:11 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Wait until the atlas-operator pod is running:

kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=atlas-operator -n default

kubectl will print something like this:

pod/atlas-operator-866dfbc56d-qkkkn condition met

For more information on the installation process, refer to the Atlas Operator Documentation

Step 2: Set up the Target Database

Start by deploying a simple PostgreSQL database using the following command:

kubectl apply -f https://raw.githubusercontent.com/ariga/atlas-operator/master/config/integration/databases/postgres.yaml

This will create a Deployment which runs a single (non-persistent) PostgreSQL instance and a Service that exposes it on port 5432. In addition, it will create a Secret that contains the database credentials.

Wait until the database pod is running:

kubectl wait --for=condition=ready pod -l app=postgres -n default

Step 3: Create the AtlasMigration resource

In order for the Atlas Operator to know which migrations to apply, we need to create an AtlasMigration resource that points to the Atlas Cloud project we created in part one. Create a new directory called manifests in your GitHub repository. In it, create a file called atlas-migration.yaml with the following contents:

manifests/atlas-migration.yaml
apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasMigration
metadata:
name: migration
spec:
urlFrom:
secretKeyRef:
key: url
name: postgres-credentials
cloud:
project: "atlasdemo" # Atlas Cloud project name
tokenFrom:
secretKeyRef:
name: atlas-credentials
key: token
dir:
remote:
name: "atlasdemo" # Migration directory name in your atlas cloud project
tag: "1d579be616db48803bb21713fd836a9165030f18" # See below on how to obtain this value for your project.

This resource tells the Atlas Operator to apply the migrations in the atlasdemo project in Atlas Cloud to the database specified in the postgres-credentials secret. Notice that the tokenFrom field references a secret called atlas-credentials. This secret will contain the Atlas Cloud API token that we created in part one.

To create it run:

kubectl create secret generic atlas-credentials --from-literal=token=aci_<replace with your token>
Obtaining the tag field

Notice the tag field in the dir section. This field tells the Atlas Operator which version of the migrations to apply. In this case, we are telling it to apply the migrations tagged with the commit hash 1d579be616db48803bb21713fd836a9165030f18 which is the commit hash of the merge commit that merged the pull request we created in part one.

To review which tags are available for your migrations, head over to you Atlas Cloud project and click on the Tags tab. You should see something like this:

Commit and push the changes to your GitHub repository.

Step 4: Create the ArgoCD Application

Now that we have created the AtlasMigration resource, we can create an ArgoCD application that will deploy it. Create a file called Application.yaml in the root of your GitHub repository with the following contents:

Application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: atlas-argocd-demo
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
source:
path: manifests
repoURL: 'https://github.com/<your gh user>/<your repo name>'
targetRevision: master
destination:
namespace: default
server: 'https://kubernetes.default.svc'
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 5s
maxDuration: 3m0s
factor: 2
syncOptions:
- CreateNamespace=true

Be sure to replace the repoURL field with the URL of your GitHub repository.

info

If your repository is private, you will need to create a GitHub Personal Access Token and tell ArgoCD about it by running the following command:

export CURRENT_NS=$(kubectl config view --minify --output 'jsonpath={..namespace}')
kubectl config set-context --current --namespace=argocd
argocd repo add https://github.com/<user>/<repo> --username <user> --password ghp_<your token>
kubectl config set-context --current --namespace=$CURRENT_NS

5. Step 5: Deploy!

Next, apply the application manifest:

kubectl apply -f Application.yaml

Wait until the application is deployed:

kubectl wait --for=condition=ready atlasmigration/migration

Observe the status of the migration object:

 kubectl get atlasmigration/migration -o jsonpath='{.status}' | jq

The output will look similar to:

{
"conditions": [
{
"lastTransitionTime": "2023-12-11T08:38:35Z",
"message": "",
"reason": "Applied",
"status": "True",
"type": "Ready"
}
],
"lastApplied": 1702283914,
"lastAppliedVersion": "20231206075118",
"observed_hash": "6e4feac15a35d20c38e705428de507835c7c58d487eacc84ed012a17b002981d"
}

You can also observe the status of the migration using the Atlas Cloud UI:

Wrapping Up

Let's review the flow that we have created, from end to end:

  • Developers modify the desired state of their schema and use atlas migrate diff locally to generate a migration plan.
  • Developers commit the migration plan to their GitHub repository and create a pull request.
  • GitHub Actions runs the Atlas Continuous Integration workflow, which verifies the migration plan is correct and safe.
  • Once the pull request is merged, a GitHub Actions workflow pushes the new migration to Atlas Cloud. It is tagged with the commit hash of the merge commit.
  • When we are ready to deploy our changes to production, we change the value of the tag field in the AtlasMigration resource to the most recent tag. We push this change to our GitHub repository.
  • ArgoCD detects the change and updates our AtlasMigration resource.
  • The Atlas Operator detects the change and applies the migrations to the database.
  • The database is now up to date with the desired state of our schema!

To summarize, in this tutorial we demonstrated how to use the Atlas Operator and ArgoCD to create a slick, modern GitOps workflow for managing your database migrations natively in Kubernetes.

As always, we would love to hear your feedback and suggestions on our Discord server.

GitOps for Databases, Part 1: CI/CD

· 10 min read
Rotem Tamir
Building Atlas
info

This is the first post in a two-part tutorial, which demonstrates how to use the Atlas Operator in tandem with Atlas Cloud and ArgoCD to create a slick, modern GitOps workflow for managing your database migrations natively in Kubernetes.

GitOps is a software development and deployment methodology that uses Git as the central repository for both code and infrastructure configuration, enabling automated and auditable deployments.

ArgoCD is a Kubernetes-native continuous delivery tool that implements GitOps principles. It uses a declarative approach to deploy applications to Kubernetes, ensuring that the desired state of the application is always maintained.

Kubernetes Operators are software extensions to Kubernetes that enable the automation and management of complex, application-specific, operational tasks with domain-specific knowledge within a Kubernetes cluster.

In this tutorial, we will use the Atlas Operator in tandem with Atlas Cloud and ArgoCD to create a slick, modern GitOps workflow for managing your database migrations natively in Kubernetes.

For the sake of brevity, we are going to split this guide into two parts:

  1. In part one, we will show how to initialize an Atlas project, and create a CI/CD pipeline that will automatically plan, verify and store your database migrations in Atlas Cloud using GitHub Actions.
  2. In part two, we will show how to deploy these migrations using the Atlas Operator and ArgoCD to demonstrate a complete GitOps workflow for database migrations.

Modern CI/CD for Database Migrations

Atlas was built to support a modern CI/CD workflow for database migrations based on the following principles:

  1. Changes to the database are planned automatically. Given the desired state of the database, the system should automatically generate a plan for how to get from the current state to the desired state.
  2. Changes to the database schema are stored in a versioned migration directory. All planned changes to the database are checked in to a versioned migration directory. This directory contains SQL scripts, which are executed in lexicographic order to apply the changes to the database.
  3. Changes to the database are validated during CI. All changes to the database are tested and evaluated against a set of governing policies.
  4. Changes to the database are deployed via automation. No manual steps are required to deploy changes to the database. All changes are deployed via a CI/CD pipeline.

To learn more about these principles, check out our guide to modern CI/CD for database migrations.

In this tutorial, we will show to apply the fourth principle to your database migrations using the Atlas Operator and ArgoCD.

Local Setup

With the principles of modern CI/CD for database migrations in mind, let's see how we can apply them to a simple application that uses a PostgreSQL database.

Prerequisites (for part one)

  1. A GitHub Account - we are going to be setting up some GitHub Actions workflows, so you will need a GitHub account.

  2. The most recent version of Atlas. To get Atlas on Linux or macOS run:

    curl -sSf https://atlasgo.sh | sh

For more installation options, see the docs 3. Docker. To install Docker, follow the instructions here.

  1. The GitHub CLI, gh. To install gh:
    brew install gh

Follow instructions for other platforms here.

Step 1: Define our desired state

Atlas advocates for the declarative approach in which users start their work by defining the desired state of their database and let the system figure out the implementation details. Atlas supports many different ways to define the desired state of your database, called "schema loaders". In this tutorial, we will use a simple SQL file to define our desired state.

In a fresh Git repo, create a file named schema.sql with the following contents:

schema.sql
create table users (
id int primary key,
name varchar(255) not null unique
);

In this project, whenever we want to change the database schema, we will update this file to reflect the desired state of the database.

Step 2: Plan the initial migration

Now that we have defined our desired state, we can use the Atlas CLI to plan the initial migration. Create the following file named atlas.hcl:

env "local" {
src = "file://schema.sql"
dev = "docker://postgres/15/dev"
migration {
dir = "file://migrations"
}
format {
migrate {
diff = "{{ sql . \" \" }}"
}
}
}

Next, run the following command to plan the initial migration:

atlas migrate diff --env local

Observe two new files that were created in the migrations directory:

.
├── atlas.hcl
├── migrations
│ ├── 20231204121249.sql
│ └── atlas.sum
└── schema.sql

Step 3: Push our migration directory to Atlas Cloud

Atlas Cloud is a hosted service that can serve as a central repository for your database migrations. Similarly to how DockerHub is used to store and distribute Docker images, Atlas Cloud can be used to store and distribute database migration directories. Atlas Cloud has a free tier that is suitable for small teams and personal projects which you can use to follow along with this tutorial.

Log in to Atlas Cloud using the following command:

atlas login

If you do not have an existing Atlas Cloud account, you will be prompted to create one.

Next, push your migration directory to Atlas Cloud using the following command:

atlas migrate push --env local atlasdemo

This will create a new project named atlasdemo on Atlas Cloud and push your migration directory to it. Atlas will print a URL to the project page on Atlas Cloud similar to the following:

https://rotemtam85.atlasgo.cloud/dirs/4294967359

Setup GitHub Actions

In this section, we will set up a GitHub Actions workflow that will add Atlas to your CI/CD pipeline.

Create a Bot Token

To write data to your Atlas Cloud account, you will need to supply your CI/CD pipelines with an API key that has write access to your Atlas Cloud account. To learn how to create a bot token, check out our guide on the topic. Use the instructions in this guide to create a token, and make a note of it. We will use it in the next steps.

Install the Atlas Extension

To streamline this process, we have created a gh command that will create the workflow for you. To install the latest version, run:

gh extension install ariga/gh-atlas

Ensure your gh CLI has sufficient permissions

Make sure you have the necessary permissions to configure your action:

gh auth refresh -s write:packages,workflow

Create a GitHub Actions Workflow

Once installed, let's use this extension to generate our GitHub Actions workflow. Run the following command:

gh atlas init-action --token <your-bot-token>  --dir-name="atlasdemo" --driver=postgres

Atlas will scan your repository (locally) for directories containing Atlas migrations and ask you which one you would like to use for CI. Select the desired directory and press "Enter":

Use the arrow keys to navigate: ↓ ↑ → ←
? choose migration directory:
▸ migrations

Atlas will then ask you which database driver this directory contains migrations for. Select the desired driver and press "Enter".

Next, the GitHub extension will save your bot token to a GitHub secret and create a pull request with the necessary configuration for the GitHub Action.

Screenshot Example

The PR contains a GitHub Actions workflow similar to this:

Code Example
name: Atlas
on:
push:
branches:
- master
paths:
- .github/workflows/ci-atlas.yaml
- 'migrations/*'
pull_request:
paths:
- 'migrations/*'
# Permissions to write comments on the pull request.
permissions:
contents: read
pull-requests: write
jobs:
atlas:
services:
# Spin up a postgres:15 container to be used as the dev-database for analysis.
postgres:
image: postgres:15
env:
POSTGRES_DB: dev
POSTGRES_PASSWORD: pass
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-start-period 10s
--health-timeout 5s
--health-retries 5
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: ariga/setup-atlas@v0
with:
cloud-token: ${{ secrets.ATLAS_CLOUD_TOKEN_K6MJMK }}
- uses: ariga/atlas-action/migrate/lint@v1
with:
dir: 'file://migrations'
dir-name: 'atlasdemo'
dev-url: 'postgres://postgres:pass@localhost:5432/dev?search_path=public&sslmode=disable'
env:
GITHUB_TOKEN: ${{ github.token }}
- uses: ariga/atlas-action/migrate/push@v1
if: github.ref == 'refs/heads/master'
with:
dir: 'file://migrations'
dir-name: 'atlasdemo'
dev-url: 'postgres://postgres:pass@localhost:5432/dev?search_path=public&sslmode=disable'

After reviewing the changes, merge the pull request to enable the GitHub Action.

Testing our Pipeline

Now that we have set everything up, let's test our pipeline end-to-end. To do so, we will first plan a new change to our database schema.

Edit the Desired Database Schema

Edit the schema.sql file to add a new column to the users table:

schema.sql
create table users (
id int primary key,
name varchar(255) not null unique,
email varchar(255) not null unique
);

Generate a New Migration

Next, run the following command to automatically generate a new migration:

atlas migrate diff --env local add_email_column

This will create a new file in the migrations directory:

.
├── atlas.hcl
├── migrations
│ ├── 20231204121249.sql
│ ├── 20231206075118_add_email_column.sql
│ └── atlas.sum
└── schema.sql

Create a new Pull Request

Next, create a branch and push the changes to GitHub:

git checkout -b add-email-column
git add .
git commit -m "Add email column"
git push --set-upstream origin add-email

Next, use the gh CLI to create a new pull request:

gh pr create --title "migrations: add email column" --body "adding email column to users table"

Atlas Reviews the Pull Request

Based on the configuration in the GitHub Actions workflow we created, Atlas will automatically review your migration directory whenever a new pull request that affects it is opened. When Atlas is done running, it will comment on your PR with the results of the review:

Interesting! Atlas found some issues with our migration. Let's click on the report to see what they are:

Atlas warns us about two issues. The first is that adding a non-nullable varchar column "email" will fail in case the "users" table is not empty. The second is that creating an index non-concurrently causes write locks on the "users" table. Since we are in the early stages of development, we can safely ignore these issues for now. Let's merge the pull request and see what happens.

gh pr merge --squash

Atlas Pushes the Migrations to Atlas Cloud

Once GitHub Actions detects that a new push to the master branch has been merged, per our configuration, it will run the atlas migrate push command to push the migrations to Atlas Cloud. Once the push is complete, our schema will be updated in the Atlas Cloud schema viewer screen:

Wrapping Up Part One

That's it for part one! In this tutorial, we have shown how to use Atlas Cloud and GitHub Actions to create a slick, modern CI/CD pipeline for your database migrations. In part two, we will show how to deploy these migrations using the Atlas Operator and ArgoCD to demonstrate a complete GitOps workflow for database migrations.

As always, we would love to hear your feedback and suggestions on our Discord server.

Automatic migrations for Microsoft SQL Server

· 6 min read
Rotem Tamir
Building Atlas
TL;DR

In this guide we will demonstrate how to use Atlas to automatically generate migrations for Microsoft SQL Server databases.

To skip the intro and go straight to the demo, click here.

Introduction

As your application's data model evolves, you will need to make changes to your database schema. In today's world, where teams are expected to own their infrastructure and ship code faster than ever, it is important to have a reliable and repeatable process for managing database schema changes.

Atlas lets you manage your database schema as code. It is a modern schema management tool that applies concepts from modern DevOps tools to database schema management. Using Atlas, teams can automatically plan, verify, deploy and monitor database schema changes.

Microsoft SQL Server, one of the longest-standing database engines in our business, was first released by Microsoft in 1989. MS-SQL is the go-to database for Windows environments in many industries.

In this guide, we will demonstrate how to use Atlas to automatically generate migrations for your Microsoft SQL Server databases.

Setting up

  1. Start by installing the Atlas CLI, if you haven't already. On macOS and Linux simply run:

    curl -sSf https://atlasgo.sh | sh

    For other platforms, see the installation instructions.

  2. The SQL Server driver is currently available to users of Atlas Cloud Beta Program. To join the program (for free), first sign up for an Atlas Cloud account.

  3. Once your inside your Atlas account, go to the account settings by clicking your avatar. Then, select the "Microsoft SQL Server" and click the "Save" button.

  4. After you have opted-in to the Beta Program, log in to your Atlas account using the CLI:

    $ atlas login
    You are now connected to "a8m" on Atlas Cloud.

Demo time!

In this guide, we will demonstrate some of the basic capabilities of Atlas by working against a local Microsoft SQL Server database.

To spin up a local SQL Server instance using docker run:

docker run --rm -e 'ACCEPT_EULA=Y' -e 'MSSQL_SA_PASSWORD=P@ssw0rd0995' -p 1433:1433 -d mcr.microsoft.com/mssql/server:latest

Notice that by passing the ACCEPT_EULA environment variable, we are accepting the terms of Microsoft's EULA.

Managing your database schema as code

Atlas supports a workflow called declarative schema migrations. In this workflow, you first define the desired state of your database schema (in one of many supported formats and languages) and then let Atlas calculate the diff between the desired state and the actual state of your database. Atlas then generates the needed SQL commands that will bring your database to the desired state.

Let's see this in action. First, create a new file name schema.sql. This file will contain the desired state of our database in plain SQL.

schema.sql
-- Create the users table
CREATE TABLE users (
id INT PRIMARY KEY,
email NVARCHAR(255) UNIQUE,
display_name NVARCHAR(255)
);

-- Create the posts table with a custom name for the FK constraint
CREATE TABLE posts (
id INT PRIMARY KEY,
title NVARCHAR(255),
body TEXT,
author_id INT,
CONSTRAINT author_fk FOREIGN KEY (author_id) REFERENCES users(id)
);

Applying our schema

Next, let's apply this schema to our database. To do so, we will use the atlas schema apply command.

atlas schema apply -u "sqlserver://sa:P@ssw0rd0995@localhost:1433?database=master" \
--to file://schema.sql \
--dev-url "docker://sqlserver"

Atlas will connect to our target database to inspect it's current state. Next, it will use the dev-database to normalize our schema and finally, it will generate the SQL commands that will bring our database to the desired state:

-- Planned Changes:
-- Create "users" table
CREATE TABLE [users] ([id] int NOT NULL, [email] nvarchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [display_name] nvarchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, CONSTRAINT [PK_users] PRIMARY KEY CLUSTERED ([id] ASC));
-- Create index "UQ__users__AB6E61643C9DEB30" to table: "users"
CREATE UNIQUE NONCLUSTERED INDEX [UQ__users__AB6E61643C9DEB30] ON [users] ([email] ASC);
-- Create "posts" table
CREATE TABLE [posts] ([id] int NOT NULL, [title] nvarchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [body] text COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [author_id] int NULL, CONSTRAINT [PK_posts] PRIMARY KEY CLUSTERED ([id] ASC), CONSTRAINT [post_author_fk] FOREIGN KEY ([author_id]) REFERENCES [users] ([id]) ON UPDATE NO ACTION ON DELETE NO ACTION);

Atlas prompts us to approve the changes before applying them to the database:

Use the arrow keys to navigate: ↓ ↑ → ←
? Are you sure?:
▸ Apply
Abort

After applying the schema, Atlas confirms that the changes were applied:

✔ Apply

Next, let's re-run the atlas schema apply command. This time, Atlas will detect that the database is already in the desired state and will not generate any changes:

Schema is synced, no changes to be made.

Altering our schema

Now, let's make some changes to our schema. Open the schema.sql file and add a new column to the users table:

schema.sql
CREATE TABLE users (
id INT PRIMARY KEY,
email NVARCHAR(255) UNIQUE,
display_name NVARCHAR(255),
+ bio text
);

Next, let's re-run the atlas schema apply command. This time, Atlas will detect that the schema has changed and will generate the needed SQL commands to bring the database to the desired state:

-- Planned Changes:
-- Modify "users" table
ALTER TABLE [users] ADD [bio] text COLLATE SQL_Latin1_General_CP1_CI_AS NULL;

After applying the changes, Atlas confirms once again that the changes were applied:

✔ Apply

Visualizing our schema

One of the most useful features of Atlas is the ability to visualize your database schema. To do so, run the atlas schema inspect command with the -w (web) flag:

atlas schema inspect -u "sqlserver://sa:P@ssw0rd0995@localhost:1433?database=master" -w

Atlas will ask whether you would like to create your visualization publicly (in a publicly accessible URL) or privately (in your Atlas Cloud account):

Use the arrow keys to navigate: ↓ ↑ → ←
? Where would you like to share your schema visualization?:
▸ Publicly (gh.atlasgo.cloud)
Privately (rotemtam85.atlasgo.cloud)

For this demo, let's choose the public option. Atlas will create the visualization and open it in your default browser:

See it for yourself at: https://gh.atlasgo.cloud/explore/5e15289a

Wrapping up

In this guide we have demonstrated how to set up Atlas to manage your Microsoft SQL Server database schema. We have also demonstrated some of the basic capabilities of Atlas, such as declarative schema migrations, and schema visualization. These two features are just the tip of the iceberg. Atlas has many more features that can help you better manage your database! To learn more, check out the Atlas documentation.

As always, we would love to hear your feedback and suggestions on our Discord server.

Passwordless Schema Migrations on RDS with Atlas

· 6 min read
Rotem Tamir
Building Atlas

TL;DR

Atlas now supports AWS IAM authentication, which enables you to perform passwordless schema migrations on your RDS databases. To use it with Atlas, add the aws_rds_token data source to your atlas.hcl configuration file:

data "aws_rds_token" "mydb" {
endpoint = "mydb.123456789012.us-east-1.rds.amazonaws.com:3306"
username = "atlas"
}

To skip the intro and jump straight to the tutorial, click here.

Introduction

Passwords have long been the default mechanism for authentication, but they come with a set of known vulnerabilities. In recent years, our industry has shifted towards alternative authentication methods due to these weaknesses. For databases, which store critical data, this security concern is even more important.

Schema migrations require elevated permissions, making it even more essential to approach them with utmost care in order to prevent security breaches. In this post, we'll show how to use Atlas to perform passwordless schema migrations on databases hosted in AWS's RDS service.

The Problem with Passwords

Passwords are considered a weak authentication mechanism for humans logging in to systems since they can be leaked or guessed. For this reason, many services offer more robust authentication methods, such as multi-factor authentication or single sign-on.

In this post, we'll focus on the security concerns of passwords (or API Tokens) for automated systems (such as CI/CD pipelines), which are used to perform schema migrations. Such tokens pose a challenge to securing systems in a few ways:

  • Leaks. When stored in configuration files, passwords are typically in plain text, increasing the risk of leaks.
  • Granularity. When passwords are shared among multiple users, it becomes challenging to grant and revoke access for individual users based on role changes or emerging security concerns.
  • Visibility. Because passwords are usually visible to operators and are shared by multiple users, it's hard to track who performed which operation once authenticated.
  • Rotation. Because passwords tend to be long-lived, their rotation becomes a cumbersome task.

IAM Authentication

IAM, short for Identity and Access Management, is a framework that has been adopted by virtually all cloud providers for managing digital identities and their permissions. Unlike traditional password-based systems where credentials are stored and checked, IAM verifies who (or what) is making a request and then checks the permissions associated with that identity.

IAM services supply mechanisms for generating short-lived tokens based on the identity of the caller. In addition, these services provide a centralized way to manage permissions (by creating granular access policies and grouping them into roles) and auditing capabilities to track how subjects (users or services) use the system.

Configured correctly, under IAM, every subject can access exactly what it needs and nothing more, without ever having to use a password or some other token that might be leaked or stolen. When a person leaves your organization (or no longer needs access to a particular resource), you can revoke their access by updating their IAM role.

IAM authentication for Databases

Most databases in use today predate IAM and have developed their own internal mechanisms for authentication and authorization. In recent years, cloud vendors have worked to create a bridge between IAM and databases, allowing users to authenticate their identity to databases using IAM credentials. In this post, we'll focus on AWS's implementation of IAM authentication for RDS databases.

How does it work?

First, enable IAM authentication on your RDS instance. This installs a plugin on the database that allows it to authenticate users with IAM credentials instead of passwords. Read how to do this in the AWS documentation

Next, create a database user and grant it permission to authenticate using IAM.

In MySQL, execute a statement like this:

CREATE USER 'atlas' IDENTIFIED WITH AWSAuthenticationPlugin as 'RDS';

In PostgreSQL, execute a statement like this:

CREATE USER atlas; 
GRANT rds_iam TO atlas;

Finally, create an IAM policy that allows subjects to create RDS connection tokens. This policy can then be attached to roles for developers or services that need to connect to the database. Read how to do this in the AWS documentation.

IAM Authentication with Atlas

Tools that perform schema migrations such as Atlas require elevated permissions to perform their tasks. For example, they need to be able to inspect the database's information schema tables as well as create and drop resources. For this reason, any mechanism that can further protect the security of their credentials is essential, making IAM authentication a great fit. To support this use case, we have recently added support for AWS IAM authentication to Atlas.

Demo Time!

Let's see how to use Atlas to perform passwordless schema migrations on an RDS database.

For the purpose of this demo, we assume that we have a PostgreSQL database running in RDS with IAM authentication enabled. We also assume that we have a user named atlas that has been granted the rds_iam permission and that we have created an IAM policy that allows us to generate RDS tokens.

Start by creating a new file named atlas.hcl to store our project configuration and add the following content:

// Define local variables for the database endpoint and username.
locals {
endpoint = "atlas-demo.xyzxyz.us-east-1.rds.amazonaws.com:5432"
username = "atlas"
}

// Use the "aws_rds_token" data source to generate a token for the database.
data "aws_rds_token" "db" {
endpoint = local.endpoint
username = local.username
region = "us-east-1"
}

// Define an environment named "rds" that uses the generated token.
env "rds" {
url = "postgres://${local.username}:${urlescape(data.aws_rds_token.db)}@${local.endpoint}/postgres"
}

Lets break this example down:

  • The locals block defines two variables – endpoint and username – that we use to store the database endpoint and the username of the user created in the database.
  • Next, we define an aws_rds_token data source to generate a token for the database. To read more about this data source, see the documentation.
  • Finally, we define an environment named rds that uses the generated token. The url property defines the connection URL that Atlas will use to connect to the database. Notice that we use the urlescape function to escape the token before embedding it in the URL.

Now that we have our project configuration, let's use Atlas to inspect the database schema. Run the following command:

atlas schema inspect -c "file://atlas.hcl" --env rds

You should see output similar to the following:

schema "public" {
}

Amazing! This output indicates that Atlas was able to both connect to the database and inspect the schema without us having to provide it with any credentials!

Wrapping up

In this post, we discussed the security concerns around passwords and how IAM authentication can help mitigate them. We also demonstrated how to use Atlas to perform passwordless schema migrations on an RDS database using IAM authentication. If you use Atlas to perform schema migrations on RDS databases, we encourage you to give IAM authentication a try!

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

Bridging the gap between IaC and Schema Management

· 11 min read
Rotem Tamir
Building Atlas

Introduction

When we started building Atlas a couple of years ago, we noticed that there was a substantial gap between what was then considered state-of-the-art in managing database schemas and the recent strides from Infrastructure-as-Code (IaC) to managing cloud infrastructure.

In this post, we review that gap and show how Atlas – along with its Terraform provider – can bridge the two domains.

As an aside, I usually try to keep blog posts practical and to the point, but occasionally think it’s worth it to zoom out and explain the grander ideas behind what we do.

If you’re looking for a quick and practical explanation of working with Atlas and Terraform, I recommend this YouTube video.

Why Infrastructure-as-Code

Infrastructure as Code (IaC) refers to the practice of managing and provisioning infrastructure through machine-readable configuration files, instead of utilizing traditional interactive configuration tools. This approach makes for automated, consistent, and repeatable deployment of environments that are faster and less error-prone than previous, more manual approaches.

Terraform, a popular open-source tool created by HashiCorp, is the most prominent implementation of the IaC concept. With Terraform, organizations can describe the desired state of their infrastructure in a simple configuration language (HCL) and let Terraform plan and apply these changes in an automated way.

Terraform (and IaC in general) has taken the software engineering world by storm in recent years. As someone who had the dubious pleasure of managing complex cloud infrastructure manually, using what is today jokingly called "ClickOps", I can mention a few properties of IaC that I believe contributed to this success:

  • Declarative – Terraform is built on a declarative workflow, which means that users only define the final (desired) state of their system. Terraform is responsible for inspecting the target environment, calculating the difference between the current and desired states, and building a plan for reconciling between those two states.

    Cloud infrastructures are becoming increasingly complex, comprising thousands of different, interconnected components. Declarative workflows greatly reduce the mental overhead of planning changes to such environments.

  • Automated – Many engineers can attest that manually provisioning a new environment used to take days, even weeks! Once Terraform generates a plan for changing environments, the process runs automatically and finishes in a matter of minutes.

  • Holistic – With Terraform, it is possible to capture all of the resources and configurations required to provision an application as one interconnected and formally defined dependency graph. Deployments become truly reproducible and automated, with no dangling or manually provisioned dependencies.

  • Self-healing – Finally, these three properties converge to support a self-healing tool that can detect and fix drift on its own. Whenever drift occurs, it is only a matter of re-running Terraform to shift from the current state back to the desired one.

Comparing IaC with Schema Management Tools

Next, let’s discuss the current state of database schema management tools (often called schema migration tools) by contrasting them with the properties of IaC.

  • Imperative – If Terraform embodies the declarative approach, then schema management tools often exemplify the opposite, imperative (or revision-based) approach. In this case, we don’t provide the tools with the what (the desired state of the database), but the how (what SQL commands need to run to migrate the database from the previous version to the next).

  • Semi-automated – Migration tools were revolutionary when they came out a decade ago. One idea stood as one of the harbingers of the GitOps philosophy: that database changes should not be applied manually but first checked into source control and then applied automatically by a tool.

    Today’s migration tools automate two aspects of schema management: 1) execution and 2) tracking which migrations were already executed on a target database.

    Compared to modern IaC tools, however, they are fairly manual. In other words, they leave the responsibility of planning and verifying the safety of changes to the user.

  • Fragmented – As we described above, one of the most pleasant aspects of adopting the IaC mindset is having a unified, holistic description of your infrastructure, to the point where you can entirely provision it from a single terraform apply command.

    For database schema management, common practices are anything but holistic. In some cases, provisioning the schema might happen 1) when application servers boot, before starting the application, or 2) while it runs as an init container on Kubernetes.

    In fact, some places (yes, even established companies) still have developers manually connect (with root credentials) to the production database to execute schema changes!

  • A pain to fix – When a migration deployment fails, many schema management tools will actually get in your way. Instead of worrying about fixing the issue at hand, you now need to worry about both your database and the way your migration tool sees it (which have now diverged).

Bridging the Gap

After describing the gap between IaC and database schema management in more detail, let’s delve into what it would take to bridge it. Our goal is to have schema management become an integral part of your day-to-day IaC pipeline so that you can enjoy all the positive properties we described above.

To integrate schema change management and IaC, we would need to solve two things:

  1. A diffing engine capable of supporting declarative migration workflows, such that an engine should be capable of:
    • Loading the desired schema of the database in some form
    • Inspecting the current schema of the database
    • Calculating a safe migration plan automatically
  2. A Terraform Provider that wraps the engine as a Terraform resource, which can then seamlessly integrate into your overall application infrastructure configuration.

How Atlas drives Declarative Migrations

Atlas is a language-agnostic tool for managing and migrating database schemas using modern DevOps principles. It is different from Terraform in many ways, but similar enough to have received the informal nickname "Terraform for Databases".

At its core lie three capabilities that make it ideal to apply a declarative workflow to schema management:

  1. Schema loaders
  2. Schema inspection
  3. Diffing and planning

Let’s discuss each of these capabilities in more detail.

Schema loaders

Every declarative workflow begins with the desired state - what we want the system to look like. Using a mechanism called "schema loaders" Atlas users can provide the desired schema in many ways. For example:

Plain SQL

Atlas users can describe the desired schema of the database using plain SQL DDL statements such as:

CREATE TABLE users (
Id int primary key,
Name varchar(255)
)

Atlas HCL

Alternatively, users can use Atlas HCL, a configuration language that shares Terraform’s configuration language foundations:

table "users" {
schema = schema.public
column "id" {
type = int
}
column "name" {
type = varchar(255)
}
column "manager_id" {
type = int
}
primary_key {
columns = [
column.id
]
}
}

A live database

In addition, users can provide Atlas with a connection to an existing database which in turn Atlas can inspect and use as the desired state of the database.

External Schemas (ORM)

Finally, Atlas has an easily extensible design which makes writing plugins to load schemas from external sources a breeze. For example, Atlas can read the desired schema of the database directly from your ORM, using a simple integration.

Schema inspection

Once Atlas understands the desired state of the database, it needs to inspect the existing database to understand its current schema. This is done by connecting to the target database and querying the database’s information schema to construct a schema graph (an in-memory representation of all the components in the database and their connections).

Diffing and planning

The next phase involves calculating the difference ("diffing") between the desired and current states and calculating an execution plan to reconcile this difference. Because resources are often interconnected, Atlas must create a sensible order of execution using algorithms such as Topological Sort to ensure, for example, that dependencies on a resource are removed before it is dropped.

In addition, each database engine has its own peculiarities and limitations to take into account when creating an execution plan. For example, adding a default value to a column in an SQLite database must be performed in a multiple-step plan that looks similar to this:

-- Planned Changes:
-- Create "new_users" table
CREATE TABLE `new_users` (`id` int NOT NULL, `greeting` text NOT NULL DEFAULT 'shalom')
-- Copy rows from old table "users" to new temporary table "new_users"
INSERT INTO `new_users` (`id`, `greeting`) SELECT `id`, IFNULL(`greeting`, 'shalom') AS `greeting` FROM `users`
-- Drop "users" table after copying rows
DROP TABLE `users`
-- Rename temporary table "new_users" to "users"
ALTER TABLE `new_users` RENAME TO `users`

Atlas in action

What does this workflow look like in practice? As you can see in Atlas's "Getting Started" guide, suppose we made a change to our desired schema that adds a new table named blog_posts (this change may be described in a plain SQL file, an HCL file or even in your ORM's data model).

To apply the desired schema on a target database you would use the schema apply command:

atlas schema apply \
-u "mysql://root:pass@localhost:3306/example" \
--to file://schema.sql \
--dev-url "docker://mysql/8/example"

After which Atlas will generate a plan:

-- Planned Changes:
-- Create "blog_posts" table
CREATE TABLE `example`.`blog_posts` (`id` int NOT NULL, `title` varchar(100) NULL, `body` text NULL, `author_id` int NULL, PRIMARY KEY (`id`), INDEX `author_id` (`author_id`), CONSTRAINT `author_fk` FOREIGN KEY (`author_id`) REFERENCES `example`.`users` (`id`))
Use the arrow keys to navigate: ↓ ↑ → ←
? Are you sure?:
▸ Apply
Abort

Observing this example, you may begin to understand how Atlas earned its nickname the "Terraform for Databases."

Integrating with Terraform

The second piece of bridging the gap is to create a Terraform Provider that wraps Atlas and allows users to define resources that represent the schema definition as part of your infrastructure.

Ariga (the company behind Atlas) is an official HashiCorp Tech Partner that publishes the Atlas Terraform Provider, which was created to solve this problem precisely.

Using the Atlas Terraform Provider, users can finally provision their database instance and its schema in one holistic definition. For example, suppose we provision a MySQL database using AWS RDS:

// Our RDS-based MySQL 8 instance.
resource "aws_db_instance" "atlas-demo" {
identifier = "atlas-demo"
instance_class = "db.t3.micro"
engine = "mysql"
engine_version = "8.0.28"
// Some fields skipped for brevity
}

Next, we load the desired schema from an HCL file, using the Atlas Provider:

data "atlas_schema" "app" {
src = "file://${path.module}/schema.hcl"
}

Finally, we use the atlas_schemaresource to apply our schema to the database:

// Apply the normalized schema to the RDS-managed database.
resource "atlas_schema" "hello" {
hcl = data.atlas_schema.app.hcl
url = "mysql://${aws_db_instance.atlas-demo.username}:${urlencode(random_password.password.result)}@${aws_db_instance.atlas-demo.endpoint}/"
}

You can find a full example here.

When we run terraform apply, this is what will happen:

  • Terraform will provision the RDS database using the AWS Provider
  • Terraform will use Atlas to inspect the existing schema of the database and load the desired state from a local HCL file.
  • Atlas will calculate for Terraform a SQL plan to reconcile between the two.

And this is how it may look like in the Terraform plan:

Terraform will perform the following actions:

# atlas_schema.hello will be created
+ resource "atlas_schema" "hello" {
+ hcl = <<-EOT
table "posts" {
schema = schema.app
column "id" {
null = false
type = int
}
column "user_id" {
null = false
type = int
}
column "title" {
null = false
type = varchar(255)
}
column "body" {
null = false
type = text
}
primary_key {
columns = [column.id]
}
foreign_key "posts_ibfk_1" {
columns = [column.user_id]
ref_columns = [table.users.column.id]
on_update = NO_ACTION
on_delete = CASCADE
}
index "user_id" {
columns = [column.user_id]
}
}
table "users" {
schema = schema.app
column "id" {
null = false
type = int
}
column "user_name" {
null = false
type = varchar(255)
}
column "email" {
null = false
type = varchar(255)
}
primary_key {
columns = [column.id]
}
}
schema "app" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
}
EOT
+ id = (known after apply)
+ url = (sensitive value)
}

# aws_db_instance.atlas-demo will be created
+ resource "aws_db_instance" "atlas-demo" {
// .. redacted for brevity
+ }

And that's how you bridge the gap between IaC and schema management!

Conclusion

In this blog post, we reviewed some exceptional properties of Infrastructure-as-Code tools, such as Terraform, that have led to their widespread adoption and success in the industry. We then reviewed the current state of a similar problem, database schema management, in contrast to these properties. Finally, we showcased Atlas’s ability to adapt some IaC principles into the domain of schema management and how we can unify the two domains using the Atlas Terraform Provider.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.