Skip to main content

One post tagged with "dx"

View All Tags

· 11 min read
Rotem Tamir

Building a loveable migration tool

"I just love dealing with migrations!"

-No developer, ever.

Over the past three years, Ariel, my co-founder and I (along with the rest of our team at Ariga), have been working on Atlas, a database schema-as-code tool. After many years of building software professionally, we've come to realize that one of the most stressful, tedious and error-prone parts of building software is dealing with database migrations.

In case you are unfamiliar with the term, database migrations are the process of changing the structure of a database. When applications evolve, the database schema needs to evolve with them. This is commonly done by writing scripts that describe the changes to the database schema. These scripts are then executed in order to apply the changes to the database. This process has earned the name "migrations" and an infamous reputation among developers.

The secret to building a successful tool for developers is to be relentlessly focused on the user experience. When we started working on Atlas, we spent a long time researching the common issues developers face when dealing with migrations. We wanted to understand the root causes of these issues and design a tool that would solve them.

In this post, I'll share the top 5 usability issues we identified with migration tools and how we addressed them in Atlas.

Issue #1: The weirdest source of truth

It should be possible to provision our environments and build, test, and deploy our software in a fully automated fashion purely from information stored in version control

Forsgren PhD, Nicole; Humble, Jez; Kim, Gene. Accelerate (p. 72). IT Revolution Press. Kindle Edition.

One of the most important principles that came from the DevOps movement is the idea that to achieve effective automation, you need to be able to build everything, deterministically, from a single source of truth.

This is especially important when it comes to databases! The database schema is a super critical part of our application and we better have a way to ensure it is compatible with the code we are deploying.

Classic migration tools (like Flyway and Liquibase) were (to this day) an amazing step forward into automating schema changes, to the point that it is possible to run them as part of your CI/CD pipeline, satisfying the principle above.

But while technically correct, in terms of usability, they provide a very poor developer experience.

Consider this directory structure from a typical Flyway project:

├── V1__create_table1.sql
├── V2__create_table_second.sql
└── V3_1__add_comments.sql

To have a migration directory describe the current schema of a database is like describing your home by listing the needed steps to build it. It's correct, but it's not very useful.

Issue #1: Migrations as a source of truth

Problem: As a human, I can't understand the current state of the database by looking at the migration files. I need to run them in chronological order to understand the current state of the database.

Solution: Use a declarative schema-as-code approach to describe the current state of the database. Then, in every version of our project, we can read the current schema in plain text and understand it.

Issue #2: Manual planning

Classic migration tools typically require you to write the migration scripts manually. This process can be tedious and error-prone, especially if you have developers who don't have a lot of experience operating databases.

Classic migration tools were created at a time when tech-stacks were much simpler and less diverse. In addition, many organizations employed a DBA who could serve as a technical authority for database changes. Nowadays, developers are expected to "own" their databases and DBAs are seriously outnumbered. This means that developers are expected to write migration scripts, even if they don't have a lot of experience with databases.

Doing something, when it's not in your area of expertise, especially when its critical and risky like database migrations, can be a daunting and stressful task. Even if it is something you are good at, it still requires your attention and focus.

It's important to note that some ORMs (such as Django and Prisma) do stand out in their ability to generate migration scripts automatically (with some important limitations). But for the most part, developers are expected to write migration scripts manually.

Issue #2: Manual planning

Problem: Writing migration scripts can be stressful and error-prone, especially for developers who don't have a lot of experience with databases. They need to first know what the current state of the database is, then they need to know what the desired state is, and then they need to write a script that will take the database from the current state to the desired state. Some changes are trivial, but others require consideration and research.

Solution: Use a tool that can automatically generate migration scripts for you. This way, you can focus on the desired state of the database and let the tool figure out how to get there.

Issue #3: Working in parallel

When a project succeeds to the point that it has many developers working on it, it's common to see developers working on different features that require different changes to the database schema. This can lead to conflicts when developers try to merge their changes together.

Classic migration tools don't provide a good way to detect and handle this situation. Because each migration script is created in its own file, common conflict detection tools like git can't help you. Your source control system can't tell you if two developers are working on the same table, or if they are adding columns with the same name.

For this reason, it's common to see teams surprised in production when conflicting migrations are applied. Even worse, in some cases migrations may be applied out of order or skipped entirely, leading to an inconsistent and unknown state of the database.

Issue #3: Working in parallel

Problem: Source control systems can't help you detect conflicts between migration scripts. This can lead to undetected issues in production when they are deployed. To overcome this, teams develop their own processes to coordinate changes to the database schema which slow down development and are error-prone.

Solution: Use a tool that maintains a directory integrity file that will detect conflicting changes and force you to resolve them before you can proceed.

Issue #4: Tracking partial failures

Virtually all classic migration tools maintain a metadata table on the target database that tracks which migrations have been applied. Using this metadata, the tool can determine which migrations need to be applied to bring the database up to date.

When migrations succeed, everything is great. But in my experience, when migrations fail, especially when they fail partially, it can be a nightmare to recover from.

Suppose you planning a migration to version N+1 of the database, from version N. The migration script for version N contains 10 changes. The migration fails after applying the 5th change. If you use a database that supports fully transactional DDL (like Postgres), and all of your changes are transactional, then you are in luck - your migration tool can safely roll back the changes that were applied and your database (revisions table) will remain in version N1.

But if you are using a database that doesn't support transactional DDL, or if your changes are not transactional, then you are in trouble. The migration tool can't rollback the changes that were applied, and your database will in a state that is somewhere in the middle between version N1 and version N+1. All of the migration tools that I know of don't capture this interim state, and so you are left with a database that is in an unknown state and a revision table that is out of sync with the actual state of the database.

The most viewed question about golang-migrate on StackOverflow is about this error:

Dirty database version 2. Fix and force version.

The answer explains:

Before a migration runs, each database sets a dirty flag. Execution stops if a migration fails and the dirty state persists, which prevents attempts to run more migrations on top of a failed migration.

And the solution?

After you clean up you database you can also open schema_migrations table and change Dirty flag and rollback version number to last migration that was successfully applied.

Issue #4: Tracking partial failures

Problem: Classic migration tools don't handle partial failures well. When a migration fails, especially when it fails partially, it can leave your database in an unknown state with the revision table out of sync with the actual state of the database. Resolving this issue requires manual handling of the database which is dangerous and error-prone.

Solution: Use a tool that natively supports transactions where possible, and uses statement-level granularity to keep track of applied changes.

Issue #5: Pre-planned rollbacks

One of the most curious things about classic migration tools is the prevalence of the down migration. The idea is that whenever you write a migration script to take the database from version N to version N+1, then you should also write the down migration that will take the database from version N+1 back to version N.

Why is this curious? Because after interviewing many developers across many organizations in virtually every industry, I have learned that in practice nobody uses down migrations, especially in production.

We have recently written extensively on this topic, but the gist is that down migrations are never used because:

  • They are naive. Pre-planned rollbacks assume that all statements in the up migration have been applied successfully. In practice, rolling back a version happens when things did not work as planned, making the pre-planned rollback obsolete.

  • They are destructive. If you successfully rolled out a migration that added a column to a table, and then decided to revert it, its inverse operation (DROP COLUMN) does not merely remove the column. It deletes all the data in that column. Re-applying the migration would not bring back the data, as it was lost when the column was dropped.

  • They are incompatible with broader rollback mechanisms. In theory, rolling back a deployment should be as simple as deploying the previous version of the application. When it comes to versions of our application code, this works perfectly. We pull the container image that corresponds to the previous version, and we deploy it.

    But what about the database? When we pull artifacts from a previous version, they do not contain the down files that are needed to revert the database changes back to the necessary schema - they were only created in a future commit!

Issue #5: Pre-planned rollbacks

Problem: Pre-planned rollbacks ("down migrations") are never used in practice. They are naive, destructive, and incompatible with broader rollback mechanisms. This leads to a false sense of security which is unraveled when a rollback is actually needed and the user realizes that they will need to handle it manually.

Solution: Use a tool that supports dynamically-planned, safe rollbacks that can revert the database to a previous state without data loss even in cases of partial failures.

Try Atlas today

I believe these usability issues should not be taken lightly. Your team's ability to move fast, refactor and respond to changing requirements is limited by the agility in which you can evolve your database schema. One of the most interesting things that we see with teams that adopt Atlas is seeing them move from occasional, dreaded, avoided-at-all-costs migrations, to planning and deploying hundreds of schema changes a year. This means that they can move faster, respond to customer feedback more quickly, and innovate more effectively.

To try Atlas, head over to the Getting Started today!

Wrapping up

In this article, we have discussed the top 5 usability issues with migration tools today and hinted at how modern migration tools (like Atlas) can address them.

As always, we would love to hear your feedback and suggestions on the Atlas Discord server.