Skip to main content

3 posts tagged with "versioned-migrations"

View All Tags

The Hidden Bias of Alembic and Django Migrations (and when to consider alternatives)

· 9 min read
Rotem Tamir
Building Atlas

Python has been a top programming language for the past decade, known for its simplicity and rich ecosystem. Many companies use it to build web apps and server software, thanks to frameworks like Django and SQLAlchemy.

One of the most common (and often loathed) tasks when building backend applications is managing the database schema. As the app's data model evolves, developers need to modify the database schema to reflect those changes. This is where database schema migration tools come into play.

Why devs ❤️ Django Migrations and Alembic

As far as migration tools go, SQLAlchemy and Django have both built out robust solutions for managing database schema through Alembic and Django Migrations, which stand out as some of the best in the field. They have both been around for a while, becoming popular due to their code-first approach:

  1. First, developers define their database schema as code through Python classes, which are also used at runtime to interact with the database.
  2. Next, the migration tool automatically generates the necessary SQL to apply those changes to the database.

For most projects, these tools work well and are a great fit, but there are some cases where you should consider looking at a specialized schema management tool. In this article, we'll explore some of the limitations of ORM-based migration tools and present Atlas, a database schema-as-code tool that integrates natively with both Django and SQLAlchemy.

The bias of ORM-based migrations

ORMs are commonly distributed with schema management tools. Without a way to set up the database schema, the ORM cannot function, so each ORM must include something that provides a viable developer experience.

The main purpose of ORMs is to abstract the database layer and deliver a roughly uniform experience across different systems (e.g., PostgreSQL, MySQL, SQL Server, etc.). As an abstraction layer, they tend to concentrate on the shared database features (such as tables, indexes, and columns) rather than on more advanced, database-specific capabilities.

Being ORM maintainers ourselves (the team behind Atlas maintains Ent), we can attest that in our capacity as ORM authors, migrations are seen as a "necessary evil", something we have to ship, but really is just an annoying requirement. ORMs exist to bridge code and DB - they are a runtime effort, not a CI/CD or resource management concern.

As such, ORM migration tools tend to be basic, suitable just for the common cases of reading and writing data from tables. In projects that require a more involved schema management process, you might want to consider using a specialized tool like Atlas.

Advanced database features

ORMs scratch the tip of the iceberg of database features

In many systems, the database is viewed simply as a persistence layer, but databases are capable of much more than just storing data. In recent years, there is a growing trend of utilizing databases for more than just CRUD operations. For example, you might want to use your database for:

  • Stored Procedures, Functions, and Triggers: Logic can be encapsulated in stored procedures or triggers that automatically execute on certain events (e.g., inserts, updates, or deletes). This ensures consistent data validation, auditing, or transformation at the database level.
  • Views and Materialized Views: Views are virtual tables generated from a SELECT query, while materialized views store the query results in a physical table that can be indexed. Materialized views can boost performance for frequently accessed or computationally expensive queries.
  • Custom Data Types: Some databases (e.g., PostgreSQL) allow defining custom data types for domain-specific constraints or storing complex structures that exceed standard built-in types.
  • Extensions: Many databases support extensions that add specialized capabilities. For example, PostGIS (an extension for PostgreSQL) provides advanced geospatial data handling and queries.
  • Row-Level Security (RLS): RLS lets you define policies to control which rows a user can see or modify. This is particularly useful for multi-tenant systems or sensitive data scenarios where granular, row-level permissions are required.
  • Sharding: Sharding distributes data across multiple database nodes (or clusters). This approach can enhance performance, fault tolerance, and scalability, especially in high-traffic, large-volume applications.
  • Enumerated Types (ENUM): ENUM types allow you to define a constrained list of valid values for a column (e.g., "small", "medium", "large"). This can help enforce data consistency and prevent invalid entries.

Where Atlas comes in

ORMs typically do not provide a way to manage these advanced database features.

Using Atlas, ORM users can keep using their favorite ORM (e.g SQLAlchemy) but also extend their data model with advanced database features. This is done by utilizing composite_schema, a feature that allows you to define your schema in multiple parts, each part using a different schema source. For example:

data "external_schema" "sqlalchemy" {
program = [
"atlas-provider-sqlalchemy",
"--path", "./models",
"--dialect", "postgresql"
]
}

data "composite_schema" "example" {
// First, load the schema with the SQLAlchemy provider
schema "public" {
url = data.external_schema.sqlalchemy.url
}
// Next, load the additional schema objects from a SQL file
schema "public" {
url = "file://extra_resources.sql"
}
}

env "local" {
src = data.composite_schema.example.url
// ... other configurations
}

In this example, we define a composite schema that combines the SQLAlchemy schema with additional schema objects loaded from a SQL file. This allows you to use the full power of your database while still benefiting from the convenience of ORMs.

Using composite schemas, we can use SQLAlchemy to define a base table and then use a SQL file to define a materialized view that aggregates data from it for faster querying. For instance, let's define a SQLAlchemy model for a user account:

class User(Base):
__tablename__ = "user_account"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
team_name: Mapped[Optional[str]] = mapped_column(String(30))
points: Mapped[int] = mapped_column(Integer)

Then use plain SQL to define a materialized view that aggregates the total points per team:

CREATE MATERIALIZED VIEW team_points AS
SELECT team_name, SUM(points) AS total_points
FROM user_account
GROUP BY team_name;

Atlas will read both the SQLAlchemy model and the SQL file and generate any necessary SQL migration scripts to apply the changes to the database.

CI/CD Pipelines

Although it happens more frequently than you might hope, database schema migrations should not be executed from a developer's workstation. Running migrations in such a way is error-prone and can lead to inconsistencies between your codebase and the database schema.

Instead, migrations should be applied as part of your CI/CD pipeline. This ensures that only code that was reviewed, approved and merged is deployed to production. Additionally, it reduces the need to grant developers direct access to the production database, which can be a security and compliance risk.

Django and SQLAlchemy are unopinionated about how you run migrations in your CI/CD pipeline. They provide the basic building blocks (e.g., manage.py migrate for Django) and leave it up to you to integrate them into your pipeline.

For simple use-cases, this is fine. But as your project grows, you might find yourself needing more control over the migration process. For example, you might want to:

  1. Automate code review. Automatically verify that migrations are safe to apply before running them. Integrating automatic checks into your CI/CD pipeline can help catch issues early and prevent bad migrations from being applied.

  2. Integrate with CD systems. As systems evolve, organizations often adapt more complex deployment strategies that require advanced tooling (e.g GitOps, Infrastucture as Code). Integrating migrations natively into these systems requires a substantial engineering effort (e.g, writing a Kubernetes Operator or Terraform provider).

  3. Monitor schema drift. As much as we'd like to believe that production environments are air-tight, and never touched by human hands, the reality is that things happen. Monitoring schema drift can help you catch unexpected changes to your database schema and take corrective action before they cause issues.

Atlas ships with native integrations for popular CI/CD systems like GitHub Actions, GitLab CI, BitBucket Pipelines, Kubernetes, Terraform, and more. This allows you to easily integrate schema management into your existing CI/CD pipelines without having to write brittle custom scripts or plugins.

One migration tool to rule them all

If your company's tech stack is uniform and everyone is using the same ORM and database system, you might not be worried about the need to standardize on a single migration tool, but as companies grow, the tech stack can become more diverse.

This is especially true when adopting a microservices architecture as different teams might be using different ORMs, languages or database systems. While this is great for flexibility, it can make it can make it very difficult for platform, security, and compliance functions to ensure that all teams are following the same best practices. This is where choosing a single migration tool can help.

Atlas is designed to be a universal schema management tool that can be used across different ORMs, languages, and database systems. It provides a consistent experience for managing database schema, regardless of the underlying technology stack.

By providing a plugin system that can provide bindings to different ORMs and database systems, Atlas can be to be the common denominator for schema management across your organization.

Conclusion

Django Migrations and Alembic are great tools that have served Python developers well. They make schema changes simple and work seamlessly with their respective ORMs. But ORMs focus on abstracting databases, not managing them.

For teams that need more advanced database features, better CI/CD integration, or consistency across multiple stacks — a dedicated schema management tool like Atlas can help. It works alongside ORMs, letting developers define schema as code while keeping full control over the database.

If you're running into the limits of ORM-based migrations, give Atlas a try!

Versioned migrations on Kubernetes with the Atlas Operator

· 7 min read
Rotem Tamir
Building Atlas
TL;DR

You can now use the Atlas Kubernetes Operator to run versioned migrations natively from your Kubernetes cluster.

See an example

Introduction

The Atlas Kubernetes Operator is a Kubernetes operator that enables you to manage your database schemas natively from your Kubernetes cluster. By exposing custom resource definitions (CRD) the operator extends the Kubernetes API to support database schema management.

In a previous blog post we demonstrated how to use the Atlas Operator for the declarative (state-based) workflow in which you define the desired state of your database schema in a Kubernetes manifest and the operator takes care of the rest.

State vs. versioned based migrations is a common and unresolved debate in the database schema management world, and we built Atlas to support both from the get-go.

Today, we are happy to announce v0.2.0 of the Atlas Kubernetes Operator which adds support support for the versioned migration workflow.

In this blog post we will demonstrate how to use the Atlas Operator this new workflow.

How it works

The Atlas Kubernetes Operator supports versioned migrations. In versioned migrations, the database schema is defined by a series of SQL scripts ("migrations") that are applied in order. The user can specify the version and migration directory to run, which can be located on the Atlas Cloud or stored as a ConfigMap in your Kubernetes cluster.

In this workflow, after installing the Atlas Kubernetes Operator, the user defines the desired state of the database as an AtlasMigration resource which connects between a target database and a migration directory. The migration directory may be configured as a remote directory in Atlas Cloud or as a ConfigMap in your Kubernetes cluster.

The operator then reconciles the desired state with the actual state of the database by applying any pending migrations on the target database.

Demo time

In this demo we will use the Atlas Kubernetes Operator to run versioned migrations on a MySQL database.

Prerequisites

  1. A Kubernetes cluster - you can use Minikube to quickly spin up a local cluster.
  2. kubectl configured to connect to your cluster.
  3. Helm, the Kubernetes package manager, locally installed.
  4. The Atlas CLI tool, locally installed.

1. Install the Atlas Kubernetes Operator

The Atlas Kubernetes Operator is available as a Helm Chart. To install the chart with the release name atlas-operator:

helm install atlas-operator oci://ghcr.io/ariga/charts/atlas-operator

2. Install a database

Create a MySQL database and a secret with an Atlas URL to the database:

kubectl apply -f https://raw.githubusercontent.com/ariga/atlas-operator/65dce84761354d1766041c7f286b35cc24ffdddb/config/integration/databases/mysql.yaml

Result:

deployment.apps/mysql created
service/mysql created
secret/mysql-credentials created

In this example, we are using a plain MySQL pod as a database. In a real-world scenario, you would probably use a managed database service such as Amazon RDS or Google Cloud SQL.

3. Set up a migration directory

With the operator and the database running, let's set up the migration directory which we will use to manage our database schema.

You can use the directory from an existing project, but for the sake of this demo we will use the Atlas template repo which contains a simple migration directory.

git clone git@github.com:ariga/atlas-template.git versioned-demo

Observe this directory contains a migrations directory with a couple of migration scripts:

cd versioned-demo
tree migrations
tree migrations
migrations
├── 20230316085611.sql
├── 20230316090502.sql
└── atlas.sum

4. Create a ConfigMap with the migration directory

The operator supports two ways to manage your migration directory:

  • Atlas Cloud - a cloud-based directory that is managed by Atlas.
  • ConfigMap - a Kubernetes resource that contains the migration directory files as key-value pairs.

In this demo we will use a ConfigMap to manage our migration directory. To create a ConfigMap with the migration directory files:

kubectl create configmap migrations --from-file=migrations -o yaml --dry-run=client --save-config > migrations.yaml

The above command creates a YAML named migrations.yaml file with the migration directory files. It should look something like this:

apiVersion: v1
data:
20230316085611.sql: |
-- Create "users" table
CREATE TABLE `users` (
`id` int NOT NULL,
`user_name` varchar(255) NOT NULL,
`email` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
20230316090502.sql: |
-- Create "posts" table
CREATE TABLE `posts` (
`id` int NOT NULL,
`user_id` int NOT NULL,
`title` varchar(255) NOT NULL,
`body` text NOT NULL,
PRIMARY KEY (`id`),
INDEX `user_id` (`user_id`),
CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE
) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
atlas.sum: |
h1:XBXbh+rzLis8gknjlIqnxXLBkOZ+sN2v2p7KjyVFYYM=
20230316085611.sql h1:br6W6LPEnnsejlz/7hRm9zthwStCzjN2vZkqVPxlmvo=
20230316090502.sql h1:GfeRjkSeoCt3JVRtLQNa/r50lRfpAPXS7AqTU2ZNFgY=
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"migrations","creationTimestamp":null},"data":{"20230316085611.sql":"-- Create \"users\" table\nCREATE TABLE `users` (\n `id` int NOT NULL,\n `user_name` varchar(255) NOT NULL,\n `email` varchar(255) NOT NULL,\n PRIMARY KEY (`id`)\n) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;\n","20230316090502.sql":"-- Create \"posts\" table\nCREATE TABLE `posts` (\n `id` int NOT NULL,\n `user_id` int NOT NULL,\n `title` varchar(255) NOT NULL,\n `body` text NOT NULL,\n PRIMARY KEY (`id`),\n INDEX `user_id` (`user_id`),\n CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON UPDATE NO ACTION ON DELETE CASCADE\n) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;\n","atlas.sum":"h1:XBXbh+rzLis8gknjlIqnxXLBkOZ+sN2v2p7KjyVFYYM=\n20230316085611.sql h1:br6W6LPEnnsejlz/7hRm9zthwStCzjN2vZkqVPxlmvo=\n20230316090502.sql h1:GfeRjkSeoCt3JVRtLQNa/r50lRfpAPXS7AqTU2ZNFgY=\n"}}
name: migrations

Apply the ConfigMap to your cluster:

kubectl apply -f migrations.yaml

Kubernetes will create a ConfigMap named migrations with the migration directory files:

configmap/migrations created

5. Create an AtlasMigration resource

Now that we have a database and a migration directory, we can create an AtlasMigration resource to manage our database schema. The AtlasMigration resource is a custom resource that you use to define the desired state of your database schema. The operator will then reconcile the actual state of your database schema with the desired state.

To create an AtlasMigration resource, create a YAML file named atlas-migration.yaml with the following content:

apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasMigration
metadata:
name: atlas-migration
spec:
urlFrom:
secretKeyRef:
key: url
name: mysql-credentials
dir:
configMapRef:
name: "migrations"

After you create the atlas-migration.yaml file, apply it to your cluster:

kubectl apply -f atlas-migration.yaml

Next, let's wait for the resource to enter a "Ready" state:

kubectl wait --for=condition=Ready atlasmigration/atlas-migration

When the operator finishes reconciling the AtlasMigration resource, the AtlasMigration resource will be ready:

atlasmigration.db.atlasgo.io/atlas-migration condition met

6. Verify the migrations were applied

Finally, to verify the migrations were applied, connect to the database and check to see if the users table was created:

kubectl exec -it $(kubectl get pods -l app=mysql -o jsonpath='{.items[0].metadata.name}') -- mysql -uroot -ppass -e "describe myapp.users"

You should see the following output:

+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id | int | NO | PRI | NULL | |
| user_name | varchar(255) | NO | | NULL | |
| email | varchar(255) | NO | | NULL | |
+-----------+--------------+------+-----+---------+-------+

Check that the posts table was created as well:

kubectl exec -it $(kubectl get pods -l app=mysql -o jsonpath='{.items[0].metadata.name}') -- mysql -uroot -ppass -e "describe myapp.posts"

You should see the following output:

+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| id | int | NO | PRI | NULL | |
| user_id | int | NO | MUL | NULL | |
| title | varchar(255) | NO | | NULL | |
| body | text | NO | | NULL | |
+---------+--------------+------+-----+---------+-------+

And that's it! You've successfully deployed the Atlas Operator and applied migrations to your database.

Conclusion

In this blog post, we showed you how to use the Atlas Operator to manage your database schema in Kubernetes using a versioned migrations workflow. To learn more about the Atlas Operator, check out the Atlas Operator GitHub repository as well as the documentation on the Atlas website.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

Atlas Terraform Provider v0.4.0: HashiCorp partnerships, versioned migrations and more

· 5 min read
Tran Minh Giau
Software Engineer

Introduction

Today we are very excited to announce the release of Atlas Terraform Provider v0.4.0. This release brings some exciting new features and improvements to the provider which we will describe in this post.

In addition, this release is the first to be published under our new partnership with HashiCorp as a Technology Partner. Atlas is sometimes described as a "Terraform for Databases", so we have high hopes that this partnership will help us to bring many opportunities to create better ways for integrating database schema management into IaC workflows.

What's new

When people first hear about integrating schema management into declarative workflows, many raise the concern that because making changes to the database is a high-risk operation, they would not trust a tool to do it automatically.

This is a valid concern, and this release contains three new features that we believe will help to address it:

  • SQL plan printing
  • Versioned migrations support
  • Migration safety verification

SQL Plan Printing

In previous versions of the provider, we displayed the plan as a textual diff showing which resources are added, removed or modified. With this version, the provider will also print the SQL statements that will be executed as part of the plan.

For example, suppose we have the following schema:

schema "market" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
comment = "A schema comment"
}

table "users" {
schema = schema.market
column "id" {
type = int
}
column "name" {
type = varchar(255)
}
primary_key {
columns = [
column.id
]
}
}

And our Terraform module looks like this:

terraform {
required_providers {
atlas = {
source = "ariga/atlas"
version = "0.4.0"
}
}
}

provider "atlas" {}

data "atlas_schema" "market" {
src = file("${path.module}/schema.hcl")
dev_db_url = "mysql://root:pass@localhost:3307"
}

resource "atlas_schema" "market" {
hcl = data.atlas_schema.market.hcl
url = "mysql://root:pass@localhost:3306"
dev_db_url = "mysql://root:pass@localhost:3307"
}

When we run terraform plan we will see the following output:

Plan: 1 to add, 0 to change, 0 to destroy.

│ Warning: Atlas Plan

│ with atlas_schema.market,
│ on main.tf line 17, in resource "atlas_schema" "market":
│ 17: resource "atlas_schema" "market" {

│ The following SQL statements will be executed:


│ -- add new schema named "market"
│ CREATE DATABASE `market`
│ -- create "users" table
│ CREATE TABLE `market`.`users` (`id` int NOT NULL, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`)) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci

Versioned migrations

Atlas supports two types of workflows: Declarative and Versioned. With declarative workflows, the plan to migrate the database is generated automatically at runtime. Versioned migrations provide teams with a more controlled workflow where changes are planned, checked-in to source control and reviewed ahead of time. Until today, the Terraform provider only supported the declarative workflow. This release adds support for versioned migrations as well.

Suppose we have the following migration directory of two files:

20221101163823_create_users.sql
CREATE TABLE `users` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`age` bigint(20) NOT NULL,
`name` varchar(255) COLLATE utf8mb4_bin NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `age` (`age`)
);
h1:OlaV3+7xXEWc1uG/Ed2zICttHaS6ydHZmzI7Hpf2Fss=
20221101163823_create_users.sql h1:mZirkpXBoLLm+M73EbHo07muxclifb70fhWQFfqxjD4=

We can use the Terraform Atlas provider to apply this migration directory to a database:

terraform {
required_providers {
atlas = {
source = "ariga/atlas"
version = "0.4.0"
}
}
}

provider "atlas" {}

// The `atlas_migration` data source loads the current state of the given database
// with regard to the migration directory.
data "atlas_migration" "hello" {
dir = "migrations?format=atlas"
url = "mysql://root:pass@localhost:3306/hello"
}

// The `atlas_migration` resource applies the migration directory to the database.
resource "atlas_migration" "hello" {
dir = "migrations?format=atlas"
version = data.atlas_migration.hello.latest # Use latest to run all migrations
url = data.atlas_migration.hello.url
dev_url = "mysql://root:pass@localhost:3307/test"
}

Running terraform plan will show the following output:

data.atlas_migration.hello: Reading...
data.atlas_migration.hello: Read complete after 0s [id=migrations?format=atlas]

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

# atlas_migration.hello will be created
+ resource "atlas_migration" "hello" {
+ dev_url = (sensitive value)
+ dir = "migrations?format=atlas"
+ id = (known after apply)
+ status = (known after apply)
+ url = (sensitive value)
+ version = "20221101163823"
}

Plan: 1 to add, 0 to change, 0 to destroy.

Linting

Atlas provides extensive support for linting database schemas. This release adds support for linting schemas as part of the Terraform plan. This means that you can now run terraform plan and see if there are any linting errors in your schema. This is especially useful when you are using the versioned migrations workflow, as you can now run terraform plan to see if there are any linting errors in your schema before you apply the changes.

Suppose we add the following migration:

20221101165036_change_unique.sql
ALTER TABLE users
DROP KEY age,
ADD CONSTRAINT NAME UNIQUE (`name`);

If we run terraform plan on the above schema, Terraform prints the following warning:


│ Warning: data dependent changes detected

│ with atlas_migration.hello,
│ on main.tf line 20, in resource "atlas_migration" "hello":
│ 20: resource "atlas_migration" "hello" {

│ File: 20221101165036_change_unique.sql

│ - MF101: Adding a unique index "NAME" on table "users" might fail in case column
│ "name" contains duplicate entries

Atlas detected that the migration may fail in case the column name contains duplicate entries! This is a very useful warning that can help you avoid unpredicted failed deployments. Atlas supports many more safety checks, which you can read about here.

Wrapping up

In this blogpost we have discussed three new features that were added to the Terraform Atlas provider that are designed to make it safer and more predictable to manage your database schemas with Terraform. We hope you will enjoy this release!

Have questions? Feedback? Feel free to reach out on our Discord server.