Skip to main content

One post tagged with "alembic"

View All Tags

The Hidden Bias of Alembic and Django Migrations (and when to consider alternatives)

· 9 min read
Rotem Tamir
Building Atlas

Python has been a top programming language for the past decade, known for its simplicity and rich ecosystem. Many companies use it to build web apps and server software, thanks to frameworks like Django and SQLAlchemy.

One of the most common (and often loathed) tasks when building backend applications is managing the database schema. As the app's data model evolves, developers need to modify the database schema to reflect those changes. This is where database schema migration tools come into play.

Why devs ❤️ Django Migrations and Alembic

As far as migration tools go, SQLAlchemy and Django have both built out robust solutions for managing database schema through Alembic and Django Migrations, which stand out as some of the best in the field. They have both been around for a while, becoming popular due to their code-first approach:

  1. First, developers define their database schema as code through Python classes, which are also used at runtime to interact with the database.
  2. Next, the migration tool automatically generates the necessary SQL to apply those changes to the database.

For most projects, these tools work well and are a great fit, but there are some cases where you should consider looking at a specialized schema management tool. In this article, we'll explore some of the limitations of ORM-based migration tools and present Atlas, a database schema-as-code tool that integrates natively with both Django and SQLAlchemy.

The bias of ORM-based migrations

ORMs are commonly distributed with schema management tools. Without a way to set up the database schema, the ORM cannot function, so each ORM must include something that provides a viable developer experience.

The main purpose of ORMs is to abstract the database layer and deliver a roughly uniform experience across different systems (e.g., PostgreSQL, MySQL, SQL Server, etc.). As an abstraction layer, they tend to concentrate on the shared database features (such as tables, indexes, and columns) rather than on more advanced, database-specific capabilities.

Being ORM maintainers ourselves (the team behind Atlas maintains Ent), we can attest that in our capacity as ORM authors, migrations are seen as a "necessary evil", something we have to ship, but really is just an annoying requirement. ORMs exist to bridge code and DB - they are a runtime effort, not a CI/CD or resource management concern.

As such, ORM migration tools tend to be basic, suitable just for the common cases of reading and writing data from tables. In projects that require a more involved schema management process, you might want to consider using a specialized tool like Atlas.

Advanced database features

ORMs scratch the tip of the iceberg of database features

In many systems, the database is viewed simply as a persistence layer, but databases are capable of much more than just storing data. In recent years, there is a growing trend of utilizing databases for more than just CRUD operations. For example, you might want to use your database for:

  • Stored Procedures, Functions, and Triggers: Logic can be encapsulated in stored procedures or triggers that automatically execute on certain events (e.g., inserts, updates, or deletes). This ensures consistent data validation, auditing, or transformation at the database level.
  • Views and Materialized Views: Views are virtual tables generated from a SELECT query, while materialized views store the query results in a physical table that can be indexed. Materialized views can boost performance for frequently accessed or computationally expensive queries.
  • Custom Data Types: Some databases (e.g., PostgreSQL) allow defining custom data types for domain-specific constraints or storing complex structures that exceed standard built-in types.
  • Extensions: Many databases support extensions that add specialized capabilities. For example, PostGIS (an extension for PostgreSQL) provides advanced geospatial data handling and queries.
  • Row-Level Security (RLS): RLS lets you define policies to control which rows a user can see or modify. This is particularly useful for multi-tenant systems or sensitive data scenarios where granular, row-level permissions are required.
  • Sharding: Sharding distributes data across multiple database nodes (or clusters). This approach can enhance performance, fault tolerance, and scalability, especially in high-traffic, large-volume applications.
  • Enumerated Types (ENUM): ENUM types allow you to define a constrained list of valid values for a column (e.g., "small", "medium", "large"). This can help enforce data consistency and prevent invalid entries.

Where Atlas comes in

ORMs typically do not provide a way to manage these advanced database features.

Using Atlas, ORM users can keep using their favorite ORM (e.g SQLAlchemy) but also extend their data model with advanced database features. This is done by utilizing composite_schema, a feature that allows you to define your schema in multiple parts, each part using a different schema source. For example:

data "external_schema" "sqlalchemy" {
program = [
"atlas-provider-sqlalchemy",
"--path", "./models",
"--dialect", "postgresql"
]
}

data "composite_schema" "example" {
// First, load the schema with the SQLAlchemy provider
schema "public" {
url = data.external_schema.sqlalchemy.url
}
// Next, load the additional schema objects from a SQL file
schema "public" {
url = "file://extra_resources.sql"
}
}

env "local" {
src = data.composite_schema.example.url
// ... other configurations
}

In this example, we define a composite schema that combines the SQLAlchemy schema with additional schema objects loaded from a SQL file. This allows you to use the full power of your database while still benefiting from the convenience of ORMs.

Using composite schemas, we can use SQLAlchemy to define a base table and then use a SQL file to define a materialized view that aggregates data from it for faster querying. For instance, let's define a SQLAlchemy model for a user account:

class User(Base):
__tablename__ = "user_account"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
team_name: Mapped[Optional[str]] = mapped_column(String(30))
points: Mapped[int] = mapped_column(Integer)

Then use plain SQL to define a materialized view that aggregates the total points per team:

CREATE MATERIALIZED VIEW team_points AS
SELECT team_name, SUM(points) AS total_points
FROM user_account
GROUP BY team_name;

Atlas will read both the SQLAlchemy model and the SQL file and generate any necessary SQL migration scripts to apply the changes to the database.

CI/CD Pipelines

Although it happens more frequently than you might hope, database schema migrations should not be executed from a developer's workstation. Running migrations in such a way is error-prone and can lead to inconsistencies between your codebase and the database schema.

Instead, migrations should be applied as part of your CI/CD pipeline. This ensures that only code that was reviewed, approved and merged is deployed to production. Additionally, it reduces the need to grant developers direct access to the production database, which can be a security and compliance risk.

Django and SQLAlchemy are unopinionated about how you run migrations in your CI/CD pipeline. They provide the basic building blocks (e.g., manage.py migrate for Django) and leave it up to you to integrate them into your pipeline.

For simple use-cases, this is fine. But as your project grows, you might find yourself needing more control over the migration process. For example, you might want to:

  1. Automate code review. Automatically verify that migrations are safe to apply before running them. Integrating automatic checks into your CI/CD pipeline can help catch issues early and prevent bad migrations from being applied.

  2. Integrate with CD systems. As systems evolve, organizations often adapt more complex deployment strategies that require advanced tooling (e.g GitOps, Infrastucture as Code). Integrating migrations natively into these systems requires a substantial engineering effort (e.g, writing a Kubernetes Operator or Terraform provider).

  3. Monitor schema drift. As much as we'd like to believe that production environments are air-tight, and never touched by human hands, the reality is that things happen. Monitoring schema drift can help you catch unexpected changes to your database schema and take corrective action before they cause issues.

Atlas ships with native integrations for popular CI/CD systems like GitHub Actions, GitLab CI, BitBucket Pipelines, Kubernetes, Terraform, and more. This allows you to easily integrate schema management into your existing CI/CD pipelines without having to write brittle custom scripts or plugins.

One migration tool to rule them all

If your company's tech stack is uniform and everyone is using the same ORM and database system, you might not be worried about the need to standardize on a single migration tool, but as companies grow, the tech stack can become more diverse.

This is especially true when adopting a microservices architecture as different teams might be using different ORMs, languages or database systems. While this is great for flexibility, it can make it can make it very difficult for platform, security, and compliance functions to ensure that all teams are following the same best practices. This is where choosing a single migration tool can help.

Atlas is designed to be a universal schema management tool that can be used across different ORMs, languages, and database systems. It provides a consistent experience for managing database schema, regardless of the underlying technology stack.

By providing a plugin system that can provide bindings to different ORMs and database systems, Atlas can be to be the common denominator for schema management across your organization.

Conclusion

Django Migrations and Alembic are great tools that have served Python developers well. They make schema changes simple and work seamlessly with their respective ORMs. But ORMs focus on abstracting databases, not managing them.

For teams that need more advanced database features, better CI/CD integration, or consistency across multiple stacks — a dedicated schema management tool like Atlas can help. It works alongside ORMs, letting developers define schema as code while keeping full control over the database.

If you're running into the limits of ORM-based migrations, give Atlas a try!