Skip to main content

Ariga raises 18M$ to build the future of working with databases

· 6 min read
Ariel Mashraki
Building Atlas

Today, we are delighted to share that Ariga (the company behind Atlas) has raised 18M$ in total from Tiger Global and TLV Partners to build our vision for the future of how organizations work with their databases.

This is an important milestone for us as it provides us with the resources we need to keep growing our open-source communities and build the products we envision for the industry. I want to take this opportunity to reflect on the work we've done in the past two+ years, and where we are going as a company.

Background

Databases play a crucial role in driving a company's success. From the outset, companies invest time in selecting a suitable database for building their product, modeling its schema, and building their application on top of it. As companies grow, they face challenges in performance, security, privacy, and more. Databases remain a vital component, accompanying businesses from the earliest stages and throughout their entire journey.

Ariga was founded after a decade of experience in building database infrastructure, and dev tooling frameworks. Our focus in the last two years has been on three fundamental pillars:

  • Pioneering the “Schema-as-Code” concept within the industry
  • Building an innovative database change management platform
  • Open sourcing a versatile framework for developing data-driven server applications

I want to elaborate on these pillars and describe how they shaped our mission statement as an engineering team and a company.

Schema-as-Code

One of the core principles we advocate for is “Schema-as-Code”, which refers to the practice of defining and managing the database schema through code. Similar to Infrastructure as Code (IaC), which is used to manage infrastructure resources, SaC is about defining database schema elements in code format.

As with IaC, being able to capture the desired state of application data topologies (which are comprised of the database schemas, the connections between them, and the policies that govern them) unlocks the ability to automate many operations around them such as:

  • Change management (provisioning, migrations, data migrations)
  • Monitoring (drift detection, performance)
  • Governance (enforcing access, privacy, and change policies)
  • Collaboration (notifications, workflows, catalog/visibility)
  • Runtime (API layers, caching, multi-tenancy, auditing, deletion)

Being able to capture not only the schema structure but also its behavior, access permissions, methods of access, and more. Adopting this approach simplifies management, ensures the schema is version-controlled, and allows the automation of most database operations.

From this core concept which we believe can have an outsized impact on the industry, similar to IaC, stem our core activities:

Atlas, our OSS schema management engine, was developed to bring the SaC concept to life, understanding that without a practical implementation, the theoretical idea couldn't truly showcase its benefits. By making Atlas open-source, we encourage broader adoption of the SaC concept.

Ent, our OSS application framework embodies the concept of schema-as-code. Define your application data topology in code: entities, relations, policies, behaviors, and have everything created automatically for the user from there.

Database Change Management

Databases are the most critical component of almost any software system; database downtime is equivalent to business downtime. Being stateful components, they cannot be easily rebuilt, scaled out, or fixed by a restart. Outages that involve damage to data or simply unavailability of the database are notoriously hard to manage and recover from, often taking long hours of careful work by a team's most senior engineers.

On the other hand, businesses are required to evolve faster than ever. Any change to the business carries a change to its underlying data model, which means modifying the database schema. Teams that can evolve their data model faster can respond quickly to market demands and customer feedback and beat the competition.

This puts organizations in a dilemma. To protect business stability, databases should seldom be changed to reduce the risk of an expensive outage. Simultaneously, database schemas must be constantly modified to accommodate the evolution of the business.

Ariga provides software engineering teams with the fastest and safest way to manage database schema changes. With our platform:

  • Software developers do not need to worry about planning safe changes to the database schema. By using a simple data definition language, developers declare the desired schema of their database and our engine plans and applies the changes for them.
  • Teams can define and enforce change policies during continuous integration to prevent dangerous changes way before they happen.
  • Operators can deploy schema changes to production with peace of mind as dangerous changes are verified and simulated against real production data.

Database Runtime Access

Ariga was founded by the creators of Ent, a Linux Foundation-backed project beloved by software engineers in companies of various sizes and industries, from top firms to small startups.

Ent is an entity framework that provides application developers with robust, privacy-aware APIs for managing data, reducing 90% of repetitive code by generating efficient and fast GraphQL, REST, or gRPC servers. Ent users value it for its exceptional development experience and time-saving capabilities, often returning to it as a tool of choice for new projects.

Even though we actively engaged in our open-source communities, providing assistance on a daily basis to many users on our GitHub, Discord, and Slack channels, we've learned that some companies need additional support beyond the standard open-source options to thrive.

To address the needs of these organizations, we are launching our Ent-erprise Support program. Besides the dedicated support, our aim for this pillar is to solve performance, privacy, and runtime issues that reduce time and effort from engineers and allow them to move much faster.

Some of the problems we address involve multi-tenancy, performance enhancements, secure database access, and improved GraphQL/REST layers that automatically connect to Ent and Atlas. These components are seamlessly integrated not only with Ent but also with other frameworks, making them suitable for a broad range of companies.

Our Mission Statement

In conclusion, Ariga's mission focuses on empowering businesses through integrated database management solutions that span the entire product lifecycle. Our technology ranges from the initial stages of the data definition to ongoing database change management, and during data access at application runtime.

Using our expertise in these areas and commitment to the open-source, we aim to revolutionize the way companies manage their database schemas, allowing them to respond quickly to customer demands and market changes while maintaining high-quality and stable infrastructure.

Best,

a8m

Announcing Atlas v0.12: Cloud State Management

· 5 min read
Ariel Mashraki
Building Atlas

It has been two months since we announced the Community Preview Plan for Atlas Cloud, and today I am thrilled to announce the next batch of features that we are releasing to open-source and to Atlas Cloud:

In summary, version v0.12 includes a few major features that are explained in detail below:

  1. We have added support for importing and running migration linting on GitHub PRs for external migration formats, such as Flyway and golang-migrate.
  2. Atlas now supports reading your migration directory directly from your Atlas Cloud account. This eliminates the need for users to build their Docker images with the directory content and makes running schema migrations in production much easier.
  3. By connecting Atlas CLI to Atlas Cloud, migration runs will be recorded in the cloud account, making it easier to monitor and troubleshoot executed migrations.
  4. A new Slack integration is now available for Community Plan accounts. Organizations that connect their migration directories to the cloud can receive notifications to Slack channels when the schemas are updated or deployed, among other events.
  5. A new look has been given to the CI report page. It will be enhanced with additional features in the next version.

Remote Directory State

One of the most common complaints we received from our users is that setting up migration deployments for real-world environments is time-consuming. For each service, users are required to build a Docker image that includes the content of the migration directory, which ensures it is available when Atlas is executed. After this, the built image must be pushed to a registry, and finally, the deployment process needs to be configured to use this newly created image.

Not only does this process add complexity to the setup, but it is also repetitive for each migration directory and involves setting up a CI/CD pipeline for each service, adding another layer of complexity.

Atlas supports the concept of Data Sources, which enables users to retrieve information stored in an external service or database. In this release, we are introducing a new data source called remote_dir. This feature allows users to configure Atlas to read the content of the migration directory directly from their cloud account, thereby eliminating the need to build Docker images with the directory content.

Here is an example of how to configure the remote_dir data source:

atlas.hcl
variable "cloud_token" {
type = string
}

atlas {
cloud {
token = var.cloud_token
}
}

data "remote_dir" "migrations" {
// The name of the migration directory in Atlas Cloud.
// In this example, the directory is named "graph".
name = "graph"
}

env {
// Set environment name dynamically based on --env value.
name = atlas.env
migration {
dir = data.remote_dir.migrations.url
}
}
atlas migrate apply \
--url "<DATABASE_URL>" \
--config file://path/to/atlas.hcl \
--env prod \
--var cloud_token="<ATLAS_TOKEN>"

Visualizing Migration Runs

Schema migrations are an integral part of application deployments, yet the setup might vary between different applications and teams. Some teams may prefer using init-containers, while others run migrations from CD pipeline. There are also those who opt for Helm upgrade hooks or use our Kubernetes operator. The differences also apply to databases. Some applications work with one database, while others manage multiple databases, often seen in multi-tenant applications.

However, across all these scenarios, there's a shared need for a single place to view and track the progress of executed schema migrations. This includes triggering alerts and providing the means to troubleshoot and manage recovery if problems arise.

Starting from version v0.12, if the cloud configuration was set with a valid token, Atlas will log migration runs in your cloud account. Here's a demonstration of how it looks in action:

We have several new features lined up for the Community Plan in the next release. If you're interested in them earlier, don't hesitate to reach out to me in our Discord community.

Slack Webhooks

In this release, we're making our Slack Webhooks integration available to all users, promoting better team collaboration and providing instant alerts when issues occur. This new feature allows different groups within the organization, such as the data engineering team, to receive notifications when the schema changes. Or, ping the on-call when deployment fails.

If you're interested in enabling this feature for your project, please check out the documentation.

Screenshot example

What's next?

There's a lot more coming in the following months. Our next releases will be focused on making other database objects such as views, triggers, and policies accessible to all Atlas users. We'll also continue to make more features from our commercial product available to both open-source and community preview users.

As always, we value community feedback and strive to be responsive to it. Please feel free to reach out and share your feedback with us on our Discord if you think something is missing or could be improved. Cheers!

Quickly visualize your Django schemas with DjangoViz

· 3 min read
DjangoViz is Deprecated

DjangoViz has been deprecated. Please refer to the Atlas Django Provider documentation for up to date instructions.

Having a visual representation of your data model can be helpful as it allows for easier comprehension of complex data structures, and enables developers to better understand and collaborate on the data model of the application they are building.

Entity relationship diagrams (ERDs) are a common way to visualize data models, by showing how data is stored in the database. ERDs are graphical representations of the entities, their attributes, and the way these entities are related to each other.

Today we are happy to announce the release of DjangoViz, a new tool for automatically creating ERDs from Django data models.

Django is an open source Python framework for building web applications quickly and efficiently. In this blog post, I will introduce DjangoViz and demonstrate how to use it for generating Django schema visualizations using the Atlas playground.

img

Django ORM

Django ORM is a built-in module in the Django web framework. It offers a high-level abstraction layer that enables developers to define complex application data models with ease. Unlike traditional ORM frameworks that rely on tables and foreign keys, Django models are defined using Python objects and relationships:

from django.db import models

class User(models.Model):
username = models.CharField(max_length=255)
email = models.EmailField(unique=True)
password = models.CharField(max_length=255)

class Post(models.Model):
title = models.CharField(max_length=255)
content = models.TextField()
author = models.ForeignKey(User, on_delete=models.CASCADE)

When the application runs, Django translates these Python models into database schemas, mapping each model to a corresponding database table and each field to a corresponding column in the table.
When working with schemas and making changes to them, being able to understand the full picture just through code can get complicated very quickly. To help developers better understand their schema, we have created DjangoViz.

Introducing DjangoViz

For the purpose of this demo, we will follow the Django getting started tutorial, and showcase how you can use DjangoViz to visualize the default models included by Django's startproject command.

First, install Django and create a new project:

pip install Django
django-admin startproject atlas_demo
cd atlas_demo

Install the DjangoViz package:

pip install djangoviz

Add DjangoViz to your Django project's INSTALLED_APPS in atlas_demo/settings.py:

INSTALLED_APPS = [
...,
'djangoviz',
...
]

DjangoViz supports either PostgreSQL or MySQL, in this example we will use PostgreSQL:

Install the PostgreSQL driver:

pip install psycopg2-binary

Configure the database to work with PostgreSQL in the settings.py file:

DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
"NAME": "postgres",
"USER": "postgres",
"PASSWORD": "pass",
"HOST": "127.0.0.1",
"PORT": "5432",
}
}

Start a PostgreSQL container:

docker run --rm -p 5432:5432  -e POSTGRES_PASSWORD=pass -d postgres:15

Now, you can visualize your schema by running the djangoviz management command from your new project directory:

python manage.py djangoviz

You will get a public link to your visualization, which will present an ERD and the schema itself in SQL or HCL:

Here is a public link to your schema visualization:
https://gh.atlasgo.cloud/explore/ac523fef

When clicking on the link you will see the ERD of your new project:

img

Wrapping up

In this post, we discussed DjangoViz, a new tool that helps to quickly visualize Django schemas. With this tool, you can easily get an overview of the data model and visual of your schema. We would love to hear your thoughts and feedback if you decide to give it a go!

Have questions? Feedback? Find our team on our Discord server ❤️.

Announcing the Atlas HCL VSCode Extension

· 3 min read

Introduction

Atlas supports many ways to describe database schemas. Using Schema Loaders, Atlas gives users the flexibility to define schemas using plain SQL, a connection to another database or using Atlas HCL.

What is Atlas HCL?

Atlas includes a data-definition language based on a Terraform-like syntax. This language, which was designed specifically for describing database schemas is based on HashiCorp's HCL toolkit for creating structured configuration languages.

This data definition language is easy to read, edit, and visualize, and it allows us to organize our schema data in a efficient way that has many benefits to using plain SQL.

What is the Atlas HCL VSCode Extension?

To make it easier for users to define schemas using Atlas HCL, we're happy to announce the release of the official Atlas HCL VSCode Extension. This extension provides a rich set of features that will help users define schemas using Atlas HCL.

In this post, we'll take a closer look at the features of this extension and how it can benefit your work.

Features

Auto Completion

The extension provides auto completion support for all SQL resources available within Atlas. For a comprehensive list of the supported resources, please visit this document: SQL Resources.

Completion

Batch Completion

Referencing Qualified Tables

This refers to a feature that allows for referencing tables and their columns in a structured manner, either relative to the current location (e.g. column.id) or absolute to the table (e.g. table.user.column.id).

Referencing Qualified Tables

Linting

To ensure that errors are caught from the beginning, the Atlas HCL VSCode Extension offers a range of validations that enhance type safety for users. The specific validations provided will vary depending on the SQL dialect being used. This feature is designed to help users identify and fix issues early on, resulting in a more efficient and effective schema-building process.

Linting

Usage

To use this extension, you can either install it directly from the VSCode desktop or download it from the Marketplace. The extension will appear in the extensions list as Atlas HCL.

Extension

After installing the Atlas HCL extension, it automatically registers for all the languages that Atlas supports. These languages include:

NameLanguage IDDescription
Atlas SQLiteatlas-schema-sqliteSQLite dialect
Atlas MySQLatlas-schema-mysqlMySQL dialect
Atlas PostgresQLatlas-schema-postgresqlPostgresQL dialect
Atlas MariaDBatlas-schema-mariadbMariaDB dialect
Atlas Config Fileatlas-configProject Configuration

By default, the extension will auto-detect atlas.hcl files in your project. However, if you have any schema files in your project, you must choose the language for them. You can do this in three ways in vscode:

  1. Click on "Change Language" for your schema files in this section at bottom. (See image below)

Installation

  1. Use the Command Palette and choose "Change language mode" (See image below)

Installation by command palette

  1. Use Language Identifiers

Wrapping Up

In this post, we've introduced The Atlas HCL Extension. There's much more inside, and we invite you to install it today and give it a try.

What next?

Announcing v0.11.0: Manage database schemas with Kubernetes and Atlas

· 8 min read
Rotem Tamir
Building Atlas
TL;DR

You can now use the Atlas Kubernetes Operator to safely manage your database schemas with Atlas from within your Kubernetes cluster.

See an example

Introduction

Today, we are excited to announce the release of Atlas v0.11.0, which introduces the Atlas Kubernetes Operator. This release is a major milestone in our mission to make Atlas the most robust and modern way to manage your database schemas. With the Atlas Kubernetes Operator, you can now manage your database schemas with Atlas from within your Kubernetes cluster.

In this release, we also introduce a new concept to Atlas - "Diff Policies" - which allow you to customize the way Atlas plans database migrations for you. This concept is directly related to the Kubernetes Operator, and we will explain how below.

What are Kubernetes Operators?

Kubernetes has taken the cloud infrastructure world by storm mostly thanks to its declarative API. When working with Kubernetes, developers provide their cluster's desired configuration to the Kubernetes API, and Kubernetes is responsible for reconciling the actual state of the cluster with the desired state. This allows developers to focus on the desired state of their cluster, and let Kubernetes handle the complexities of how to get there.

This works out incredibly well for stateless components, such as containers, network configuration and access policies. The benefit of stateless components is that they can be replaced at any time, and Kubernetes can simply create a new instance of the component with the desired configuration. For stateful resources, such as databases, this is not the case. Throwing away a running database and creating a new one with the desired configuration is not an option.

For this reason, reconciling the desired state of a database with its actual state can be a complex task that requires a lot of domain knowledge. Kubernetes Operators were introduced to the Kubernetes ecosystem to help users manage complex stateful resources by codifying this type of domain knowledge into a Kubernetes controller.

What is the Atlas Kubernetes Operator?

The Atlas Kubernetes Operator is a Kubernetes controller that uses Atlas to manage your database schema. The Atlas Kubernetes Operator allows you to define the desired schema and apply it to your database using the Kubernetes API.

Declarative schema migrations

The Atlas Kubernetes Operator supports declarative migrations. In declarative migrations, the desired state of the database is defined by the user and the operator is responsible for reconciling the desired state with the actual state of the database (planning and executing CREATE, ALTER and DROP statements).

Diffing policies

One of the common objections to applying declarative workflows to databases is that there are often multiple ways to achieve the same desired state. For example, if you are running a Postgres database, you may want to add an index to a table. Depending on your circumstances, you may want to add this index with or without the CONCURRENTLY option. When using a declarative workflow, you supply where you want to go, but not how to get there.

To address this concern, we have introduced the concept of "diff policies" to Atlas. Diff policies allow you to customize the way Atlas plans database schema changes for you. For example, you can define a diff policy that will always add the CONCURRENTLY option to CREATE INDEX statements. You can also define a diff policy that will skip certain kinds of changes (for example DROP COLUMN) altogether.

Diff policies can be defined in the atlas.hcl file you use to configure Atlas. For example:

env "local" {
diff {
// By default, indexes are not created or dropped concurrently.
concurrent_index {
create = true
drop = true
}
}
}

Diff policies are especially valuable when using the Atlas Kubernetes Operator, as they allow you to customize and constrain the way the operator manages your database to account for your specific needs. We will see an example of this below.

Demo time!

Let's see the Atlas Kubernetes Operator in action. In this demo, we will use the Atlas Kubernetes Operator to manage a MySQL database running in a Kubernetes cluster.

The Atlas Kubernetes Operator is available as a Helm Chart. To install the chart with the release name atlas-operator:

helm install atlas-operator oci://ghcr.io/ariga/charts/atlas-operator

After installing the operator, follow these steps to get started:

  1. Create a MySQL database and a secret with an Atlas URL to the database:
kubectl apply -f https://raw.githubusercontent.com/ariga/atlas-operator/65dce84761354d1766041c7f286b35cc24ffdddb/config/integration/databases/mysql.yaml

Result:

deployment.apps/mysql created
service/mysql created
secret/mysql-credentials created
  1. Create a file named schema.yaml containing an AtlasSchema resource to define the desired schema:
apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasSchema
metadata:
name: atlasschema-mysql
spec:
urlFrom:
secretKeyRef:
key: url
name: mysql-credentials
schema:
sql: |
create table users (
id int not null auto_increment,
name varchar(255) not null,
email varchar(255) unique not null,
short_bio varchar(255) not null,
primary key (id)
);
  1. Apply the schema:
kubectl apply -f schema.yaml

Result:

atlasschema.db.atlasgo.io/atlasschema-mysql created
  1. Check that our table was created:
kubectl exec -it $(kubectl get pods -l app=mysql -o jsonpath='{.items[0].metadata.name}') -- mysql -uroot -ppass -e "describe myapp.users"

Result:

+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| email | varchar(255) | NO | UNI | NULL | |
| short_bio | varchar(255) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+

Hooray! We applied our desired schema to our target database.

Diff policies in action

Now let's see how we can use diffing policies to customize the way the operator manages our database. In this example, we will demonstrate how we can prevent the operator from dropping columns in our database. Modify the schema.yaml file:

apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasSchema
metadata:
name: atlasschema-mysql
spec:
urlFrom:
secretKeyRef:
key: url
name: mysql-credentials
+ policy:
+ diff:
+ skip:
+ drop_column: true
schema:
sql: |
create table users (
id int not null auto_increment,
name varchar(255) not null,
email varchar(255) unique not null,
- short_bio varchar(255) not null,
primary key (id)
);

In the example above we added a policy section to our AtlasSchema resource. In this section, we defined a diff policy that will skip DROP COLUMN statements. In addition, we dropped the short_bio column from our schema. Let's apply the updated schema:

kubectl apply -f schema.yaml

Next, wait for the operator to reconcile the desired state with the actual state of the database:

kubectl wait --for=condition=Ready atlasschema/atlasschema-mysql

Finally, let's check that the short_bio column was not dropped. Run:

kubectl exec -it $(kubectl get pods -l app=mysql -o jsonpath='{.items[0].metadata.name}') -- mysql -uroot -ppass -e "describe myapp.users"

Result:

+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| email | varchar(255) | NO | UNI | NULL | |
| short_bio | varchar(255) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+

As you can see, the short_bio column was not dropped. This is because we defined a diffing policy that skips DROP COLUMN statements.

Linting policies

An alternative way to prevent the operator from dropping columns is to use a linting policy. Linting policies allow you to define rules that will be used to validate the changes to the schema before they are applied to the database. Let's see how we can define a policy that prevents the operator from applying destructive changes to the schema. Edit the schema.yaml file:

```diff
apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasSchema
metadata:
name: atlasschema-mysql
spec:
urlFrom:
secretKeyRef:
key: url
name: mysql-credentials
policy:
+ lint:
+ destructive:
+ error: true
- diff:
- skip:
- drop_column: true
schema:
sql: |
create table users (
id int not null auto_increment,
name varchar(255) not null,
email varchar(255) unique not null,
primary key (id)
);

In the example above we replaced the diff policy with a lint policy. In this policy, we defined a destructive rule that will cause the operator to fail if it detects a destructive change to the schema. Notice that the short_bio is not present in the schema (we did this in our previous change).

Let's apply the updated schema:

kubectl apply -f schema.yaml

Next, let's wait for the operator to reconcile the desired state with the actual state of the database:

kubectl wait --for=condition=Ready atlasschema/atlasschema-mysql --timeout 10s

Notice that this time, the operator failed to reconcile the desired state with the actual state of the database:

error: timed out waiting for the condition on atlasschemas/atlasschema-mysql

Let's check the reason for this failure:

kubectl get atlasschema atlasschema-mysql -o jsonpath='{.status.conditions[?(@.type=="Ready")].message}'

Result:

destructive changes detected:
- Dropping non-virtual column "short_bio"

Hooray! We have successfully prevented the operator from applying destructive changes to our database.

Conclusion

In this post, we have presented the Atlas Operator and demonstrated how you can use it to manage your database schema. We also covered diffing and linting policies and showed how you can use them to customize the way the operator manages your database.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

Manage database schemas with Terraform in plain SQL

· 9 min read
Rotem Tamir
Building Atlas
TL;DR

You can now use plain SQL to define the desired state of your database schema.

See an example

Earlier today, we released v0.5.0 of the Atlas Terraform Provider. This release includes two changes that, in my opinion, are a great improvement to the experience of working with the Atlas Provider.

In this post, I will discuss these two changes and how they can help you to manage your database schemas with Terraform:

  • Support for the docker:// driver for dev-databases.
  • Support for defining the desired state of your database schema in plain SQL (and any other schema loader supported by Atlas).

Improving the dev-database experience with the docker:// driver

Atlas relies on a connection to an empty database which it can use to perform various calculations and operations. This database is called the "dev-database", and it allows Atlas to do things like validate the correctness of user-provided code as well as normalize user-input to the way the database actually sees it.

In previous versions of the Atlas Provider, the dev-database needed to be provided by the user. This was a bit cumbersome, as the user needed to spin up a database (usually by running a local Docker container), and then provide the connection string to it in the dev_url field of the atlas_schema resource.

To improve this experience, we added support for the docker:// driver, which allows the user to only provide the database engine and version, and Atlas will spin up an ephemeral container with the correct database engine and version. In addition, starting this version, users may define the dev_url on the provider scope. For example:

provider "atlas" {
dev_url = "docker://mysql/8/myapp"
}

Defining the desired state of the database schema in plain SQL

In earlier versions of the Atlas Provider, the atlas_schema resource required the user to provide an Atlas HCL file which describes the desired state of the database schema. Many users found this syntax, which resembles Terraform's own, to be clean and concise. However, others disliked it and asked for a way to define the desired state in plain SQL.

To support this use-case, and many others, we have announced support for "schema loaders" - components that can be used to load the desired schema from many kinds of sources (such as plain SQL, an existing database, or the data-model of an ORM). To use this capability, users may use the atlas_schema data source, which accepts a url field that points to the desired schema. The scheme of this URL determines which schema loader will be used, for instance:

  • file://schema.sql - loads the schema from a local SQL file.
  • mysql://root:pass@localhost:3306/myapp - loads the schema from an existing MySQL database.
  • ent://service/ent/schema - loads the schema from the schema of an Ent project.

Managing database schemas in plain SQL using Terraform

info

You can find the final code for this example here.

In the following example, we will show how you can use Terraform and the Atlas provider to manage a MySQL database schema in plain SQL.

Let's start by creating a Terraform file named main.tf installing the Atlas Terraform provider:

terraform {
required_providers {
atlas = {
source = "ariga/atlas"
version = "0.5.0"
}
}
}

In addition to installing the Atlas provider, we will also spin up a local MySQL database using Docker which will represent our target database that we will manage with Terraform. In a real-world scenario, you would probably use a managed database service such as AWS RDS or Google Cloud SQL, but for the purpose of brevity, a local database will suffice. Run:

docker run -d --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=pass -e MYSQL_DATABASE=myapp mysql:8

Now that we have a database to manage, we can define the desired state of the database schema. Add a file named "schema.sql" with the following content:

create table users (
id int not null auto_increment primary key,
name varchar(255) not null
);

Next, we will define an atlas_schema data source that will load the schema from the schema.sql file:

data "atlas_schema" "sql" {
src = "file://${path.module}/schema.sql"
}

Finally, we will define an atlas_schema resource that will manage the schema in the target database. In addition, we will configure the Atlas provider to use the docker:// driver to spin up a temporary database container:

provider "atlas" {
dev_url = "docker://mysql/8/myapp"
}

resource "atlas_schema" "mysql" {
url = "mysql://root:pass@localhost:3306/myapp"
hcl = data.atlas_schema.sql.hcl
}

Now that we have defined our Terraform configuration, we can run terraform init to install the required providers:

terraform init

This should output something like:

Initializing provider plugins...
- Finding ariga/atlas versions matching "0.4.7"...
- Installing ariga/atlas v0.5.0...
- Installed ariga/atlas v0.5.0 (signed by a HashiCorp partner, key ID 45441FCEAAC3770C)

# ...

Terraform has been successfully initialized!

Finally, we can run terraform apply to create the database schema:

terraform apply

Terraform will print the following plan:

data.atlas_schema.sql: Reading...
data.atlas_schema.sql: Read complete after 4s [id=qnUvTyupgQzivof5LYWDOQ]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
+ create

Terraform will perform the following actions:

# atlas_schema.myapp will be created
+ resource "atlas_schema" "myapp" {
+ hcl = <<-EOT
table "hello" {
schema = schema.myapp
column "world" {
null = true
type = text
}
column "thoughts" {
null = true
type = varchar(100)
}
}
schema "myapp" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
}
EOT
+ id = (known after apply)
+ url = (sensitive value)
}

Plan: 1 to add, 0 to change, 0 to destroy.

│ Warning: Atlas Plan

│ with atlas_schema.myapp,
│ on main.tf line 18, in resource "atlas_schema" "myapp":
│ 18: resource "atlas_schema" "myapp" {

│ The following SQL statements will be executed:


│ CREATE TABLE `myapp`.`hello` (`world` text NULL, `thoughts` varchar(100) NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci



Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value:

Notice that the plan shows the SQL statements that will be executed to create the database schema as well as our loaded schema in its HCL representation - this was done by the schema loader that was used to load the schema from the schema.sql file.

If you are happy with the plan, type yes and press enter to apply the changes:

Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value: yes

atlas_schema.myapp: Modifying... [id=mysql:///myapp]
atlas_schema.myapp: Modifications complete after 4s [id=mysql:///myapp]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Hooray! We have successfully created a database schema using Terraform and the Atlas provider.

Terraform's true power comes from its declarative nature - we feed it a desired state and it will make sure that the actual state matches the desired state. Atlas is a perfect match for this paradigm. Let's see what happens if we change the schema in the schema.sql file and run terraform apply again:

Update the contents of schema.sql to the following:

create table `groups` (
id int not null auto_increment primary key,
name varchar(255) not null
);

create table `users` (
id int not null auto_increment primary key,
name varchar(255) not null,
group_id int not null,
foreign key (group_id) references `groups` (id)
);

Re-apply the changes:

terraform apply

Observe that our plan includes the addition of the groups table as well as the foreign key constraint on the users table:

data.atlas_schema.sql: Reading...
data.atlas_schema.sql: Read complete after 4s [id=Qhci62i6CFYRQ2CmUOjMeA]
atlas_schema.myapp: Refreshing state... [id=mysql:///myapp]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
~ update in-place

Terraform will perform the following actions:

# atlas_schema.myapp will be updated in-place
~ resource "atlas_schema" "myapp" {
~ hcl = <<-EOT
+ table "groups" {
+ schema = schema.myapp
+ column "id" {
+ null = false
+ type = int
+ auto_increment = true
+ }
+ column "name" {
+ null = false
+ type = varchar(255)
+ }
+ primary_key {
+ columns = [column.id]
+ }
+ }
table "users" {
schema = schema.myapp
column "id" {
null = false
type = int
auto_increment = true
}
column "name" {
null = false
type = varchar(255)
}
+ column "group_id" {
+ null = false
+ type = int
+ }
primary_key {
columns = [column.id]
}
+ foreign_key "users_ibfk_1" {
+ columns = [column.group_id]
+ ref_columns = [table.groups.column.id]
+ on_update = NO_ACTION
+ on_delete = NO_ACTION
+ }
+ index "group_id" {
+ columns = [column.group_id]
+ }
}
schema "myapp" {
charset = "utf8mb4"
collate = "utf8mb4_0900_ai_ci"
}
EOT
id = "mysql:///myapp"
# (1 unchanged attribute hidden)
}

Plan: 0 to add, 1 to change, 0 to destroy.

│ Warning: Atlas Plan

│ with atlas_schema.myapp,
│ on main.tf line 18, in resource "atlas_schema" "myapp":
│ 18: resource "atlas_schema" "myapp" {

│ The following SQL statements will be executed:


│ CREATE TABLE `myapp`.`groups` (`id` int NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`)) CHARSET
│ utf8mb4 COLLATE utf8mb4_0900_ai_ci
│ ALTER TABLE `myapp`.`users` ADD COLUMN `group_id` int NOT NULL, ADD INDEX `group_id` (`group_id`), ADD CONSTRAINT
│ `users_ibfk_1` FOREIGN KEY (`group_id`) REFERENCES `myapp`.`groups` (`id`) ON UPDATE NO ACTION ON DELETE NO ACTION



Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value:

After typing yes and pressing enter, Terraform will apply the changes, bringing the actual state of the database schema in line with the desired state:

atlas_schema.myapp: Modifying... [id=mysql:///myapp]
atlas_schema.myapp: Modifications complete after 4s [id=mysql:///myapp]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Conclusion

In this tutorial, we have seen how to use Terraform to manage the schema of a MySQL database using the Atlas provider with plain SQL. Using this workflow, teams can bridge the gap between their database schema management flows and their Terraform workflows, allowing for simpler and more reliable software delivery.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

Troubleshooting schema migration issues with Atlas

· 9 min read
Rotem Tamir
Building Atlas

Database schema migrations are an essential part of software development, allowing teams to evolve and refine their application's data model over time. However, with schema changes, it's not always smooth sailing, and migration failures can be disruptive and challenging to resolve.

As much as we'd like to believe that our schema migrations will be executed flawlessly, the reality is that things can and do go wrong. Whether it's due to human error, unforeseen complications, or technical constraints, migration failures can be a significant source of frustration for development teams. Anticipating and preparing for these issues is essential to minimize their impact on your project.

In this blog post, we'll explore the common causes of migration failures and demonstrate how Atlas can help you quickly recover from such failures and easily get back on track.

Atlas: Optimized for MTTR

MTTR (mean-time-to-recovery) is a widely accepted metric for measuring the performance of teams delivering software. MTTR measures the mean time it takes to restore service when a production issue occurs. In the context of schema migrations, this would mean measuring how long it takes a team to detect, triage and resolve failures of schema migrations.

Contrary to existing tools, Atlas was designed with failure in mind and comes with some useful features to help your team get out of the mud if (and when) a schema migration fails. By utilizing these features, your team can greatly reduce MTTR for schema change related failures.

Why do migrations fail?

Let's begin our discussion of troubleshooting schema migration failures by mentioning the common causes for migration failures.

  1. Syntax errors - A surprisingly common cause for migration failures is syntax errors in the migration script: the migration tool tries to execute a statement and the database rejects it, causing the migration to fail. For example, adding an unnecessary comma at the end of a list:
mysql> create table users (   id int,   name varchar(255), );

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 1
  1. Schema dependent changes - Incorrect assumptions about the current state of the target database can lead to failed migrations when those assumptions are not met. For example, trying to create a table that was already created:
mysql> create table users (   id int,   name varchar(255) );
ERROR 1050 (42S01): Table 'users' already exists
  1. Data-dependent changes - If migrations manipulate data or modify constraints, the operation may fail depending on existing data in the target database. For example, adding a NOT NULL constraint to a column may fail if that column contains null values:
mysql> alter table users modify bio varchar(100) not null;
ERROR 1138 (22004): Invalid use of NULL value
  1. Lost connection - In some cases, and depending on the state of the target database and network connectivity, the client executing the migration commands against the database may lose the connection to the database, causing the migration to fail:
mysql> create table t1 (c int);
No connection. Trying to reconnect...
ERROR 2003 (HY000): Can't connect to MySQL server on '0.0.0.0:3306' (61)
ERROR:
Can't connect to the server

Troubleshooting failures with Atlas

In the next section, we review the capabilities that Atlas provides operators to troubleshoot and resolve migration failures:

  • Status observability - how to understand the current state of the system after a failure.
  • Statement level granularity - how to recover from partial migration failures.
  • Declarative roll-forward - how to use Atlas to automatically create a recovery plan from a failure.

Status observability

The first step to solving any failure is being able to triage the issue at hand. To assist operators in diagnosing the current status of a target database, Atlas provides the migrate status command which can be used to understand the current situation. For instance, suppose we tried to run the following migration which contains a drop table statement for a non-existing table:

create table users (
id int,
name varchar(255)
);

drop table non_existing;

The migration will fail with the following error:

Error 1051 (42S02): Unknown table 'test.non_existing'

In many cases, the migration will not be applied from our workstation, so we may not have access to the execution logs. To check the migration status, we can run the migrate status command:

atlas migrate status -u mysql://root:pass@/test

Atlas will print:

Migration Status: PENDING
-- Current Version: 20230409114917 (1 statements applied)
-- Next Version: 20230409114917 (1 statements left)
-- Executed Files: 2 (last one partially)
-- Pending Files: 1

Last migration attempt had errors:
-- SQL: drop table non_existing;
-- ERROR: Error 1051 (42S02): Unknown table 'test.non_existing'

Observe that Atlas prints out some useful information:

  • Migration Status: PENDING - There are pending migrations.
  • -- Executed Files: 2 (last one partially) - the last file was partially applied.
  • The last migration failed with an error: ERROR: Error 1051 (42S02): Unknown table 'test.non_existing'

Statement-level granularity

As we saw in the example above, in cases where migrations partially fail (only some statements succeed) our database schema will be in a limbo state of sorts, it's neither in the previous nor the next version. To keep implementations simple, in the past many migration tools have opted to treat migration files as opaque blobs, meaning they cannot provide any assistance in cases of partial failures.

Atlas, on the other hand, parses the migration files prior to executing them and can therefore provide information about failures on the statement (rather than the file) level. This is great for observability, but it is even more meaningful when trying to resolve issues.

Consider a situation similar to the one we presented above, where a migration fails halfway because of a constraint violation:

CREATE TABLE biographies (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
bio TEXT,
FOREIGN KEY (user_id) REFERENCES users(id)
);

ALTER TABLE users modify bio varchar(100) not null;

In cases where the users.bio column already contains null values, this migration will partially fail:

  -- migrating version 20230409123337
-> CREATE TABLE biographies (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
bio TEXT,
FOREIGN KEY (user_id) REFERENCES users(id)
);
-> alter table users modify bio varchar(100) not null;

Error: Error 1138 (22004): Invalid use of NULL value

This can be solved by backfilling the table with non-null values in the relevant column. To do this, we can update our migration script to contain this UPDATE statement:

CREATE TABLE biographies (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
bio TEXT,
FOREIGN KEY (user_id) REFERENCES users(id)
);

update users set bio='' where bio is null;

alter table users modify bio varchar(100) not null;

Here's the good part: because Atlas operates at the statement level and remembers that we've already successfully applied the first CREATE TABLE statement, it will resume from where it stopped. If we run:

atlas migrate apply -u mysql://root:pass@/test

Atlas runs to completion:

Migrating to version 20230409123337 from 20230409123337 (1 migrations in total):

-- migrating version 20230409123337
-> update users set bio='' where bio is null;
-> alter table users modify bio varchar(100) not null;
-- ok (48.440861ms)

-------------------------
-- 56.051791ms
-- 1 migrations
-- 2 sql statements

Declarative roll-forward

One of the things people experienced with existing tools immediately notice when they start working with Atlas is the absence of down migrations. Many migration tools expect users to plan a down migration parallel to every migration, which contains the statements needed to roll back the schema changes for a version. In theory, this is done to allow users to seamlessly return to a previous version in case things go wrong with the new one.

Our decision to omit down migrations from Atlas and deserves its own lengthy discussion, but limited to the examples we just showed it is easy to demonstrate that attempting to execute down migrations in cases of partial failures may fail themselves, since they rely on the database being at the state where all statements executed successfully.

Instead of down migrations, Atlas provides an alternative strategy for reverting to a previous version. As you may know, one of Atlas's core features is its support for declarative migrations - the ability to automatically plan schema changes from the current state of a database to some desired state (similar to Terraform, but for databases).

Suppose we want to revert the database to a known version. This can happen in cases where the database was somehow manually modified in a way that's preventing us from making progress, or if we simply want to revert to a previous version. Using Atlas's schema apply, we can automatically plan this change:

atlas schema apply \
--url "mysql://root:pass@localhost:3306/example" \
--to "file://migrations?version=targetVersion" \
--dev-url "docker://mysql/8/example" \
--exclude "atlas_schema_revisions"

Atlas plans the change for us:

-- Planned Changes:
-- Drop "biographies" table
DROP TABLE `biographies`;
-- Modify "users" table
ALTER TABLE `users` MODIFY COLUMN `bio` varchar(100) NULL;
Apply

Let's unpack this command:

  • url - is the URL of the target database that we want to modify.
  • to - describes the desired state, in this case the migration directory at file://migrations at version targetVersion - omitting this query parameter will set the desired state at the most recent revision.
  • dev-url - Atlas requires a connection to an empty dev-database, which it uses to normalize the desired schema. Using the docker:// URL scheme tells Atlas to spin up and use a fresh Docker container for this purpose.
  • exclude - tells Atlas to ignore atlas_schema_revision which is a metadata table maintained by Atlas and not described in the migration directory. Adding this argument prevents Atlas from accidentally producing a plan that drops this table.

Wrapping up

This blog post discussed the common causes of database schema migration failures and demonstrated how Atlas is equipped to handle them. Atlas offers features such as status observability, statement-level granularity, and declarative roll-forward capabilities, which enable development teams to efficiently recover from migration failures, reduce MTTR, and minimize disruption to their services.

How can we make Atlas better?

We would love to hear from you on our Discord server ❤️.

New Analyzers and Cloud Linting Configuration

· 5 min read
Jannik Clausen
Building Atlas

It's been only two weeks since the release of v0.10.0, but we are already back with more exciting features we want to share with you. Besides some more improvements and bug fixes, we added two new SQL analyzers to Atlas and the capability to have Atlas Cloud pick up a linting configuration file from your repository.

Concurrent Index Policy (Postgres)

One of the Analyzers we added in this release is the Concurrent Index Policy Analyzer for Postgres. When creating or dropping indexes Postgres will lock the table against writes. Depending on the amount of data this lock might be in place longer than just a few moments, up to several hours. Therefore, Postgres provides the CONCURRENTLY option which will cause the index to be built without keeping a lock for the whole time it is built. While consuming more resources, this option oftentimes is preferred, and we are happy to present to you, that Atlas Linting engine is now capable of detecting statements that create or drop an index without using the CONCURRENTLY option.

Naming Conventions Policy

Keeping consistency when naming database schema resources is a widely common practice. Atlas now has an analyzer that can detect names that don't comply with a given naming policy and will warn you in such cases. You can configure both a global or a resource specific policy. Read on to learn how to configure this analyzer or have a look at the documentation.

Cloud Linting Configuration

In our last post, @a8m introduced the Community Preview for Atlas Cloud and how to connect a migration directory in your GitHub repository to Atlas Cloud with just a few clicks. As of then, the Atlas Cloud Linting reports that are added to your PR's used the default linting configuration. In this post, I will show you how to add configuration to the linting by making use of both the new analyzers I mentioned above.

When connecting a new migration directory, Atlas Cloud will scan the repository for an existing atlas.hcl file and propose to you to use that file on migration linting. If you don't have such a file, you can configure it manually as described in the next tab.

Enable the Analyzers

The Concurrent Index Analyzer will not report on creating or dropping indexes on tables that have been created in the same file. Therefore, lets ensure we have a table ready we can add an index to. Our first migration file can look something like this:

1.sql
CREATE TABLE users
(
id serial PRIMARY KEY,
email VARCHAR(50) UNIQUE NOT NULL,
first_name VARCHAR(50) NOT NULL
);

To configure the Atlas Cloud linter to warn about creating or dropping indexes without the CONCURRENTLY option and ensure that all our schema resources are named with lowercase letters only, use the following configuration:

note

The below configuration will also work with the latest release of the Atlas CLI.

atlas.hcl
lint {
concurrent_index {
error = true # block PR on violations instead of warning
}

naming {
error = true
match = "^[a-z]+$" # regex to match lowercase letters
message = "must be lowercase letters" # message to return if a violation is found
}
}

See It In Action

What is left to demonstrate is a migration file violating the above policies. Take the below example: the index name contains an underscore _, which is permitted by the naming analyzer and create the index non-concurrently.

2.sql
CREATE INDEX email_idx ON users (email);

After adding the above atlas.hcl configuration file and the new migration, create a Pull Request on GitHub and observe Atlas Cloud doing its magic wizardry:

Wonderful! As you can see, Atlas Cloud found the two issues with this simple statement. Since the Concurrent Index Analyzer is configured to error on violations, merging the PR is blocked (if you have this policy set on GitHub).

In addition to the comment on the Pull Request, you can find more detailed reporting in the Checks tab or have a look at the file annotations Atlas adds to your changes:

What next?

I sure hope the new analyzers will be useful to you. In the upcoming weeks, we will be rolling out several new major features that we have been working on lately, including schema drift detection, managed migration deployments, and much more. If any of these features sound interesting to you, please do not hesitate to contact us.

We would love to hear from you on our Discord server ❤️.

Schema change management for platform teams

· 8 min read
Rotem Tamir
Building Atlas

What is Platform Engineering

A few months ago, an article on The New Stack made a lot of waves in our industry, claiming that "DevOps Is Dead" and that it will be superseded by Platform Engineering. I had the pleasure of discussing this topic with Armon Dadgar, who serves on Ariga's advisory board (but more famously is the co-founder and CTO of HashiCorp).

The DevOps movement challenged the software industry with a bold vision, Armon recalled: "If only we could meld the functions of Dev and Ops into the same human, a super-engineer that writes backend applications for breakfast, configures build systems and CI pipelines for lunch, and then puts out a database production fire for dinner, we would finally get total ownership and accountability."

However, Armon pointed out, in practice, things were more complex: "Outside of a few select companies that could hire and retain them, these people didn't actually exist." This realization has given rise to Platform Engineering. "We need developers to be able to self-serve and be responsible for their applications, but the complexity of modern cloud-native architectures is too big. To be efficient, organizations need to abstract these things away."

We are seeing platform teams being created everywhere as a response to this complexity. These teams are commissioned with the charter to maintain an Internal Developer Platform, which acts as a flexible abstraction to the complexity of managing applications in cloud-native architectures.

Schema change management for Platform Teams

A good platform team treats its Developer Platform as a product by constantly looking for measurable ways to improve the efficiency of the engineering organization it serves.

This is often done by one of the following:

  • Making slow things fast – by solving technical bottlenecks or automating manual work.
  • Making hard things easy – by providing simpler ways of retrieving information or providing declarative workflows
  • Making risky things safer – by preventing human error during CI.

One overlooked area in which platform teams can create serious technical leverage is providing a strong schema change management story. Schema change management refers to the collection of tools and processes that support the evolution of an application's data model and the ways in which it is stored in databases.

In the past, most applications consisted of a single enterprise database, often supported by an enterprise vendor, serving a monolithic backend. These were developed in a waterfall methodology and managed by a professional, well-trained DBA.

Today, applications are characterized by an explosion in microservices. Each microservice is backed by its own database (and sometimes multiple databases), developed and maintained by multiple autonomous teams, with varying (and sometimes very little) operational knowledge when it comes to managing their databases.

In other words, despite being a critical component of any architecture, the operational aspects of the backing databases are an afterthought in many cases. Organizations can easily spend hundreds of thousands of dollars a year to make sure devs have access to observability data, but when it comes to managing schema changes, developers are somehow expected to know all the intricacies of the database their team happens to work with.

What is the impact of not supporting schema change management?

Having interviewed engineers from dozens of companies, we've seen some serious issues repeat themselves in organizations that do not have a well-thought-out strategy when it comes to schema change management:

  • Backward-incompatible changes to the database schema break the contract between the database and the application backend, causing downtime.
  • Downstream consumers of the database schema (such as data engineering teams consuming CDC logs) are constantly surprised.
  • Tables are accidentally locked for writes causing application downtime – sometimes for hours or days.
  • Developers connect to the database with root credentials to apply changes or troubleshoot them.
  • Deployments fail halfway because of constraint violations that are discovered only on production data.
  • Incidents and outages occur because of database behaviors unknown to many engineers.
  • Simple refactorings become complex projects that require senior engineering leadership to plan and carefully execute, making them less frequent, and leading to increased technical debt.
  • Frustration from (and fear of) database schema changes promotes anti-patterns such as pushing schema management to the application layer effectively using SQL databases as NoSQL storage.
  • And many more.

Evolving beyond schema migration tools

Most existing schema management tools (often called Schema Migration Tools) were created in an era with very different problems. At the dawn of the DevOps movement, the idea of describing all schema changes in files that are committed to source control and automatically applied by a tool that knows which of them have already been applied was revolutionary.

However, as we've mentioned above, a few things changed drastically in the way software is built today, compared to when these tools were conceived:

  • The way we develop – Microservices brought an explosion in the number of databases and the diversity of storage technology organizations use, making it impractical for many organizations to have a professional DBA author or even review database schema changes. Teams are expected to be autonomous and self-sufficient so they can continuously make progress.
  • The way we operate – In the past it was acceptable to bring an application down for maintenance – your bank's DBA had a few good hours between its last employee leaving the office and when the first one came in the following day when they could shut the system down, upgrade it, and spin it back up. Managing an always-on system that serves traffic 24/7 is a different ordeal.
  • Who operates – In most cases, teams operate their own database, resulting in a situation where the person on-call for the system often has very basic knowledge when it comes to operational aspects of the database.

What can platform teams do to increase developer efficiency with schema change management?

As a result, a modern solution for schema change management can address the following problems:

  • Planning changes – Today's tools expect developers of all technical backgrounds and levels of expertise to be able to plan correct, safe, and efficient changes to the database. Given the vast range of technology developers must deal with, this may not always be possible. Therefore, platforms can provide developers with an automated, declarative workflow for planning changes (a "terraform plan for databases"). Ideally, this workflow should support any ORM or framework developers use to build applications.

  • Verifying safety – Once a change leaves a developer's workstation and is submitted as a pull request, it becomes the team's responsibility to review and approve the correctness and safety of the change. Existing tools offer no support in this area, leaving it completely to manual review. Platforms can provide teams with automated verification of changes ("CI for schema changes"), to detect risky changes before they reach production.

  • Deploying changes – Existing tools are mostly centered on the machinery for describing and applying changes to the target database. This is a great start, but deployment machinery is seldom used in isolation anymore. Platforms need to figure out how to integrate these tools into their continuous delivery pipelines in a native way. In addition, delivery pipelines are responsible to verify that the target environment is safe to deploy to before rolling out changes ("CD for schema changes").

    In addition, with microservices architectures, managing and coordinating schema migrations of various microservices within a single deployment unit is crucial to ensure safe rollout or recovery from failures.

  • Troubleshooting – Unfortunately, schema changes don't always succeed. Existing tools offer little to no support when it comes to helping developers out of the mud when things fail. This often requires engineers to connect to the database to diagnose what went wrong and then perform risky operations such as manually editing metadata tables. Platform teams should consider what they can do to support engineers when planned changes don't succeed or cause an outage.

  • Drift detection and schema monitoring – Once changes are successfully rolled out, it is valuable for teams to be able to detect drift between the expected state of the system and its actual state. Schema drift can happen because of technical issues or in cases where manual access to the database is permitted and can cause both operational and compliance issues. Platform teams should consider how they can provide their teams with the confidence that there is no schema drift in their application.

Wrapping up

Surprisingly, database schema change management is an area in which little change or innovation has occurred since we embarked on the DevOps journey as an industry. At Ariga, we are building Atlas to provide solutions for many of the problems we described above.

If you’re a member of a Platform Engineering team and want to learn more about how we can help, please ping me (or my co-founder Ariel) on our Discord server.

Preventing breaking changes to your database with Atlas

· 5 min read
Rotem Tamir
Building Atlas

As applications evolve, database schema changes become inevitable. However, some types of changes can break the contract between the backend server and the database, leading to errors during deployment and application downtime. Ensuring that these breaking changes are detected and prevented is crucial for maintaining system stability and delivering a seamless user experience. In this post, we will explore how Atlas can help you prevent breaking schema changes from impacting your application.

What are breaking changes?

Backward-incompatible changes, also known as breaking changes, are schema changes that have the potential to break the contract with applications that rely on the old schema. Applications today are expected to be highly-available, which means that we cannot shut them down for maintenance, and must continuously serve traffic, 24/7.

The common strategy for dealing with this requirement is called a "rolling deployment" where we deploy new versions of the application in stages, gradually increasing the percentage of traffic served by the new version. This means that during the deployment phase of a new version we will always have old and new code running side by side.

This means, for example, that renaming a column from email_address to email will cause older versions of the application to fail when they try to access the column using the old name. This is a breaking change, and it can cause downtime if not detected and prevented.

Manually enforcing backward-compatability

Preventing breaking changes in production is crucial to maintaining a highly-available system. To enforce backward-compatibility most teams rely on two things:

  1. GitOps: Making sure every change to the database is checked-in to source control system, and using automated schema migration tools to apply the change.
  2. Manual code-review: Carefully reviewing every schema change to make sure it will not break the contract between the backend and the database.

Checking-in database artifacts into source control is a well established technique - in fact it's one of the first principles mentioned in the seminal Evolutionary Database Design (Fowler and Sadalage, 2016).

However, let's consider the impact of relying on human review for changes using the widely accepted DORA Metrics for assessing DevOps performance:

  1. Lead time increases - Lead time measures the time it takes to get a commit to production. If a change needs the careful attention and approval of an expert reviewer that reviewer becomes a bottleneck.

  2. Change failure rate increases - measures the percentage of deployments causing a failure in production. Relying on humans to routinely evaluate and verify every change against dozens of rules and policies is naturally error-prone, especially if they are busy and under constant pressure to deliver (which expert reviewers almost always are).

  3. Deployment frequency decreases - How often an organization successfully releases to production. When engineers learn that a certain type of change gets queued up in slow review cycles there is a tendency to batch changes together, decreasing the frequency of deployments and increasing their riskiness.

This pattern is not unique to database changes, in fact it looks exactly the same for any high risk change. Since the emergence of the DevOps movement manual verification of changes is gradually being replaced by automation in many fields, can the same be done for database changes?

Automated detection of schema changes with Atlas

Atlas provides support for a process called migration linting, which many teams use to automatically verify the safety of schema changes against a predefined set of policies. Unlike most available SQL linters, Atlas's linter is focused on the semantic level, rather than the syntactic level, which involves formatting and coding style standards. The linter analyzes the meaning of the changes and their potential impact, rather than how the changes are written.

Atlas's migrate lint command exposes this mechanism, making the automatic detection of breaking schema changes simple and straightforward.

Suppose we wanted to run the following migration:

ALTER TABLE `users` RENAME TO `Users`;

Renaming a table is a backward-incompatible change that can cause errors during deployment (migration) if applications running the previous version of the schema refer to the old name in their statements. To check if our latest migration file contains any dangerous changes we run:

atlas migrate lint --dev-url docker://mysql/8/dev --latest 1

Atlas prints:

20230330203525.sql: backward incompatible changes detected:

L1: Renaming table "users" to "Users"

Amazing! Atlas detected our breaking change automatically. In addition to breaking changes Atlas supports many other safety checks that you can apply to your schema changes.

Linting can be used during development from the developer's workstation to detect issues locally, but it really shines when you connect it to your project's continuous integration pipeline, allowing you to prevent such changes from ever reaching production. Setting up CI for schema changes takes less than a minute using Atlas Cloud or a little longer using our GitHub Action.

Wrapping up

Detecting and preventing breaking changes in your database schema is essential for maintaining a high-quality user experience and ensuring system stability. Atlas provides you with a practical and efficient solution to analyze and detect potential breaking changes before they impact your application.

What's next?

Have questions or feedback? Feel free to reach out on our Discord server.