Skip to main content

Migrate Multi-Tenant Environments With Atlas

ยท 9 min read
Ariel Mashraki

Wikipedia defines Multi-tenancy as:

a software architecture in which a single instance of software runs on a server and serves multiple tenants.

In recent years, multitenancy has become a common topic in our industry as many organizations provide service to multiple customers using the same infrastructure. Multitenancy usually becomes an issue in software architecture because tenants often expect a decent level of isolation from one another.

In this post, I will go over different known approaches for achieving multi-tenancy and discuss the approach we took to build Ariga's cloud platform. In addition, I will demonstrate how we added built-in support for multi-tenant environments in Atlas to overcome some of the challenges we faced.

Introductionโ€‹

Throughout the last few years, I have had the opportunity to implement multi-tenancy in various ways. Some of them might be familiar to you:

  1. A separate environment (deployment) per tenant, where isolation is achieved at both compute and data layers.
  2. A schema (named database) per tenant, where there is one environment for compute (e.g., a K8S cluster), but tenants are stored in different databases or schemas. Isolation is achieved at the data layer while compute resources are shared.
  3. One environment for all tenants, including the data layer. Typically, in this case, each table holds a tenant_id column that is used to filter statements by the tenant. Both data and compute layers are shared, with isolation achieved at the logical, database query level.

Each approach has pros and cons, but I want to briefly list the main reasons we chose to build our cloud platform based on the second option: schema per tenant.

  1. Management: Easily delete, backup tenants, and allow them to export their data without affecting others.
  2. Isolation: Limit credentials, connection pooling, and quotas per tenant. This way, one tenant cannot cause the database to choke and interrupt other tenants in case they share the same physical database.
  3. Security and data privacy: In case it is required, some tenants can be physically separated from others. For example, data can be stored in the tenant's AWS account, and the application can connect to it using a secure connection, like VPC peering in AWS.
  4. Code-maintenance: Most of the application code is written in a way that it is unaware of the multi-tenancy. In our case, there is one layer "at the top" that attaches the tenant connection to the context, and the API layer (e.g., GraphQL resolver) extracts the connection from the context to read/write data. As a result, we are not concerned that API changes will cross tenant boundaries.
  5. Migration: Schema changes can be executed first on "test tenants" and fail-fast in case of error.

The primary con to this approach was that there was no elegant way to execute migrations on multiple databases (N times) in Atlas. In the rest of the post, I'll cover how we solved this problem in Ariga and added built-in support for multi-tenancy in Atlas.

Atlas config fileโ€‹

Atlas provides a convenient way to describe and interact with multiple environments using project files. A project file is a file named atlas.hcl and contains one or more env blocks. For example:

atlas.hcl
env "local" {
url = "mysql://root:pass@:3306/"
migrations {
dir = "file://migrations"
}
}

env "prod" {
// ... a different env
}

Once defined, a project's environment can be worked against using the --env flag. For example:

atlas schema apply --env local

The command above runs the schema apply against the database that is defined in the local environment.

Multi-Tenant environmentsโ€‹

The Atlas configuration language provides a few capabilities adopted from Terraform to facilitate the definition of multi-tenant environments. The first is the for_each meta-argument that allows defining a single env block that is expanded to N instances, one for each tenant. For example:

atlas.hcl
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}

variable "tenants" {
type = list(string)
}

env "local" {
for_each = toset(var.tenants)
url = urlsetpath(var.url, each.value)
migration {
dir = "file://migrations"
}
}

The above configuration expects a list of tenants to be provided as a variable. This can be useful when the list of tenants is dynamic and can be injected into the Atlas command. The urlsetpath function is a helper function that sets the path of the database URL to the tenant name. For example, if url is set to mysql://root:pass@:3306/?param=value and the tenant name is tenant1, the resulting URL will be mysql://root:pass@:3306/tenant1?param=value.

The second capability is Data Sources. This option enables users to retrieve information stored in an external service or database. For the sake of this example, let's extend the configuration above to use the SQL data source to retrieve the list of tenants from the INFORMATION_SCHEMA in MySQL:

atlas.hcl
// The URL of the database we operate on.
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}

// Schemas that match this pattern will be considered tenants.
variable "pattern" {
type = string
default = "tenant_%"
}

data "sql" "tenants" {
url = var.url
query = <<EOS
SELECT `schema_name`
FROM `information_schema`.`schemata`
WHERE `schema_name` LIKE ?
EOS
args = [var.pattern]
}

env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
}

Exampleโ€‹

Let's demonstrate how managing migrations in a multi-tenant architecture is made simple with Atlas.

1. Install Atlas:

Get the latest release with Homebrew:

brew install ariga/tap/atlas

2. Create a migration directory with the following example content:

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

3. Create two example tenants on a local database:

create database tenant_a8m;
create database tenant_rotemtam;

4. Run Atlas to execute the migration scripts on the tenants' databases:

atlas migrate apply --env local
tenant_a8m
Migrating to version 20220811074314 (2 migrations in total):

-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (36.803179ms)

-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (26.184177ms)

-------------------------
-- 72.899146ms
-- 2 migrations
-- 2 sql statements
tenant_rotemtam
Migrating to version 20220811074314 (2 migrations in total):

-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (61.987153ms)

-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (24.656515ms)

-------------------------
-- 95.233384ms
-- 2 migrations
-- 2 sql statements

Running the command again will not execute any migrations:

No migration files to execute
No migration files to execute

Migration loggingโ€‹

At Ariga, our services print structured logs (JSON) to feed our observability tools. That is why we felt obligated to add support for custom log formatting in Atlas. To continue the example from above, we present how we configure Atlas to emit JSON lines with the tenant name attached to them.

1. Add the log configuration to the local environment block:

atlas.hcl
env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
// Emit JSON logs to stdout and add the
// tenant name to each log line.
log {
migrate {
apply = format(
"{{ json . | json_merge %q }}",
jsonencode({
Tenant : each.value
})
)
}
}
}

2. Create a new script file in the migration directory:

-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

3. Run migrate apply in our "local" environment:

atlas migrate apply --env local
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.685899+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.655879+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.685899+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.647091+03:00","Target":"20221027125605","Tenant":"tenant_a8m","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_a8m","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_a8m","Scheme":"mysql","User":{}}}
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.787476+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.757463+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.787476+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.748399+03:00","Target":"20221027125605","Tenant":"tenant_rotemtam","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_rotemtam","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_rotemtam","Scheme":"mysql","User":{}}}

Next stepsโ€‹

Currently, Atlas uses a fail-fast policy, which means the process exits on the first tenant that returns an error. We built it this way because we find it helpful to execute migrations first on "test tenants" and stop in case the operation fails on any of them. However, this means the execution is serial and may be slow in cases where there is a large amount of tenants. Therefore, we aim to add more advanced approaches that will allow executing the first M tenants serially and the rest of the N-M tenants in parallel.

Have questions? Feedback? Feel free to reach out on our Discord server.