Wikipedia defines Multi-tenancy as:
a software architecture in which a single instance of software runs on a server and serves multiple tenants.
In recent years, multitenancy has become a common topic in our industry as many organizations provide service to multiple customers using the same infrastructure. Multitenancy usually becomes an issue in software architecture because tenants often expect a decent level of isolation from one another.
In this post, I will go over different known approaches for achieving multi-tenancy and discuss the approach we took to build Ariga's cloud platform. In addition, I will demonstrate how we added built-in support for multi-tenant environments in Atlas to overcome some of the challenges we faced.
Introduction
Throughout the last few years, I have had the opportunity to implement multi-tenancy in various ways. Some of them might be familiar to you:
- A separate environment (deployment) per tenant, where isolation is achieved at both compute and data layers.
- A schema (named database) per tenant, where there is one environment for compute (e.g., a K8S cluster), but tenants are stored in different databases or schemas. Isolation is achieved at the data layer while compute resources are shared.
- One environment for all tenants, including the data layer. Typically, in this case, each table holds a
tenant_id
column that is used to filter statements by the tenant. Both data and compute layers are shared, with isolation achieved at the logical, database query level.
Each approach has pros and cons, but I want to briefly list the main reasons we chose to build our cloud platform based on the second option: schema per tenant.
- Management: Easily delete, backup tenants, and allow them to export their data without affecting others.
- Isolation: Limit credentials, connection pooling, and quotas per tenant. This way, one tenant cannot cause the database to choke and interrupt other tenants in case they share the same physical database.
- Security and data privacy: In case it is required, some tenants can be physically separated from others. For example, data can be stored in the tenant's AWS account, and the application can connect to it using a secure connection, like VPC peering in AWS.
- Code-maintenance: Most of the application code is written in a way that it is unaware of the multi-tenancy. In our case, there is one layer "at the top" that attaches the tenant connection to the context, and the API layer (e.g., GraphQL resolver) extracts the connection from the context to read/write data. As a result, we are not concerned that API changes will cross tenant boundaries.
- Migration: Schema changes can be executed first on "test tenants" and fail-fast in case of error.
The primary con to this approach was that there was no elegant way to execute migrations on multiple databases (N times) in Atlas. In the rest of the post, I'll cover how we solved this problem in Ariga and added built-in support for multi-tenancy in Atlas.
Atlas config file
Atlas provides a convenient way to describe and interact with multiple environments using project files. A project file
is a file named atlas.hcl
and contains one or more env
blocks. For example:
env "local" {
url = "mysql://root:pass@:3306/"
migrations {
dir = "file://migrations"
}
}
env "prod" {
// ... a different env
}
Once defined, a project's environment can be worked against using the --env
flag. For example:
atlas migrate apply --env local
The command above runs the schema apply
against the database that is defined in the local
environment.
Multi-Tenant environments
The Atlas configuration language provides a few capabilities adopted from Terraform to facilitate the definition of
multi-tenant environments. The first is the for_each
meta-argument that allows defining a single env
block that
is expanded to N instances, one for each tenant. For example:
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}
variable "tenants" {
type = list(string)
}
env "local" {
for_each = toset(var.tenants)
url = urlsetpath(var.url, each.value)
migration {
dir = "file://migrations"
}
}
The above configuration expects a list of tenants to be provided as a variable. This can be useful when the list of
tenants is dynamic and can be injected into the Atlas command. The urlsetpath
function is a helper function that sets
the path of the database URL to the tenant name. For example, if url
is set to mysql://root:pass@:3306/?param=value
and the tenant name is tenant1
, the resulting URL will be mysql://root:pass@:3306/tenant1?param=value
.
The second capability is Data Sources. This option enables users to retrieve information stored in an external service or database. For the sake of this example, let's extend the configuration above to use the SQL data source to retrieve the list of tenants from the INFORMATION_SCHEMA in MySQL:
// The URL of the database we operate on.
variable "url" {
type = string
default = "mysql://root:pass@:3306/"
}
// Schemas that match this pattern will be considered tenants.
variable "pattern" {
type = string
default = "tenant_%"
}
data "sql" "tenants" {
url = var.url
query = <<EOS
SELECT `schema_name`
FROM `information_schema`.`schemata`
WHERE `schema_name` LIKE ?
EOS
args = [var.pattern]
}
env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
}
Example
Let's demonstrate how managing migrations in a multi-tenant architecture is made simple with Atlas.
1. Install Atlas
- macOS + Linux
- Homebrew
- Docker
- Windows
- Manual Installation
To download and install the latest release of the Atlas CLI, simply run the following in your terminal:
curl -sSf https://atlasgo.sh | sh
Get the latest release with Homebrew:
brew install ariga/tap/atlas
To pull the Atlas image and run it as a Docker container:
docker pull arigaio/atlas
docker run --rm arigaio/atlas --help
If the container needs access to the host network or a local directory, use the --net=host
flag and mount the desired
directory:
docker run --rm --net=host \
-v $(pwd)/migrations:/migrations \
arigaio/atlas migrate apply
--url "mysql://root:pass@:3306/test"
Download the latest release and move the atlas binary to a file location on your system PATH.
2. Create a migration directory with the following example content:
- 20220811074144_create_users.sql
- 20220811074314_add_users_name.sql
- atlas.sum
-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- modify "users" table
ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
h1:w2ODzVxhTKdBVBdzqntHw7rHV8lKQF98TmNevOEZfIo=
20220811074144_create_users_table.sql h1:KnMSZM/E4TBGidYCZ+UHxkHEWaRWeyuPIUjSHRybQqA=
20220811074314_add_users_name.sql h1:jUpaANgD0SjI5DjaHuJxtHZ6Wq98act0MmE5oZ+NRU0=
3. Create two example tenants on a local database:
create database tenant_a8m;
create database tenant_rotemtam;
4. Run Atlas to execute the migration scripts on the tenants' databases:
atlas migrate apply --env local
Migrating to version 20220811074314 (2 migrations in total):
-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (36.803179ms)
-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (26.184177ms)
-------------------------
-- 72.899146ms
-- 2 migrations
-- 2 sql statements
Migrating to version 20220811074314 (2 migrations in total):
-- migrating version 20220811074144
-> CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- ok (61.987153ms)
-- migrating version 20220811074314
-> ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- ok (24.656515ms)
-------------------------
-- 95.233384ms
-- 2 migrations
-- 2 sql statements
Running the command again will not execute any migrations:
No migration files to execute
No migration files to execute
Migration logging
At Ariga, our services print structured logs (JSON) to feed our observability tools. That is why we felt obligated to add support for custom log formatting in Atlas. To continue the example from above, we present how we configure Atlas to emit JSON lines with the tenant name attached to them.
1. Add the log
configuration to the local
environment block:
env "local" {
for_each = toset(data.sql.tenants.values)
url = urlsetpath(var.url, each.value)
// Emit JSON logs to stdout and add the
// tenant name to each log line.
format {
migrate {
apply = format(
"{{ json . | json_merge %q }}",
jsonencode({
Tenant : each.value
})
)
}
}
}
2. Create a new script file in the migration directory:
- 20220811074144_create_users.sql
- 20220811074314_add_users_name.sql
- 20221027125605_create_pets.sql
- atlas.sum
-- create "users" table
CREATE TABLE `users` (`id` int NOT NULL) CHARSET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
-- modify "users" table
ALTER TABLE `users` ADD COLUMN `name` varchar(255) NOT NULL;
-- create "pets" table
CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));
h1:coAZ5kxfT6EpZLjsMRWZv+lYAuChLTnhOnRjAVbZOao=
20220811074144_create_users.sql h1:INH5s7+VEs2pO4+YqL1XNvJUPnF5abhReSEz75L6fz0=
20220811074314_add_users_name.sql h1:EHo+0gM1Cpfl1pe0Cid35qr9mWUhQ9FSXSr+pHUwQwk=
20221027125605_create_pets.sql h1:glWEZaQjERCv1N5evUDz154XvJ84ivlfORrQp/pL7yY=
3. Run migrate apply
in our "local" environment:
atlas migrate apply --env local
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.685899+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.655879+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.685899+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.647091+03:00","Target":"20221027125605","Tenant":"tenant_a8m","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_a8m","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_a8m","Scheme":"mysql","User":{}}}
{"Applied":[{"Applied":["CREATE TABLE `pets` (`id` bigint, PRIMARY KEY (`id`));"],"Description":"create_pets","End":"2022-10-27T16:03:03.787476+03:00","Name":"20221027125605_create_pets.sql","Start":"2022-10-27T16:03:03.757463+03:00","Version":"20221027125605"}],"Current":"20220811074314","Dir":"migrations","Driver":"mysql","End":"2022-10-27T16:03:03.787476+03:00","Pending":[{"Description":"create_pets","Name":"20221027125605_create_pets.sql","Version":"20221027125605"}],"Start":"2022-10-27T16:03:03.748399+03:00","Target":"20221027125605","Tenant":"tenant_rotemtam","URL":{"ForceQuery":false,"Fragment":"","Host":":3308","OmitHost":false,"Opaque":"","Path":"/tenant_rotemtam","RawFragment":"","RawPath":"","RawQuery":"parseTime=true","Schema":"tenant_rotemtam","Scheme":"mysql","User":{}}}
Next steps
Currently, Atlas uses a fail-fast policy, which means the process exits on the first tenant that returns an error. We built it this way because we find it helpful to execute migrations first on "test tenants" and stop in case the operation fails on any of them. However, this means the execution is serial and may be slow in cases where there is a large amount of tenants. Therefore, we aim to add more advanced approaches that will allow executing the first M tenants serially and the rest of the N-M tenants in parallel.
Have questions? Feedback? Feel free to reach out on our Discord server.