Managing Google Spanner's protobuf Column Types with Atlas
If you have not set up Atlas for Spanner yet, start with Automatic Google Cloud Spanner Schema Migrations with Atlas.
Spanner supports columns with complex data type, such as protocol buffers (aka, protobufs), but it needs to know the data's structure beforehand. Since Atlas isn't capable of uploading those structures, you have to upload them manually using Google's tools first. Once that's done, Atlas can go ahead and create the columns that use them.
In this guide, we will create a table with a Payload column of protobuf type examples.shipping.Order, and then change
the column type to examples.shipping.Shipment.
Prerequisites
To follow along, you need:
- Atlas CLI with the Spanner driver available.
- macOS + Linux
- Docker
- Windows
To download and install the custom release of the Atlas CLI, simply run the following in your terminal:
curl -sSf https://atlasgo.sh | ATLAS_FLAVOR="spanner" sh
To pull the Atlas image and run it as a Docker container:
docker pull arigaio/atlas:latest-extended
docker run --rm arigaio/atlas:latest-extended --help
If the container needs access to the host network or a local directory, use the --net=host flag and mount the desired
directory:
docker run --rm --net=host \
-v $(pwd)/migrations:/migrations \
arigaio/atlas:latest-extended migrate apply \
--url "oracle://PDBADMIN:Pssw0rd0995@localhost:1521/FREEPDB1"
Download the custom release and move the atlas binary to a file location on your system PATH.
gcloudauthenticated to your GCP project, with access to an existing Spanner instance.protocto generate the descriptor set.- A local directory that contains
shipping.proto,schema-v1.hcl, andschema-v2.hcl.
Setup
This example uses the following directory setup:
.
├── proto/examples/shipping/shipping.proto
├── schema-v1.hcl
└── schema-v2.hcl
Files
The protobuf definitions used in this example are:
syntax = "proto3";
package examples.shipping;
message Order {
string order_id = 1;
string customer_id = 2;
}
message Shipment {
string shipment_id = 1;
string carrier = 2;
}
The first Atlas schema uses examples.shipping.Order:
schema "default" {
}
table "ProtoOrders" {
schema = schema.default
column "OrderId" {
null = false
type = INT64
}
column "Payload" {
null = true
type = sql("examples.shipping.Order")
}
primary_key {
columns = [column.OrderId]
}
}
The second schema changes that column type to examples.shipping.Shipment:
schema "default" {
}
table "ProtoOrders" {
schema = schema.default
column "OrderId" {
null = false
type = INT64
}
column "Payload" {
null = true
type = sql("examples.shipping.Shipment")
}
primary_key {
columns = [column.OrderId]
}
}
Spanner instance
If you do not already have a Spanner instance and database, you can create them either in the Google Cloud Console or with gcloud:
Set PROJECT_ID to your Google Cloud project ID before running the gcloud commands below.
gcloud spanner instances create proto-columns-instance \
--config=regional-asia-east2 \
--description="Proto columns example instance" \
--processing-units=100 \
--project="$PROJECT_ID"
gcloud spanner databases create proto-columns-db \
--instance=proto-columns-instance \
--project="$PROJECT_ID" \
--database-dialect=GOOGLE_STANDARD_SQL
Once the database exists, the Atlas URL is:
spanner://projects/$PROJECT_ID/instances/proto-columns-instance/databases/proto-columns-db
Replace proto-columns-instance and proto-columns-db with your own names. The commands below assume you are running from
the example directory and substituting your own $PROJECT_ID, $INSTANCE, $DATABASE, and $DB_URL values where needed.
export INSTANCE="proto-columns-instance"
export DATABASE="proto-columns-db"
export DB_URL="spanner://projects/$PROJECT_ID/instances/$INSTANCE/databases/$DATABASE"
Register protobuf message types in PROTO_BUNDLE
Spanner needs a descriptor set when you create or update a PROTO_BUNDLE. Start by generating one from your .proto file:
protoc --proto_path=proto --include_imports --descriptor_set_out=shipping.pb proto/examples/shipping/shipping.proto
This writes shipping.pb to the current directory.
Before Atlas can apply schema-v1.hcl, Spanner must already know about examples.shipping.Order.
Create the initial bundle entry:
gcloud spanner databases ddl update "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID" \
--ddl='CREATE PROTO BUNDLE (`examples.shipping.Order`)' \
--proto-descriptors-file='shipping.pb'
This command returns:
Schema updating...done.
Create a protobuf-typed column with Atlas
Once the message type exists in the bundle, Atlas can create the table with a protobuf-typed column:
atlas schema apply -u "$DB_URL" \
--to "file://schema-v1.hcl" \
--auto-approve \
--tx-mode none
Atlas plans and applies a single CREATE TABLE statement:
Planning migration statements (1 in total):
-- create "protoorders" table:
-> CREATE TABLE `ProtoOrders` (
`OrderId` INT64 NOT NULL,
`Payload` `examples.shipping.Order`
) PRIMARY KEY (`OrderId`)
Applying approved migration (1 statement in total):
-- create "protoorders" table
-> CREATE TABLE `ProtoOrders` (
`OrderId` INT64 NOT NULL,
`Payload` `examples.shipping.Order`
) PRIMARY KEY (`OrderId`)
-- ok
Inspecting the database with Atlas shows that the protobuf column returns as sql("examples.shipping.Order"):
atlas schema inspect -u "$DB_URL"
table "ProtoOrders" {
schema = schema.default
column "OrderId" {
null = false
type = INT64
}
column "Payload" {
null = true
type = sql("examples.shipping.Order")
}
primary_key {
columns = [column.OrderId]
}
}
schema "default" {
}
If you also want to inspect the database-side DDL, run:
gcloud spanner databases ddl describe "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID"
CREATE PROTO BUNDLE (
`examples.shipping.Order`,
);
CREATE TABLE ProtoOrders (
OrderId INT64 NOT NULL,
Payload `examples.shipping.Order`,
) PRIMARY KEY(OrderId);
Change a protobuf column to a different message type
Before changing the column type, register the new protobuf message in the bundle:
gcloud spanner databases ddl update "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID" \
--ddl='ALTER PROTO BUNDLE INSERT (`examples.shipping.Shipment`)' \
--proto-descriptors-file='shipping.pb'
The command returns:
Schema updating...done.
Now apply the second Atlas schema:
atlas schema apply -u "$DB_URL" \
--to "file://schema-v2.hcl" \
--auto-approve \
--tx-mode none
Atlas plans and applies a single ALTER COLUMN statement:
Planning migration statements (1 in total):
-- modify "payload" in table "protoorders":
-> ALTER TABLE `ProtoOrders` ALTER COLUMN `Payload` `examples.shipping.Shipment`
Applying approved migration (1 statement in total):
-- modify "payload" in table "protoorders"
-> ALTER TABLE `ProtoOrders` ALTER COLUMN `Payload` `examples.shipping.Shipment`
-- ok
Inspecting the database again shows the updated type:
atlas schema inspect -u "$DB_URL"
table "ProtoOrders" {
schema = schema.default
column "OrderId" {
null = false
type = INT64
}
column "Payload" {
null = true
type = sql("examples.shipping.Shipment")
}
primary_key {
columns = [column.OrderId]
}
}
schema "default" {
}
The database DDL now contains both message types in the bundle:
gcloud spanner databases ddl describe "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID"
CREATE PROTO BUNDLE (
`examples.shipping.Order`,
examples.shipping.Shipment,
);
CREATE TABLE ProtoOrders (
OrderId INT64 NOT NULL,
Payload examples.shipping.Shipment,
) PRIMARY KEY(OrderId);
Re-running the final apply confirms Atlas is in sync:
atlas schema apply -u "$DB_URL" \
--to "file://schema-v2.hcl" \
--auto-approve \
--tx-mode none
Schema is synced, no changes to be made
Missing type errors
If you run atlas schema apply before adding the message type to PROTO_BUNDLE, Spanner rejects the change with an error:
rpc error: code = NotFound desc = Type not found: examples.shipping.Order.
To fix this error, you must first update the bundle with gcloud spanner databases ddl update and a descriptor set generated
by protoc before running atlas schema apply.
Cleanup
Only run these cleanup commands for a test database and instance that you created for this guide. Do not use them on shared or production Spanner resources.
Spanner instances can be expensive to leave running. If you created a dedicated instance for this walkthrough, delete the database first and then delete the instance.
Delete the test database when you are done:
gcloud spanner databases delete "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID"
If you created the instance just for this guide, delete it after the database is gone:
gcloud spanner instances delete "$INSTANCE" \
--project="$PROJECT_ID"
Need more help?
Click the Intercom bubble on the right side of the site or schedule a demo with our team.
You can also join the conversation on our Discord server.