Skip to main content

Managing Google Spanner's protobuf Column Types with Atlas

info

If you have not set up Atlas for Spanner yet, start with Automatic Google Cloud Spanner Schema Migrations with Atlas.

Spanner supports columns with complex data type, such as protocol buffers (aka, protobufs), but it needs to know the data's structure beforehand. Since Atlas isn't capable of uploading those structures, you have to upload them manually using Google's tools first. Once that's done, Atlas can go ahead and create the columns that use them.

In this guide, we will create a table with a Payload column of protobuf type examples.shipping.Order, and then change the column type to examples.shipping.Shipment.

Prerequisites

To follow along, you need:

  1. Atlas CLI with the Spanner driver available.

To download and install the custom release of the Atlas CLI, simply run the following in your terminal:

curl -sSf https://atlasgo.sh | ATLAS_FLAVOR="spanner" sh
  1. gcloud authenticated to your GCP project, with access to an existing Spanner instance.
  2. protoc to generate the descriptor set.
  3. A local directory that contains shipping.proto, schema-v1.hcl, and schema-v2.hcl.

Setup

This example uses the following directory setup:

.
├── proto/examples/shipping/shipping.proto
├── schema-v1.hcl
└── schema-v2.hcl

Files

The protobuf definitions used in this example are:

shipping.proto
syntax = "proto3";

package examples.shipping;

message Order {
string order_id = 1;
string customer_id = 2;
}

message Shipment {
string shipment_id = 1;
string carrier = 2;
}

The first Atlas schema uses examples.shipping.Order:

schema-v1.hcl
schema "default" {
}

table "ProtoOrders" {
schema = schema.default

column "OrderId" {
null = false
type = INT64
}

column "Payload" {
null = true
type = sql("examples.shipping.Order")
}

primary_key {
columns = [column.OrderId]
}
}

The second schema changes that column type to examples.shipping.Shipment:

schema-v2.hcl
schema "default" {
}

table "ProtoOrders" {
schema = schema.default

column "OrderId" {
null = false
type = INT64
}

column "Payload" {
null = true
type = sql("examples.shipping.Shipment")
}

primary_key {
columns = [column.OrderId]
}
}

Spanner instance

If you do not already have a Spanner instance and database, you can create them either in the Google Cloud Console or with gcloud:

note

Set PROJECT_ID to your Google Cloud project ID before running the gcloud commands below.

gcloud spanner instances create proto-columns-instance \
--config=regional-asia-east2 \
--description="Proto columns example instance" \
--processing-units=100 \
--project="$PROJECT_ID"

gcloud spanner databases create proto-columns-db \
--instance=proto-columns-instance \
--project="$PROJECT_ID" \
--database-dialect=GOOGLE_STANDARD_SQL

Once the database exists, the Atlas URL is:

spanner://projects/$PROJECT_ID/instances/proto-columns-instance/databases/proto-columns-db
note

Replace proto-columns-instance and proto-columns-db with your own names. The commands below assume you are running from the example directory and substituting your own $PROJECT_ID, $INSTANCE, $DATABASE, and $DB_URL values where needed.

export INSTANCE="proto-columns-instance"
export DATABASE="proto-columns-db"
export DB_URL="spanner://projects/$PROJECT_ID/instances/$INSTANCE/databases/$DATABASE"

Register protobuf message types in PROTO_BUNDLE

Spanner needs a descriptor set when you create or update a PROTO_BUNDLE. Start by generating one from your .proto file:

protoc --proto_path=proto --include_imports --descriptor_set_out=shipping.pb proto/examples/shipping/shipping.proto

This writes shipping.pb to the current directory.

Before Atlas can apply schema-v1.hcl, Spanner must already know about examples.shipping.Order.

Create the initial bundle entry:

gcloud spanner databases ddl update "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID" \
--ddl='CREATE PROTO BUNDLE (`examples.shipping.Order`)' \
--proto-descriptors-file='shipping.pb'

This command returns:

Schema updating...done.

Create a protobuf-typed column with Atlas

Once the message type exists in the bundle, Atlas can create the table with a protobuf-typed column:

atlas schema apply -u "$DB_URL" \
--to "file://schema-v1.hcl" \
--auto-approve \
--tx-mode none

Atlas plans and applies a single CREATE TABLE statement:

Planning migration statements (1 in total):

-- create "protoorders" table:
-> CREATE TABLE `ProtoOrders` (
`OrderId` INT64 NOT NULL,
`Payload` `examples.shipping.Order`
) PRIMARY KEY (`OrderId`)

Applying approved migration (1 statement in total):

-- create "protoorders" table
-> CREATE TABLE `ProtoOrders` (
`OrderId` INT64 NOT NULL,
`Payload` `examples.shipping.Order`
) PRIMARY KEY (`OrderId`)
-- ok

Inspecting the database with Atlas shows that the protobuf column returns as sql("examples.shipping.Order"):

atlas schema inspect -u "$DB_URL"
table "ProtoOrders" {
schema = schema.default
column "OrderId" {
null = false
type = INT64
}
column "Payload" {
null = true
type = sql("examples.shipping.Order")
}
primary_key {
columns = [column.OrderId]
}
}
schema "default" {
}

If you also want to inspect the database-side DDL, run:

gcloud spanner databases ddl describe "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID"
CREATE PROTO BUNDLE (
`examples.shipping.Order`,
);

CREATE TABLE ProtoOrders (
OrderId INT64 NOT NULL,
Payload `examples.shipping.Order`,
) PRIMARY KEY(OrderId);

Change a protobuf column to a different message type

Before changing the column type, register the new protobuf message in the bundle:

gcloud spanner databases ddl update "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID" \
--ddl='ALTER PROTO BUNDLE INSERT (`examples.shipping.Shipment`)' \
--proto-descriptors-file='shipping.pb'

The command returns:

Schema updating...done.

Now apply the second Atlas schema:

atlas schema apply -u "$DB_URL" \
--to "file://schema-v2.hcl" \
--auto-approve \
--tx-mode none

Atlas plans and applies a single ALTER COLUMN statement:

Planning migration statements (1 in total):

-- modify "payload" in table "protoorders":
-> ALTER TABLE `ProtoOrders` ALTER COLUMN `Payload` `examples.shipping.Shipment`

Applying approved migration (1 statement in total):

-- modify "payload" in table "protoorders"
-> ALTER TABLE `ProtoOrders` ALTER COLUMN `Payload` `examples.shipping.Shipment`
-- ok

Inspecting the database again shows the updated type:

atlas schema inspect -u "$DB_URL"
table "ProtoOrders" {
schema = schema.default
column "OrderId" {
null = false
type = INT64
}
column "Payload" {
null = true
type = sql("examples.shipping.Shipment")
}
primary_key {
columns = [column.OrderId]
}
}
schema "default" {
}

The database DDL now contains both message types in the bundle:

gcloud spanner databases ddl describe "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID"
CREATE PROTO BUNDLE (
`examples.shipping.Order`,
examples.shipping.Shipment,
);

CREATE TABLE ProtoOrders (
OrderId INT64 NOT NULL,
Payload examples.shipping.Shipment,
) PRIMARY KEY(OrderId);

Re-running the final apply confirms Atlas is in sync:

atlas schema apply -u "$DB_URL" \
--to "file://schema-v2.hcl" \
--auto-approve \
--tx-mode none
Schema is synced, no changes to be made

Missing type errors

If you run atlas schema apply before adding the message type to PROTO_BUNDLE, Spanner rejects the change with an error:

rpc error: code = NotFound desc = Type not found: examples.shipping.Order.

To fix this error, you must first update the bundle with gcloud spanner databases ddl update and a descriptor set generated by protoc before running atlas schema apply.

Cleanup

danger

Only run these cleanup commands for a test database and instance that you created for this guide. Do not use them on shared or production Spanner resources.

Spanner instances can be expensive to leave running. If you created a dedicated instance for this walkthrough, delete the database first and then delete the instance.

Delete the test database when you are done:

gcloud spanner databases delete "$DATABASE" \
--instance="$INSTANCE" \
--project="$PROJECT_ID"

If you created the instance just for this guide, delete it after the database is gone:

gcloud spanner instances delete "$INSTANCE" \
--project="$PROJECT_ID"

Need more help?

Click the Intercom bubble on the right side of the site or schedule a demo with our team.

You can also join the conversation on our Discord server.