Back to changelog
New
3 minute read

Programmatic Access to Column-Level Lineage

atlas cloud repo lingraph prints a repository’s column-level lineage to stdout, as a compact node/edge graph for rendering and traversal, or as OpenLineage RunEvents for ingestion into catalogs like Marquez and DataHub.

The column-level lineage Atlas Cloud renders in the browser is now available programmatically. The new atlas cloud repo lingraph command prints a repository's lineage graph to stdout, so you can render it, traverse it in a script, or feed it into a lineage catalog.

The machine-readable graph is also a ready source of context for AI coding agents. An agent asked to rename or drop a column can load the column-level dependency graph, trace every downstream view and column that reads from it, and either rewrite them or flag the change as breaking, without a human mapping the dependencies by hand.

A node/edge graph by default

The default format is a compact graph of nodes (tables, views, external datasets, and intermediate datasets such as CTEs) and edges. Column-level edges carry the source and target columns and the SQL projection that connects them:

atlas cloud repo lingraph --slug my-repo
$ atlas cloud repo lingraph --slug my-repo
{
"nodes": [
{ "id": "schema/main/table/users", "type": "table", "name": "users", "schema": "main" },
{ "id": "schema/main/view/user_names", "type": "view", "name": "user_names", "schema": "main" },
{ "id": "schema/main/dataset/user_names/u_cte@5", "type": "dataset", "name": "u_cte@5", "schema": "main" }
],
"edges": [
{ "from": "schema/main/table/users", "to": "schema/main/dataset/user_names/u_cte@5", "fromColumn": "name", "toColumn": "user_name", "expr": "u.name AS user_name" },
{ "from": "schema/main/dataset/user_names/u_cte@5", "to": "schema/main/view/user_names", "fromColumn": "user_name", "toColumn": "user_name", "expr": "u_cte.user_name AS user_name" }
]
}

OpenLineage for your catalog

Pass --open-lineage and Atlas emits the same lineage as a JSON array of OpenLineage RunEvents, one per view or derived dataset, ready to ingest into Marquez, DataHub, or any OpenLineage-compatible catalog. Each event carries the view's SQL, a jobType facet (VIEW or MATERIALIZED_VIEW), and a columnLineage facet that distinguishes pass-through columns (DIRECT/IDENTITY) from transformed ones (INDIRECT/TRANSFORMATION):

atlas cloud repo lingraph --slug my-repo --open-lineage
$ atlas cloud repo lingraph --slug my-repo --open-lineage
[
{
"eventType": "COMPLETE",
"job": {
"name": "active_users",
"namespace": "atlas://my-repo/main",
"facets": {
"jobType": { "integration": "atlas", "jobType": "VIEW", "processingType": "BATCH" },
"sql": { "query": "SELECT id, name FROM users" }
}
},
"inputs": [ { "name": "users", "namespace": "atlas://my-repo/main" } ],
"outputs": [
{
"name": "active_users",
"namespace": "atlas://my-repo/main",
"facets": {
"columnLineage": {
"fields": {
"id": { "inputFields": [ { "name": "users", "field": "id", "transformations": [ { "type": "DIRECT", "subtype": "IDENTITY" } ] } ] },
"name": { "inputFields": [ { "name": "users", "field": "name", "transformations": [ { "type": "DIRECT", "subtype": "IDENTITY" } ] } ] }
}
}
}
}
],
"producer": "https://atlasgo.io",
"run": { "runId": "82133796-1259-5d46-bb5f-92a056fa5413" }
}
]

Managed objects use the atlas://<repo>/<schema> namespace and external sources use external://, and run.runId is deterministic, so diffs across runs are meaningful.

See the Column-Level Data Lineage docs for the full field reference and more examples.

featurelineageopenlineagecliatlas cloud