Database Schema as Context
AI agents that interact with databases (writing queries, reviewing migrations, generating code) perform better when they have access to structured schema context. Without it, agents guess table names, invent columns, miss constraints, and ignore relationships.
This guide covers how companies solve this problem, how Atlas organizes schema files to provide context for AI coding assistants like Claude Code, Codex, and Cursor, and includes a benchmark comparing query accuracy with and without schema context.
How Companies Approach This
Several companies have published approaches to improving text-to-SQL accuracy through schema context:
- Pinterest uses a multi-stage pipeline: schema linking identifies relevant tables, retrieval finds similar past queries as examples, then SQL is generated with both DDL and examples in context.
- AWS recommends augmenting raw DDL with column descriptions and business rules as SQL comments.
- Tiger Data
embeds descriptions in PostgreSQL using
COMMENT ONstatements. Their benchmark showed accuracy improving from 58% to 86% with semantic descriptions versus raw DDL. - Google Cloud uses vector search to retrieve only the relevant subset of a large schema per query.
The common pattern: give the AI more structured context about the database. The differences are in how that context is organized and delivered.
Types of Database Context
AI agents need three kinds of context to work with databases:
| Context Type | What It Contains | Example |
|---|---|---|
| Schema context | Tables, columns, types, constraints, dependencies, views | tasks.assignee_id is a FK to users.id |
| Business context | Domain rules, naming conventions, data semantics | quality_status = 'rework' means the batch failed QA but can be salvaged |
| Usage context | Common query patterns, existing views | There's a daily_production_summary materialized view, so don't recompute it |
Most tools provide none of these. The agent has to discover everything on its own: which table stores production data, what valid status values exist, or whether there's already a view that answers the question.
Schema Code as Context
With a single atlas schema inspect command, Atlas exports any database into organized,
version-controlled SQL files. Atlas supports
PostgreSQL,
MySQL and MariaDB,
SQL Server,
SQLite,
ClickHouse,
Redshift,
CockroachDB,
Oracle,
Snowflake,
Spanner,
and more.
Instead of a single dump (like pg_dump or mysqldump), the output is modular SQL files with clear dependencies that
AI agents can read and work with like any other code in the repository.
To demonstrate, we ran a benchmark on an example schema
with 43 tables, 47 enum types, 64 functions, and 13 views across two schemas. After running atlas schema inspect,
the output looks like this:
sql/
├── main.sql # Import manifest
└── schemas/
├── manufacturing/
│ ├── manufacturing.sql # Schema definition
│ ├── tables/
│ │ ├── equipment.sql
│ │ ├── production_lines.sql
│ │ ├── production_runs.sql
│ │ └── maintenance_work_orders.sql
│ ├── types/
│ │ ├── enum_equipment_status.sql
│ │ └── enum_quality_status.sql
│ ├── functions/
│ │ ├── calculate_oee.sql
│ │ └── schedule_preventive_maintenance.sql
│ └── views/
│ ├── daily_production_summary.sql
│ └── equipment_health_dashboard.sql
└── public/
├── tables/
│ ├── users.sql
│ ├── projects.sql
│ └── tasks.sql
├── types/
│ ├── enum_task_status.sql
│ └── enum_priority_level.sql
├── functions/
│ └── calculate_project_progress.sql
└── views/
└── project_portfolio_dashboard.sql
Self-Describing Files
Each SQL file contains the complete DDL for one object: columns, types, constraints, indexes, and checks:
-- atlas:import ../manufacturing.sql
-- atlas:import production_lines.sql
-- atlas:import ../types/enum_quality_status.sql
-- atlas:import ../../public/tables/projects.sql
-- atlas:import ../../public/tables/users.sql
-- A production run tracks a single batch on one production line.
-- Runs with end_time = NULL are still in progress and should be
-- excluded from yield calculations.
CREATE TABLE "manufacturing"."production_runs" (
"id" serial NOT NULL,
"production_line_id" integer NOT NULL,
"project_id" integer NULL,
"assigned_user_id" integer NULL,
"product_code" character varying(50) NOT NULL,
"batch_number" character varying(100) NOT NULL,
"planned_quantity" integer NOT NULL,
"actual_quantity" integer NULL,
"quality_status" "manufacturing"."quality_status" NOT NULL DEFAULT 'pending',
"start_time" timestamptz NULL,
"end_time" timestamptz NULL,
"yield_percentage" numeric(5,2) NULL,
PRIMARY KEY ("id"),
CONSTRAINT "production_runs_production_line_id_fkey"
FOREIGN KEY ("production_line_id")
REFERENCES "manufacturing"."production_lines" ("id"),
CONSTRAINT "production_runs_project_id_fkey"
FOREIGN KEY ("project_id")
REFERENCES "public"."projects" ("id"),
CONSTRAINT "production_runs_planned_quantity_check"
CHECK (planned_quantity > 0),
CONSTRAINT "production_runs_time_order"
CHECK ((end_time IS NULL) OR (start_time IS NULL)
OR (end_time > start_time))
);
From this single file, an AI agent learns the exact column names and types, that production_line_id
references manufacturing.production_lines, that project_id crosses into the public schema,
that quantity must be positive, and that end_time being NULL means a run is still in progress.
When the agent sees a custom type like "manufacturing"."quality_status", the import directive
-- atlas:import ../types/enum_quality_status.sql tells it exactly where the valid values are defined.
Also, since these are regular source files (not database metadata), you can add code comments to
provide business context. Unlike COMMENT ON statements, these don't require a migration to update
and won't clutter your database catalog.
Import Directives as Dependency Graphs
The -- atlas:import lines at the top of each file declare dependencies:
-- atlas:import ../types/enum_quality_status.sql -- needs the quality_status enum
-- atlas:import ../../public/tables/projects.sql -- cross-schema FK to projects
-- atlas:import ../../public/tables/users.sql -- cross-schema FK to users
This provides the relationship graph without requiring view, function body, trigger, or stored procedure parsing. When asked "what data is related to production runs?", the imports immediately show: production lines, quality status enum, projects, and users.
For more on organizing schema files with imports, see the SQL Schema documentation.
Benchmark
We ran three AI agents (Claude Opus 4.6) against the same PostgreSQL database with the same 10 questions. Each agent had a different level of schema context.
Setup
- Database: PostgreSQL 16, 43 tables across 2 schemas (
publicandmanufacturing), 47 enum types, 64 functions, 13 views - Agent A (no context): given only the
postgres://connection string - Agent B (single file): given the full schema DDL (3,615 lines) as a single SQL file
exported by
atlas schema inspect - Agent C (schema directory): given the organized one-file-per-object directory with code comments explaining business rules
- Questions: 10 analytical queries ranging from simple lookups to cross-schema joins with business logic
Results
| Metric | Agent A (No Context) | Agent B (Single File) | Agent C (Schema Directory) |
|---|---|---|---|
| Correct answers | 7/10 | 9/10 | 10/10 |
| Silent wrong answers | 2 | 0 | 0 |
| Schema discovery queries | 10 | 0 | 0 |
| Schema files read | 0 | 1 (3,615 lines) | 20 of ~150 files |
| Total tokens consumed | 40,585 | 71,754 | 45,184 |
| Total tool calls | 31 | 20 | 37 |
| Duration | 189s | 174s | 149s |
Agent A ran 10 introspection queries before writing a single answer: listing tables, enumerating 226 enum values, mapping columns across schemas, and tracing 67 foreign key relationships.
Agent B loaded the entire 3,615-line DDL into context, consuming the most tokens of all three. Agent C read only the 20 files relevant to its questions and used 37% fewer tokens.
Where No-Context Went Wrong
All three agents produced valid SQL that executed without errors. The differences were semantic: queries that return plausible but incorrect data. Below are examples where Agents A and B got wrong results while Agent C answered correctly.
Question 2: Production yield for WDG-100
One production run is still in progress (actual_quantity is NULL, quality_status is 'pending').
It should be excluded from yield calculations.
-- Agent A (no context): included incomplete run → 77.14% (WRONG)
SELECT ROUND(100.0 * SUM(actual_quantity) / SUM(planned_quantity), 2)
FROM manufacturing.production_runs
WHERE product_code = 'WDG-100';
-- Agent C (schema directory): filtered to completed runs → 93.55% (CORRECT)
-- Read the comment: "Runs with end_time = NULL are still in progress
-- and should be excluded from yield and efficiency calculations."
SELECT ROUND(AVG(yield_percentage), 2)
FROM manufacturing.production_runs
WHERE product_code = 'WDG-100'
AND end_time IS NOT NULL;
A 17 percentage point difference is large enough to trigger false alarms on a production dashboard.
Agent C read the code comment in production_runs.sql and applied the filter. Agent A had no way
to know this convention.
Question 3: Supplier tier classification
All three agents found the same 2 at-risk suppliers, but Agent A guessed the tier thresholds:
-- Agent A: guessed thresholds
WHEN quality_rating >= 4.0 AND delivery_rating >= 4.0 THEN 'preferred'
WHEN quality_rating >= 3.0 AND delivery_rating >= 3.0 THEN 'approved'
-- Agents B and C: extracted from supplier_performance_scorecard view
WHEN quality_rating >= 4.5 AND delivery_rating >= 4.5 THEN 'preferred'
WHEN quality_rating >= 3.5 AND delivery_rating >= 3.5 THEN 'approved'
With different data, Agent A's thresholds would classify suppliers into the wrong tiers.
Question 4: Production line efficiency
Agent A counted 6 runs for Widget Line A with a 66.67% pass rate. Agents B and C counted 5 runs
with 80.00%. Same root cause: Agent A included a run where end_time is NULL (still running).
Why the Directory Approach Wins
The schema directory agent scored 10/10 while using 37% fewer tokens and finishing faster than the single-file agent. Context was necessary for correct answers, but the right structure also sped up context retrieval:
-
Code comments as business rules. A two-line comment in
production_runs.sqlprevented the errors that tripped up the no-context agent. UnlikeCOMMENT ONstatements, these comments live in source files and don't require migrations to update. -
Selective file reading. Instead of loading 3,615 lines into context, the agent read only the 20 files relevant to its 10 questions. For larger schemas that exceed context limits, this selective reading is the only option that works.
-
Enums as vocabulary. Enum types tell the agent which values are valid. Without the
equipment_statusenum definition, the agent guesses (WHERE status = 'broken') or has to querypg_enumandpg_typeat runtime. With the enum file, it uses the correct values immediately. -
Views as pre-answered questions. The
daily_production_summarymaterialized view answers "what happened on the production floor today?" with yields, quality counts, and run durations already computed. An agent with this view in context queries it directly instead of writing a complex aggregation from scratch. -
Functions as business logic. The
equipment_health_dashboardview referencescalculate_oee, a function that computes Overall Equipment Effectiveness (availability x performance x quality). Without its definition, the agent guesses the formula. With it, the function body and comments explain the classification thresholds. Atlas also provides a testing framework for database logic, giving agents context on expected behavior and edge cases without executing anything.
The no-context agent reported all answers as correct. Every query ran without errors and returned plausible results. But two of them were semantic errors: valid SQL that silently returns wrong data.
Providing Schema Context
1. Store Schema as Code
AI tools can query the database schema at runtime using atlas schema inspect with the --include
flag to retrieve specific parts. But storing the schema as version-controlled files means agents
can find them using the same tools they use for any code: file search (grep, glob), vector
embeddings (Cursor's codebase indexing), or agentic exploration (Claude Code's read-only sub-agents).
No live database connection needed, and you can add code comments and test files alongside the DDL.
To get started, export your existing database into a structured, context-rich schema directory
using atlas schema inspect:
atlas schema inspect -u "postgres://localhost:5432/mydb" --format '{{ sql . " " }}'
See the Export Database to Code guide for a full walkthrough.
2. Use One File Per Object
A single migration file with thousands of lines is poor context. Organized files are rich context:
sql/schemas/manufacturing/tables/equipment.sql
sql/schemas/manufacturing/types/enum_equipment_status.sql
sql/schemas/manufacturing/functions/calculate_oee.sql
An agent working on a manufacturing query can read just the relevant files instead of parsing the entire schema.
3. Use Import Directives
Atlas -- atlas:import directives declare dependencies between files:
-- atlas:import ../types/enum_quality_status.sql
-- atlas:import ../../public/tables/projects.sql
These are automatically generated when you run atlas schema inspect. If you maintain your schema
files manually, you can add import directives yourself to keep the dependency graph accurate.
Either way, an AI agent following imports can discover the full context chain for any object
without scanning the entire schema.
4. Include Functions and Views
Functions encode business logic. Views encode common query patterns. Both prevent the agent from reinventing complex calculations:
-- Instead of writing a 40-line OEE calculation:
SELECT manufacturing.calculate_oee(3, '2026-03-01');
-- Instead of a complex production aggregation:
SELECT * FROM manufacturing.daily_production_summary
WHERE production_date = CURRENT_DATE;
Use Atlas's testing framework to document expected behavior. Tests tell agents what a function does more clearly than comments:
test "schema" "oee_world_class" {
exec {
sql = "SELECT (manufacturing.calculate_oee(1, '2025-03-01'))->>'classification'"
output = "world_class"
}
}
5. Use Self-Explanatory Constraint Names
Named CHECK constraints are machine-readable business rules:
CONSTRAINT "projects_budget_spent_valid"
CHECK (budget_spent >= 0
AND (budget_allocated IS NULL
OR budget_spent <= budget_allocated * 1.1))
This tells the agent that budget overspend up to 10% is allowed, without requiring a comment.
Getting Started
Export an existing database to organized schema files:
# Install Atlas
curl -sSf https://atlasgo.sh | sh
# Export to a single SQL file
atlas schema inspect -u $DATABASE_URL --format '{{ sql . }}' > schema.sql
# Export to organized one-file-per-object structure
atlas schema inspect -u $DATABASE_URL --format '{{ sql . | split | write "sql" }}'
The second command generates the file structure described in this guide: one file per table, enum,
function, and view, with -- atlas:import directives declaring dependencies between them.
Once the files are in your repository, every AI tool that reads your codebase has access to the full schema context.
- SQL Schema: defining schema as SQL files with imports
- Inspecting Databases: exporting an existing database to code
- Export Database to Code: generating organized schema files