Data Contracts
An island never reaches for a file directly. It binds to a dataset: a named, typed contract declared in the manifest. The runtime resolves each dataset through a DuckDB query core, infers its columns and types from the live data, and checks every island binding against that shape. Your files are the source of truth; there are no snapshots to drift out of date.
Datasets
datasets maps a name to a source. A dataset is one of three shapes:
"datasets": {
"net_worth": { "source": "data/net_worth_monthly.csv" }, // a file you own
"allocation": { "sql": "models/transforms/allocation.sql" }, // a SQL transform over other datasets
"tracks": { "source": "data/library.sqlite", "table": "tracks" } // a table in a SQLite database
}- A file source.
sourcepoints at a CSV, JSON, JSONL, or Parquet file, relative to the project root. DuckDB reads it directly and infers the column types. - A SQL transform.
sqlpoints at a.sqlfile undermodels/(typicallymodels/transforms/). This is a derived, read-only view (see below). - A SQLite table. A
.sqlite/.dbsource plus atablename. A SQLite source requirestable; supplyingtableon any other source is a named validation error.
A file or SQLite-backed dataset is a source dataset: it can be written to (by actions
and connectors). A sql dataset is derived and never writable.
Shaping lives in SQL, never in island configs
A golden rule: islands stay declarative. You never put a sum, a filter, or a join inside an
island's config. Data shaping happens in the data layer, in a sql transform, so the
manifest only ever names fields that already exist.
Drop a .sql file in models/transforms/ and register it as a dataset. It's a plain DuckDB
SELECT over your other datasets (referenced by their dataset name):
-- models/transforms/allocation.sql
SELECT
class,
SUM(value_eur) AS value_eur,
SUM(value_eur) / SUM(SUM(value_eur)) OVER () AS share
FROM holdings
GROUP BY class
ORDER BY value_eur DESC;"datasets": {
"holdings": { "source": "data/holdings.csv" },
"allocation": { "sql": "models/transforms/allocation.sql" }
}Now a breakdown.treemap can bind cleanly to allocation with label: "class" and value: "value_eur". Every value it reads is a real column, computed once in SQL.
The contract check
This is the safety net. When you run validate (or an agent calls propose_edit), the
compiler materializes each dataset, asks DuckDB for its columns and types, and checks every
island binding against that live schema:
- Bind to a column that exists → the island is green.
- Bind to a column that doesn't → the build fails and names the page, the island index, the island type, and the missing field.
Because the check runs against the live data, not a cached schema, renaming a CSV column or changing a transform surfaces immediately as a named error, never as a silently-empty chart. The same machinery guards custom-island configs and page filter bindings.
Markdown datasets
A Markdown file can be a dataset too, useful for source.doc islands that embed notes or
strategy docs alongside your charts. Point a dataset's source at a .md file and reference
it from a source.doc island, keeping prose under the same typed, file-owned contract as
your numbers.
Writing back: actions and connectors
Contracts aren't read-only. Two typed write paths feed source datasets through a
checkpointed pipeline, so writes are reversible:
- Actions. A manifest-declared, typed
insertinto asourcedataset. Every row is validated against the dataset's inferred schema before anything is written. - Connectors. Vendored integrations that sync an external provider's data into
sourcedatasets on a schedule.
Both run through the same snapshot-before-write machinery the agent edit loop uses. See
MCP Server for how an agent drives actions (run_action) and connectors
(run_sync) safely, and the Manifest Reference for their full
declaration shape.