Legacy snapshot configuration legacy
From dbt versions 1.8 and earlier, you were able to configure snapshots using jinja blocks in your .sql
files. This is considered legacy syntax and was replaced with a YAML-based configuration in dbt Cloud's "Latest" release track and dbt v1.9 for faster and more efficient management.
However, there are situations where you might still need to use the legacy syntax for snapshots in any dbt version or release track. This page details how you can use the legacy SQL-based configurations and provides a path to migrate to the more efficient YAML configuration.
For new snapshots, we recommend using these latest YAML-based configs. If applying them to existing snapshots, you'll need to migrate over.
The following outlines the differences between the legacy SQL-based syntax and the updated YAML-based syntax:
Snapshot configurations
Although you can use the more performant YAML-based configuration, you might still want to use the legacy configuration to define your snapshots if it suits your needs.
Snapshots can be configured in two main ways:
- Using snapshot-specific configurations
- Or using general configurations
These configurations allow you to control how dbt detects changes in your data and where snapshots are stored. Both types of configurations can coexist in your project in the same config
block (or from your dbt_project.yml
file or properties.yaml
file). You can also configure snapshots using strategies, which define how dbt knows if a row has changed.
Snapshot specific configurations
Snapshot-specific configurations are applicable to only one dbt resource type rather than multiple resource types. You can define these settings within the resource’s file using the {{ config() }}
macro (as well as in the project file (dbt_project.yml
) or a property file (models/properties.yml
for models, similarly for other resources)).
{ % snapshot orders_snapshot %}
{{ config(
target_schema="<string>",
target_database="<string>",
unique_key="<column_name_or_expression>",
strategy="timestamp" | "check",
updated_at="<column_name>",
check_cols=["<column_name>"] | "all"
invalidate_hard_deletes : true | false
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
General configuration
Use general configurations for broader operational settings applicable across multiple resource types. Like resource-specific configurations, these can also be set in the project file, property files, or within resource-specific files using a config block.
Snapshot strategies
Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt that require the strategy
parameter:
- Timestamp — Uses an
updated_at
column to determine if a row has changed. - Check — Compares a list of columns between their current and historical values to determine if a row has changed. Uses the
check_cols
parameter.
- Timestamp
- Check
The timestamp strategy uses an updated_at
field to determine if a row has changed. If the configured updated_at
column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action.
Example
{% snapshot orders_snapshot_timestamp %}
{{
config(
target_schema='snapshots',
strategy='timestamp',
unique_key='id',
updated_at='updated_at',
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
The check strategy is useful for tables which do not have a reliable updated_at
column. It requires the check_cols
parameter, which is a list of columns within the results of your snapshot query to check for changes. Alternatively, use all columns using the all value (however this may be less performant).
Example
{% snapshot orders_snapshot_check %}
{{
config(
strategy='check',
unique_key='id',
check_cols=['status', 'is_cancelled'],
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
Examples
Configure snapshots
In dbt versions 1.8 and earlier, snapshots are select
statements, defined within a snapshot block in a .sql
file (typically in your snapshots
directory or any other directory). You'll also need to configure your snapshot to tell dbt how to detect record changes.
The following table outlines the configurations available for snapshots in versions 1.8 and earlier:
Config | Description | Required? | Example |
---|---|---|---|
target_database | The database that dbt should render the snapshot table into | No | analytics |
target_schema | The schema that dbt should render the snapshot table into | Yes | snapshots |
strategy | The snapshot strategy to use. One of timestamp or check | Yes | timestamp |
unique_key | A primary key column or expression for the record | Yes | id |
check_cols | If using the check strategy, then the columns to check | Only if using the check strategy | ["status"] |
updated_at | If using the timestamp strategy, the timestamp column to compare | Only if using the timestamp strategy | updated_at |
invalidate_hard_deletes | Find hard deleted records in source, and set dbt_valid_to current time if no longer exists | No | True |
- A number of other configurations are also supported (like,
tags
andpost-hook
), check out the full list here. - Snapshots can be configured from both your
dbt_project.yml
file and aconfig
block, check out the configuration docs for more information. - Note: BigQuery users can use
target_project
andtarget_dataset
as aliases fortarget_database
andtarget_schema
, respectively.
Add snapshot to a project
To add a snapshot to your project:
- Create a file in your
snapshots
directory with a.sql
file extension. For example,snapshots/orders.sql
- Use a
snapshot
block to define the start and end of a snapshot:
{% snapshot orders_snapshot %}
{% endsnapshot %}
- Write a
select
statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can usesources
orrefs
here.
{% snapshot orders_snapshot %}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
-
Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the
updated_at
column reliably indicates record changes, so we can use thetimestamp
strategy. If your query result set does not have a reliable timestamp, you'll need to instead use thecheck
strategy — more details on this in the next step. -
Add configurations to your snapshot using a
config
block. You can also configure your snapshot from yourdbt_project.yml
file (docs).
{% snapshot orders_snapshot %}
{{
config(
target_database='analytics',
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
- Run the
dbt snapshot
command. For our example, a new table will be created atanalytics.snapshots.orders_snapshot
. You can change thetarget_database
configuration, thetarget_schema
configuration and the name of the snapshot (as defined in{% snapshot .. %}
) will change how dbt names this table.
dbt snapshot
Running with dbt=1.8.0
15:07:36 | Concurrency: 8 threads (target='dev')
15:07:36 |
15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN]
15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s]
15:07:36 |
15:07:36 | Finished running 1 snapshots in 0.68s.
Completed successfully
Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1
-
Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the snapshot meta fields as described earlier.
-
Run the
dbt snapshot
command again, and inspect the results. If any records have been updated, the snapshot should reflect this. -
Select from the
snapshot
in downstream models using theref
function.
select * from {{ ref('orders_snapshot') }}
- Snapshots are only useful if you run them frequently — schedule the
snapshot
command to run regularly.
Examples
This section outlines some examples of how to apply configurations to snapshots using the legacy method.
Migrate legacy snapshot configs
This section outlines the steps you need to follow to migrate legacy jinja block snapshot configurations into the updated YAML-based configuration format.
Why use the updated YAML spec?
- YAML-based configurations are processed faster by dbt, leading to improved performance, especially during parsing and compilation.
- Centralizing configuration in YAML makes it easier to manage and update snapshot settings without editing the SQL logic directly.
- YAML configuration aligns snapshot definitions with other dbt resources, such as models and seeds, leading to a more consistent project structure.
Considerations
- In versions prior to v1.9, the
target_schema
(required) andtarget_database
(optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. - In v1.9,
target_schema
became optional, allowing snapshots to be environment-aware. - By default, without
target_schema
ortarget_database
defined, snapshots now use thegenerate_schema_name
orgenerate_database_name
macros to determine where to build. - Developers can still set a custom location with schema and database configs, consistent with other resource types.
How to migrate
The latest YAML-based configuration syntax is best suited for new snapshots. If you're migrating existing snapshots, consider the following steps:
-
Migrate the previous snapshot to the new table schema and values.
- Create a backup copy of your snapshots.
- Use
alter
statements as needed (or a script to applyalter
statements) to ensure table consistency.
-
Convert any configurations currently written within the jinja block (like
unique_key
,strategy
,updated_at
, and so on) into the YAML file structure, one at a time and testing as you go.The configurations are structured similarly to how you would define a model in
dbt_project.yml.
:snapshots.ymlsnapshots:
- name: orders_snapshot
schema: snapshots
unique_key: id
strategy: timestamp
updated_at: updated_atNote: The
unique_key
, strategy, andupdated_at
fields must match the settings previously defined in your jinja block. -
Before removing the old jinja block, run the
dbt snapshot
command using the new YAML configuration to confirm that the snapshot behaves as expected.- Verify that the data is processed correctly (for example, no data loss or incorrect records).
- Make sure the performance is either the same or improved compared to the old configuration.
- After running the new snapshot, inspect the snapshot tables in your data warehouse to confirm the new snapshot records match the old data.
-
Once you’ve confirmed that the new YAML configuration works properly, safely remove the old snapshot jinja block from your
.sql
file. This keeps your codebase clean and fully migrated to the new method. -
If your snapshots require more complex transformations, consider using an ephemeral model to handle the transformations before referencing it in the snapshot. An ephemeral model can encapsulate transformations and simplify the snapshot query itself.
Example of using an ephemeral model:
models/ephemeral/orders_ephemeral.sql{{
config(materialized='ephemeral')
}}
select * from {{ source('jaffle_shop', 'orders') }}Example of the snapshot YAML configuration referencing the ephemeral model:
snapshots.ymlsnapshots:
- name: orders_snapshot
relation: ref('orders_ephemeral')
target_schema: snapshots
unique_key: id
strategy: timestamp
updated_at: updated_at
Full migration example
Here’s a complete example of migrating from a legacy jinja block snapshot to a YAML-based snapshot configuration:
Legacy method (jinja block)
{% snapshot orders_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at',
)
}}
select * from {{ source('jaffle_shop', 'orders') }}
{% endsnapshot %}
Recommended method (YAML configuration)
snapshots:
- name: orders_snapshot
schema: snapshots
unique_key: id
strategy: timestamp
updated_at: updated_at
By following these steps, you can smoothly transition from legacy jinja-based snapshots to the modern, more efficient YAML-based configurations.