Model-specific CI configuration
Datafold leverages dbt YAML configuration to allow setting model-specific configuration for CI.
SQL Filters
SQL filters can be helpful in two scenarios:
When Production and Staging environment are not built using the same data. I.e., when Staging is built using a subset of production data, filters can be applied to ensure that both environments are on par and can be diffed.
To improve Datafold CI performance by reducing the volume of data compared, e.g. to only last 3 months.
are an effective technique to speed up diffs by narrowing the data diffed. SQL filter adds a WHERE
clause to allow you to filter data on both sides using standard SQL filter expressions. SQL filters can be added to dbt YAML under the meta.datafold.datadiff.filter
tag:
models:
- name: users
meta:
datafold:
datadiff:
filter: "user_id > 2350 AND source_timestamp >= current_date() - 7"
Time Travel
If your database supports time travel, you can diff tables from a particular point in time by specifying prod_time_travel
for a production model and pr_time_travel
for a PR model.
models:
- name: users
meta:
datafold:
datadiff:
prod_time_travel:
- 2022-02-07T00:00:00
pr_time_travel:
- 2022-02-07T00:00:00
Including/Excluding Columns
You can specify which columns to include or exclude in the diff.
models:
- name: users
meta:
datafold:
datadiff:
include_columns:
- user_id
- created_at
- name
exclude_columns:
- full_name
Excluding Models
You can exclude a model or a subdirectory of models using never_diff
.
models:
- name: users
meta:
datafold:
datadiff:
never_diff: true
Diff Timeline
You can specify a time_column
to visualize the match rate between tables for each column over time.
models:
- name: users
meta:
datafold:
datadiff:
time_column: created_at