Skip to main content

Data Diff CI Triggers

Running data diff for specific PRs/MRs

By default, Datafold CI will run on every new pull/merge request and new commit to an existing pull/merge requests. To only run Datafold CI when the user explicitly requests it, you can set Run only when tagged option in the Datafold app CI settings which will only allow Datafold CI to run if a datafold tag/label is assigned to the pull/merge request.

Running data diff on specific file changes

By default, Datafold CI will run on any file change in the repo. To skip Datafold CI runs for certain modified files (e.g., if the dbt code is placed in the same repo with non-dbt code), you can specify files to ignore. The pattern uses the syntax of .gitignore. Excluded files can be re-included by using the negation.

Example

Let's say the dbt project is a folder in a repo that contains other code (e.g., Airflow). We want to run Datafold CI for changes to dbt models but skip it for other files. For that, we exclude all files in the repo except those the /dbt folder. We also want to filter out .md files in the /dbt folder:

*!dbt/*dbt/*.md
skipping specific dbt models

To skip diffing individual dbt models in CI, use the never_diff option in the Datafold dbt yaml config.

Running data diff on specific branches

By default, Datafold CI will run on every new pull/merge request and new commit to an existing pull/merge request. You can set Custom base branch option in the Datafold app CI settings, to only run Datafold CI on pull requests that have a specific base branch. This might be useful if you have multiple environments built from different branches. For example, staging and production environments built from staging and main branches respectively. Using the option, you can have 2 different CI configurations in Datafold, one for each environment, and only run the CI for the corresponding branch.