Skip to main content

dbt Cloud

info

You will need a dbt Team account or higher to access the dbt Cloud API that Datafold uses to connect the accounts.

Prerequisites

Set up dbt Cloud CI

In dbt Cloud, set up dbt Cloud CI so that your Pull Request job runs when you open or update a Pull Request. This job will provide Datafold information about the changes included in the PR.

Create an Artifacts Job in dbt Cloud

The Artifacts job generates production manifest.json on a scheduled basis, giving Datafold information about the state of production. The simplest method is to set up a dbt Cloud job that executes the dbt ls command on an hourly basis.

Note: dbt ls is preferred over dbt compile as it runs faster and data diffing does not require fully compiled models to work.

Example dbt Cloud artifact job settings and successful run:

Continuous Deployment
If you are interested in continuous deployment, you can use a Merge Trigger Production Job instead of the Artifacts Job listed above.

dbt Cloud API Key

You will need either a Service Token or a User Token to connect Datafold to your dbt Cloud account.

  • Service Token (Recommended): Navigate to Account Settings Service Tokens + New Token

  • Add a Permission Set and select Member or Developer
  • Select All Projects, or check only the projects to use with Datafold
  • Save

  • Navigate to Your Profile API Access and copy the token.

Create a dbt Cloud Integration in the Datafold app

  • Navigate to Settings > Integrations > dbt Cloud/Core and create a new dbt Cloud integration.

  • Enter the API Key (Service Token) you copied from dbt Cloud.

Configuration

Basic Settings

Advanced Settings

  • Enable Datafold in CI/CD: High-level switch to turn Datafold off or on in CI (but we hope you'll leave it on!).
  • Import dbt tags and descriptions: Populate our Lineage tool with dbt metadata. ⚠️ This feature is in development. ⚠️
  • Slim Diff: Only diff modified models in CI, instead of all models. Please read more about Slim Diff, which is highly configurable using dbt yaml, and each organization will need to set a strategy based on their data environment.
    • Downstream Hightouch models will be diffed even when Slim Diff is turned on.
  • Diff Hightouch Models: Hightouch customers can see diffs of downstream Hightouch assets in Pull Requests.
  • CI fails on primary key issues: The existence of null or duplicate primary keys causes the Datafold CI check to fail.
  • Pull Request Label: For when you want Datafold to only run in CI when a label is manually applied in GitHub/GitLab.
  • Files to ignore: If at least one modified file doesn’t match the ignore pattern, Datafold CI diffs all changed models in the PR. If all modified files should be ignored, Datafold CI does not run in the PR. (Additional details.)

Click save, and that's it! 🎉

Now that you've set up a dbt Cloud integration, Datafold will diff your impacted tables whenever you push commits to a PR. A summary of the diff will appear in GitHub, and detailed results will appear in the Datafold app.