Skip to main content

dbt Cloud

Prerequisites

  • To configure dbt Cloud, you must first connect a Data Source and connect a GitHub or GitLab account.
  • To access the dbt API Datafold requires a dbt Team account or higher.
  • You will need either a Service Token or a User Token:
    • Service Token (Recommended):
      • Navigate to Account Settings -> Service Tokens -> + New Token



        • Add a Token Name
        • Add a Permission Set



          • Permission Set: Member
          • Project: All Projects, or check only the projects to use with Datafold
          • Save



    • User Token:
      • Navigate to Your Profile -> API Access
        • Copy
  • In dbt Cloud, configure a production job and a Pull Request job, and set up dbt Cloud CI so that your Pull Request job runs when you open or update a Pull Request.

Connecting dbt Cloud

  • To set up dbt Cloud begin by navigating to Admin -> Settings -> Orchestration. Here you will click on Add New Integration to enter your dbt Cloud details.

Configure dbt Cloud on Datafold

To complete the setup you'll connect your dbt Cloud account with an API key and specify settings for your dbt runs.

Complete the configuration by specifying the following fields:

Field NameDescription
RepositorySelect the repository that generates the webhooks and where pull / merge requests will be raised.
Data SourceSelect the data source where the code that is changed in the repository will run.
NameAn identifier used in Datafold to identify this CI configuration.
API KeyThis is the token created above.
Account nameThis becomes selectable when a valid API key is filled in. After that, select your account to use.
Job that builds production tablesThis becomes selectable after a valid API key is filled in. Select the job that builds production tables.
Skip for pull/merge requestsWhen selected, the Datafold CI pipeline won't be run on pull/merge requests.
Job that builds pull requestsThis becomes selectable after a valid API key is filled in. Select the job that builds pull requests.
Primary key tagSee dbt Integration. This should be set to a value such as primary-key which will also be used in your dbt project configuration (yaml files and config blocks).
Base branch commit selection strategySelect "Merge Base" to compare your PR to the commit in the main branch that is defined as part of the PR. Select "Latest" to compare your PR branch to the latest commit in the main branch.
Sync metadata on every push to productionWhen selected, will sync the metadata from the dbt run with Datafold every time a push happens on the default branch.
Sync metadata on a scheduleSet a schedule to synchronize the dbt metadata (columns and table descriptions, tags, owners, etc), use this when you run the production job on a schedule.
Files to ignoreIf defined, the files matching the pattern will be ignored in the PRs. The pattern uses the syntax of .gitignore. Excluded files can be re-included by using the negation; re-included files can be later re-excluded again to narrow down the filter. For example, to exclude everything except the /dbt folder, but not the dbt .md files, do:*!dbt/*dbt/*.md.
CI Status ReportingIf the checkbox is disabled, the errors in the CI runs will be reported back to GitHub/GitLab as successes, to keep the check "green" and not block the PR/MR. By default (enabled), the errors are reported as failures and may prevent PR/MRs from being merged.
Slim CIIf this box is checked, data diffs will be run only for models changed in a pull request. You'll be able to automatically diff downstream models within your PR.
Require 'datafold' label to be present on pull requestWhen this is selected, the Datafold CI process will only run when the 'datafold' label has been applied. This label needs to be created manually in GitHub or GitLab, and the title or name must match 'datafold' exactly.
Sampling toleranceThe tolerance to apply in sampling for all data diffs.
Sampling confidenceThe confidence to apply when sampling.
Sampling ThresholdSampling will be disabled automatically if tables are smaller than specified threshold. If unspecified, default values will be used depending on the Data Source type.

Click save. Now that you've set up the integration, Datafold will diff your impacted tables whenever you push commits to a PR. A summaryof the diff will appear in GitHub, and detailed results will appear in the Datafold app.