Integrating Datafold with dbt

Why do I need Datafold if I already have dbt tests?

What do I need to implement Datafold for dbt?

We currently have a dbt Cloud Slim CI job. Does Datafold work with the custom PR schema that dbt Cloud creates?

How can I optimize diff performance in dbt?

Can I run Data Diffs before opening a PR?

Some teams want to show Data Diff results in their tickets before creating a pull request. This speeds up code reviews as developers can QA code changes before requesting a PR review.You can trigger a Data Diff by first creating a draft PR and then running the following command via the CLI:

dbt run && datafold diff dbt

This command runs dbt locally and then triggers a Data Diff, allowing you to preview data changes without pushing to Git.To automate this process of kicking off a Data Diff before pushing code to git, we recommend creating a GitHub Actions job for draft PRs. For example:

name: Data Diff on draft dbt PR

on:
  pull_request:
    types: [opened, reopened, synchronize]
    branches:
      - '!main'

jobs:
  run:
    if: github.event.pull_request.draft == true  # Run only on draft PRs
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v2

      - name: Set Up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.8'

      - name: Install requirements
        run: pip install -r requirements.txt  

      - name: Install dbt dependencies
        run: dbt deps

      # Update with your S3 bucket details
      - name: Grab production manifest from S3
        run: |
          aws s3 cp s3://advanced-ci-manifest-demo/manifest.json ./manifest.json
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_REGION: us-east-1

      - name: Run dbt and Data Diff
        env:
          DATAFOLD_API_KEY: ${{ secrets.DATAFOLD_API_KEY }}
        run: |
          dbt run
          datafold diff dbt
          
      # Optional: Submit artifacts to Datafold for more analysis or logging
      - name: Submit artifacts to Datafold
        run: |
          set -ex
          datafold dbt upload --ci-config-id 350 --run-type pull_request --commit-sha ${GIT_SHA}
        env:
          DATAFOLD_API_KEY: ${{ secrets.DATAFOLD_API_KEY }}
          GIT_SHA: "${{ github.event.pull_request.head.sha }}"

FAQ