Our Fully-Automated CI integration enables you to automatically diff tables modified in a pull request so you know exactly how your data will change before going to production. We do this by analyzing the SQL in any changed files, extracting the relevant table names, and diffing those tables between your staging and production environments. We then post the results of those diffs—including any downstream impact—to your pull request for all to see. All without manual intervention.Documentation Index
Fetch the complete documentation index at: https://docs.datafold.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Your code must be hosted in one of our supported version control integrations
- Your tables/views must be defined in SQL
- Your schema names must be parameterized (see below)
- You must be automatically generating staging data (more info)
Get Started
Get started in just a few easy steps.1. Generate a Datafold API key
If you haven’t already generated an API key (you only need one), visit Settings > Account and select Create API Key. Save the key somewhere safe like a password manager, as you won’t be able to view it later.2. Set up a version control integration
Open the Datafold app and navigate to Settings > Integrations > Repositories to connect the repository that contains the code you’d like to automatically diff.3. Add a step to your CI workflow
This example assumes you’re using GitHub actions, but the approach generalizes to any version control tool we support including GitLab, Bitbucket, etc.
4. Parameterize schema names
If it’s not already the case, you’ll need to parameterize the schema for any table paths you’d like Datafold to diff. For example, let’s say you have a file calleddim_orgs.sql that defines a table called DIM_ORGS in your warehouse. Your SQL should look something like this:
5. Provide primary keys (optional)
While this step is technically optional, we strongly recommend providing primary keys for any tables you’d like Datafold to diff.
-- datafold: pk=<your_pk> syntax shown below:
