Learn how to set up and configure Datafold’s API for CI/CD testing.
Field Name | Description |
---|---|
Configuration name | Choose a name for your for your Datafold dbt integration. |
Repository | Select the repository you configured in step 1. |
Data Source | Select the data source your repository writes to. |
Field Name | Description |
---|---|
Diff Hightouch Models | Run data diffs for Hightouch models affected by your PR. |
CI fails on primary key issues | If null or duplicate primary keys exist, CI will fail. |
Pull Request Label | When this is selected, the Datafold CI process will only run when the ‘datafold’ label has been applied. |
CI Diff Threshold | Data Diffs will only be run automatically for given CI Run if the number of diffs doesn’t exceed this threshold. |
Custom base branch | If defined, the Datafold CI process will only run on pull requests with the specified base branch. |
Files to ignore | Datafold CI diffs all changed models in the PR if at least one modified file doesn’t match the ignore pattern. Datafold CI doesn’t run in the PR if all modified files should be ignored. (Additional details.) |
Field Name | Description |
---|---|
Enable sampling | Enable sampling for data diffs to optimize analyzing large datasets. |
Sampling tolerance | The tolerance to apply in sampling for all data diffs. |
Sampling confidence | The confidence to apply when sampling. |
Sampling threshold | Sampling will be disabled automatically if tables are smaller than specified threshold. If unspecified, default values will be used depending on the Data Source type. |
ci submit
command. The example below should be adapted to match your specific use-case.
json
file format. Datafold can then determine which models to diff in a CI run based on the diffs.json
you pass in to the Datafold SDK ci submit
command.
JSON
file is optional and you can also achieve the same effect by using standard input (stdin) as shown here. However, for brevity, we’ll use the JSON
file approach in this example:
diffs.json
file is specific to your use case and therefore out of scope for this guide. The only requirement is to adhere to the JSON
schema structure explained above.datafold ci submit
step in your PR CI job.
<datafold_ci_config_id>
with the CI config ID value.<path_to_diffs_json_file>
, as it heavily depends on your specific use case. However, ensure that the generated file adheres to the required schema outlined above.DATAFOLD_API_KEY
in your GitHub repository settings.Once you’ve completed these steps, Datafold will run data diffs between production and development data on the next GitHub Actions CI run.datafold-skip-ci
in the last commit message.