A core component of Datafold Cloud is the integration of Datafold into your Continuous Integration (CI) process. This is how Datafold creates Data Diffs for all dbt model code changes, catching issues before they make it into production.
Put simply, Continuous Integration (or CI) is a process for building and testing changes to your code before deploying to production.
- Changes are manually coordinated, and often become a complex synchronization chore.
- Testing is done manually, if at all.
- Code changes are released at a slower cadence, and with higher rates of failure.
- Smoothly manage code changes, and scale as your team and codebase grows.
- Automate high-confidence test coverage.
- Boost the quantity and quality of developer output.
For Datafold to work in CI, a step building dbt staging data needs to be added to your CI process in GitHub or GitLab.
Staging data is created using the version of the dbt code in your PR/MR branch, which contains the edits you're currently working on.
dbt in CI: Creating production and staging data
Datafold in CI automatically identifies data differences between production data and staging data.
Data Diff results are then written directly to GitHub/GitLab and can be viewed in the Datafold Cloud application.
dbt builds and updates production data in your warehouse. This is the data your dashboards, BI systems, and users depend on.
When you set up dbt in CI, dbt builds a version of the data in your warehouse that is based on your PR/MR code.
You can use either dbt Cloud or dbt Core to add astep in your CI process that builds staging data.
Datafold in CI: Comparing production and staging data
Once you have a job in CI that builds staging data, you'll be ready to get started with Datafold in CI!
We'll walk through the setup steps in more detail in the Getting Started section.
How does Datafold in CI work?
- Two versions of your dbt project's
manifest.jsonwill be submitted to Datafold representing the state of production code as well as PR/MR code.
- This submission of dbt artifacts happens out-of-the-box with dbt Cloud.
- dbt Core users can set this up by adding steps to their existing CI configuration in Circle CI, GitHub Actions, or GitLab.
- Datafold uses two versions of the
manifest.jsonto identify code differences.
- Datafold queries your warehouse and runs Data Diffs of modified models and other downstream impacts to data apps like Hightouch, Mode, Looker, and Tableau.
- The results of the Data Diffs are then written directly to GitHub/GitLab, and more details can be viewed in the Datafold Cloud application.