CI/CD Testing
You can use SQL filters to ensure that Datafold compares equivalent subsets of data between your staging/dev and production environments, allowing for accurate data quality checks despite the difference in data volume.
Yes, you can use Datafold in development. It helps catch data quality issues early by comparing data changes in your development environment before they reach production. This proactive approach ensures that errors and inconsistencies are identified and resolved during the development process, enhancing overall data reliability and preventing potential issues in production. Data teams can leverage the Datafold SDK to run data diffs from the command line while developing and testing data models.
Data drift in CI occurs when the two data transformation builds that are compared by Datafold in CI have differing data outputs due to the upstream data changing over time.
We have a few recommended strategies for dealing with data drift in our docs here.
Some teams want to show Data Diff results in their tickets before creating a pull request. This speeds up code reviews as developers can QA code changes before requesting a PR review.
If you use dbt, we explain how you can automate this workflow here.