What is Datafold’s resource consumption footprint? How will Datafold affect my data warehouse costs?
What is Datafold’s resource consumption footprint? How will Datafold affect my data warehouse costs?
Recognizing the importance of efficient data reconciliation, we provide a number of strategies to make the diffing process as efficient as possible:Efficient AlgorithmDatafold connects to any SQL source and target databases, similar to how BI tools do. Datasets from both data connections are co-located in a centralized database to execute comparisons and identify specific rows, columns, and values with differences. To perform diffs at massive scale and increased speed, users can apply sampling, filtering, and column selection.Flexible ControlsUsers can easily control the volume of data used in diffing by using:
- Filters: Focus on the most relevant part of the dataset
- Sampling: Set sampling as a percentage of rows or desired confidence level
- Slim Diff: Selectively diff only the models that have dbt code changes in your pull request.
- On the Datafold side: Set desired concurrency
- On the database side: Most databases support workload management settings to ensure that Datafold does not consume more than X% CPU or Y% RAM