Datafold home page

Login
Request a Demo
Request a Demo

Frequently Asked Questions

FAQ

Overview
Data Diffing
CI/CD Testing
Data Migration Automation
Data Reconciliation
Data Monitoring and Observability
Integrating Datafold with dbt
Data Storage and Security
Performance and Scalability
Resource Management

FAQ

Resource Management

What is Datafold’s resource consumption footprint? How will Datafold affect my data warehouse costs?

Recognizing the importance of efficient data reconciliation, we provide a number of strategies to make the diffing process as efficient as possible:Efficient AlgorithmThe diffing algorithm itself leverages stochastic checksumming which is optimized for efficiency at scale. It provides detailed comparison by pushing down the compute to both source and target databases without requiring the extraction of datasets for comparison.Flexible ControlsUsers can easily control the volume of data used in diffing by using:

Filters: Focus on the most relevant part of the dataset
Sampling: Set sampling as a percentage of rows or desired confidence level
Slim Diff: Selectively diff only the models that have dbt code changes in your pull request.

Workload ManagementUsers can apply controls to enforce low diffing footprint:

On the Datafold side: Set desired concurrency
On the database side: Most databases support workload management settings to ensure that Datafold does not consume more than X% CPU or Y% RAM

Also, consider that using a data quality tool like Datafold to catch issues before production will reduce cost over time as it lowers the need for expensive reprocessing and troubleshooting. Datafold’s features like filtering, sampling, and Slim Diff ensure that only relevant datasets are tested, minimizing the computational load on your data warehouse. This targeted approach can lead to more efficient resource usage and potentially lower data warehouse operation costs.

Performance and Scalability

linkedin x github youtube

Powered by Mintlify

Assistant

Responses are generated using AI and may contain mistakes.