Datafold home pagelogo
  • Login
  • Request a Demo
  • Request a Demo
Documentation
API Reference
Frequently Asked Questions
  • About Datafold
  • Blog
  • FAQ
    • Overview
    • Data Diffing
    • CI/CD Testing
    • Data Migration Automation
    • Data Reconciliation
    • Data Monitoring and Observability
    • Integrating Datafold with dbt
    • Data Storage and Security
    • Performance and Scalability
    • Resource Management
    FAQ

    Data Reconciliation

    Datafold’s data diff connects to source and target databases and performs fast, accurate and detailed comparison of datasets providing aggregate summaries, column, and value-level insights into any discrepancies.

    1. Datafold connects to any SQL source and target databases, similar to how BI tools do.
    2. Datafold does not need to extract the entirety of the datasets for comparisons. Instead, Datafold relies on stochastic checksumming to identify discrepancies and only extract those for analysis.

    Datafold’s cross-database diffing will produce the following results:

    1. High-Level Summary:
      • Total number of different rows
      • Total number of rows (primary keys) that are present in one database, but not the other
      • Aggregate schema differences
    2. Schema Differences: Per-column mapping of data types, column order, etc.
    3. Primary Key Differences: Sample of specific rows that are present in one database, but not the other
    4. Value-Level Differences: Sample of differing values for each column with identified discrepancies; full dataset of differences can be downloaded or materialized to the warehouse

    You can check out what the results look like in the App.

    1. Via Datafold’s interactive UI
    2. Via the Datafold API
    3. On a schedule (as a monitor) with optional alerting via Slack, email, PagerDuty, etc.

    Yes, users can run as many diffs as they would like with concurrency limited by the underlying database.

    In such cases, we recommend using watermarking – diffing data within a specified time window of row creation / update (e.g. updated_at timestamp).

    Datafold performs best-effort type matching for cases when deterministic type casting is possible, e.g. comparing VARCHAR type with STRING type. When automatic type casting without information loss is not possible, the user can define type casting manually using diffing in Query mode.

    Yes, users can reshape the input dataset by writing a SQL query and diffing in Query mode to bring the dataset to a shape that can be compared with another. Datafold also supports column remapping for datasets with different column names between tables.

    To make the provisioning at scale easier, you can create data diffs via the Datafold API.

    Data Migration AutomationData Monitoring and Observability
    linkedinxgithubyoutube
    Powered by Mintlify
    Assistant
    Responses are generated using AI and may contain mistakes.