> ## Documentation Index
> Fetch the complete documentation index at: https://docs.datafold.com/llms.txt
> Use this file to discover all available pages before exploring further.

# What's a Data Diff?

> A data diff is the value-level comparison between two tables, used to identify critical changes to your data and guarantee data quality.

<Tip>
  Data Diffs functionality is available via [MCP](/api-reference/mcp-server-setup) — connect your AI assistant to Datafold and run diffs directly from your development environment.
</Tip>

When you **git diff** your code, you’re comparing two versions of your code files to see what has changed, such as lines added, removed, or modified. Similarly, a **data diff** compares two versions of a dataset or two databases to identify differences in individual cells in the data.

<img src="https://mintcdn.com/datafold/Q7OqZ4fuuETHBSvX/images/data_diff/what_is_data_diff.png?fit=max&auto=format&n=Q7OqZ4fuuETHBSvX&q=85&s=9c6fdaf4be1e5f7ff097e35e236aad4c" alt="what's a data diff" width="2400" height="1350" data-path="images/data_diff/what_is_data_diff.png" />

## Why do I need to diff data?

Just as diffing code and text is fundamental to software engineering and working with text documents, diffing data is essential to the data engineering workflow.

Why? In data engineering, both data and the code that processes it are constantly evolving. Without the ability to easily diff data, understanding and tracking data changes becomes challenging. This slows down the development process and makes it harder to ensure data quality.

There is a lot you can do with data diff:

* Test SQL code by comparing development or staging environment data to production
* Compare tables in source and target systems to identify discrepancies when migrating data between databases
* Detect value-level outliers, or unexpected changes, in data flowing through your ETL/ELT pipelines
* Verify that reports generated for regulatory compliance accurately reflect the underlying data by comparing report outputs with source data

## Why Datafold?

Data diffing is a fundamental capability in data engineering that every engineer should have access to.

Datafold's [Data Diff](https://www.datafold.com/data-diff) compares datasets fast, within or across databases. As part of Datafold's data quality power tools, Data Diff is fully interoperable with AI agents via [MCP](/datafold-mcp) — so your coding agents can run diffs, validate their own work, and reconcile data across sources programmatically. Datafold offers an enterprise-ready solution for comparing datasets at scale, with comprehensive diffing, API access, and secure deployment options.

Here's how you can identify row-level discrepancies in Datafold:

<Frame>
  <img src="https://mintcdn.com/datafold/hQ4DukKOuaj6vjhH/images/data-diff.png?fit=max&auto=format&n=hQ4DukKOuaj6vjhH&q=85&s=414ebe833488adcc05e8c6f196db307a" width="8271" height="6306" data-path="images/data-diff.png" />
</Frame>

Datafold provides end-to-end solutions for automating testing, including column-level lineage, ML-based anomaly detection, and enterprise-scale infrastructure support. It caters to complex and production-ready scenarios, including:

* Automated and collaborative diffing and testing for data transformations in CI
* Data diffing informed by column-level lineage, and validation of code changes with visibility into BI applications
* Validating large data migrations or continuous replications with automated cross-database diffing capabilities

Here's a high-level overview of what Datafold offers:

|                                                        Feature Category                                                       |                     Datafold                     |
| :---------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------: |
|                      **Database Support**<br />*Databases that are supported for source-destination diff*                     | Any SQL database, inquire about specific support |
|                                    **Scale**<br />*Size of datasets supported for diffing*                                    | Unlimited with advanced performance optimization |
|               **Primary Key Data Type Support**<br />*Data types of primary keys that are supported for diffing*              |  Numerical, string, datetime, boolean, composite |
|                   **Data Types Diffing Support**<br />*Data types that are supported for per-column diffing*                  |                  All data types                  |
|               **Export Diff Results to Database**<br />*Materialize diffing results in your database of choice*               |           <Icon icon="square-check" />           |
|     **Value-level diffs**<br />*Investigate row-by-row column value differences between source and destination databases*     |     <Icon icon="square-check" /> (JSON & GUI)    |
|                **Diff UI**<br />*Explore diffs visually and easily share them with your team and stakeholders*                |           <Icon icon="square-check" />           |
|           **API Access**<br />*Automatically create diffs and receive results at scale using the Datafold REST API*           |           <Icon icon="square-check" />           |
| **Persisting Diff History**<br />*Persist the result history of diffs to know how your data and diffs have changed over time* |           <Icon icon="square-check" />           |
|                          **Scheduled Checks**<br />*Run scheduled diffs for a defined list of tables*                         |           <Icon icon="square-check" />           |
|      **Alerting**<br />*Receive automatic alerts about detected discrepancies between tables within or across databases*      |           <Icon icon="square-check" />           |
|                       **Security and Compliance**<br />*Run diffs in secure and compliant environments*                       |        HIPAA, SOC2 Type II, GDPR compliant       |
|            **Deployment Options**<br />*Deploy your diffs in secure environments that meet your security standards*           |     Multi-tenant SaaS or Single-tenant in VPC    |
|                **Support**<br />*Choose which channels offer the greatest support to your use cases and users*                |   Enterprise support from Datafold team members  |
|        **SLA**<br />*The types of SLAs that exist to guarantee your team can diff and interact with diffs as expected*        |    <Icon icon="square-check" /> (Coming soon)    |

## Three ways to learn more

If you're new to Datafold or data diffing, here are three easy ways to get started:

1. **Explore our CI integration guides**: See how Datafold fits into your continuous integration (CI) pipeline by checking out our guides for [No-Code](../deployment-testing/getting-started/universal/no-code), [API](../deployment-testing/getting-started/universal/api), or [dbt](../integrations/orchestrators) integrations.
2. **Try it yourself**: Use your own data with our [14-day free trial](https://app.datafold.com/) and experience Datafold in action.
3. **Book a demo**: Get a deeper technical understanding of how Datafold integrates with your company’s data infrastructure by [booking a demo](https://www.datafold.com/booktime) with our team.
