How Datafold Diffs Data

On this page

What types of data can data diffs compare?
Creating data diffs
How in-database diffing works
How cross-database diffing works

The basic inputs required to run a diff are the data connections, names/paths of the datasets to be compared, and the primary key (one or more columns that uniquely identify rows in the datasets).

What types of data can data diffs compare?

Diffs can compare data in tables, views, SQL queries (in relational databases and data lakes), and even files (e.g. CSV, Excel, Parquet, etc.). Datafold facilitates data diffing by supporting a wide range of basic data types across major database systems like Snowflake, Databricks, BigQuery, Redshift, PostgreSQL, and many more.

Creating data diffs

Diffs can be created in several ways:

Interactively through the Datafold app
Programmatically via our REST API
As part of a Continuous Integration (CI) workflow for Deployment Testing

How in-database diffing works

When diffing data within the same physical database or data lake namespace, diffs compare data by executing various SQL queries in the target database. It uses several JOIN-type queries and various aggregate queries to provide detailed insights into differences at the row, value, and column levels, and to calculate differences in metrics and distributions.

How cross-database diffing works

When comparing data across databases, diffs leverage checksumming and interval search to diff the data fast and at minimal cost. Diffs can quickly assess both the magnitude of differences and identify specific rows, columns, and values with differences without having to copy the entire datasets over the network. This efficiency makes it scalable for datasets as large as trillions of rows or terabytes in size.

What's a Data Diff?Creating a New Data Diff

INTRODUCTION

DATA DIFFS

CI/CD TESTING

DATA MONITORS

DATA EXPLORER

DATA MIGRATION AUTOMATION

INTEGRATIONS

DEPLOYMENT

SECURITY

SUPPORT

How Datafold Diffs Data

What types of data can data diffs compare?

Creating data diffs

How in-database diffing works

How cross-database diffing works

INTRODUCTION

DATA DIFFS

CI/CD TESTING

DATA MONITORS

DATA EXPLORER

DATA MIGRATION AUTOMATION

INTEGRATIONS

DEPLOYMENT

SECURITY

SUPPORT

​What types of data can data diffs compare?

​Creating data diffs

​How in-database diffing works

​How cross-database diffing works

What types of data can data diffs compare?

Creating data diffs

How in-database diffing works

How cross-database diffing works