Open Source Data Diff
Three Use Cases
Our Open Source Data Diff package, data-diff, has three functions - each with a different use case in mind:
- dbt for comparing dbt models within the same data source
- joindiff for comparing tables within the same data source
- hashdiff for comparing tables across different data sources (e.g., Postgres and Snowflake)
Getting Started
Install
To get started with any of the use cases above, install data-diff and the relevant database connector(s):
- Snowflake
- BigQuery
- Redshift
- PostgreSQL
- Databricks
- DuckDB
- Other
pip install data-diff 'data-diff[snowflake]' -U
pip install data-diff google-cloud-bigquery -U
Additional BigQuery details
Only dbt projects that use the OAuth via gcloud connection method are currently supported.
For example, run:
For example, run:
gcloud auth application-default login
before running a data-diff
command.pip install data-diff 'data-diff[redshift]' -U
pip install data-diff 'data-diff[postgres]' -U
Supported for PostgreSQL >=10
If you need support for an earlier version, please open an issue.
pip install data-diff 'data-diff[databricks]' -U
pip install data-diff 'data-diff[duckdb]' -U
Supported for DuckDB >=0.6
If you need support for an earlier version, please open an issue.
pip install data-diff 'data-diff[<database_name>]' -U
Additionally Supported Databases
The following databases are support for
If you'd like to request support for another database, please open an issue.
hashdiff
and joindiff
, but not dbt.- MySQL
- Clickhouse
- Presto
- Trino
- Vertica
- Oracle
'data-diff[mysql]'
If you'd like to request support for another database, please open an issue.
Run
Once you've installed data-diff, you can run it from the command line (see below) or via Python API (see docs).
- dbt
- joindiff
- hashdiff
caution
If you are a dbt user, check out our docs on Development Testing with Open Source.
data-diff <DB_URI> <TABLE_NAME_1> <TABLE_NAME_2> [OPTIONS]
info
You can find the URI string for your database, a full list of options, and joindiff and hashdiff examples in our Reference section.
data-diff <DB_URI_1> <TABLE_NAME_1> <DB_URI_2> <TABLE_NAME_2> [OPTIONS]
info
You can find the URI string for your database, a full list of options, and joindiff and hashdiff examples in our Reference section.