Skip to main content

Options

info

dbt users should check out our dbt Integration, where you'll find everything you need to get started.

You can use the following options to specify the configuration of a data-diff run.

# Specify the default run parameters
[run.default]
verbose = true
stats = true
data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME --debug --v -k order_id
CategoryConfig keyCLI switch                                       DescriptionWithin-DatabaseCross-Database
help--helpShow help message and exit.
Schemakey_columns-k or --key-columnsName of the primary key column. If none provided, default is 'id'.
Schemaupdate_column-t or --update-columnName of updated_at/last_updated column.
Schemacolumns-c or --columnsNames of extra columns to compare. Can be used more than once in the same command. Accepts a name or a pattern, like in SQL. Example: -c col% -c another_col -c %foorb.r%
Schemaassume_unique_key--assume-unique-keySkip validating the uniqueness of the key column during joindiff, which is costly in non-cloud dbs.
Filteringmin_age--min-ageConsiders only rows older than specified. Useful for specifying replication lag. Example: --min-age=5min ignores rows from the last 5 minutes. Valid units: d, days, h, hours, min, minutes, mon, months, s, seconds, w, weeks, y, years
Filteringmax_age--max-ageConsiders only rows younger than specified. See --min-age.
Filteringwhere-w, --whereAn additional 'where' expression to restrict the search space.
Performancelimit-l or --limitMaximum number of differences to find (limits maximum bandwidth and runtime).
Performancethreads-j or --threadsNumber of worker threads to use per database. Default=1.
Performancealgorithm-a, --algorithmForce algorithm choice. Options: auto, joindiff, hashdiff
Performancebisection_threshold--bisection-thresholdMinimal size of segment to be split. Smaller segments will be downloaded and compared locally.
Performancebisection_factor--bisection-factorSegments per iteration. When set to 2, it performs binary search.
Outputstats-s or --statsPrint stats instead of a detailed diff.
Outputdebug-d or --debugPrint debug info.
Outputinteractive-i or --interactiveConfirm queries, implies --debug
Outputverbose-v or --verbosePrint extra info.
Outputjson--jsonPrint JSONL output for machine readability.
Outputsample_exclusive_rows--sample-exclusive-rowsSample several rows that only appear in one of the tables, but not the other. Use with -s.
Outputmaterialize_all_rows--materialize-all-rowsMaterialize every row, even if they are the same, instead of just the differing rows.
Outputmaterialize-m, --materializeMaterialize the diff results into a new table in the database. If a table exists by that name, it will be replaced. Use %t in the name to place a timestamp. Example: -m test_mat_%t
Outputtable_write_limit--table-write-limitMaximum number of rows to write when creating materialized or sample tables, per thread. Default=1000.
Settings--conf, --runSpecify the run and configuration from a TOML file.
Settingsno_tracking--no-trackingdata-diff sends home anonymous usage data. Use this to disable it.