Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datafold.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The top-level menu displays the diff status, job ID, creation and completed times, runtime, and data connection.

Match Score

The Match Score is the percentage shown in the Overview tab. It summarizes how similar the two datasets are across rows, columns, and values in a single number between 0% and 100%.

Formula

Match Score = max(0, total_cells - non_matching_cells) / total_cells
clamped to the range 0%–100%. Total cells is the sum of cells in both tables:
total_cells = (rows_A × cols_A) + (rows_B × cols_B)
Non-matching cells is the sum of three contributions:
SourceContribution to non-matching cells
Value differences (cells in both tables with different values)value_diffs × 2
Exclusive rows (rows present in only one table)exclusive_rows × shared_columns
Exclusive columns (columns present in only one table)(extra_cols_A × rows_A) + (extra_cols_B × rows_B)
Value differences are multiplied by 2 because each differing cell is counted once on Table A’s side of the denominator and once on Table B’s side. This keeps the score on a consistent scale regardless of whether a discrepancy comes from a value change, a missing row, or a missing column.

Example

Table A has 100 rows × 10 columns (1,000 cells). Table B is identical in shape and content except for 4 differing values.
  • Total cells: 1,000 + 1,000 = 2,000
  • Non-matching cells: 4 × 2 = 8
  • Match Score: (2,000 − 8) / 2,000 = 99.6%

Why an extra column lowers the score

Every contribution penalizes the score — not just value differences. A table with identical values but one extra column will not score 100%, because that column’s cells exist on only one side. For example, if Table A has the extra column (100 rows × 1 column = 100 cells), those 100 cells are added to the non-matching count, dropping the score to (2,000 − 100) / 2,000 = 95%.

Edge cases

  • Empty tables: if neither table has any cells, the Match Score is 0%, not 100%. There is nothing to match.
  • Sampling: when sampling is enabled, the Match Score is computed using the sampled row counts, not the full table sizes. The score reflects the sample.

Columns

The Columns tab displays a table with detailed column and type mappings from the two datasets being diffed, with status indicators for each column comparison (e.g., identical, percentage of values different). This provides a quick way to identify data inconsistencies and prioritize updates.

Primary keys

This tab highlights rows that are unique to the Target dataset in a data diff (“Rows exclusive to Target”). As this identifies rows that exist only in the Target dataset and not in the Source dataset based on the primary key, it flags potential data discrepancies. The Clone diffs and materialize results button allows you to rerun existing data diffs with results materialized in the warehouse, as well as any other desired modifications.

Values

This tab displays rows where at least one column value differs between the datasets being compared. It is useful for quickly assessing the extent of discrepancies between the two datasets. The Show filters button enables the following features:
  • Highlight characters: highlight value differences between tables
  • % of difference: filters and displays columns based on the specified percentage range of value differences