Overview

The top-level menu displays the diff status, job ID, creation and completed times, runtime, and data connection.

Columns

The Columns tab displays a table with detailed column and type mappings from the two datasets being diffed, with status indicators for each column comparison (e.g., identical, percentage of values different). This provides a quick way to identify data inconsistencies and prioritize updates.

Primary keys

This tab highlights rows that are unique to the Test dataset in a data diff (“Rows exclusive to Test”). As this identifies rows that exist only in the Test dataset and not in the Main dataset based on the primary key, it flags potential data discrepancies.

The Show filters button allows you to filter these rows by selected column(s).

The Clone diffs and materialize results button allows you to rerun existing data diffs with results materialized in the warehouse, as well as any other desired modifications.

Column Profiles

Column Profiles displays aggregate statistics and distributions including averages, counts, ranges, and histogram charts representing column-level differences.

The Show filters button allows you to adjust chart values by relative (percentage) or absolute numbers.

Values

This tab displays rows where at least one column value differs between the datasets being compared. It is useful for quickly assessing the extent of discrepancies between the two datasets.

The Show filters button enables the following features:

  • Highlight characters: highlight value differences between tables
  • % of difference: filters and displays columns based on the specified percentage range of value differences

Timeline

The Timeline tab is a specialized feature that only appears if the time-series dimension column has been selected. It graphically represents data differences over time to highlight discrepancies. It only displays columns with data differences, and differences are presented as the share of mismatched data (percentage mismatched).

This feature offers enhanced clarity in pinpointing inconsistencies, supports informed decision-making through visual data representation, and increases efficiency in identifying and resolving data-related issues.

The Timeline feature is particularly useful in scenarios where an incremental model is mismanaged, leading to improper backfilling. It allows users to visually track the inconsistencies that arise over time due to the mismanagement. This graphical representation makes it easier to pinpoint the specific time frames where the errors occurred, facilitating a more targeted approach to rectify these issues.

It is also useful in correlating data differences with specific time intervals that coincide with changing data connections. When switching over or stitching together different data connections, there’s often a shift in how data behaves over time. The Timeline graph helps flag the potential impact of the source change on data consistency and integrity.

Downstream Impact

This tab displays all associated BI and data app dependencies, such as dashboards and views, linked to the compared datasets. This helps visually illustrate the impact of data changes on downstream data assets.

Each listed dependency is shown with a link to its lineage diagram within Datafold’s column-level lineage. You can you can filter by tables or columns within tables, or open this view in Data Explorer for further analysis.