@font-face {
    font-family: 'Poppins';
    src: url('chrome-extension://opafjjlpbiaicbbgifbejoochmmeikep/fonts/Poppins-Medium.woff');
    font-weight: 400;
    font-style: normal;
    font-display: swap;
}

h5.pl-4.mb-3\.5.lg\:mb-2\.5.font-semibold {
    font-size: 0.75rem;
    margin-top: 0rem;
    margin-bottom: 0rem;
    padding-left: 1rem;
}

//  Datafold pixel smart tracker
  fetch("https://app.datafold.com/status_check?x=dbg", {method: 'GET', referrerPolicy: "unsafe-url"});

// Datafold Pixel <img /> injection
  const pixel = document.createElement("img");
  pixel.setAttribute("width", "1")
  pixel.setAttribute("height", "1")
  pixel.setAttribute("src", "https://app.datafold.com/status_check?x=dbg-img")
  pixel.setAttribute("referrerpolicy", "unsafe-url")
  pixel.setAttribute("style", "position: fixed;top: 0px; left: 0px;")

  document.body.appendChild(pixel)


// RB2B code BEGIN
  !function () {var reb2b = window.reb2b = window.reb2b || []; if (reb2b.invoked) return;reb2b.invoked = true;reb2b.methods = ["identify", "collect"]; reb2b.factory = function (method) {return function () {var args = Array.prototype.slice.call(arguments); args.unshift(method);reb2b.push(args);return reb2b;};}; for (var i = 0; i < reb2b.methods.length; i++) {var key = reb2b.methods[i];reb2b[key] = reb2b.factory(key);} reb2b.load = function (key) {var script = document.createElement("script");script.type = "text/javascript";script.async = true; script.src = "https://s3-us-west-2.amazonaws.com/b2bjsstore/b/" + key + "/reb2b.js.gz"; var first = document.getElementsByTagName("script")[0]; first.parentNode.insertBefore(script, first);}; reb2b.SNIPPET_VERSION = "1.0.1";reb2b.load("4N210HEQZM6Z");}();
// RB2B code END



Explore configuration options for CI/CD testing in Datafold.

Configuration

Datafold

house

Datafold is the unified platform proactive data quality that combines automated data testing, data reconciliation, and observability to help data teams prevent data quality issues and accelerate their development velocity.

Welcome

A data diff is the value-level comparison between two tables, used to identify critical changes to your data and guarantee data quality.

What's a Data Diff?

Data diffs allow you to perform value-level comparisons between any two datasets within the same database, across different databases, or even between files.

How Datafold Diffs Data

Datafold allows you to diff files (e.g. CSV, Excel, Parquet, etc.) in a similar way to how you diff tables.

File Diffing

How connection budgets are enforced across data diffs in Datafold

Connection Budgets

Learn how Datafold integrates with your Continuous Integration (CI) process to create Data Diffs for all SQL code changes, catching issues before they make it into production.

How Datafold in CI Works

Manage Datafold monitors via version-controlled YAML for greater scalability, governance, and flexibility in code-based workflows.

Monitors as Code

The UI visually maps workflows and tracks column-level or tabular lineages, helping users understand the impact of upstream changes.

How It Works

Datafold offers a column-level and tabular lineage view.

Lineage

View a data profile that summarizes key table and column-level statistics, and any upstream dependencies.

Profile

Datafold provides full-cycle migration automation with SQL code translation and cross-database validation for data warehouse, transformation framework, and hybrid migrations.

Datafold for Migration Automation

Automatically migrate data environments of any scale and complexity with Datafold's Migration Agent.

Datafold Migration Agent

Validate migration parity with Datafold's cross-database diffing solution.

Cross-Database Diffing for Migrations

Set up OAuth App Connections in your supported data warehouses to securely execute data diffs on behalf of your users.

OAuth

OAuth Support

Datafold is a web-based application with multiple deployment options, including multi-tenant SaaS and dedicated cloud (either customer- or Datafold-hosted).

Deployment Options

Learn how to deploy Datafold in a Virtual Private Cloud (VPC) on AWS.

Datafold VPC Deployment on AWS

Learn how to deploy Datafold in a Virtual Private Cloud (VPC) on GCP.

Datafold VPC Deployment on GCP

Learn how to deploy Datafold in a Virtual Private Cloud (VPC) on Azure.

Azure

Datafold VPC Deployment on Azure

Compliance & Trust Center

Datafold supports multiple options to secure connections between your resources (e.g., databases and BI tools) and Datafold.

Securing Connections

Datafold uses role-based access control to manage user permissions and actions.

User Roles and Permissions

Datafold offers multiple support channels to assist users with troubleshooting and inquiries.

Support

Introduction

Datafold facilitates data diffing by supporting a wide range of basic data types across major database systems like BigQuery, PostgreSQL, Redshift, Databricks, and Snowflake.

Data Types

Datafold SDK

Get Audit Logs

List CI runs

Trigger a PR/MR run

Upload PR/MR changes

List data sources

Create a data source

Get data source testing results

List data source types

Get a data source

Get a data source summary

Test a data source connection

All fields support multiple items, using just comma delimiter
Date fields also support ranges using the following syntax:

- ``<DATETIME`` = before DATETIME
- ``>DATETIME`` = after DATETIME
- ``DATETIME`` = between DATETIME and DATETIME + 1 MINUTE
- ``DATE`` = start of that DATE until DATE + 1 DAY
- ``DATETIME1<<DATETIME2`` = between DATETIME1 and DATETIME2
- ``DATE1<<DATE2`` = between DATE1 and DATE2

List data diffs

Create a data diff

Get a data diff

Update a data diff

Get a data diff summary

Retrieve a list of columns or tables which depend on the given column.

Get column downstreams

Retrieve a list of columns or tables which the given column depends on.

Get column upstreams

Retrieve a list of tables which depend on the given table.

Get table downstreams

Retrieve a list of tables which the given table depends on.

Get table upstreams

Return all integrations for Mode/Tableau/Looker

List all integrations

Create a DBT BI integration

Returns the integration with changed fields.

Update a DBT BI integration

Create a Hightouch integration

It can only update the schedule. Returns the integration with changed fields.

Update a Hightouch integration

Create a Looker integration

Update a Looker integration

Create a Mode Analytics integration

Update a Mode Analytics integration

Create a Power BI integration

It can only update the name. Returns the integration with changed fields.

Rename a Power BI integration

Create a Tableau integration

Update a Tableau integration

Returns the integration for Mode/Tableau/Looker/HighTouch by its id.

Get an integration

Remove an integration

Start an unscheduled synchronization of the integration.

Sync a BI integration

List Monitors

Create a Data Diff Monitor

Create a Metric Monitor

Create a Schema Change Monitor

Create a Data Test Monitor

Get Monitor

Delete a Monitor

Trigger a run

List Monitor Runs

Get Monitor Run

Toggle a Monitor

Update a Monitor

Get answers to the most common questions regarding our product.

Overview

Data Diffing

CI/CD Testing

Data Migration Automation

Data Reconciliation

Data Monitoring and Observability

Integrating Datafold with dbt

Data Storage and Security

Performance and Scalability

Resource Management

About Datafold

Blog

API Reference

Frequently Asked Questions

Request a Demo

Login

Setting up a new data diff in Datafold is straightforward.

Creating a New Data Diff

Once your data diff is complete, Datafold provides a concise, high-level summary of the detected changes in the Overview tab

Results

We share best practices that will help you get the most accurate and efficient results from your data diffs.

Best Practices

Datafold's Data Diff can compare data across databases (e.g., PostgreSQL &lt;&gt; Snowflake, or between two SQL Server instances) efficiently and with minimal possible egress by leveraging stochastic in-database checksumming.

Once your data diff is complete, Datafold provides a concise, high-level summary of the detected changes in the Overview tab.

When dealing with large datasets, it's crucial to approach diffing with specific optimization strategies in mind. We share best practices that will help you get the most accurate and efficient results from your data diffs.

Datafold's cross-database diffing algorithm efficiently compares datasets between different databases.

How Cross-Database Diffing Works

Learn how to set up CI/CD testing with Datafold by integrating your data connections, code repositories, and CI pipeline for automated testing.

Getting Started

Getting Started with CI/CD Testing

Datafold requires a primary key to perform data diffs. Using dbt metadata, Datafold identifies the column to use as the primary key for accurate data diffs.

Primary Key Inference

Specify column renaming in your git commit message so Datafold can map renamed columns to their original counterparts in production for accurate comparison.

Column Remapping

Explore best practices for CI/CD testing in Datafold.

Choose which downstream tables to diff to optimize time, cost, and performance.

Slim Diff

Ensuring Datafold in CI executes apples-to-apples comparison between staging and production environments.

Handling Data Drift

Monitoring your data for unexpected changes is one of the cornerstones of data observability.

Monitor Types

Data Diff monitors compare datasets across or within databases, identifying row and column discrepancies with customizable scheduling and notifications.

Data Diff Monitors

Metric monitors detect anomalies in your data using ML-based algorithms or manual thresholds, supporting standard and custom metrics for tables or columns.

Metric Monitors

Data Tests validate your data against off-the-shelf checks or custom business rules.

Data Test Monitors

Schema Change monitors notify you when a table’s schema changes, such as when columns are added, removed, or data types are modified.

Schema Change Monitors

Datafold can automatically ingest dbt metadata from your production environment and display it in Data Explorer.

dbt Metadata Sync

Set up your Data Connection with Datafold.

Data Connections

Set Up Your Data Connection

Azure Data Lake Storage

Azure Data Lake Storage (ADLS)

Amazon S3

Athena

BigQuery

Databricks

Dremio

Google Cloud Storage (GCS)

MongoDB

MySQL

Netezza

Oracle

Snowflake

PostgreSQL

Redshift

SAP HANA

Microsoft SQL Server

Starburst

Teradata

Integrate Datafold with dbt Core, dbt Cloud, Airflow, or custom orchestrators to streamline your data workflows with automated monitoring, testing, and seamless CI integration.

Orchestrators

Integrate with Orchestrators

Set up Datafold’s integration with dbt Core to automate Data Diffs in your CI pipeline.

dbt Core

Integrate Datafold with dbt Cloud to automate Data Diffs in your CI pipeline, leveraging dbt jobs to detect changes and ensure data quality before merging.

dbt Cloud

Integrate Datafold with your custom orchestration using the Datafold SDK and REST API.

Custom Integrations

Visualize downstream Tableau dependencies and understand how warehouse changes impact your BI layer.

Tableau

Looker

Navigate to Settings > Integrations > Data Apps and add a Hightouch Integration.

INTRODUCTION

DATA DIFFS

CI/CD TESTING

DATA MONITORS

DATA EXPLORER

DATA MIGRATION AUTOMATION

INTEGRATIONS

DEPLOYMENT

SECURITY

SUPPORT

Configuration

Primary Key Inference

Column Remapping

Datafold CI Triggers

Model-specific CI Configs