This feature is currently supported for Databricks, Snowflake, Redshift, and BigQuery.
How it works
1. Create a Data Diff
When you attempt to run a data diff, you will notice that it won’t run without authentication:
2. Authorize the Data Diff
Authorize the data diff by clicking the Authenticate button. This will redirect you to the data warehouse for authentication:
3. The Data Diff is now running

4. View the Data Diff results
The results reflect your permissions within the data warehouse:



5. Sharing Data Diffs
Data diff sharing is a feature that enables you to share data diffs with other users. This is useful in scenarios such as compliance verification, where auditors can access specific data diffs without first requiring permissions to be set up in the data warehouse. Sharing can be accessed via the Actions dropdown on the data diff page:



Configuring OAuth
Navigate to Settings and click on your data connection. Then, click on Advanced settings and under OAuth, set the Client Id and Client Secret fields:
Example: Databricks
To create a new Databricks app connection:- Go to Settings and App connections.
- Click Add connection in the top right of the screen.

- Fill in the required fields:
INFODatafold caches access tokens and using refresh tokens fetches new valid tokens in order to complete the diffs and reduce the number of times users need to authenticate against the data warehouses.One hour is sufficient for the access token.The refresh token will determine the frequency of user reauthentication, whether it’s daily, weekly, or monthly.
3. Click Add to obtain the Client ID and Client Secret

4. Fill in the Client ID and Client Secret fields in Datafold’s Data Connection advanced settings:

5. Click Test and save OAuth
You will be redirected to Databricks to complete authentication. If you are already authenticated, you will be redirected back. This notification signals a successful OAuth configuration:
Additional steps for Databricks
To ensure that users have correct access rights to temporary tables (stored in Dataset for temporary tables provided in the Basic settings for the Databricks connection), follow these steps:- Update the permissions for the Dataset for temporary tables in Databricks.
- Grant these permissions to Datafold users: USE SCHEMA and CREATE TABLE.

Example: Snowflake
To create a new Snowflake app connection:- Go to Snowflake and run this SQL:
CAUTION
PRE_AUTHORIZED_ROLES_LIST
must include all roles allowed to use the current security integration.- By default,
ACCOUNTADMIN
,SECURITYADMIN
, andORGADMIN
are not allowed to be included inPRE_AUTHORIZED_ROLES_LIST
.
INFODatafold caches access tokens and uses refresh tokens to fetch new valid tokens in order to complete the diffs and reduce the number of times users need to authenticate against the data warehouses.
OAUTH_REFRESH_TOKEN_VALIDITY
can be in the range of 3600 (1 hour) to 7776000 (90 days).- To retrieve
OAUTH_CLIENT_ID
andOAUTH_CLIENT_SECRET
, run the following SQL:
Example result:

- Fill in the Client ID and Client Secret fields in Datafold’s Data Connection advanced settings:

- Click Test and save OAuth

Additional steps for Snowflake
To guarantee correct access rights to temporary tables (stored in Dataset for temporary tables provided in the Basic settings for Snowflake connection):- Grant the required privileges on the database and
TEMP
schema for all roles that will be using the OAuth flow. This must be done for all roles that will be utilizing the OAuth flow.
- Revoke
SELECT
privileges for tables in theTEMP
schema for all roles that will be using the OAuth flow (except for theDATAFOLDROLE
role), if they were provided. This action must be performed for all roles utilizing the OAuth flow..
CAUTIONIf one of the roles will have
FUTURE GRANTS
at the database level, this role will also will have FUTURE GRANTS
on the TEMP
schema.Example: Redshift
Redshift does not support OAuth2. To execute data diffs on behalf of a specific user, that user needs to provide their own credentials to Redshift.- Configure permissions on the Redshift side. Grant the necessary access rights to temporary tables (stored in the Dataset for temporary tables provided in the Basic settings for Redshift connection):
- As an Administrator, select the Enabled toggle in Datafold’s Redshift Data Connection Advanced settings:

- As a User, add your Redshift credentials into Datafold. Click on your Datafold username to Edit Profile:



Example: BigQuery
- To create a new Google Cloud OAuth 2.0 Client ID, go to the Google Cloud console, navigate to APIs & Services, then Credentials, and click + CREATE CREDENTIALS:




https://app.datafold.com/api/internal/oauth_dwh/callback
:



- Activate BigQuery OAuth in Datafold by uploading the JSON OAuth credentials in the JSON OAuth keys file section, in Datafold’s BigQuery Data Connection Advanced settings:

Additional steps for BigQuery
- Create a new temporary schema (dataset) for each OAuth user.

datafold_tmp_<username>
as the Dataset ID and set the same region as configured for other datasets. Click CREATE DATASET:

- Configure permissions for
datafold_tmp_<username>
.
datafold_tmp_<username>
schema. This can be done by granting roles like BigQuery Data Editor or BigQuery Data Owner or any custom roles with the required permissions.
Go to Google Cloud console, navigate to BigQuery, select datafold_tmp_<username>
dataset, and click Create dataset → Manage Permissions:


datafold_tmp_<username>
.
- Configure temporary schema in Datafold.
https://app.datafold.com/users/me
. If the user lacks credentials for BigQuery, click on + Add credentials, select BigQuery datasource from the list, and click Create credentials:

accounts.google.com
and then returned to the previous page:

<project>.<datafold_tmp_<username>>
, and click Update:

INFOUsers can update BigQuery credentials only if they have the correct permissions for
<datafold_tmp_<username>
.