OAuth Support
Set up OAuth App Connections in your supported data warehouses to securely execute data diffs on behalf of your users.
OAuth support empowers users to run data diffs based on their individual permissions and roles configured within the data warehouses. This ensures that data access is governed by existing security policies and protocols.
How it works
1. Create a Data Diff
When you attempt to run a data diff, you will notice that it won’t run without authentication:
2. Authorize the Data Diff
Authorize the data diff by clicking the Authenticate button. This will redirect you to the data warehouse for authentication:
Upon successful authentication, you will be redirected back.
3. The Data Diff is now running
4. View the Data Diff results
The results reflect your permissions within the data warehouse:
Note that running the same data diff, as a different user, renders different results:
The masked values represent the data retrieved from the data warehouse. We do not conduct any post-processing:
By default, results are only visible for their authors. Users can still clone the data diffs, but the results might be different depending on their data warehouse access levels.
For example, as a different user, I won’t be able to access the data diff results for Filip’s data diff:
5. Sharing Data Diffs
Data diff sharing is a feature that enables you to share data diffs with other users. This is useful in scenarios such as compliance verification, where auditors can access specific data diffs without first requiring permissions to be set up in the data warehouse.
Sharing can be accessed via the Actions dropdown on the data diff page:
Note that data diff sharing is disabled by default:
It can be enabled under Org Settings by clicking on Allow Data Diff sharing:
Once enabled, you can share data diffs with other users:
Configuring OAuth
Navigate to Settings and click on your data connection. Then, click on Advanced settings and under OAuth, set the Client Id and Client Secret fields:
Example: Databricks
To create a new Databricks app connection:
- Go to Settings and App connections.
- Click Add connection in the top right of the screen.
- Fill in the required fields:
Application Name:
Redirect URLs:
INFO
Datafold caches access tokens and using refresh tokens fetches new valid tokens in order to complete the diffs and reduce the number of times users need to authenticate against the data warehouses.
One hour is sufficient for the access token.
The refresh token will determine the frequency of user reauthentication, whether it’s daily, weekly, or monthly.
3. Click Add to obtain the Client ID and Client Secret
4. Fill in the Client ID and Client Secret fields in Datafold’s Data Connection advanced settings:
5. Click Test and save OAuth
You will be redirected to Databricks to complete authentication. If you are already authenticated, you will be redirected back.
This notification signals a successful OAuth configuration:
Additional steps for Databricks
To ensure that users have correct access rights to temporary tables (stored in Dataset for temporary tables provided in the Basic settings for the Databricks connection), follow these steps:
- Update the permissions for the Dataset for temporary tables in Databricks.
- Grant these permissions to Datafold users: USE SCHEMA and CREATE TABLE.
This will ensure that materialization results from data diffs are only readable by their authors.
Example: Snowflake
To create a new Snowflake app connection:
- Go to Snowflake and run this SQL:
It should result in this message:
CAUTION
PRE_AUTHORIZED_ROLES_LIST
must include all roles allowed to use the current security integration.- By default,
ACCOUNTADMIN
,SECURITYADMIN
, andORGADMIN
are not allowed to be included inPRE_AUTHORIZED_ROLES_LIST
.
INFO
Datafold caches access tokens and uses refresh tokens to fetch new valid tokens in order to complete the diffs and reduce the number of times users need to authenticate against the data warehouses.
OAUTH_REFRESH_TOKEN_VALIDITY
can be in the range of 3600 (1 hour) to 7776000 (90 days).
- To retrieve
OAUTH_CLIENT_ID
andOAUTH_CLIENT_SECRET
, run the following SQL:
Example result:
- Fill in the Client ID and Client Secret fields in Datafold’s Data Connection advanced settings:
- Click Test and save OAuth
You will be redirected to Snowflake to complete authentication.
info
Your default Snowflake role will be used for the generated access token.
This notification signals a successful OAuth configuration:
Additional steps for Snowflake
To guarantee correct access rights to temporary tables (stored in Dataset for temporary tables provided in the Basic settings for Snowflake connection):
- Grant the required privileges on the database and
TEMP
schema for all roles that will be using the OAuth flow. This must be done for all roles that will be utilizing the OAuth flow.
- Revoke
SELECT
privileges for tables in theTEMP
schema for all roles that will be using the OAuth flow (except for theDATAFOLDROLE
role), if they were provided. This action must be performed for all roles utilizing the OAuth flow..
CAUTION
If one of the roles will have FUTURE GRANTS
at the database level, this role will also will have FUTURE GRANTS
on the TEMP
schema.
Example: Redshift
Redshift does not support OAuth2. To execute data diffs on behalf of a specific user, that user needs to provide their own credentials to Redshift.
- Configure permissions on the Redshift side. Grant the necessary access rights to temporary tables (stored in the Dataset for temporary tables provided in the Basic settings for Redshift connection):
- As an Administrator, select the Enabled toggle in Datafold’s Redshift Data Connection Advanced settings:
Then, click the Test and Save button.
- As a User, add your Redshift credentials into Datafold. Click on your Datafold username to Edit Profile:
Then, click Add credentials and select the required Redshift data connection from the Data Connections list:
Finally, provide your Redshift username and password, and configure the Delete on field (after this date, your credentials will be removed from Datafold):
Click Create credentials.
Example: BigQuery
- To create a new Google Cloud OAuth 2.0 Client ID, go to the Google Cloud console, navigate to APIs & Services, then Credentials, and click + CREATE CREDENTIALS:
Select OAuth client ID:
From the list of Application type, select Web application:
Provide a name in the Name field:
In Authorized redirect URIs, provide https://app.datafold.com/api/internal/oauth_dwh/callback
:
Click CREATE. Then, download the OAuth Client credentials as a JSON file:
- Activate BigQuery OAuth in Datafold by uploading the JSON OAuth credentials in the JSON OAuth keys file section, in Datafold’s BigQuery Data Connection Advanced settings:
Click Test and Save.
Additional steps for BigQuery
- Create a new temporary schema (dataset) for each OAuth user.
Go to Google Cloud console, navigate to BigQuery, select your project in BigQuery, and click on Create dataset:
Provide datafold_tmp_<username>
as the Dataset ID and set the same region as configured for other datasets. Click CREATE DATASET:
- Configure permissions for
datafold_tmp_<username>
.
Grant read/write/create/delete permissions to the user for their datafold_tmp_<username>
schema. This can be done by granting roles like BigQuery Data Editor or BigQuery Data Owner or any custom roles with the required permissions.
Go to Google Cloud console, navigate to BigQuery, select datafold_tmp_<username>
dataset, and click Create dataset → Manage Permissions:
Click + ADD PRINCIPAL, specify the user and role, then click SAVE:
caution
Ensure that only the specified user (excluding admins) has read/write/create/delete permissions on datafold_tmp_<username>
.
- Configure temporary schema in Datafold.
As a user, navigate to https://app.datafold.com/users/me
. If the user lacks credentials for BigQuery, click on + Add credentials, select BigQuery datasource from the list, and click Create credentials:
The user will be redirected to accounts.google.com
and then returned to the previous page:
Select BigQuery credentials from the list, input the Temporary Schema field in the format <project>.<datafold_tmp_<username>>
, and click Update:
INFO
Users can update BigQuery credentials only if they have the correct permissions for <datafold_tmp_<username>
.