Skip to main content

OAuth support documentation

To execute data diffs on behalf of your users, you can configure OAuth App Connections in your supported data warehouses.

OAuth support empowers users to run data diffs based on their individual permissions and roles configured within the data warehouses. This ensures that data access is governed by existing security policies and protocols.

How it works

1. Create a Data Diff.

When you attempt to run a data diff, you will notice that it won't run without authentication:

2. Authorize the Data Diff.

Authorize the data diff by clicking the Authenticate button. This will redirect you to the data warehouse for authentication:

Upon successful authentication, you will be redirected back.

3. The Data Diff is now running 🎉

4. View the Data Diff results.

The results reflect your permissions within the data warehouse:

Note that running the same data diff, as a different user, renders different results:

info

The masked values represent the data retrieved from the data warehouse. We do not conduct any post-processing:

Data diff results are only visible for their authors; however, they can be cloned for sharing the results with other users.

For example, as a DevOps user, I won't be able to access the data diff results for Filip's data diff:

Configuring OAuth

Navigate to Settings and click on your data source. Then, click on Advanced settings and under OAuth, set the Client Id and Client Secret fields:

Example: Databricks

To create a new Databricks app connection:

  1. Go to Settings and App connections.

  2. Click Add connection in the top right of the screen.

  1. Fill in the required fields:

Application Name:

Datafold OAuth connection

Redirect URLs:

https://app.datafold.com/api/internal/oauth_dwh/callback
info

Datafold caches access tokens and using refresh tokens fetches new valid tokens in order to complete the diffs and reduce the number of times users need to authenticate against the data warehouses.

One hour is sufficient for the access token.

The refresh token will determine the frequency of user reauthentication, whether it's daily, weekly, or monthly.

  1. Click Add to obtain the Client ID and Client Secret 🙌

  1. Fill in the Client ID and Client Secret fields in Datafold's Data Source advanced settings:

  1. Click Test and save OAuth

You will be redirected to Databricks to complete authentication. If you are already authenticated, you will be redirected back.

This notification signals a successful OAuth configuration:

Example: Snowflake

To create a new Snowflake app connection:

  1. Go to Snowflake and run this SQL:
CREATE SECURITY INTEGRATION DATAFOLD_OAUTH
TYPE = OAUTH
ENABLED = TRUE
OAUTH_CLIENT = CUSTOM
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = 'https://app.datafold.com/api/internal/oauth_dwh/callback'
PRE_AUTHORIZED_ROLES_LIST=(<ROLENAME1>, <ROLENAME2>, ...)
OAUTH_ISSUE_REFRESH_TOKENS = TRUE
OAUTH_REFRESH_TOKEN_VALIDITY = 604800
OAUTH_ENFORCE_PKCE=TRUE;

It should result in this message:

caution
  • PRE_AUTHORIZED_ROLES_LIST must include all roles allowed to use the current security integration.

  • By default, ACCOUNTADMIN, SECURITYADMIN, and ORGADMIN are not allowed to be included in PRE_AUTHORIZED_ROLES_LIST.

info

Datafold caches access tokens and uses refresh tokens to fetch new valid tokens in order to complete the diffs and reduce the number of times users need to authenticate against the data warehouses.

OAUTH_REFRESH_TOKEN_VALIDITY can be in the range of 3600 (1 hour) to 7776000 (90 days).

  1. To retrieve OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET, run the following SQL:
select system$show_oauth_client_secrets('DATAFOLD_OAUTH');

Example result:

  1. Fill in the Client ID and Client Secret fields in Datafold's Data Source advanced settings:

  1. Click Test and save OAuth

You will be redirected to Snowflake to complete authentication.

info

Your default Snowflake role will be used for the generated access token.

This notification signals a successful OAuth configuration:

Additional steps for Snowflake

To guarantee correct access rights to temporary tables (stored in Dataset for temporary tables provided in the Basic settings for Snowflake connection):

  • Grant the required privileges on the database and TEMP schema for all roles that will be using the OAuth flow. This must be done for all roles that will be utilizing the OAuth flow.

    GRANT USAGE ON WAREHOUSE <WH_NAME> TO ROLE <ROLENAME>;
    GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <ROLENAME>;
    GRANT USAGE ON ALL SCHEMAS IN DATABASE <DB_NAME> TO ROLE <ROLENAME>;
    GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <DB_NAME> TO ROLE <ROLENAME>;

    GRANT ALL ON SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> TO ROLE <ROLENAME>;
  • Revoke SELECT privileges for tables in the TEMP schema for all roles that will be using the OAuth flow (except for the DATAFOLDROLE role), if they were provided. This action must be performed for all roles utilizing the OAuth flow..

    -- Revoke SELECT privileges for the TEMP SCHEMA
    revoke SELECT ON ALL TABLES IN SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> FROM ROLE <ROLENAME>;
    revoke SELECT ON FUTURE TABLES IN SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> FROM ROLE <ROLENAME>;

    revoke SELECT ON ALL VIEWS IN SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> FROM ROLE <ROLENAME>;
    revoke SELECT ON FUTURE VIEWS IN SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> FROM ROLE <ROLENAME>;

    revoke SELECT ON ALL MATERIALIZED VIEWS IN SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> FROM ROLE <ROLENAME>;
    revoke SELECT ON FUTURE MATERIALIZED VIEWS IN SCHEMA <DB_NAME>.<TEMP_SCHEMA_NAME> FROM ROLE <ROLENAME>;

    -- Revoke SELECT privileges for a Database
    revoke SELECT ON ALL TABLES IN DATABASE <DB_NAME> FROM ROLE <ROLENAME>;
    revoke SELECT ON FUTURE TABLES IN DATABASE <DB_NAME> FROM ROLE <ROLENAME>;

    revoke SELECT ON ALL VIEWS IN DATABASE <DB_NAME> FROM ROLE <ROLENAME>;
    revoke SELECT ON FUTURE VIEWS IN DATABASE <DB_NAME> FROM ROLE <ROLENAME>;

    revoke SELECT ON ALL MATERIALIZED VIEWS IN DATABASE <DB_NAME> FROM ROLE <ROLENAME>;
    revoke SELECT ON FUTURE MATERIALIZED VIEWS IN DATABASE <DB_NAME> FROM ROLE <ROLENAME>;

caution

If one of the roles will have FUTURE GRANTS at the database level, this role will also will have FUTURE GRANTS on the TEMP schema.