Snowflake
NOTE: Datafold needs permissions in your Snowflake dataset to read your table data. You will need to be a Snowflake Admin in order to grant the required permissions.
Steps to complete:
- Create a user and role for Datafold
- Setup password-based or Use key-pair authentication
- Create a temporary schema
- Give the Datafold role access to your warehouse
- Configure your data connection in Datafold
Create a user and role for Datafold
A full script can be found at the bottom of this page.
It is best practice to create a separate role for the Datafold integration (e.g., DATAFOLDROLE
):
CREATE ROLE DATAFOLDROLE;
CREATE USER DATAFOLD DEFAULT_ROLE = "DATAFOLDROLE" MUST_CHANGE_PASSWORD = FALSE;
GRANT ROLE DATAFOLDROLE TO USER DATAFOLD;
To provide column-level lineage, Datafold needs to read & parse all SQL statements executed in your Snowflake account:
GRANT MONITOR EXECUTION ON ACCOUNT TO ROLE DATAFOLDROLE;
GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE DATAFOLDROLE;
Set up password-based authentication
Datafold supports username/password authentication, but also key-pair authentication.
ALTER USER DATAFOLD SET PASSWORD = 'SomethingSecret';
You can set the username/password in the Datafold web UI.
Use key-pair authentication
If you want to use key-pair authentication, please follow the steps of Snowflake. The public key will be set to the Snowflake user:
ALTER USER DATAFOLD SET rsa_public_key='abc..'
The private key needs to be uploaded to Datafold, and the optional passphrase of the private-key can be set to the user.
Create schema for Datafold
Datafold utilizes a temporary dataset to materialize scratch work and keep data processing in the your warehouse.
CREATE SCHEMA <database_name>.DATAFOLD_TMP;
GRANT ALL ON SCHEMA <database_name>.DATAFOLD_TMP TO DATAFOLDROLE;
Give the Datafold role access
Datafold will only scan the tables that it has access to. The snippet below will give Datafold read access to a database. If you have more than one database that you want to use in Datafold, rerun the script below for each one.
/* Repeat for every DATABASE to be usable in Datafold. This allows Datafold to
correctly discover, profile & diff each table */
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE DATAFOLDROLE;
GRANT USAGE ON DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL TABLES IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE TABLES IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL MATERIALIZED VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE MATERIALIZED VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT ALL PRIVILEGES ON ALL DYNAMIC TABLES IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE DYNAMIC TABLES IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
Full Script
--Step 1: Create a user and role for Datafold
CREATE ROLE DATAFOLDROLE;
CREATE USER DATAFOLD DEFAULT_ROLE = "DATAFOLDROLE" MUST_CHANGE_PASSWORD = FALSE;
GRANT ROLE DATAFOLDROLE TO USER DATAFOLD;
GRANT MONITOR EXECUTION ON ACCOUNT TO ROLE DATAFOLDROLE;
GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE DATAFOLDROLE;
--Step 2a: Use password-based authentication
ALTER USER DATAFOLD SET PASSWORD = 'SomethingSecret';
--OR
--Step 2b: Use key-pair authentication
--ALTER USER DATAFOLD SET rsa_public_key='abc..'
--Step 3: Create schema for Datafold
CREATE SCHEMA <database_name>.DATAFOLD_TMP;
GRANT ALL ON SCHEMA <database_name>.DATAFOLD_TMP TO DATAFOLDROLE;
--Step 4: Give the Datafold role access to your data connection
/*
Repeat for every DATABASE to be usable in Datafold. This allows Datafold to
correctly discover, profile & diff each table
*/
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE DATAFOLDROLE;
GRANT USAGE ON DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL TABLES IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE TABLES IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL MATERIALIZED VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE MATERIALIZED VIEWS IN DATABASE <database_name> TO ROLE DATAFOLDROLE;
Validate Snowflake Grants for Datafold
Run these queries to validate that the grants have been set up correctly:
Note: More results may be returned than shown in the screenshots below if you have granted access to multiple roles/users
Example Placeholders:
<database_name>
=DEV
<warehouse_name>
=DEMO
-- Validate database usage for the DATAFOLDROLE
SHOW GRANTS ON DATABASE <database_name>;
-- Validate warehouse usage for the DATAFOLDROLE
SHOW GRANTS ON WAREHOUSE <warehouse_name>;
-- Validate schema permissions for the DATAFOLDROLE
SHOW GRANTS ON SCHEMA <database_name>.DATAFOLD_TMP;
A note on future grants
The above database grants will be insufficient if any future grants have been defined at the schema level, because schema-level grants will override database-level grants. In that case, you will need to execute future grants for every existing schema that Datafold will operate on.
GRANT SELECT ON FUTURE TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA <database_name>.<schema_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON FUTURE MATERIALIZED VIEWS IN SCHEMA <database_name>.<schema_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA <database_name>.<schema_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL VIEWS IN SCHEMA <database_name>.<schema_name> TO ROLE DATAFOLDROLE;
GRANT SELECT ON ALL MATERIALIZED VIEWS IN SCHEMA <database_name>.<schema_name> TO ROLE DATAFOLDROLE;
Configure in Datafold
Field Name | Description |
---|---|
Name | A name given to the data connection within Datafold |
Account identifier | The Org name-Account name pair for your Snowflake account. This can be found in the browser address string. It may look like https://orgname-accountname.snowflakecomputing.com or https://app.snowflake.com/orgname/accountname. In the setup form, enter <orgname>-<accountname>. |
User | The username set in the Setup password-based authentication section |
Password | The password set in the Setup password-based authentication section |
Key Pair file | The key file generated in the Use key-pair authentication section |
Warehouse | The Snowflake warehouse name |
Schema for temporary tables | The schema name you created with our script (<database_name>.DATAFOLD_TMP) |
Role | The role you created for Datafold (Typically DATAFOLDROLE) |
Default DB | A database the role above can access. If more than one database was added, whichever you prefer to be the default |
Note: Please review the documentation for the account name. Datafold uses Format 1 (Preferred): https://docs.snowflake.com/en/user-guide/admin-account-identifier#using-an-account-locator-as-an-identifier
Click Create. Your data connection is ready!