Skip to main content
NOTE: Datafold needs catalog-level permissions in your Databricks workspace to read and write table data, query system tables, and deploy migration bundles. You will need workspace admin access to create a service principal and grant the required permissions. Steps to complete:
  1. Create a service principal and configure authentication
  2. Retrieve SQL warehouse connection details
  3. Grant permissions
  4. Configure your data connection in Datafold

Create a service principal and configure authentication

Create a dedicated service principal for the Datafold integration. This is the identity Datafold will use to connect to your workspace.
  1. Go to SettingsIdentity and accessService principals
  2. Click Add service principal and give it a name (e.g., datafold)
  3. Select the service principal, go to the Secrets tab, and click Generate secret
  4. Save the Client ID and Secret — the secret is only shown once
OAuth secrets are valid for up to 730 days. You can have a maximum of 5 active secrets per service principal. Rotate secrets before expiry to avoid connection interruptions.
Datafold also supports Personal Access Tokens as an alternative authentication method. PATs are considered legacy by Databricks — see the Databricks authentication documentation for details.

Retrieve SQL warehouse connection details

Navigate to SQL Warehouses under the SQL section in the left sidebar. Choose the preferred warehouse and copy the following fields from its Connection Details tab:
  • Server hostname
  • HTTP path
You also need to grant the service principal access to the SQL warehouse:
  1. On the warehouse page, click the Permissions tab
  2. Add the service principal and grant Can Use permission

Grant permissions

Run the following SQL statements to grant Datafold the permissions it needs. Replace <catalog_name> and <service_principal_id> with your values. Replace <schema_name> with the schema where you want to store the DMA bundle volume (e.g., default).
The <service_principal_id> is the application ID (also called Client ID) of your service principal. In Databricks SQL, service principal identifiers must be enclosed in backticks.
-- Catalog access
GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT USE SCHEMA ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT CREATE TABLE ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT MODIFY ON CATALOG <catalog_name> TO `<service_principal_id>`;

-- Temporary schema for Datafold
CREATE SCHEMA IF NOT EXISTS <catalog_name>.datafold_tmp;

-- System tables access
GRANT USE CATALOG ON CATALOG system TO `<service_principal_id>`;
GRANT USE SCHEMA ON CATALOG system TO `<service_principal_id>`;
GRANT SELECT ON CATALOG system TO `<service_principal_id>`;

-- UC Volume for DMA bundle deployment
CREATE VOLUME IF NOT EXISTS <catalog_name>.<schema_name>.datafold_bundles;
GRANT READ VOLUME ON VOLUME <catalog_name>.<schema_name>.datafold_bundles TO `<service_principal_id>`;
GRANT WRITE VOLUME ON VOLUME <catalog_name>.<schema_name>.datafold_bundles TO `<service_principal_id>`;

Configure in Datafold

Select M2M OAuth / Service Principal (Recommended) as the authentication method and fill in the following fields:
FieldDescription
Connection nameA name for this data connection within Datafold
HostThe Server hostname from the warehouse Connection Details tab
HTTP pathThe HTTP path from the warehouse Connection Details tab
Authentication methodSelect M2M OAuth / Service Principal (Recommended)
Client IDThe Client ID of the service principal
Client SecretThe secret generated in the authentication step
CatalogThe default catalog name (e.g., hive_metastore or your Unity Catalog name)
Schema path for temporary tablesThe temp schema as <catalog_name>.datafold_tmp (e.g., demo.datafold_tmp)
Click Create. Your data connection is ready!

Validate your setup

Run these queries to verify that permissions are configured correctly:
-- Check all grants for the service principal
SHOW GRANTS TO `<service_principal_id>`;
-- Verify warehouse access (run from the SQL warehouse the service principal will use)
SELECT 1;
-- Verify temp schema access
SHOW TABLES IN <catalog_name>.datafold_tmp;
-- Verify system table access (should return a row)
SELECT * FROM system.query.history LIMIT 1;
-- Verify volume access
LIST '/Volumes/<catalog_name>/<schema_name>/datafold_bundles';