> ## Documentation Index
> Fetch the complete documentation index at: https://docs.datafold.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks

> Connect Datafold to Databricks for data diffing, CI/CD testing, lineage, and migration validation. Includes setup instructions and required permissions.

**NOTE**: Datafold needs catalog-level permissions in your Databricks workspace to read and write table data, query system tables, and deploy migration bundles. You will need workspace admin access to create a service principal and grant the required permissions.

**Steps to complete:**

1. [Create a service principal and configure authentication](/integrations/databases/databricks#create-a-service-principal-and-configure-authentication)
2. [Retrieve SQL warehouse connection details](/integrations/databases/databricks#retrieve-sql-warehouse-connection-details)
3. [Grant permissions](/integrations/databases/databricks#grant-permissions)
4. [Configure your data connection in Datafold](/integrations/databases/databricks#configure-in-datafold)

## Create a service principal and configure authentication

Create a dedicated service principal for the Datafold integration. This is the identity Datafold will use to connect to your workspace.

1. Go to **Settings** → **Identity and access** → **Service principals**
2. Click **Add service principal** and give it a name (e.g., `datafold`)
3. Select the service principal, go to the **Secrets** tab, and click **Generate secret**
4. Save the **Client ID** and **Secret** — the secret is only shown once

<Note>
  OAuth secrets are valid for up to 730 days. You can have a maximum of 5 active secrets per service principal. Rotate secrets before expiry to avoid connection interruptions.
</Note>

<Note>
  Datafold also supports Personal Access Tokens as an alternative authentication method. PATs are considered legacy by Databricks — see the [Databricks authentication documentation](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) for details.
</Note>

## Retrieve SQL warehouse connection details

Navigate to **SQL Warehouses** under the **SQL** section in the left sidebar.

Choose the preferred warehouse and copy the following fields from its **Connection Details** tab:

* **Server hostname**
* **HTTP path**

You also need to grant the service principal access to the SQL warehouse:

1. On the warehouse page, click the **Permissions** tab
2. Add the service principal and grant **Can Use** permission

## Grant permissions

Run the following SQL statements to grant Datafold the permissions it needs. Replace `<catalog_name>` and `<service_principal_id>` with your values. Replace `<schema_name>` with the schema where you want to store the DMA bundle volume (e.g., `default`).

<Note>
  The `<service_principal_id>` is the application ID (also called Client ID) of your service principal. In Databricks SQL, service principal identifiers must be enclosed in backticks.
</Note>

```sql theme={null}
-- Catalog access
GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT USE SCHEMA ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT SELECT ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT CREATE TABLE ON CATALOG <catalog_name> TO `<service_principal_id>`;
GRANT MODIFY ON CATALOG <catalog_name> TO `<service_principal_id>`;

-- Temporary schema for Datafold
CREATE SCHEMA IF NOT EXISTS <catalog_name>.datafold_tmp;

-- System tables access
GRANT USE CATALOG ON CATALOG system TO `<service_principal_id>`;
GRANT USE SCHEMA ON CATALOG system TO `<service_principal_id>`;
GRANT SELECT ON CATALOG system TO `<service_principal_id>`;

-- UC Volume for DMA bundle deployment
CREATE VOLUME IF NOT EXISTS <catalog_name>.<schema_name>.datafold_bundles;
GRANT READ VOLUME ON VOLUME <catalog_name>.<schema_name>.datafold_bundles TO `<service_principal_id>`;
GRANT WRITE VOLUME ON VOLUME <catalog_name>.<schema_name>.datafold_bundles TO `<service_principal_id>`;
```

## Configure in Datafold

Select **M2M OAuth / Service Principal (Recommended)** as the authentication method and fill in the following fields:

| Field                            | Description                                                                  |
| -------------------------------- | ---------------------------------------------------------------------------- |
| Connection name                  | A name for this data connection within Datafold                              |
| Host                             | The Server hostname from the warehouse Connection Details tab                |
| HTTP path                        | The HTTP path from the warehouse Connection Details tab                      |
| Authentication method            | Select **M2M OAuth / Service Principal (Recommended)**                       |
| Client ID                        | The Client ID of the service principal                                       |
| Client Secret                    | The secret generated in the authentication step                              |
| Catalog                          | The default catalog name (e.g., `hive_metastore` or your Unity Catalog name) |
| Schema path for temporary tables | The temp schema as `<catalog_name>.datafold_tmp` (e.g., `demo.datafold_tmp`) |

Click **Create**. Your data connection is ready!

## Validate your setup

Run these queries to verify that permissions are configured correctly:

```sql theme={null}
-- Check all grants for the service principal
SHOW GRANTS TO `<service_principal_id>`;
```

```sql theme={null}
-- Verify warehouse access (run from the SQL warehouse the service principal will use)
SELECT 1;
```

```sql theme={null}
-- Verify temp schema access
SHOW TABLES IN <catalog_name>.datafold_tmp;
```

```sql theme={null}
-- Verify system table access (should return a row)
SELECT * FROM system.query.history LIMIT 1;
```

```sql theme={null}
-- Verify volume access
LIST '/Volumes/<catalog_name>/<schema_name>/datafold_bundles';
```
