Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datafold.com/llms.txt

Use this file to discover all available pages before exploring further.

Beta ProductSQL Proxy is currently in beta. Features and APIs may change.

Connectivity Setup

dbt Core

Update your profile to point to SQL Proxy:
# profiles.yml
my_project:
  outputs:
    prod:
      type: databricks
      host: sqlproxy.your-company.datafold.com
      http_path: /sql/1.0/warehouses/proxy
      token: "{{ env_var('PAT_OR_PROXY_TOKEN') }}"  # Databricks PAT or proxy token (sqlp_pcp_...)
      catalog: your_catalog
      schema: your_schema
M2M OAuth not supported in dbt. The Python databricks-sql-connector used by dbt does not support M2M OAuth. To use a Databricks service principal with dbt, register it via the Admin API and use a proxy token.

dbt Cloud

In your dbt Cloud environment settings:
SettingValue
Hostsqlproxy.your-company.datafold.com
HTTP Path/sql/1.0/warehouses/proxy
TokenDatabricks PAT or proxy token (sqlp_pcp_...)
Test the connection:
dbt debug

Annotations

Add @datafold: directives in SQL comments to control routing.

Warehouse Size

-- @datafold:warehouse_size=L
SELECT * FROM {{ ref('large_source') }}

Jobs Compute

Use jobs compute for long-running transformations. Classic with spot instances is more cost-effective than serverless.
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=4
{{ config(materialized='table') }}
SELECT * FROM {{ ref('source') }}
See Routing Modes for full parameter reference.

Environment-Specific Routing

{%- if target.name == 'prod' -%}
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=8 cluster_policy_id=PROD_POLICY
{%- else -%}
-- @datafold:warehouse_size=S
{%- endif -%}

SELECT * FROM {{ ref('source') }}

Setting Defaults

Create a macro for default annotations:
-- macros/datafold_defaults.sql
{% macro datafold_default() %}
  {%- if target.name == 'prod' -%}
    -- @datafold:jobs_compute type=classic cluster_policy_id={{ var('datafold_policy_id') }}
  {%- endif -%}
{% endmacro %}

Validation Steps

After setup, verify connectivity and routing:

1. Basic Connectivity

dbt debug

2. Simple Query

dbt run --select my_simple_model

3. Jobs Compute (Serverless)

-- models/test/test_jobs_serverless.sql
-- @datafold:jobs_compute
{{ config(materialized='table') }}
SELECT 1 as test_column

4. Jobs Compute (Classic)

-- models/test/test_jobs_classic.sql
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=2
{{ config(materialized='table') }}
SELECT 1 as test_column

5. Verify Tagging

-- models/test/test_jobs_tagged.sql
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=2 custom_tags={"test":"validation"}
{{ config(materialized='table') }}
SELECT 1 as test_column
Check Databricks Jobs UI to confirm tags are applied.

6. Job Policy

-- models/test/test_jobs_policy.sql
-- @datafold:jobs_compute type=classic cluster_policy_id=YOUR_POLICY_ID
{{ config(materialized='table') }}
SELECT 1 as test_column

Required Permissions

The Databricks credentials used (either your own with passthrough, or the principal’s with managed auth) need:
ResourcePermissionRequired For
SQL WarehouseCAN USERunning queries
JobsCAN MANAGE RUNJobs compute routing
Unity CatalogUSE CATALOG, USE SCHEMA, SELECTData access
If routing to multiple warehouses, grant CAN USE on each.