Documentation Index
Fetch the complete documentation index at: https://docs.datafold.com/llms.txt
Use this file to discover all available pages before exploring further.
Beta ProductSQL Proxy is currently in beta. Features and APIs may change.
Connectivity Setup
dbt Core
Update your profile to point to SQL Proxy:
# profiles.yml
my_project:
outputs:
prod:
type: databricks
host: sqlproxy.your-company.datafold.com
http_path: /sql/1.0/warehouses/proxy
token: "{{ env_var('PAT_OR_PROXY_TOKEN') }}" # Databricks PAT or proxy token (sqlp_pcp_...)
catalog: your_catalog
schema: your_schema
M2M OAuth not supported in dbt. The Python databricks-sql-connector used by dbt does not support M2M OAuth. To use a Databricks service principal with dbt, register it via the Admin API and use a proxy token.
dbt Cloud
In your dbt Cloud environment settings:
| Setting | Value |
|---|
| Host | sqlproxy.your-company.datafold.com |
| HTTP Path | /sql/1.0/warehouses/proxy |
| Token | Databricks PAT or proxy token (sqlp_pcp_...) |
Test the connection:
Annotations
Add @datafold: directives in SQL comments to control routing.
Warehouse Size
-- @datafold:warehouse_size=L
SELECT * FROM {{ ref('large_source') }}
Jobs Compute
Use jobs compute for long-running transformations. Classic with spot instances is more cost-effective than serverless.
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=4
{{ config(materialized='table') }}
SELECT * FROM {{ ref('source') }}
See Routing Modes for full parameter reference.
Environment-Specific Routing
{%- if target.name == 'prod' -%}
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=8 cluster_policy_id=PROD_POLICY
{%- else -%}
-- @datafold:warehouse_size=S
{%- endif -%}
SELECT * FROM {{ ref('source') }}
Setting Defaults
Create a macro for default annotations:
-- macros/datafold_defaults.sql
{% macro datafold_default() %}
{%- if target.name == 'prod' -%}
-- @datafold:jobs_compute type=classic cluster_policy_id={{ var('datafold_policy_id') }}
{%- endif -%}
{% endmacro %}
Validation Steps
After setup, verify connectivity and routing:
1. Basic Connectivity
2. Simple Query
dbt run --select my_simple_model
3. Jobs Compute (Serverless)
-- models/test/test_jobs_serverless.sql
-- @datafold:jobs_compute
{{ config(materialized='table') }}
SELECT 1 as test_column
4. Jobs Compute (Classic)
-- models/test/test_jobs_classic.sql
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=2
{{ config(materialized='table') }}
SELECT 1 as test_column
5. Verify Tagging
-- models/test/test_jobs_tagged.sql
-- @datafold:jobs_compute type=classic node_type=i3.xlarge workers=2 custom_tags={"test":"validation"}
{{ config(materialized='table') }}
SELECT 1 as test_column
Check Databricks Jobs UI to confirm tags are applied.
6. Job Policy
-- models/test/test_jobs_policy.sql
-- @datafold:jobs_compute type=classic cluster_policy_id=YOUR_POLICY_ID
{{ config(materialized='table') }}
SELECT 1 as test_column
Required Permissions
The Databricks credentials used (either your own with passthrough, or the principal’s with managed auth) need:
| Resource | Permission | Required For |
|---|
| SQL Warehouse | CAN USE | Running queries |
| Jobs | CAN MANAGE RUN | Jobs compute routing |
| Unity Catalog | USE CATALOG, USE SCHEMA, SELECT | Data access |
If routing to multiple warehouses, grant CAN USE on each.