Beta ProductSQL Proxy is currently in beta. Features and APIs may change.
Overview
Jobs compute routes SQL queries to Databricks Jobs instead of SQL warehouses. This is ideal for long-running transformations, DDL operations, and workloads that benefit from dedicated compute resources.
Admin API
Configure global jobs compute settings via the admin API.
Get Configuration
GET /admin/jobs-compute
curl -X GET \
-H "Authorization: Bearer <admin-token>" \
https://sqlproxy.your-company.datafold.com/admin/jobs-compute
Update Configuration (Full Replace)
POST /admin/jobs-compute
curl -X POST \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{
"enabled": true,
"max_wait_ms": 3600000,
"executor_script_path": "/dbfs/datafold/executor.py",
"default_spark_version": "14.3.x-scala2.12",
"default_node_type": "i3.xlarge",
"default_num_workers": 4
}' \
https://sqlproxy.your-company.datafold.com/admin/jobs-compute
Request Body
| Field | Type | Required | Description |
|---|
enabled | boolean | Yes | Enable/disable jobs compute routing |
max_wait_ms | integer | Yes | Maximum wait time for job completion (milliseconds) |
executor_script_path | string | Yes | DBFS path to the executor script |
default_spark_version | string | No | Default Databricks runtime version |
default_node_type | string | No | Default EC2 instance type for workers |
default_num_workers | integer | No | Default number of workers |
Update Configuration (Partial)
PUT /admin/jobs-compute
All fields are optional. Only provided fields are updated.
curl -X PUT \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{"default_num_workers": 8}' \
https://sqlproxy.your-company.datafold.com/admin/jobs-compute
Routing Directives
Use @datafold:jobs_compute in SQL comments to route queries to jobs compute.
-- @datafold:jobs_compute [options]
CREATE TABLE result AS SELECT ...
Compute Types
Serverless (Default)
-- @datafold:jobs_compute
CREATE TABLE result AS SELECT * FROM source
Uses Databricks serverless compute. Simple but charges by DBU-hours.
Classic
-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4
CREATE TABLE result AS SELECT * FROM source
Uses a dedicated cluster. More control over instance types and pricing (spot instances).
Directive Parameters
All parameter names match the official Databricks Jobs API new_cluster schema.
Core Configuration
| Parameter | Type | Description | Example |
|---|
type | string | Compute type: serverless (default) or classic | type=classic |
Node Configuration
| Parameter | Type | Description | Example |
|---|
node_type_id | string | EC2 instance type for workers | node_type_id=i3.xlarge |
driver_node_type_id | string | EC2 instance type for driver (defaults to node_type_id) | driver_node_type_id=i3.2xlarge |
Worker Configuration
| Parameter | Type | Description | Example |
|---|
num_workers | integer | Fixed number of workers | num_workers=4 |
autoscale.min_workers | integer | Minimum workers for autoscaling | autoscale.min_workers=2 |
autoscale.max_workers | integer | Maximum workers for autoscaling | autoscale.max_workers=10 |
Use either num_workers (fixed) or autoscale.min_workers/autoscale.max_workers (autoscaling), not both.
Spark Configuration
| Parameter | Type | Description | Example |
|---|
spark_version | string | Databricks Runtime version | spark_version=14.3.x-scala2.12 |
spark_conf.<key> | string | Spark configuration property | spark_conf.spark.sql.shuffle.partitions=200 |
spark_env_vars.<key> | string | Environment variable | spark_env_vars.MY_VAR=value |
runtime_engine | string | STANDARD or PHOTON | runtime_engine=PHOTON |
Instance Pools
| Parameter | Type | Description | Example |
|---|
instance_pool_id | string | Instance pool for workers | instance_pool_id=0123-456789-pool |
driver_instance_pool_id | string | Instance pool for driver | driver_instance_pool_id=0123-456789-pool |
When using instance pools, node_type_id is not required as the pool defines instance types.
Cluster Policies
| Parameter | Type | Description | Example |
|---|
policy_id | string | Databricks cluster policy ID | policy_id=0123456789ABCDEF |
apply_policy_default_values | boolean | Use policy defaults for omitted fields | apply_policy_default_values=true |
Storage
| Parameter | Type | Description | Example |
|---|
enable_elastic_disk | boolean | Enable autoscaling local storage | enable_elastic_disk=true |
enable_local_disk_encryption | boolean | Encrypt local disks | enable_local_disk_encryption=true |
Security
| Parameter | Type | Description | Example |
|---|
data_security_mode | string | NONE, SINGLE_USER, or USER_ISOLATION | data_security_mode=SINGLE_USER |
single_user_name | string | Username for single-user mode | [email protected] |
| Parameter | Type | Description | Example |
|---|
custom_tags.<key> | string | Resource tag key-value | custom_tags.cost_center=12345 |
Init Scripts
| Parameter | Type | Description | Example |
|---|
init_scripts | string | Comma-separated paths (s3://, dbfs:/, or workspace) | init_scripts=s3://bucket/init.sh |
Docker
| Parameter | Type | Description | Example |
|---|
docker_image.url | string | Custom container image URL | docker_image.url=my-registry/image:tag |
docker_image.basic_auth.username | string | Registry auth username | docker_image.basic_auth.username=user |
docker_image.basic_auth.password | string | Registry auth password | docker_image.basic_auth.password=secret |
AWS Attributes
All AWS-specific parameters use the aws_attributes. prefix.
| Parameter | Type | Description | Example |
|---|
aws_attributes.availability | string | SPOT, ON_DEMAND, or SPOT_WITH_FALLBACK | aws_attributes.availability=SPOT |
aws_attributes.zone_id | string | AWS availability zone | aws_attributes.zone_id=us-west-2a |
aws_attributes.first_on_demand | integer | Number of on-demand nodes before spot | aws_attributes.first_on_demand=1 |
aws_attributes.spot_bid_price_percent | integer | Spot bid as % of on-demand price | aws_attributes.spot_bid_price_percent=100 |
aws_attributes.ebs_volume_type | string | EBS type: GENERAL_PURPOSE_SSD, THROUGHPUT_OPTIMIZED_HDD | aws_attributes.ebs_volume_type=GENERAL_PURPOSE_SSD |
aws_attributes.ebs_volume_count | integer | Number of EBS volumes per node | aws_attributes.ebs_volume_count=1 |
aws_attributes.ebs_volume_size | integer | EBS volume size in GB | aws_attributes.ebs_volume_size=100 |
aws_attributes.ebs_volume_iops | integer | IOPS for gp3/io1/io2 volumes | aws_attributes.ebs_volume_iops=3000 |
aws_attributes.ebs_volume_throughput | integer | Throughput in MB/s for gp3 volumes | aws_attributes.ebs_volume_throughput=125 |
aws_attributes.instance_profile_arn | string | IAM instance profile ARN | aws_attributes.instance_profile_arn=arn:aws:iam::123:instance-profile/role |
Cluster Logging
Configure Spark log delivery using the cluster_log_conf. prefix.
| Parameter | Type | Description | Example |
|---|
cluster_log_conf.destination | string | S3 or DBFS path for logs | cluster_log_conf.destination=s3://bucket/logs |
cluster_log_conf.region | string | AWS region for S3 destination | cluster_log_conf.region=us-west-2 |
cluster_log_conf.enable_encryption | boolean | Enable S3 server-side encryption | cluster_log_conf.enable_encryption=true |
The destination type (S3 vs DBFS) is auto-detected from the path prefix (s3:// or dbfs:/).
Examples
Cost-Optimized Spot Cluster
-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 aws_attributes.availability=SPOT_WITH_FALLBACK aws_attributes.spot_bid_price_percent=100
CREATE TABLE aggregated_metrics AS
SELECT date, SUM(value) as total
FROM raw_metrics
GROUP BY date
Autoscaling Cluster with Policy
-- @datafold:jobs_compute type=classic policy_id=0123456789ABCDEF apply_policy_default_values=true autoscale.min_workers=2 autoscale.max_workers=10
CREATE TABLE large_join AS
SELECT a.*, b.details
FROM table_a a
JOIN table_b b ON a.id = b.id
Photon-Enabled with Custom Spark Config
-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=8 runtime_engine=PHOTON spark_conf.spark.sql.shuffle.partitions=400
CREATE TABLE optimized_result AS
SELECT * FROM complex_transformation
Tagged for Cost Tracking
-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 custom_tags.team=analytics custom_tags.cost_center=12345 custom_tags.environment=production
CREATE TABLE team_report AS SELECT ...
Single-User Security Mode
-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=2 data_security_mode=SINGLE_USER [email protected]
CREATE TABLE secure_output AS SELECT ...
With Cluster Logging
-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 cluster_log_conf.destination=s3://my-bucket/spark-logs cluster_log_conf.region=us-west-2
CREATE TABLE logged_job AS SELECT ...
See Also