Skip to main content
Beta ProductSQL Proxy is currently in beta. Features and APIs may change.

Overview

Jobs compute routes SQL queries to Databricks Jobs instead of SQL warehouses. This is ideal for long-running transformations, DDL operations, and workloads that benefit from dedicated compute resources.

Admin API

Configure global jobs compute settings via the admin API.

Get Configuration

GET /admin/jobs-compute
curl -X GET \
  -H "Authorization: Bearer <admin-token>" \
  https://sqlproxy.your-company.datafold.com/admin/jobs-compute

Update Configuration (Full Replace)

POST /admin/jobs-compute
curl -X POST \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "max_wait_ms": 3600000,
    "executor_script_path": "/dbfs/datafold/executor.py",
    "default_spark_version": "14.3.x-scala2.12",
    "default_node_type": "i3.xlarge",
    "default_num_workers": 4
  }' \
  https://sqlproxy.your-company.datafold.com/admin/jobs-compute

Request Body

FieldTypeRequiredDescription
enabledbooleanYesEnable/disable jobs compute routing
max_wait_msintegerYesMaximum wait time for job completion (milliseconds)
executor_script_pathstringYesDBFS path to the executor script
default_spark_versionstringNoDefault Databricks runtime version
default_node_typestringNoDefault EC2 instance type for workers
default_num_workersintegerNoDefault number of workers

Update Configuration (Partial)

PUT /admin/jobs-compute All fields are optional. Only provided fields are updated.
curl -X PUT \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"default_num_workers": 8}' \
  https://sqlproxy.your-company.datafold.com/admin/jobs-compute

Routing Directives

Use @datafold:jobs_compute in SQL comments to route queries to jobs compute.
-- @datafold:jobs_compute [options]
CREATE TABLE result AS SELECT ...

Compute Types

Serverless (Default)

-- @datafold:jobs_compute
CREATE TABLE result AS SELECT * FROM source
Uses Databricks serverless compute. Simple but charges by DBU-hours.

Classic

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4
CREATE TABLE result AS SELECT * FROM source
Uses a dedicated cluster. More control over instance types and pricing (spot instances).

Directive Parameters

All parameter names match the official Databricks Jobs API new_cluster schema.

Core Configuration

ParameterTypeDescriptionExample
typestringCompute type: serverless (default) or classictype=classic

Node Configuration

ParameterTypeDescriptionExample
node_type_idstringEC2 instance type for workersnode_type_id=i3.xlarge
driver_node_type_idstringEC2 instance type for driver (defaults to node_type_id)driver_node_type_id=i3.2xlarge

Worker Configuration

ParameterTypeDescriptionExample
num_workersintegerFixed number of workersnum_workers=4
autoscale.min_workersintegerMinimum workers for autoscalingautoscale.min_workers=2
autoscale.max_workersintegerMaximum workers for autoscalingautoscale.max_workers=10
Use either num_workers (fixed) or autoscale.min_workers/autoscale.max_workers (autoscaling), not both.

Spark Configuration

ParameterTypeDescriptionExample
spark_versionstringDatabricks Runtime versionspark_version=14.3.x-scala2.12
spark_conf.<key>stringSpark configuration propertyspark_conf.spark.sql.shuffle.partitions=200
spark_env_vars.<key>stringEnvironment variablespark_env_vars.MY_VAR=value
runtime_enginestringSTANDARD or PHOTONruntime_engine=PHOTON

Instance Pools

ParameterTypeDescriptionExample
instance_pool_idstringInstance pool for workersinstance_pool_id=0123-456789-pool
driver_instance_pool_idstringInstance pool for driverdriver_instance_pool_id=0123-456789-pool
When using instance pools, node_type_id is not required as the pool defines instance types.

Cluster Policies

ParameterTypeDescriptionExample
policy_idstringDatabricks cluster policy IDpolicy_id=0123456789ABCDEF
apply_policy_default_valuesbooleanUse policy defaults for omitted fieldsapply_policy_default_values=true

Storage

ParameterTypeDescriptionExample
enable_elastic_diskbooleanEnable autoscaling local storageenable_elastic_disk=true
enable_local_disk_encryptionbooleanEncrypt local disksenable_local_disk_encryption=true

Security

ParameterTypeDescriptionExample
data_security_modestringNONE, SINGLE_USER, or USER_ISOLATIONdata_security_mode=SINGLE_USER
single_user_namestringUsername for single-user mode[email protected]

Custom Tags

ParameterTypeDescriptionExample
custom_tags.<key>stringResource tag key-valuecustom_tags.cost_center=12345

Init Scripts

ParameterTypeDescriptionExample
init_scriptsstringComma-separated paths (s3://, dbfs:/, or workspace)init_scripts=s3://bucket/init.sh

Docker

ParameterTypeDescriptionExample
docker_image.urlstringCustom container image URLdocker_image.url=my-registry/image:tag
docker_image.basic_auth.usernamestringRegistry auth usernamedocker_image.basic_auth.username=user
docker_image.basic_auth.passwordstringRegistry auth passworddocker_image.basic_auth.password=secret

AWS Attributes

All AWS-specific parameters use the aws_attributes. prefix.
ParameterTypeDescriptionExample
aws_attributes.availabilitystringSPOT, ON_DEMAND, or SPOT_WITH_FALLBACKaws_attributes.availability=SPOT
aws_attributes.zone_idstringAWS availability zoneaws_attributes.zone_id=us-west-2a
aws_attributes.first_on_demandintegerNumber of on-demand nodes before spotaws_attributes.first_on_demand=1
aws_attributes.spot_bid_price_percentintegerSpot bid as % of on-demand priceaws_attributes.spot_bid_price_percent=100
aws_attributes.ebs_volume_typestringEBS type: GENERAL_PURPOSE_SSD, THROUGHPUT_OPTIMIZED_HDDaws_attributes.ebs_volume_type=GENERAL_PURPOSE_SSD
aws_attributes.ebs_volume_countintegerNumber of EBS volumes per nodeaws_attributes.ebs_volume_count=1
aws_attributes.ebs_volume_sizeintegerEBS volume size in GBaws_attributes.ebs_volume_size=100
aws_attributes.ebs_volume_iopsintegerIOPS for gp3/io1/io2 volumesaws_attributes.ebs_volume_iops=3000
aws_attributes.ebs_volume_throughputintegerThroughput in MB/s for gp3 volumesaws_attributes.ebs_volume_throughput=125
aws_attributes.instance_profile_arnstringIAM instance profile ARNaws_attributes.instance_profile_arn=arn:aws:iam::123:instance-profile/role

Cluster Logging

Configure Spark log delivery using the cluster_log_conf. prefix.
ParameterTypeDescriptionExample
cluster_log_conf.destinationstringS3 or DBFS path for logscluster_log_conf.destination=s3://bucket/logs
cluster_log_conf.regionstringAWS region for S3 destinationcluster_log_conf.region=us-west-2
cluster_log_conf.enable_encryptionbooleanEnable S3 server-side encryptioncluster_log_conf.enable_encryption=true
The destination type (S3 vs DBFS) is auto-detected from the path prefix (s3:// or dbfs:/).

Examples

Cost-Optimized Spot Cluster

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 aws_attributes.availability=SPOT_WITH_FALLBACK aws_attributes.spot_bid_price_percent=100
CREATE TABLE aggregated_metrics AS
SELECT date, SUM(value) as total
FROM raw_metrics
GROUP BY date

Autoscaling Cluster with Policy

-- @datafold:jobs_compute type=classic policy_id=0123456789ABCDEF apply_policy_default_values=true autoscale.min_workers=2 autoscale.max_workers=10
CREATE TABLE large_join AS
SELECT a.*, b.details
FROM table_a a
JOIN table_b b ON a.id = b.id

Photon-Enabled with Custom Spark Config

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=8 runtime_engine=PHOTON spark_conf.spark.sql.shuffle.partitions=400
CREATE TABLE optimized_result AS
SELECT * FROM complex_transformation

Tagged for Cost Tracking

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 custom_tags.team=analytics custom_tags.cost_center=12345 custom_tags.environment=production
CREATE TABLE team_report AS SELECT ...

Single-User Security Mode

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=2 data_security_mode=SINGLE_USER [email protected]
CREATE TABLE secure_output AS SELECT ...

With Cluster Logging

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 cluster_log_conf.destination=s3://my-bucket/spark-logs cluster_log_conf.region=us-west-2
CREATE TABLE logged_job AS SELECT ...

See Also