Jobs Compute - Datafold

Beta ProductSQL Proxy is currently in beta. Features and APIs may change.

Overview

Jobs compute routes SQL queries to Databricks Jobs instead of SQL warehouses. This is ideal for long-running transformations, DDL operations, and workloads that benefit from dedicated compute resources.

Admin API

Configure global jobs compute settings via the admin API.

Get Configuration

GET /admin/jobs-compute

curl -X GET \
  -H "Authorization: Bearer <admin-token>" \
  https://sqlproxy.your-company.datafold.com/admin/jobs-compute

Update Configuration (Full Replace)

POST /admin/jobs-compute

curl -X POST \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "max_wait_ms": 3600000,
    "executor_script_path": "/dbfs/datafold/executor.py",
    "default_spark_version": "14.3.x-scala2.12",
    "default_node_type": "i3.xlarge",
    "default_num_workers": 4
  }' \
  https://sqlproxy.your-company.datafold.com/admin/jobs-compute

Request Body

Field	Type	Required	Description
`enabled`	boolean	Yes	Enable/disable jobs compute routing
`max_wait_ms`	integer	Yes	Maximum wait time for job completion (milliseconds)
`executor_script_path`	string	Yes	DBFS path to the executor script
`default_spark_version`	string	No	Default Databricks runtime version
`default_node_type`	string	No	Default EC2 instance type for workers
`default_num_workers`	integer	No	Default number of workers

Update Configuration (Partial)

PUT /admin/jobs-compute All fields are optional. Only provided fields are updated.

curl -X PUT \
  -H "Authorization: Bearer <admin-token>" \
  -H "Content-Type: application/json" \
  -d '{"default_num_workers": 8}' \
  https://sqlproxy.your-company.datafold.com/admin/jobs-compute

Routing Directives

Use @datafold:jobs_compute in SQL comments to route queries to jobs compute.

-- @datafold:jobs_compute [options]
CREATE TABLE result AS SELECT ...

Compute Types

Serverless (Default)

-- @datafold:jobs_compute
CREATE TABLE result AS SELECT * FROM source

Uses Databricks serverless compute. Simple but charges by DBU-hours.

Classic

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4
CREATE TABLE result AS SELECT * FROM source

Uses a dedicated cluster. More control over instance types and pricing (spot instances).

Directive Parameters

All parameter names match the official Databricks Jobs API new_cluster schema.

Core Configuration

Parameter	Type	Description	Example
`type`	string	Compute type: `serverless` (default) or `classic`	`type=classic`

Node Configuration

Parameter	Type	Description	Example
`node_type_id`	string	EC2 instance type for workers	`node_type_id=i3.xlarge`
`driver_node_type_id`	string	EC2 instance type for driver (defaults to `node_type_id`)	`driver_node_type_id=i3.2xlarge`

Worker Configuration

Parameter	Type	Description	Example
`num_workers`	integer	Fixed number of workers	`num_workers=4`
`autoscale.min_workers`	integer	Minimum workers for autoscaling	`autoscale.min_workers=2`
`autoscale.max_workers`	integer	Maximum workers for autoscaling	`autoscale.max_workers=10`

Use either num_workers (fixed) or autoscale.min_workers/autoscale.max_workers (autoscaling), not both.

Spark Configuration

Parameter	Type	Description	Example
`spark_version`	string	Databricks Runtime version	`spark_version=14.3.x-scala2.12`
`spark_conf.<key>`	string	Spark configuration property	`spark_conf.spark.sql.shuffle.partitions=200`
`spark_env_vars.<key>`	string	Environment variable	`spark_env_vars.MY_VAR=value`
`runtime_engine`	string	`STANDARD` or `PHOTON`	`runtime_engine=PHOTON`

Instance Pools

Parameter	Type	Description	Example
`instance_pool_id`	string	Instance pool for workers	`instance_pool_id=0123-456789-pool`
`driver_instance_pool_id`	string	Instance pool for driver	`driver_instance_pool_id=0123-456789-pool`

When using instance pools, node_type_id is not required as the pool defines instance types.

Cluster Policies

Parameter	Type	Description	Example
`policy_id`	string	Databricks cluster policy ID	`policy_id=0123456789ABCDEF`
`apply_policy_default_values`	boolean	Use policy defaults for omitted fields	`apply_policy_default_values=true`

Storage

Parameter	Type	Description	Example
`enable_elastic_disk`	boolean	Enable autoscaling local storage	`enable_elastic_disk=true`
`enable_local_disk_encryption`	boolean	Encrypt local disks	`enable_local_disk_encryption=true`

Security

Parameter	Type	Description	Example
`data_security_mode`	string	`NONE`, `SINGLE_USER`, or `USER_ISOLATION`	`data_security_mode=SINGLE_USER`
`single_user_name`	string	Username for single-user mode	`single_user_name=user@company.com`

Custom Tags

Parameter	Type	Description	Example
`custom_tags.<key>`	string	Resource tag key-value	`custom_tags.cost_center=12345`

Init Scripts

Parameter	Type	Description	Example
`init_scripts`	string	Comma-separated paths (s3://, dbfs:/, or workspace)	`init_scripts=s3://bucket/init.sh`

Docker

Parameter	Type	Description	Example
`docker_image.url`	string	Custom container image URL	`docker_image.url=my-registry/image:tag`
`docker_image.basic_auth.username`	string	Registry auth username	`docker_image.basic_auth.username=user`
`docker_image.basic_auth.password`	string	Registry auth password	`docker_image.basic_auth.password=secret`

AWS Attributes

All AWS-specific parameters use the aws_attributes. prefix.

Parameter	Type	Description	Example
`aws_attributes.availability`	string	`SPOT`, `ON_DEMAND`, or `SPOT_WITH_FALLBACK`	`aws_attributes.availability=SPOT`
`aws_attributes.zone_id`	string	AWS availability zone	`aws_attributes.zone_id=us-west-2a`
`aws_attributes.first_on_demand`	integer	Number of on-demand nodes before spot	`aws_attributes.first_on_demand=1`
`aws_attributes.spot_bid_price_percent`	integer	Spot bid as % of on-demand price	`aws_attributes.spot_bid_price_percent=100`
`aws_attributes.ebs_volume_type`	string	EBS type: `GENERAL_PURPOSE_SSD`, `THROUGHPUT_OPTIMIZED_HDD`	`aws_attributes.ebs_volume_type=GENERAL_PURPOSE_SSD`
`aws_attributes.ebs_volume_count`	integer	Number of EBS volumes per node	`aws_attributes.ebs_volume_count=1`
`aws_attributes.ebs_volume_size`	integer	EBS volume size in GB	`aws_attributes.ebs_volume_size=100`
`aws_attributes.ebs_volume_iops`	integer	IOPS for gp3/io1/io2 volumes	`aws_attributes.ebs_volume_iops=3000`
`aws_attributes.ebs_volume_throughput`	integer	Throughput in MB/s for gp3 volumes	`aws_attributes.ebs_volume_throughput=125`
`aws_attributes.instance_profile_arn`	string	IAM instance profile ARN	`aws_attributes.instance_profile_arn=arn:aws:iam::123:instance-profile/role`

Cluster Logging

Configure Spark log delivery using the cluster_log_conf. prefix.

Parameter	Type	Description	Example
`cluster_log_conf.destination`	string	S3 or DBFS path for logs	`cluster_log_conf.destination=s3://bucket/logs`
`cluster_log_conf.region`	string	AWS region for S3 destination	`cluster_log_conf.region=us-west-2`
`cluster_log_conf.enable_encryption`	boolean	Enable S3 server-side encryption	`cluster_log_conf.enable_encryption=true`

The destination type (S3 vs DBFS) is auto-detected from the path prefix (s3:// or dbfs:/).

Examples

Cost-Optimized Spot Cluster

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 aws_attributes.availability=SPOT_WITH_FALLBACK aws_attributes.spot_bid_price_percent=100
CREATE TABLE aggregated_metrics AS
SELECT date, SUM(value) as total
FROM raw_metrics
GROUP BY date

Autoscaling Cluster with Policy

-- @datafold:jobs_compute type=classic policy_id=0123456789ABCDEF apply_policy_default_values=true autoscale.min_workers=2 autoscale.max_workers=10
CREATE TABLE large_join AS
SELECT a.*, b.details
FROM table_a a
JOIN table_b b ON a.id = b.id

Photon-Enabled with Custom Spark Config

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=8 runtime_engine=PHOTON spark_conf.spark.sql.shuffle.partitions=400
CREATE TABLE optimized_result AS
SELECT * FROM complex_transformation

Tagged for Cost Tracking

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 custom_tags.team=analytics custom_tags.cost_center=12345 custom_tags.environment=production
CREATE TABLE team_report AS SELECT ...

Single-User Security Mode

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=2 data_security_mode=SINGLE_USER single_user_name=analyst@company.com
CREATE TABLE secure_output AS SELECT ...

With Cluster Logging

-- @datafold:jobs_compute type=classic node_type_id=i3.xlarge num_workers=4 cluster_log_conf.destination=s3://my-bucket/spark-logs cluster_log_conf.region=us-west-2
CREATE TABLE logged_job AS SELECT ...

Cloud

Audit Logs

CI

Data sources

Data diffs

Explore

BI

Monitors

​Overview

​Admin API

​Get Configuration

​Update Configuration (Full Replace)

​Request Body

​Update Configuration (Partial)

​Routing Directives

​Compute Types

​Serverless (Default)

​Classic

​Directive Parameters

​Core Configuration

​Node Configuration

​Worker Configuration

​Spark Configuration

​Instance Pools

​Cluster Policies

​Storage

​Security

​Custom Tags

​Init Scripts

​Docker

​AWS Attributes

​Cluster Logging

​Examples

​Cost-Optimized Spot Cluster

​Autoscaling Cluster with Policy

​Photon-Enabled with Custom Spark Config

​Tagged for Cost Tracking

​Single-User Security Mode

​With Cluster Logging

​See Also

Overview

Admin API

Get Configuration

Update Configuration (Full Replace)

Request Body

Update Configuration (Partial)

Routing Directives

Compute Types

Serverless (Default)

Classic

Directive Parameters

Core Configuration

Node Configuration

Worker Configuration

Spark Configuration

Instance Pools

Cluster Policies

Storage

Security

Custom Tags

Init Scripts

Docker

AWS Attributes

Cluster Logging

Examples

Cost-Optimized Spot Cluster

Autoscaling Cluster with Policy

Photon-Enabled with Custom Spark Config

Tagged for Cost Tracking

Single-User Security Mode

With Cluster Logging

See Also