REST API
info
To use the Datafold REST API, you should first create a Datafold API Key.
Create A New Diff
Parameter | Value |
---|---|
Endpoint | /api/v1/datadiffs |
Method | POST |
Request Content-Type | application/json |
Response Content-Type | application/json |
Auth | API key header. Example: headers = {'Authorization': 'Key <api_key>'} |
- Request
- Response
- Curl Request Example
- Python Request Example
Example request body:
{
"data_source1_id": 1,
"data_source2_id": 1,
"table1": [
"DATABASE",
"SCHEMA",
"TABLE_1"
],
"table2": [
"DATABASE",
"SCHEMA",
"TABLE_2"
],
"query1": null,
"query2": null,
"materialize_dataset1": false,
"materialize_dataset2": false,
"materialization_destination_id": null,
"materialize_without_sampling": false,
"pk_columns": [
"PRIMARY_KEY"
],
"include_columns": null,
"exclude_columns": null,
"time_column": null,
"time_aggregate": null,
"filter1": null,
"filter2": null,
"time_travel_point1": null,
"time_travel_point2": null,
"time_interval_start": null,
"time_interval_end": null,
"sampling_tolerance": null,
"sampling_confidence": null,
"sampling_threshold": null,
"diff_tolerances_per_column": null
}
The response will contain all data from the request with missing values filled in with either nulls or defaults. In addition to that, it returns the diff id
to be used in subsequent API calls, and additional metadata.
{
"data_source1_id": 1,
"data_source2_id": 1,
"table1": [
"DATABASE",
"SCHEMA",
"TABLE_1"
],
"table2": [
"DATABASE",
"SCHEMA",
"TABLE_2"
],
"query1": null,
"query2": null,
"materialize_dataset1": false,
"materialize_dataset2": false,
"materialization_destination_id": null,
"materialize_without_sampling": false,
"pk_columns": [
"PRIMARY_KEY"
],
"include_columns": [],
"exclude_columns": [],
"time_column": null,
"time_aggregate": null,
"filter1": null,
"filter2": null,
"time_travel_point1": null,
"time_travel_point2": null,
"time_interval_start": null,
"time_interval_end": null,
"sampling_tolerance": null,
"sampling_confidence": null,
"sampling_threshold": null,
"diff_tolerances_per_column": null,
"tags": null,
"id": 544266,
"user_id": 3126,
"done": false,
"status": "waiting",
"source": "api",
"result_statuses": {},
"updated_at": "2022-10-12T20:06:27.653283+00:00",
"created_at": "2022-10-12T20:06:27.653283+00:00",
"diff_stats": {
"diff_pks": null,
"diff_rows": null,
"diff_values": null
}
}
curl --location --request POST 'https://app.datafold.com/api/v1/datadiffs' \
--header 'Authorization: <api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"data_source1_id": 1,
"data_source2_id": 1,
"table1": [
"DATABASE",
"SCHEMA",
"TABLE_1"
],
"table2": [
"DATABASE",
"SCHEMA",
"TABLE_2"
],
"query1": null,
"query2": null,
"materialize_dataset1": false,
"materialize_dataset2": false,
"materialization_destination_id": null,
"materialize_without_sampling": false,
"pk_columns": [
"PRIMARY_KEY"
],
"include_columns": null,
"exclude_columns": null,
"time_column": null,
"time_aggregate": null,
"filter1": null,
"filter2": null,
"time_travel_point1": null,
"time_travel_point2": null,
"time_interval_start": null,
"time_interval_end": null,
"sampling_tolerance": null,
"sampling_confidence": null,
"sampling_threshold": null,
"diff_tolerances_per_column": null
}'
import requests
import json
url = "https://app.datafold.com/api/v1/datadiffs"
payload = json.dumps({
"data_source1_id": 1,
"data_source2_id": 1,
"table1": [
"DATABASE",
"SCHEMA",
"TABLE_1"
],
"table2": [
"DATABASE",
"SCHEMA",
"TABLE_2"
],
"query1": None,
"query2": None,
"materialize_dataset1": False,
"materialize_dataset2": False,
"materialization_destination_id": None,
"materialize_without_sampling": False,
"pk_columns": [
"PRIMARY_KEY"
],
"include_columns": None,
"exclude_columns": None,
"time_column": None,
"time_aggregate": None,
"filter1": None,
"filter2": None,
"time_travel_point1": None,
"time_travel_point2": None,
"time_interval_start": None,
"time_interval_end": None,
"sampling_tolerance": None,
"sampling_confidence": None,
"sampling_threshold": None,
"diff_tolerances_per_column": None
})
headers = {
'Authorization': '<api_key>',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Get Diff Results
Parameter | Value |
---|---|
Endpoint | /api/v1/datadiffs/<id>/summary_results |
Method | GET |
Response Content-Type | application/json |
Auth | API key header. Example: headers = {'Authorization': 'Key <api_key>'} |
- Successful Response
- Error Response
- Curl Request Example
- Python Request Example
Immediately after the diff is submitted, the response will contain the following:
{
"status": "running"
}
Or
{
"status": "waiting"
}
When the diff is complete, the status will be set to success
and additional fields will contain high-level diff metadata:
{
"status": "success",
"pks": {
"total_rows": [
1509,
1761
],
"nulls": [
0,
0
],
"dupes": [
0,
303
],
"exclusives": [
0,
5
],
"distincts": [
1509,
1514
]
},
"values": {
"total_rows": 1453,
"rows_with_differences": 0,
"total_values": 10171,
"values_with_differences": 0,
"compared_columns": 7,
"columns_with_differences": 0
},
"dependencies": {
"deps": {}
},
"schema": {
"columns_mismatched": [
0,
0
],
"column_type_mismatches": 0,
"column_reorders": 0,
"column_counts": [
8,
8
]
}
}
In the event of an error, the status will return as failed
. And the error
object will contain additional details.
Example:
{
"status": "failed",
"error": {
"error_type": "NoSuchTableException",
"error_value": "DATABASE.SCHEMA.TABLE_1"
}
}
Example curl command:
curl --location --request GET 'https://app.datafold.com/api/v1/datadiffs/<id>/summary_results' \
--header 'Authorization: <api_key>'
Example python request:
import requests
url = `https://app.datafold.com/api/v1/datadiffs/<id>/summary_results`
payload={}
headers = {
'Authorization': '<api_key>'
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)