Data Test Monitors
Data Test monitors validate your data against business rules written as SQL queries, materializing failed records in a temporary table for review in Datafold.
Data Test monitors allow you to validate your data with business rules written as custom SQL queries. Failed records are materialized to a temporary table in your warehouse, and you can view a sample of the records in Datafold.
INFO
Please contact support@datafold.com if you’d like to enable this feature for your organization.
Think of these monitors as pass/fail—either the query returns no records (pass) or it returns at least one record (fail). Data Tests are extremely flexible, but common use cases are validating custom business rules, referential integrity between tables, and data formatting (see examples below).
Create a Data Test monitor
There are two ways to create a Data Test monitor:
- Open the Monitors page, select Create new monitor, and then choose Data Test.
- Clone an existing Data Test monitor by clicking Actions and then Clone. This will pre-fill the form with the existing monitor configuration.
Set up your monitor
Give your monitor a descriptive name. For example, if your test confirms that records in a table of transactions have an amount that’s greater than zero, you might call it Transaction amount > 0
.
Then, select your data connection and write your test. Keep in mind your query should return records that fail the test.
Continuing with the example above, if transaction amounts should be greater than zero, your query should return records with amounts less than or equal to zero:
Add a schedule
Customize the frequency and timing of monitor runs. You can choose a specific hourly or daily time in UTC, or input a cron tab expression for more complex scheduling:
Add notifications
Receive notifications via Slack or email when at least one record fails your test:
Example queries
Custom business rule
Say your company defines active users as individuals who have signed into your application at least 3 times in the past week. You could write a test that validates this logic by checking for users marked as active who haven’t reached this threshold:
Referential integrity
Now let’s assume you have contacts and accounts in your warehouse. Every contact belongs to exactly one account, and you want to confirm that every account referenced in the contacts table exists in the accounts table. In this case, you’d look for references to accounts that are missing from the accounts table:
Data formatting
Finally, if you wanted to validate that all phone numbers in your contacts table are 10 digits and only contain numbers, you’d return records that are not 10 digits or use non-numeric characters:
Attach CSVs to notifications
Datafold allows attaching a CSV of failed records to notifications in Slack, email, etc. This is useful if, for example, you have business users who don’t have a Datafold license but need to know about records that fail your tests.
This option is configured separately per notification destination as shown here:
CSV attachments are limited to the lesser of 1,000 rows or 1 MB in file size.
Attaching CSVs in Slack
In order to attach CSVs to Slack notifications, you need to complete 1-2 additional steps:
- If you installed the Datafold Slack app prior to October 2024, you’ll need to reinstall the app by visiting Settings > Integrations > Notifications, selecting your Slack integration, then Reinstall Slack integration.
- Invite the Datafold app to the channel you wish to send notifications to using the
/invite
command shown below:
Run Tests in CI
Standard Data Tests run on a schedule against your production data. But often it’s useful to test data before it gets to production as part of your deployment workflow. For this reason, Datafold supports running tests in CI.
Data Tests in CI work very similarly to our Monitors as Code feature, in the sense that you define your tests in a version-controled YAML file. You then use the Datafold SDK to execute those tests as part of your CI workflow.
Write your tests
First, create a new file (e.g. tests.yaml
) in the root of your repository. Then write your tests use the same format described in our Monitors as Code docs with two exceptions:
- Add a
run_in_ci
flag to each test and set it totrue
(assuming you’d like to run the test) - (Optional) Add placeholders for variables that you’d like to populate dynamically when executing your tests
Here’s an example:
Execute your tests
INFO
This section describes how to get started with GitHub Actions, but the same concepts apply to other hosted version control platforms like GitLab and Bitbucket. Contact us if you need help getting started.
If you’re using GitHub Actions, create a new YAML file under .github/workflows/
using the following template. Be sure to tailor it to your particular setup:
View the results
When your CI workflow is triggered (e.g. by a pull request), you can view the terminal output for your test results:
Need help?
If you have any questions about how to use Data Test monitors, please reach out to our team via Slack, in-app chat, or email us at support@datafold.com.