INFO

Please contact support@datafold.com if you’d like to enable this feature for your organization.

This is particularly useful if any of the following are true:

  • You have (or plan to have) 100s or 1000s of monitors
  • Your team is accustomed to managing things in code
  • Strict governance and change management are important to you

Getting started

INFO

This section describes how to get started with GitHub Actions, but the same concepts apply to other hosted version control platforms like GitLab and Bitbucket. Contact us if you need help getting started.

Set up version control integration

To start using monitors as code, you’ll need to decide which repository will contain your YAML configuration.

If you’ve already connected a repository to Datafold, you could use that. Or, follow the instructions here to connect a new repository.

Generate a Datafold API key

If you’ve already got a Datafold API key, use it. Otherwise, you can create a new one in the app by visiting Settings > Account and selecting Create API Key.

Create monitors config

In your chosen repository, create a new YAML file where you’ll define your monitors config.

For this example, we’ll name the file monitors.yaml and place it in the root directory, but neither of these choices are hard requirements.

Leave the file blank for now—we’ll come back to it in a moment.

Add CI workflow

If you’re using GitHub Actions, create a new YAML file under .github/workflows/ using the following template. Be sure to tailor it to your particular setup:

name: Apply monitors as code config to Datafold

on:
  push:
    branches:
      - main # or master

jobs:
  apply:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.12
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install datafold-sdk
      - name: Update monitors
        run: datafold monitors provision monitors.yaml # use the correct file name/path
        env:
          DATAFOLD_HOST: https://app.datafold.com # different for dedicated deployments
          DATAFOLD_API_KEY: ${{ secrets.DATAFOLD_API_KEY }} # remember to add to secrets

Create a monitor

Now return to your YAML configuration file to add your first monitor. Reference the list of examples below and select one that makes sense for your organization.

Examples

INFO

These examples are intended to serve as inspiration and don’t demonstrate every possible configuration. Contact us if you have any questions.

Data Diff

monitors:
  replication_test:
    type: diff
    enabled: true
    schedule:
      interval:
        every: hour
    datadiff:
      dataset_a:
        connection_id: 734
        table: db.schema.table
        time_travel_point: '2020-01-01'
      dataset_b:
        connection_id: 736
        table: db.schema.table1
        time_travel_point: '2020-01-01'
      primary_key:
        - pk_column
      columns_to_compare:
        - col1
      materialize_results: true
      column_remapping:
        col1: col2
      sampling:
        rate: 0.1
      ignore_string_case: true

  replication_test_w_alert:
    type: diff
    enabled: true
    schedule:
      interval:
        every: hour
    datadiff:
      dataset_a:
        connection_id: 734
        table: db.schema.table
      dataset_b:
        connection_id: 736
        table: db.schema.table2
        materialize: false
        session_parameters:
          k: v
      primary_key:
        - pk_column
      egress_limit: 100
      per_column_diff_limit: 10
    alert:
      different_rows_count: 100
      different_rows_percent: 10

  replication_test_w_alert_and_notifications:
    type: diff
    enabled: true
    schedule:
      interval:
        every: hour
    datadiff:
      dataset_a:
        connection_id: 734
        table: db.schema.table
      dataset_b:
        connection_id: 736
        table: db.schema.table3
      primary_key:
        - pk_column
    notifications:
      - type: email
        recipients:
          - valentin@datafold.com
#      - type: slack
#        integration: 123
#        channel: alerts
#      - type: pagerduty
#        integration: 123
#      - type: webhook
#        integration: 123
    alert:
      different_rows_count: 100
      different_rows_percent: 10

Metric

monitors:
  table_monitor:
    type: metric
    enabled: true
    schedule:
      interval:
        every: hour
    connection_id: 736
    metric:
      type: table
      table: db.schema.table
      filter: deleted is false
      metric: freshness_minutes
    alert:
      type: automatic
      sensitivity: 10

  column_monitor:
    type: metric
    enabled: true
    schedule:
      interval:
        every: hour
    connection_id: 736
    metric:
      type: column
      table: db.schema.table
      column: col
      filter: deleted is false
      metric: sum
    alert:
      type: absolute
      max: 100
      min: 0

Data Test

monitors:
  data_test_monitor:
    type: test
    enabled: true
    schedule:
      interval:
        every: hour
    connection_id: 736
    query: select 1 from db.schema.table

Schema Change

monitors:
  schema_change_monitor:
    type: schema
    enabled: true
    schedule:
      interval:
        every: hour
    connection_id: 736
    table: db.schema.table

FAQ

Need help?

If you have any questions about how to use monitors as code, please reach out to our team via Slack, in-app chat, or email us at support@datafold.com.