Introduction into alerting to detect unexpected changes in data before
Within Datafold it is easy to create alerts to monitor the data that's flowing through the pipeline. On the alerts page, you will see an overview of the configured alerts.
This table shows important information such as, which metrics are being tracked, when the alert was triggered for the last time, and when it ran for the last time.
It is both very easy to track metrics using SQL. At the same time, SQL allows you to do aggregations and grouping to also facilitate complex KPIs. In the alert we track certain metrics of Datafold's public dbt-beers pipeline. In this case the average basket size, and the prices per day.
SQL query – a query that produces a time series metric with the following parameters:
  • The first column of the query should be date time (e.g. hour, day, week etc.)
  • The second to nth columns are metrics (e.g. count of rows).
  • Secondary dimensions (i.e. non-metric columns) are currently not supported.
  • For best performance, limit the query results to 300 data points.
  • Make sure your query doesn't contain NULLS
  • If your query returns incomplete data for the most recent time period, it may be necessary to filter it out to avoid false positive alerts
The query is executed daily at midnight UTC, and alerts are sent to the #datafold-alerts slack channel. Furthermore, it shows when the alert was changed for the last time.
The UI shows the historical trend of the average price of the order per day. In the screen above we see that the alert would have trigerred on two historical events; 22 and 28 of August. There are three types of checks that you can do:
  • Min/Max values. This is a very simple, yet effective way of detecting that something is wrong. For example, having hard limits, or when a certain threshold exceeds. For example, you expect a dump to come in every week, and the latest record in the table is 8 days old.
  • Min/Max increase/decrease. Applies to a % change in the metric value from the previous period (e.g. day-over-day). For example, if you track a median or average of a value, and you don't expect rapid changes. An example would be the number of sales for a day of week.
  • Anomaly detection. Datafold comes with anomaly detection that takes into account seasonality and trend in your data and will construct a confidence interval (pale blue in the picture). You can tune the significance to your liking. It will give you an estimate based on the historical data on how many alerts to expect.
Once triggered, an alert can be sent to a Slack channel or Email (custom integrations e.g. PagerDuty, OpsGenie, etc. are available on request). When configuring the integrations, it is always a good idea to test them using the arrow button (see above) to make sure that the alerts can be delivered successfully.
Not all the datasets have a time component. Next to the time series queries, Datafold can also track the data at a certain point in time and build the history inside of Datafold:
Copy link