Skip to main content
AI Code Reviews bring LLM-powered analysis directly into your CI pipeline, automatically reviewing every pull request for SQL and data pipeline best practice violations.
AI Code Review posting an AI Overview comment on a pull request, summarizing code changes and their impact on data.
When combined with Data Diffs, AI Code Reviews give your team both code-level and data-level validation on every PR — catching logic errors, anti-patterns, and unintended data changes before they reach production.
AI Code Reviews are an optional add-on to Datafold’s CI integration. If you prefer to use only Data Diffs, no changes are needed — your existing CI setup continues to work as before.

How It Works

1

PR is opened or updated

When a pull request is created or updated, Datafold’s CI runner detects the change and checks if AI Code Reviews are enabled for your organization.
2

Code diff is analyzed

Datafold fetches the git diff, annotates it with line numbers, and identifies the affected files.
3

LLM reviews the changes

The code diff is sent to an LLM, which analyzes added lines for potential issues while considering the full context of removed and unchanged lines. The model identifies issues, explains them, and suggests specific code improvements referencing exact lines in the diff.
4

Results are validated

A review supervisor validates the findings, merging or refining them to reduce noise and ensure actionable feedback.
5

Feedback is posted to the PR

The AI-generated review is posted as a summary comment and inline annotations on the pull request, providing actionable feedback and suggested code changes.

What AI Code Reviews Check

AI Code Reviews are tuned for SQL and data pipeline code. The LLM analyzes your changes for common issues, including:
  • SQL anti-patterns — inefficient joins, missing filters, implicit type coercion
  • Data quality risks — missing WHERE clauses on DELETE/UPDATE, unintended cross joins, SELECT * in production models
  • dbt best practices — model naming conventions, ref usage, materialization choices
  • Schema changes — column additions, removals, or type changes that may break downstream consumers

AI Code Reviews + Data Diffs

AI Code Reviews and Data Diffs complement each other:
AI Code ReviewsData Diffs
What it checksThe code itself (SQL, dbt, pipeline logic)The actual data output (row and column-level differences)
When it runsOn the first CI run for each PRAfter staging data is built
What it catchesLogic errors, anti-patterns, best practice violationsUnintended data changes, row count shifts, value drift
Used together, they provide comprehensive validation — the AI reviews catch code issues early, while Data Diffs verify the actual data impact.

Enabling AI Code Reviews

AI Code Reviews require Datafold’s CI integration to be set up with your Git provider and data warehouse. To enable the feature:
  1. Ensure your Git provider and data warehouse are connected in Datafold.
  2. Verify your CI configuration is set up under Settings > CI/CD.
  3. Contact Datafold support to enable AI Code Reviews for your organization.
Once enabled, AI Code Reviews will automatically run on the first CI run for each new pull request — no additional CI pipeline changes are required.

Using Data Diffs Only

If your team prefers to use only Data Diffs without AI Code Reviews, no action is needed. Your existing CI configuration will continue to run Data Diffs as before. AI Code Reviews are an opt-in feature and do not affect Data Diff behavior.