AI Code Reviews - Datafold

AI Code Reviews bring LLM-powered analysis directly into your CI pipeline, automatically reviewing every pull request for SQL and data pipeline best practice violations.

AI Code Review posting an AI Overview comment on a pull request, summarizing code changes and their impact on data.

When combined with Data Diffs, AI Code Reviews give your team both code-level and data-level validation on every PR — catching logic errors, anti-patterns, and unintended data changes before they reach production.

AI Code Reviews are an optional add-on to Datafold’s CI integration. If you prefer to use only Data Diffs, no changes are needed — your existing CI setup continues to work as before.

How It Works

PR is opened or updated

When a pull request is created or updated, Datafold’s CI runner detects the change and checks if AI Code Reviews are enabled for your organization.

Code diff is analyzed

Datafold fetches the git diff, annotates it with line numbers, and identifies the affected files.

LLM reviews the changes

The code diff is sent to an LLM, which analyzes added lines for potential issues while considering the full context of removed and unchanged lines. The model identifies issues, explains them, and suggests specific code improvements referencing exact lines in the diff.

Results are validated

A review supervisor validates the findings, merging or refining them to reduce noise and ensure actionable feedback.

Feedback is posted to the PR

The AI-generated review is posted as a summary comment and inline annotations on the pull request, providing actionable feedback and suggested code changes.

What AI Code Reviews Check

AI Code Reviews are tuned for SQL and data pipeline code. The LLM analyzes your changes for common issues, including:

SQL anti-patterns — inefficient joins, missing filters, implicit type coercion
Data quality risks — missing WHERE clauses on DELETE/UPDATE, unintended cross joins, SELECT * in production models
dbt best practices — model naming conventions, ref usage, materialization choices
Schema changes — column additions, removals, or type changes that may break downstream consumers

AI Code Reviews + Data Diffs

AI Code Reviews and Data Diffs complement each other:

	AI Code Reviews	Data Diffs
What it checks	The code itself (SQL, dbt, pipeline logic)	The actual data output (row and column-level differences)
When it runs	On the first CI run for each PR	After staging data is built
What it catches	Logic errors, anti-patterns, best practice violations	Unintended data changes, row count shifts, value drift

Used together, they provide comprehensive validation — the AI reviews catch code issues early, while Data Diffs verify the actual data impact.

Enabling AI Code Reviews

AI Code Reviews require Datafold’s CI integration to be set up with your Git provider and data warehouse. To enable the feature:

Ensure your Git provider and data warehouse are connected in Datafold.
Verify your CI configuration is set up under Settings > CI/CD.
Contact Datafold support to enable AI Code Reviews for your organization.

Once enabled, AI Code Reviews will automatically run on the first CI run for each new pull request — no additional CI pipeline changes are required.

Using Data Diffs Only

If your team prefers to use only Data Diffs without AI Code Reviews, no action is needed. Your existing CI configuration will continue to run Data Diffs as before. AI Code Reviews are an opt-in feature and do not affect Data Diff behavior.

​How It Works

​What AI Code Reviews Check

​AI Code Reviews + Data Diffs

​Enabling AI Code Reviews

​Using Data Diffs Only

How It Works

What AI Code Reviews Check

AI Code Reviews + Data Diffs

Enabling AI Code Reviews

Using Data Diffs Only