When your PR includes updates to column names, it’s important to specify these updates in your git commit message using the following syntax. This allows Datafold to understand how renamed columns should be compared to the column in the production data with the original name.

Example

By specifying column remapping in the commit message, instead of interpreting the change as a removing one column and adding another:

Datafold will recognize that the column has been renamed:

Syntax for column remapping

You can use any of the following syntax styles as a single line to a commit message to instruct Datafold in CI to remap a column from oldcol to newcol.

# All models/tables in the PR:
datafold remap oldcol newcol
X-Datafold: rename oldcol newcol
/datafold renamed oldcol newcol
datafold: remapped oldcol newcol

# Filtered models/tables by shell-like glob:
datafold remap oldcol newcol model_NAME
X-Datafold: rename oldcol newcol TABLE
/datafold renamed oldcol newcol VIEW_*

Chaining together column name updates

Commit messages can be chained together to reflect sequential changes. This means that a commit message does not lock you in to renaming a column.

For example, if your commit history looks like this:

Datafold will understand that the production column name has been renamed to first_name in the PR branch.

Handling column renaming in git commits and PR comments

Git commits

Git commits track changes on a change-by-change basis and linearize history assuming merged branches introduce new changes on top of the base/current branch (1st parent).

PR comments

PR comments apply changes to the entire changeset.

When to use git commits or PR comments?

When handling chained renames:

  • Git commits: Sequential renames (col1 > col2 > col3) result in the final rename (col1 > col3).
  • PR comments: It’s best to specify the final result directly (col1 > col3). Sequential renames (col1 > col2 > col3) can also work, but specifying the final state simplifies understanding during review.
AspectGit CommitsPR Comments
Tracking ChangesTracks changes on a change-by-change basis.Applies changes to the entire changeset.
History LinearizationLinearizes history assuming merged branches introduce new changes on top of the base/current branch (1st parent).N/A
Chained RenamesSequential renames (col1 > col2 > col3) result in the final rename (col1 > col3).It’s best to specify the final result directly (col1 > col3). Sequential renames (col1 > col2 > col3) can also work, but specifying the final state simplifies understanding during review.
PrecedenceRenames specified in git commits are applied in sequence unless overridden by subsequent commits.PR comments take precedence over renames specified in git commits if applied during the review process.

These guidelines ensure consistency and clarity when managing column renaming in collaborative development environments, leveraging Datafold’s capabilities effectively.