Datafold offers a column-level and tabular lineage view.
Datafold’s column-level lineage helps users trace and document the history, transformations, dependencies, and both downstream and upstream processes of a specific data column within an organization’s data assets. This feature allows you to pinpoint the origins of data validation issues and comprehensively identify downstream data processes and applications.
To view column-level lineage, click on the Columns dropdown menu of the selected asset.
Lineage Graph Columns Dropdown
To highlight the column path between assets, click the specific column. Reset the view by clicking the Exit the selected path button.
Selected Path in Lineage Graph
Datafold also offers a tabular lineage view.
You can sort lineage information by depth, asset type, identifier, and owner. Click on the Actions button for further options:
Tabular Lineage Actions Dropdown
Drill down onto the data node or column of interest.
Access the SQL query associated with the selected column to understand how the data was queried from the source:
Show SQL Query in Tabular Lineage
Access detailed information about the column’s read, write, and cumulative read (the sum of read count including read count of downstream columns) for the previous 7 days:
Usage Details in Tabular Lineage
Datafold offers powerful search and filtering capabilities to help users quickly locate specific data assets and isolate data connections of interest.
In both the graphical and tabular lineage views, you can filter by tables or columns within tables, allowing you to go as granular as needed.
Search and Filter in Tabular Lineage
Simply enter the table’s name in the search bar to filter and display all relevant information associated with that table.
To focus specifically on columns, you can search using a combination of keywords. For instance, searching “column table” will display columns associated with a table, while a query like “column dim customer” narrows the search to columns within the “dim customer” table.
You can configure the settings for Lineage under Settings > Data Connections > Advanced Settings:
Lineage Advanced Settings
Customize the frequency and timing of when to update the indexes on database schemas. The schedule is defined through a cron tab expression.
You can filter to include and/or exclude specific tables to be shown in Lineage.
When the inclusion list is set, only the tables specified in this list will be visible in the lineage and search results.
When the inclusion list is not set, all tables will be visible by default, except for those explicitly specified in the exclusion list.
Customize the frequency and timing of when to scan the query history of your data warehouse to build and update the data lineage. The schedule is defined through a cron tab expression.
How is lineage computed?
Datafold computes column-level lineage by:
Is there a programmatic way to retrieve lineage?
Currently, the schema of the Datafold GraphQL API, which we use to expose lineage information, is not yet stable and is considered to be in beta. Therefore, we do not include this API in our public documentation.
If you would like to programmatically access lineage information, you can explore our GitHub repository with a few examples: datafold/datafold-api-examples. Simply clone the repository and follow the instructions provided in the README.md
file.
Datafold offers a column-level and tabular lineage view.
Datafold’s column-level lineage helps users trace and document the history, transformations, dependencies, and both downstream and upstream processes of a specific data column within an organization’s data assets. This feature allows you to pinpoint the origins of data validation issues and comprehensively identify downstream data processes and applications.
To view column-level lineage, click on the Columns dropdown menu of the selected asset.
Lineage Graph Columns Dropdown
To highlight the column path between assets, click the specific column. Reset the view by clicking the Exit the selected path button.
Selected Path in Lineage Graph
Datafold also offers a tabular lineage view.
You can sort lineage information by depth, asset type, identifier, and owner. Click on the Actions button for further options:
Tabular Lineage Actions Dropdown
Drill down onto the data node or column of interest.
Access the SQL query associated with the selected column to understand how the data was queried from the source:
Show SQL Query in Tabular Lineage
Access detailed information about the column’s read, write, and cumulative read (the sum of read count including read count of downstream columns) for the previous 7 days:
Usage Details in Tabular Lineage
Datafold offers powerful search and filtering capabilities to help users quickly locate specific data assets and isolate data connections of interest.
In both the graphical and tabular lineage views, you can filter by tables or columns within tables, allowing you to go as granular as needed.
Search and Filter in Tabular Lineage
Simply enter the table’s name in the search bar to filter and display all relevant information associated with that table.
To focus specifically on columns, you can search using a combination of keywords. For instance, searching “column table” will display columns associated with a table, while a query like “column dim customer” narrows the search to columns within the “dim customer” table.
You can configure the settings for Lineage under Settings > Data Connections > Advanced Settings:
Lineage Advanced Settings
Customize the frequency and timing of when to update the indexes on database schemas. The schedule is defined through a cron tab expression.
You can filter to include and/or exclude specific tables to be shown in Lineage.
When the inclusion list is set, only the tables specified in this list will be visible in the lineage and search results.
When the inclusion list is not set, all tables will be visible by default, except for those explicitly specified in the exclusion list.
Customize the frequency and timing of when to scan the query history of your data warehouse to build and update the data lineage. The schedule is defined through a cron tab expression.
How is lineage computed?
Datafold computes column-level lineage by:
Is there a programmatic way to retrieve lineage?
Currently, the schema of the Datafold GraphQL API, which we use to expose lineage information, is not yet stable and is considered to be in beta. Therefore, we do not include this API in our public documentation.
If you would like to programmatically access lineage information, you can explore our GitHub repository with a few examples: datafold/datafold-api-examples. Simply clone the repository and follow the instructions provided in the README.md
file.