Steps to complete:

  1. Create an S3 bucket
  2. Run SQL Script for permissions
  3. Configure your data connection in Datafold

Create an S3 bucket

If you don’t already have an S3 bucket for your cluster, you’ll need to create one. Datafold uses this bucket to create temporary tables and store data in it. You can learn how to create an S3 bucket in AWS by referring to the AWS documentation.

Run SQL Script and Create Schema for Datafold

To connect to AWS Athena, you must generate an AWS Access Key ID and an AWS Secret Access Key. These keys provide read-only access to all tables in all schemas and write access to the Datafold-specific schema for temporary tables. If you don’t have these keys yet, follow the steps outlined in the AWS documentation.

Datafold utilizes a temporary dataset to materialize scratch work and keep data processing in the your warehouse.

/* Datafold utilizes a temporary dataset to materialize scratch work and keep data processing witin your data warehouse. */

CREATE SCHEMA IF NOT EXISTS awsdatacatlog.datafold_tmp;

Configure in Datafold

Field NameDescription
AWS Access Key IDYour AWS Access Key, which can be found in your AWS Account.
AWS Secret Access KeyThe AWS Secret Key (generate it in your AWS account if you don’t have it yet).
S3 Staging DirectoryThe S3 bucket where table data is stored.
AWS RegionThe region of your Athena cluster.
CatalogThe catalog, which is typically awsdatacatalog by default.
DatabaseThe database or schema with tables, typically default by default.
Schema for Temporary TablesThe schema (datafold_tmp) created in our SQL script.

Click Create to complete the setup of your data connection in Datafold.