Steps to complete:

  1. Create a user with access to S3
  2. Assign the user to the S3 bucket
  3. Create an access key for the user
  4. Configure your data connection in Datafold

Create a user with access to S3

To connect your Amazon S3 bucket, you will need to create a user for Datafold to use.

  • Navigate to the AWS Console.
  • Click on the search bar in the top header, then find IAM service and click on it.
  • Click on the Users item of the Access Management section.
  • Click on the Create user button.
  • Create a user named Datafold.
  • Assign the user to the AmazonS3FullAccess policy.
  • When done, keep ARN of the user handy as you’ll need it in the next step.

Assign the user to the S3 bucket

  • Go to S3 panel and select the bucket.
  • Click on the Permissions tab.
  • Click on Edit next to the Bucket Policy.
  • Add the following policy:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam:::user/Datafold" // Replace with your user's ARN
          },
          "Action": [
            "s3:GetObject",
            "s3:PutObject" // Optional: Only needed if you're planning to use this data connection as a destination for materialized diff results.
          ],
          "Resource": [
            "arn:aws:s3:::your-bucket-name/*", // Replace with your bucket's ARN
            "arn:aws:s3:::your-bucket-name" // Replace with your bucket's ARN
          ]
        }
      ]
    }
    

The Datafold user requires the following roles and permissions:

  • s3:GetObject for read access.
  • s3:PutObject for write access if you’re planning to use this data connection as a destination for materialized diff results.

Create an access key for the user

Next, go back to the IAM page to generate a key for Datafold.

  • Click on the Users page.
  • Click on the Datafold user.
  • Click on the Security Credentials tab.
  • Click on Create access key and select Create new access key.
  • Select JSON and click Create.

Configure in Datafold

Field NameDescription
Connection nameA name given to the data connection within Datafold
Bucket NameThe name of the bucket you want to connect to.
Bucket regionThe region of the bucket you want to connect to.
Key IDThe key file generated in the Create an access key for the user step
Secret Access KeyThe secret access key generated in the Create an access key for the user step
Directory for writing diff resultsOptional. The directory in the bucket where diff results will be written. Service account should have write access to this directory.
Default maximum number of rows to include in diff resultsOptional. The maximum number of rows that a file with materialized results will contain.

Click Create. Your data connection is ready!