Encryption

When you connect to Datafold to query your data in a database (e.g., BigQuery), communications are secured using HTTPS encryption.

IP Whitelisting

If access to your data connection is restricted to IP addresses on an allowlist, you will need to manually add Datafold’s addresses in order to use our product. Otherwise, you will receive a connection error when setting up your data connection.

For SaaS (app.datafold.com) deployments, whitelist the following IP addresses:

  • 23.23.71.47
  • 35.166.223.86
  • 52.11.132.23
  • 54.71.177.163
  • 54.185.25.103
  • 54.210.34.216

Note that at any given time, you will only see one of these addresses in use. However, the active IP address can change, so you should add them all to your IP whitelist to ensure no interruptions in service.

AWS PrivateLink

AWS PrivateLink allows you to connect Datafold to your databases without exposing data to the internet. This option is available for both Datafold SaaS Cloud and all Datafold Dedicated Cloud options.

The following diagram shows the architecture for a customer with a High Availability RDS setup:

SaaS with PrivateLink

Setup

Supported Databases

The following setup assumes you have an RDS/Aurora database you want to connect to. Datafold also supports PrivateLink connections to other databases such as Snowflake, which should only be accessed from your VPC. Please contact support@datafold.com to get assistance with connecting to your specific database.

Our support team will send you the following:

  • The role ARN to establish the PrivateLink connection.
  • Datafold SaaS Cloud VPC CIDR range.

You need to do the following steps:

  1. Send us the region(s) where your database(s) are located.
  2. Create a VPC Endpoint Service and NLB.
  3. Add the provided role ARN as ‘Allowed Principal’ on the VPC Endpoint Service.
  4. Allow ingress from the Datafold SaaS Cloud VPC.
  5. Send us the:
    • Service name(s), e.g. com.amazonaws.vpce.us-west-2.vpce-svc-0cfd2f258c4395ad6.
    • Availability Zone ID(s) used in the VPCE Service(s), e.g. use1-az6 or usw2-az3.
    • RDS/Aurora hostname(s), e.g. datafold.c2zezoge6btk.us-west-2.rds.amazonaws.com.

At the end, the database hostname used to configure the data source will be the original RDS/Aurora hostname. But with private DNS resolution, we will resolve the hostname to the VPC Endpoint. Our support team will let you know when everything is set up and you can accept the PrivateLink connection and start configuring the data source.

Cross-Region PrivateLink

Datafold SaaS Cloud supports cross-region PrivateLink for all North American regions. Datafold SaaS Cloud is located in us-west-2. Datafold manages the cross-region networking, allowing you to connect to a VPC Endpoint in the same region as your VPC Endpoint Service. For Datafold Dedicated Cloud customers, deployment occurs in your chosen region. If you need to connect to databases in multiple regions, Datafold also supports this through cross-region PrivateLink.

The setup will be similar to the regular PrivateLink setup.

VPC Peering (SaaS)

VPC Peering is easier to set up than Private Link, but a drawback is that both networks are joined and the IP ranges must not overlap. For Datafold SaaS Cloud, this setup is an AWS-only option.

The basics of VPC peering are covered here.

To set up VPC peering, please contact support@datafold.com and provide us with the following information:

  • AWS region where your database is hosted.
  • ID of the VPC that you would like to connect.
  • CIDR of the VPC.

If there are no address collisions, we’ll send you a peering request and CIDR that we use on our end, and whitelist the CIDR range for your organization. You’ll need to set up routing to this CIDR through the peering connection.

If you activate DNS on your side of the peering connection, you can use the private DNS hostname to connect. Otherwise, you need to use the IP.

VPC Peering (Dedicated Cloud)

VPC Peering is a supported option for all cloud providers, both for Datafold-hosted and customer-hosted deployments. Basic information for each cloud provider can be found here:

VPC vs VNet

We use the term VPC accross all major cloud providers. However, Azure calls this concept a Virtual Network (VNet).

SSH Tunnel

To set up a tunnel, please contact our team at support@datafold.com and provide the following information:

  • Hostname of your bastion host and port number used for SSH service.
  • Hostname of and port number of your database.
  • SSH fingerprint of the bastion host (optional).

We’ll get back to you with:

  • SSH public key that you need to add to ~/.ssh/authorized_hosts.
  • IP address and port to use for data connection configuration in the Datafold application.

Reverse SSH Tunnel

The reverse proxy design sets up a tunnel from a customer’s private subnet to allow Datafold to establish connections with resources in the customer’s private subnet.

This design can be preferential to avoid exposure of those resources directly to the Internet, even if strict filtering rules are applied on the firewalls or security groups. The reverse SSH tunnel also gives you slightly more control over which connections are established.

At the moment, this solution is exclusive to Datafold SaaS Cloud and Datafold Dedicated Cloud, both hosted on AWS. To deploy it on your AWS account, Datafold will send you two CloudFormation stacks that you can use to set up the SSH tunnel client VM. If you need to connect to a database in a different public cloud or on-premises environment, Datafold will send you a script you can run on an Ubuntu server in your VPC to set up the SSH tunnel client.

Please contact our team at support@datafold.com for documentation specific to your cloud/on-premises setup.

IPSec tunnel

Please contact our team at support@datafold.com for more information.