Skip to main content

Securing Connections

Datafold supports multiple options to secure connections between your resources such as databases and BI tools and Datafold Cloud.

Encryption

When you connect to Datafold to query your data in a database (e.g., BigQuery), communications are secured through HTTPS encryption.

IP Whitelisting

If access to your data source is restricted to IP addresses on an allowlist, you will need to manually add Datafold's addresses in order to use our product. Otherwise, you will receive a connection error when setting up your data source.

For SaaS (app.datafold.com) deployments, whitelist the following IP addresses:

  • 23.23.71.47
  • 35.166.223.86
  • 52.11.132.23
  • 54.71.177.163
  • 54.185.25.103
  • 54.210.34.216

Note that at any time, you will only see one of these addresses in use. However, the active IP address can change at any time, so you should add them all to your IP whitelist to ensure no interruptions in service.

AWS PrivateLink

This option is available for both Datafold SaaS Cloud and all Datafold Dedicated Cloud options. See the diagram here for how it works:

Documentation on how to achieve that is available here.

Cross-Region PrivateLink

Datafold SaaS Cloud supports cross-region PrivateLink for all North American regions. Datafold SaaS Cloud is located in us-west-2. Datafold manages the cross-region networking, allowing you to connect to a VPC Endpoint in the same region as your VPC Endpoint Service. For Datafold Dedicated Cloud customers, deployment occurs in your chosen region. If you need to connect to databases in multiple regions, Datafold also supports this through cross-region PrivateLink.

The setup will be similar to the regular PrivateLink setup.

Setup

Supported Databases

The following setup assumes you have an RDS database you want to connect to. Datafold also supports PrivateLink connections to other databases such as Snowflake, which should only be accessed from your VPC. Please contact support@datafold.com to get assistance with connecting to your specific database.

In your VPC containing the RDS database, it will create a Network Load Balancer and a VPC endpoint service. The Datafold VPC will access the RDS over a secure link established by its VPC endpoint connection in the Datafold account and connect to that VPC endpoint name instead.

AWS then routes traffic from the VPC endpoint over the PrivateLink to the Network Load Balancer at your side, which targets the RDS database.

The above article contains a yaml CloudFormation stack that reduces the engineering effort to set up that connection. Make sure to use the correct region when deploying the stack.

Please contact support@datafold.com to discuss further details.

Tips:

  • After the stack is successfully deployed, you should run the Lambda once using the "Test" button to update the IP in the target group.
  • By default, Lambdas do not have internet access. Make sure to deploy or redeploy the Lambda in a subnet with a NAT gateway.
  • The security group used by the RDS database may need to be updated to allow access from the new Network Load Balancer.
  • When creating the endpoint in the Datafold VPC, the user performing that operation must have permission to see the service endpoint, which is determined by the target ARN role and if the user can assume that role.

At the end, the DNS name of Datafold's service endpoint should be used to set up the datasource. Our support team will let you know the name of that connection.

VPC Peering (SaaS)

VPC Peering is easier to set up than Private Link, but a drawback is that both networks are joined and the IP ranges must not overlap. For Datafold SaaS Cloud, this setup is an AWS-only option.

The basics of VPC peering are covered here.

To set up VPC peering, please contact support@datafold.com and provide us with the following information:

  • AWS region where your database is hosted.
  • ID of the VPC that you would like to connect.
  • CIDR of the VPC.

If there are no address collisions, we'll send you a peering request and CIDR that we use on our end, and whitelist the CIDR range for your organization. You'll need to set up routing to this CIDR through the peering connection.

If you activate DNS on your side of the peering connection, you can use the private DNS hostname to connect. Otherwise, you need to use the IP.

VPC Peering (Dedicated Cloud)

VPC Peering is a supported option for all cloud providers, both for Datafold-hosted and customer-hosted deployments. Basic information for each cloud provider can be found here:

VPC vs VNet

We use the term VPC accross all major cloud providers. However, Azure calls this concept a Virtual Network (VNet).

SSH Tunnel

To set up a tunnel, please contact our team at support@datafold.com and provide the following information:

  • Hostname of your bastion host and port number used for SSH service.
  • Hostname of and port number of your database.
  • SSH fingerprint of the bastion host (optional).

We'll get back to you with:

  • SSH public key that you need to add to ~/.ssh/authorized_hosts.
  • IP address and port to use for data source configuration in the Datafold application.

Reverse SSH Tunnel

The reverse proxy design sets up a tunnel from a customer's private subnet to allow Datafold to establish connections with resources in the customer's private subnet. This design can be preferential to avoid exposure of those resources directly to the Internet, even if strict filtering rules would be applied on the firewalls or security groups. The reverse SSH tunnel also gives you slightly more control over which connections are established.

At the moment, this solution is exclusive to Datafold SaaS Cloud and Datafold Dedicated Cloud, both hosted on AWS. To deploy it on your AWS account, Datafold will send you two CloudFormation stacks that you can use to set up the SSH tunnel client VM. If you need to connect to a database in a different public cloud or on-premises environment, Datafold will send you a script you can run on an Ubuntu server in your VPC to set up the SSH tunnel client.

Please contact support@datafold.com for documentation specific to your cloud/on-premises setup.

IPSec tunnel

Please contact our team at support@datafold for more information.