Skip to main content

Securing Connections

Datafold supports multiple options to secure connections between your resources such as databases and BI tools and Datafold Cloud.

IP Whitelisting

If access to your data source is restricted to IP addresses on an allowlist, you will need to manually add Datafold's addresses in order to use our product. Otherwise, you will receive a connection error when setting up your data source.

For SaaS (app.datafold.com) deployments, whitelist the following IP addresses:

  • 23.23.71.47
  • 35.166.223.86
  • 52.11.132.23
  • 54.71.177.163
  • 54.185.25.103
  • 54.210.34.216

Note that at any time, you will only see one of these addresses in use. However, the active IP address can change at any time, so you should add them all to your IP whitelist to ensure no interruptions in service.

AWS PrivateLink is only available when both parties are in the same cloud region. This is also an AWS only option. How the solution works is given in the diagram here:

Documentation on how to achieve that is available here:

https://aws.amazon.com/blogs/database/access-amazon-rds-across-vpcs-using-aws-privatelink-and-network-load-balancer/

In your VPC containing the RDS database, it will create a Network Load Balancer and a VPC endpoint service. The Datafold VPC will access the RDS over a secure link established by its VPC endpoint connection in the Datafold account and connect to that VPC endpoint name instead.

AWS then routes traffic from the VPC endpoint over the PrivateLink to the Network Load Balancer at your side, which targets the RDS database.

The above article contains a yaml CloudFormation stack that reduces the engineering effort to set up that connection. Make sure to use the correct region when deploying the stack.

Please contact support@datafold.com to discuss further details.

Tips:

  • After the stack is successfully deployed, you should run the Lambda once using the "Test" button to update the IP in the target group.
  • By default, Lambdas do not have internet access. Make sure to deploy or redeploy the Lambda in a subnet with a NAT gateway.
  • The security group used by the RDS database may need to be updated to allow access from the new Network Load Balancer.
  • When creating the endpoint in the Datafold VPC, the user performing that operation must have permission to see the service endpoint, which is determined by the target ARN role and if the user can assume that role.

At the end, the DNS name of Datafold's service endpoint should be used to set up the datasource. Our support team will let you know the name of that connection.

VPC Peering

VPC Peering is easier to set up than PrivateLink, but a drawback is that both networks are joined and the IP ranges must not overlap. This setup is an AWS-only option.

The basics of VPC peering are covered here:

https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-basics.html

To set up VPC peering, please contact support@datafold.com and provide us with the following information:

  • AWS region where your database is hosted.
  • ID of the VPC that you would like to connect.
  • CIDR of the VPC.

If there are no address collisions, we'll send you a peering request and CIDR that we use on our end, and whitelist the CIDR range for your organization. You'll need to set up routing to this CIDR through the peering connection.

If you activate DNS on your side of the peering connection, you can use the private DNS hostname to connect. Otherwise, you need to use the IP.

SSH Tunnel

To set up a tunnel, please contact our team at support@datafold.com and provide the following information:

  • Hostname of your bastion host and port number used for SSH service.
  • Hostname of and port number of your database.
  • SSH fingerprint of the bastion host (optional).

We'll get back to you with:

  • SSH public key that you need to add to ~/.ssh/authorized_hosts.
  • IP address and port to use for data source configuration in the Datafold application.

Reverse SSH Tunnel

The reverse proxy design sets up a tunnel from a customer's private subnet to allow Datafold to establish connections with resources in the customer's private subnet. This design can be preferential to avoid exposure of those resources directly to the Internet, even if strict filtering rules would be applied on the firewalls or security groups. The reverse SSH tunnel also gives you slightly more control over which connections are established.

Please contact support@datafold.com for documentation specific to your cloud setup.

At the moment, this is an AWS-only solution. Datafold will send you two CloudFormation stacks that you can use to set up the SSH tunnel client VM.

IPSec tunnel

Please contact our team at support@datafold for more information.