INFOVPC deployments are an Enterprise feature. Please email sales@datafold.com to enable your account.
Create a Domain Name (optional)
You can either choose to use your domain (for example,datafold.domain.tld
) or to use a Datafold managed domain (for example, yourcompany.dedicated.datafold.com
).
Customer Managed Domain Name
Create a DNS A-record for the domain where Datafold will be hosted. For the DNS record, there are two options:- Public-facing: When the domain is publicly available, we will provide an SSL certificate for the endpoint.
- Internal: It is also possible to have Datafold disconnected from the internet. This would require an internal DNS (for example, Azure DNS) record that points to the Datafold instance. It is possible to provide your own certificate for setting up the SSL connection.
Create a New Subscription
For isolation reasons, it is best practice to create a new subscription within your Microsoft Entra directory/tenant. Please call it something likeyourcompany-datafold
to make it easy to identify.
Set IAM Permissions
Go to Microsoft Entra ID and navigate to Users. Click Add, User, Invite external user and add the Datafold engineers. Navigate to the subscription you just created and go to Access control (IAM) tab in the side bar.- Navigate to the subscription you just created. Go to Access control (IAM). Under Add select Add role assignment.
- Under Role, navigate to Priviledged administrator roles and select Owner.
- Under Members, click Select members and add the Datafold engineers.
- When you are done, select Review + assign.
Required APIs
The following Azure APIs need to be enabled to run Datafold:- Microsoft.ContainerService
- Microsoft.Network
- Microsoft.Compute
- Microsoft.KeyVault
- Microsoft.Storage
- Microsoft.DBforPostgreSQL
Datafold Azure infrastructure details
This document provides detailed information about the Azure infrastructure components deployed by the Datafold Terraform module, explaining the architectural decisions and operational considerations for each component.Managed disks
The Datafold application requires 3 managed disks for persistent storage, each deployed as encrypted Azure managed disks in the primary availability zone. This also means that pods cannot be deployed outside the availability zone of these disks, because the nodes wouldn’t be able to attach them. ClickHouse data disk serves as the analytical database storage for Datafold. ClickHouse is a columnar database that excels at analytical queries. The default 40GB allocation usually provides sufficient space for typical deployments, but it can be scaled up based on data volume requirements. The StandardSSD_LRS disk type with configurable IOPS and throughput ensures consistent performance for analytical workloads. ClickHouse logs disk stores ClickHouse’s internal logs and temporary data. The separate logs disk prevents log data from consuming IOPS and I/O performance from actual data storage. Redis data disk provides persistent storage for Redis, which handles task distribution and distributed locks in the Datafold application. Redis is memory-first but benefits from persistence for data durability across restarts. The 50GB default size accommodates typical caching needs while remaining cost-effective. All managed disks are encrypted by default using Azure-managed encryption keys, ensuring data security at rest. The disks are deployed in the first availability zone to minimize latency and simplify backup strategies. For Premium and Ultra SSD disk types, IOPS and throughput can be configured to optimize performance for specific workloads.Application Gateway
The Application Gateway serves as the primary entry point for all external traffic to the Datafold application. The module offers 2 deployment strategies, each with different operational characteristics and trade-offs. External Application Gateway Deployment (the default approach) creates an Azure Application Gateway through Terraform. This approach provides centralized control over load balancer configuration and integrates well with existing Azure infrastructure. The Application Gateway automatically handles SSL termination, health checks, and traffic distribution across Kubernetes pods. This method is ideal for organizations that prefer infrastructure-as-code management and want consistent load balancer configurations across environments. Kubernetes-Managed Application Gateway deployment setsdeploy_lb = false
and relies on the Azure Application Gateway
Ingress Controller (AGIC) running within the AKS cluster. This approach leverages Kubernetes-native load balancer management,
allowing for dynamic scaling and easier integration with Kubernetes ingress resources. The controller automatically provisions
and manages Application Gateways based on Kubernetes service definitions, which can be more flexible for applications that
need to scale load balancer resources dynamically.
Both Application Gateways apply the currently recommended and strictest SSL policies: AppGwSslPolicy20220101S
and security
settings.
The choice between these approaches often depends on operational preferences and existing infrastructure patterns. External
deployment provides more predictable resource management, while Kubernetes-managed deployment offers greater flexibility for
dynamic workloads.
Security A network security group shared between the Application Gateway and the AKS nodes allows traffic to reach only
the AKS nodes and nothing else. The Application Gateway allows traffic to land directly into the AKS private subnet.
Certificate The certificate can be pre-created by the customer and then attached, or a cloud-managed certificate can be
created on the fly. The application will not function without HTTPS, so a certificate is mandatory. After the certificate is
created either manually or through this repository, it must be validated by the DNS administrator by adding a CNAME record.
This puts the certificate in “Issued” state. The certificate cannot be found when it’s still provisioning.