Our MongoDB integration is still in beta. Some features, such as column-level lineage, are not yet supported. Please contact us if you need assistance.
Steps to complete:
  1. Configure user in MongoDB
  2. Configure your data connection in Datafold
  3. Diff your data

Configure user in MongoDB

To connect to MongoDB, create a user with read-only access to all databases you plan to diff.

Configure in Datafold

Field NameDescription
Connection NameThe name you’d like to assign to this connection in Datafold
HostThe hostname for your MongoDB instance
PortMongoDB endpoint port (default value is 27017)
User IDUser ID (e.g. DATAFOLD)
PasswordPassword for the user provided above
DatabaseDatabase to connect to
Authentication DatabaseDatabase name associated with the user credentials (e.g. main)
Click Create. Your data connection is now ready!

Diff your data

Write your MongoDB query MongoDB works a bit differently from our other integrations. Under the hood, we flatten your collections into datasets you can query with SQL. Here’s how to diff your MongoDB data:
  1. Create a new data diff
  2. Select your MongoDB data connection
  3. Select Query diff (Table diffs aren’t supported at this time)
  4. Write a SQL query against the flattened dataset, including a PRAGMA statement with the collection name on the first line. Here’s an example:
    PRAGMA mongodb_collections('tracks_v1_1m');
    
    SELECT point_id,
        device_id,
        timestamp,
        location.longitude as longitude,
        location.latitude as latitude
    FROM mongo_tracks_v1_1m
    WHERE point_id < 100000;
    
  5. Configure the rest of your diff and run it!