To compare datasets between two different databases, Datafold leverages a proprietary stochastic checksumming algorithm that allows it to identify discrepancies down to individual primary keys and column values while minimizing the amount of data sent over the network. As a result, the comparison is mostly performed in-place, leveraging the underlying databases without the need to export the entire dataset to compare elsewhere.