Relaxed Functional Dependency Discovery in Heterogeneous Data Lakes
Functional dependencies are important for the definition of constraints and relationships that have to be satisfied by every database instance. Relaxed functional dependencies (RFDs) can be used for data exploration and profiling in datasets with lower data quality. In this work, we present an approach for RFD discovery in heterogeneous data lakes. More specifically, the goal of this work is to find RFDs from structured, semi-structured, and graph data. Our solution brings novelty to this problem in the following aspects: (1) We introduce a generic metamodel to the problem of RFD discovery, which allows us to define and detect RFDs for data stored in heterogeneous sources in an integrated manner. (2) We apply clustering techniques during RFD discovery for partitioning and pruning. (3) We performed an intensive evaluation with nine datasets, which shows that our approach is effective for discovering meaningful RFDs, reducing redundancy, and detecting inconsistent data.
The authors would like to thank the German Research Foundation DFG for the kind support within the Cluster of Excellence “Internet of Production” (Project ID: EXC 2023/390621612).
- 3.Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the VLDB, pp. 315–326 (2007)Google Scholar
- 4.Fassetti, F., Fazzinga, B.: Approximate functional dependencies for XML data. In: Proceedings of the ADBIS (2007)Google Scholar
- 5.Hai, R., Geisler, S., Quix, C.: Constance: an intelligent data lake system. In: Proceedings of the SIGMOD, pp. 2097–2100. ACM (2016)Google Scholar
- 13.Pelleg, D., Moore, A.W., et al.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the ICML, pp. 727–734 (2000)Google Scholar
- 16.Yao, H., Hamilton, H.J., Butz, C.J.: FD\(\_\)Mine: discovering functional dependencies in a database using equivalences. In: Proceedings of the ICDM, pp. 729–732 (2002)Google Scholar