Abstract
Inclusion dependencies within and across databases are an important relationship for many applications in anomaly detection, schema (re-)design, query optimization or data integration. When such dependencies are not available as explicit metadata, scalable and efficient algorithms have to discover them from a given data instance.
We introduce a new idea for clustering the attributes of database relations. Based on this idea we have developed S-indd, an efficient and scalable algorithm for discovering all unary inclusion dependencies in large datasets. S-indd is scalable both in the number of attributes and in the number of rows. We show that previous approaches reveal themselves as special cases of S-indd. We exhaustively evaluate S-indd’s scalability using many datasets with several thousands attributes and rows up to one million. The experiments show that S-indd is up to 11x faster than previous approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bauckmann, J.: Dependency Discovery for Data Integration. Ph.D. thesis, Hasso Plattner Institute at the University of Potsdam (2013). http://opus.kobv.de/ubp/volltexte/2013/6664/
Bauckmann, J., Leser, U., Naumann, F.: Efficiently computing inclusion dependencies for schema discovery. In: Proceedings of the International Workshop on Database Interoperability (InterDB) (2006)
Bell, S., Brockhausen, P.: Discovery of Data Dependencies in Relational Databases. Tech. rep., Universitat Dortmund (1995)
Casanova, M.A., Tucherman, L., Furtado, A.L.: Enforcing inclusion dependencies and referencial integrity. In: Proceedings of the 14th International Conference on Very Large Data Bases, VLDB 1988, pp. 38–49. Morgan Kaufmann Publishers Inc., San Francisco (1988). http://dl.acm.org/citation.cfm?id=645915.671795
Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 240–251 (2002). http://doi.acm.org/10.1145/564691.564719
Gryz, J.: Query folding with inclusion dependencies. In: In Proc. of the 14th IEEE Int. Conf. on Data Engineering (ICDE 1998), pp. 126–133 (1998)
Koeller, A., Rundensteiner, E.: Discovery of high-dimensional inclusion dependencies. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 683–685 (2003)
Levene, M., Vincent, M.W.: Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering 12 (2000)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledgediscovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997). http://dx.doi.org/10.1023/A:1009796218281
De Marchi, F., Flouvat, F., Petit, J.-M.: Adaptive strategies for mining the positive border of interesting patterns: application to inclusion dependencies in databases. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 81–101. Springer, Heidelberg (2006). http://www.dx.doi.org/10.1007/11615576_5
De Marchi, F., Lopes, S., Petit, J.-M.: Efficient algorithms for mining inclusion dependencies. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 464–476. Springer, Heidelberg (2002). http://www.dl.acm.org/citation.cfm?id=645340.650245
Marchi, F.D., Petit, J.M.: Zigzag: a new algorithm for mining large inclusion dependencies in databases. In: Proceedings of the Third IEEE International Conference on Data Mining, ICDM, pp. 27–34 (2003). http://dl.acm.org/citation.cfm?id=951949.952179
Marchi, F., Lopes, S., Petit, J.M.: Unary and n-ary inclusion dependency discovery in relational databases. Journal of Intelligent Information Systems 32(1), 53–73 (2009). http://dx.doi.org/10.1007/s10844-007-0048-x
Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., Leser, U.: A machine learning approach to foreign key discovery. In: Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB), Providence, RI (2009)
Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. Proc. VLDB Endow. 3(1–2), 805–814 (2010). http://dx.doi.org/10.14778/1920841.1920944
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Shaabani, N., Meinel, C. (2015). Scalable Inclusion Dependency Discovery. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-18120-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)