Scalable Inclusion Dependency Discovery

Shaabani, Nuhad; Meinel, Christoph

doi:10.1007/978-3-319-18120-2_25

Nuhad Shaabani¹⁷ &
Christoph Meinel¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9049))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1934 Accesses
5 Citations

Abstract

Inclusion dependencies within and across databases are an important relationship for many applications in anomaly detection, schema (re-)design, query optimization or data integration. When such dependencies are not available as explicit metadata, scalable and efficient algorithms have to discover them from a given data instance.

We introduce a new idea for clustering the attributes of database relations. Based on this idea we have developed S-indd, an efficient and scalable algorithm for discovering all unary inclusion dependencies in large datasets. S-indd is scalable both in the number of attributes and in the number of rows. We show that previous approaches reveal themselves as special cases of S-indd. We exhaustively evaluate S-indd’s scalability using many datasets with several thousands attributes and rows up to one million. The experiments show that S-indd is up to 11x faster than previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bauckmann, J.: Dependency Discovery for Data Integration. Ph.D. thesis, Hasso Plattner Institute at the University of Potsdam (2013). http://opus.kobv.de/ubp/volltexte/2013/6664/
Bauckmann, J., Leser, U., Naumann, F.: Efficiently computing inclusion dependencies for schema discovery. In: Proceedings of the International Workshop on Database Interoperability (InterDB) (2006)
Google Scholar
Bell, S., Brockhausen, P.: Discovery of Data Dependencies in Relational Databases. Tech. rep., Universitat Dortmund (1995)
Google Scholar
Casanova, M.A., Tucherman, L., Furtado, A.L.: Enforcing inclusion dependencies and referencial integrity. In: Proceedings of the 14th International Conference on Very Large Data Bases, VLDB 1988, pp. 38–49. Morgan Kaufmann Publishers Inc., San Francisco (1988). http://dl.acm.org/citation.cfm?id=645915.671795
Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 240–251 (2002). http://doi.acm.org/10.1145/564691.564719
Gryz, J.: Query folding with inclusion dependencies. In: In Proc. of the 14th IEEE Int. Conf. on Data Engineering (ICDE 1998), pp. 126–133 (1998)
Google Scholar
Koeller, A., Rundensteiner, E.: Discovery of high-dimensional inclusion dependencies. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 683–685 (2003)
Google Scholar
Levene, M., Vincent, M.W.: Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering 12 (2000)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledgediscovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997). http://dx.doi.org/10.1023/A:1009796218281
Article Google Scholar
De Marchi, F., Flouvat, F., Petit, J.-M.: Adaptive strategies for mining the positive border of interesting patterns: application to inclusion dependencies in databases. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 81–101. Springer, Heidelberg (2006). http://www.dx.doi.org/10.1007/11615576_5
Chapter Google Scholar
De Marchi, F., Lopes, S., Petit, J.-M.: Efficient algorithms for mining inclusion dependencies. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 464–476. Springer, Heidelberg (2002). http://www.dl.acm.org/citation.cfm?id=645340.650245
Chapter Google Scholar
Marchi, F.D., Petit, J.M.: Zigzag: a new algorithm for mining large inclusion dependencies in databases. In: Proceedings of the Third IEEE International Conference on Data Mining, ICDM, pp. 27–34 (2003). http://dl.acm.org/citation.cfm?id=951949.952179
Marchi, F., Lopes, S., Petit, J.M.: Unary and n-ary inclusion dependency discovery in relational databases. Journal of Intelligent Information Systems 32(1), 53–73 (2009). http://dx.doi.org/10.1007/s10844-007-0048-x
Article Google Scholar
Rostin, A., Albrecht, O., Bauckmann, J., Naumann, F., Leser, U.: A machine learning approach to foreign key discovery. In: Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB), Providence, RI (2009)
Google Scholar
Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. Proc. VLDB Endow. 3(1–2), 805–814 (2010). http://dx.doi.org/10.14778/1920841.1920944
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hasso-Plattner-Institut, University of Potsdam, Prof.-Dr.-Helmert-Str. 2-3, 14482, Potsdam, Germany
Nuhad Shaabani & Christoph Meinel

Authors

Nuhad Shaabani
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuhad Shaabani .

Editor information

Editors and Affiliations

Universität München, München, Germany
Matthias Renz
University of Southern California, Los Angeles, USA
Cyrus Shahabi
University of Queensland, Brisbane, Australia
Xiaofang Zhou
Monash University, Clayton, Australia
Muhammad Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shaabani, N., Meinel, C. (2015). Scalable Inclusion Dependency Discovery. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9049. Springer, Cham. https://doi.org/10.1007/978-3-319-18120-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-18120-2_25
Published: 09 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18119-6
Online ISBN: 978-3-319-18120-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics