Skip to main content

Efficient Algorithms for Mining Inclusion Dependencies

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT 2002 (EDBT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2287))

Included in the following conference series:

Abstract

Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, algorithms need to be devised to discover foreign keys. One of the underlying problems is known to be the inclusion dependency (IND) inference problem. In this paper a new data mining algorithm for computing unary INDs is given. From unary INDs, we also propose a levelwise algorithmto discover all remaining INDs, where candidate INDs of size i + 1 are generated from satisfied INDs of size i, (i > 0). An implementation of these algorithms has been achieved and tested against synthetic databases. Up to our knowledge, this paper is the first one to address in a comprehensive manner this data mining problem, from algorithms to experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, International Conference on Very Large Data Bases, Santiago de Chile, Chile, pages 487–499. Morgan Kaufmann, 1994.

    Google Scholar 

  2. S. Bell and P. Brockhausen. Discovery of constraints and data dependencies in databases (extended abstract). In Nada Lavrac and Stefan Wrobel, editors, European Conference on Machine Learning, Crete, Greece, pages 267–270, 1995.

    Google Scholar 

  3. G. Vossen C. Fahrner. A survey of database design transformations based on the entity-relationship model. Data and Knowledge Engineering, 15(3):213–250, 1995.

    Article  MATH  Google Scholar 

  4. M. Casanova, R. Fagin, and C. Papadimitriou. Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1):29–59, February 1984.

    Article  MathSciNet  Google Scholar 

  5. Qi Cheng, Jarek Gryz, Fred Koo, T. Y. Cliff Leung, Linqi Liu, Xiaoyan Qian, and Berni Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, pages 687–698. Morgan Kaufmann, 1999.

    Google Scholar 

  6. C. J. Date. Referential integrity. In International Conference on Very Large Data Bases, Cannes, France, pages 2–12. IEEE Computer Society Press, 1981.

    Google Scholar 

  7. F. De Marchi, S. Lopes, and J-M. Petit. Informative armstrong relations: Application to database analysis. In Bases de Données Avancées, Agadir, Maroc, October 2001.

    Google Scholar 

  8. F. De Marchi, M. Rivon, S. Lopes, and J-M. Petit. Mind: Algorithme par niveaux de découverte des dépendances d’inclusion. In Inforsid 2001 (french information system conference), Martigny, Swiss, May 2001.

    Google Scholar 

  9. Jarek Gryz. Query folding with inclusion dependencies. In International Conference on Data Engineering, Orlando, Florida, USA, pages 126–133. IEEE Computer Society, 1998.

    Google Scholar 

  10. J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, August 2000.

    Google Scholar 

  11. Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. TANE: An efficient algorithmfor discovering functional and approximate dependencies. The Computer Journal, 42(2):100–111, 1999.

    Article  MATH  Google Scholar 

  12. M. Kantola, H. Mannila, K. J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7:591–607, 1992.

    Article  MATH  Google Scholar 

  13. M. Levene and G. Loizou. A Guided Tour of Relational Databases and Beyond. SPRINGER, 1999.

    Google Scholar 

  14. M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2):281–291, 2000.

    Article  Google Scholar 

  15. S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In Carlo Zaniolo, Peter C. Lockemann, Marc H. Scholl, and Torsten Grust, editors, International Conference on Extending Database Technology, Konstanz, Germany, volume 1777 of Lecture Notes in Computer Science, pages 350–364. Springer, 2000.

    Google Scholar 

  16. S. Lopes, J-M. Petit, and F. Toumani. Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1):1–19, 2002.

    Article  Google Scholar 

  17. H. Mannila and K. J. Räihä. The Design of Relational Databases. Addison-Wesley, second edition, 1994.

    Google Scholar 

  18. H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(1):241–258, 1997.

    Article  Google Scholar 

  19. V.M. Markowitz and J.A. Makowsky. Identifying Extended Entity-Relationship Object Structures in Relational Schemas. IEEE Transactions on Software Engineering, 16(1):777–790, August 1990.

    Article  Google Scholar 

  20. R. J. Miller, M. A. Hernández, L. M. Haas, L. Yan, C. T. H. Ho, R. Fagin, and L. Popa. The clio project: Managing heterogeneity. SIGMOD Record, 30(1):78–83, 2001.

    Article  Google Scholar 

  21. Noel Novelli and Rosine Cicchetti. Fun: An efficient algorithmfor mining functional and embedded dependencies. In Jan Van den Bussche and Victor Vianu, editors, 8th International Conference on Databases Theory, London, UK, volume 1973 of Lecture Notes in Computer Science, pages 189–203. Springer, 2001.

    Google Scholar 

  22. E. Pichat and R. Bodin. Ingénierie des données. Masson, 1790.

    Google Scholar 

  23. C. Wyss, C. Giannella, and E. Robertson. Fastfds: A heuristic-driven depth-first algorithmfor mining functional dependencies fromrelation instances. In Yahiko Kambayashi, Werner Winiwarter, and Masatoshi Arikawa, editors, Data Warehousing and Knowledge Discovery, Munich, Germany, volume 2114 of Lecture Notes in Computer Science, pages 101–110, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

De Marchi, F., Lopes, S., Petit, JM. (2002). Efficient Algorithms for Mining Inclusion Dependencies. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_30

Download citation

  • DOI: https://doi.org/10.1007/3-540-45876-X_30

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43324-8

  • Online ISBN: 978-3-540-45876-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics