Efficient Algorithms for Mining Inclusion Dependencies

  • Fabien De Marchi
  • Stéphane Lopes
  • Jean-Marc Petit
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2287)

Abstract

Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, algorithms need to be devised to discover foreign keys. One of the underlying problems is known to be the inclusion dependency (IND) inference problem. In this paper a new data mining algorithm for computing unary INDs is given. From unary INDs, we also propose a levelwise algorithmto discover all remaining INDs, where candidate INDs of size i + 1 are generated from satisfied INDs of size i, (i > 0). An implementation of these algorithms has been achieved and tested against synthetic databases. Up to our knowledge, this paper is the first one to address in a comprehensive manner this data mining problem, from algorithms to experimental results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, International Conference on Very Large Data Bases, Santiago de Chile, Chile, pages 487–499. Morgan Kaufmann, 1994.Google Scholar
  2. 2.
    S. Bell and P. Brockhausen. Discovery of constraints and data dependencies in databases (extended abstract). In Nada Lavrac and Stefan Wrobel, editors, European Conference on Machine Learning, Crete, Greece, pages 267–270, 1995.Google Scholar
  3. 3.
    G. Vossen C. Fahrner. A survey of database design transformations based on the entity-relationship model. Data and Knowledge Engineering, 15(3):213–250, 1995.MATHCrossRefGoogle Scholar
  4. 4.
    M. Casanova, R. Fagin, and C. Papadimitriou. Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1):29–59, February 1984.CrossRefMathSciNetGoogle Scholar
  5. 5.
    Qi Cheng, Jarek Gryz, Fred Koo, T. Y. Cliff Leung, Linqi Liu, Xiaoyan Qian, and Berni Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, pages 687–698. Morgan Kaufmann, 1999.Google Scholar
  6. 6.
    C. J. Date. Referential integrity. In International Conference on Very Large Data Bases, Cannes, France, pages 2–12. IEEE Computer Society Press, 1981.Google Scholar
  7. 7.
    F. De Marchi, S. Lopes, and J-M. Petit. Informative armstrong relations: Application to database analysis. In Bases de Données Avancées, Agadir, Maroc, October 2001.Google Scholar
  8. 8.
    F. De Marchi, M. Rivon, S. Lopes, and J-M. Petit. Mind: Algorithme par niveaux de découverte des dépendances d’inclusion. In Inforsid 2001 (french information system conference), Martigny, Swiss, May 2001.Google Scholar
  9. 9.
    Jarek Gryz. Query folding with inclusion dependencies. In International Conference on Data Engineering, Orlando, Florida, USA, pages 126–133. IEEE Computer Society, 1998.Google Scholar
  10. 10.
    J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, August 2000.Google Scholar
  11. 11.
    Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. TANE: An efficient algorithmfor discovering functional and approximate dependencies. The Computer Journal, 42(2):100–111, 1999.MATHCrossRefGoogle Scholar
  12. 12.
    M. Kantola, H. Mannila, K. J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7:591–607, 1992.MATHCrossRefGoogle Scholar
  13. 13.
    M. Levene and G. Loizou. A Guided Tour of Relational Databases and Beyond. SPRINGER, 1999.Google Scholar
  14. 14.
    M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2):281–291, 2000.CrossRefGoogle Scholar
  15. 15.
    S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In Carlo Zaniolo, Peter C. Lockemann, Marc H. Scholl, and Torsten Grust, editors, International Conference on Extending Database Technology, Konstanz, Germany, volume 1777 of Lecture Notes in Computer Science, pages 350–364. Springer, 2000.Google Scholar
  16. 16.
    S. Lopes, J-M. Petit, and F. Toumani. Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1):1–19, 2002.CrossRefGoogle Scholar
  17. 17.
    H. Mannila and K. J. Räihä. The Design of Relational Databases. Addison-Wesley, second edition, 1994.Google Scholar
  18. 18.
    H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(1):241–258, 1997.CrossRefGoogle Scholar
  19. 19.
    V.M. Markowitz and J.A. Makowsky. Identifying Extended Entity-Relationship Object Structures in Relational Schemas. IEEE Transactions on Software Engineering, 16(1):777–790, August 1990.CrossRefGoogle Scholar
  20. 20.
    R. J. Miller, M. A. Hernández, L. M. Haas, L. Yan, C. T. H. Ho, R. Fagin, and L. Popa. The clio project: Managing heterogeneity. SIGMOD Record, 30(1):78–83, 2001.CrossRefGoogle Scholar
  21. 21.
    Noel Novelli and Rosine Cicchetti. Fun: An efficient algorithmfor mining functional and embedded dependencies. In Jan Van den Bussche and Victor Vianu, editors, 8th International Conference on Databases Theory, London, UK, volume 1973 of Lecture Notes in Computer Science, pages 189–203. Springer, 2001.Google Scholar
  22. 22.
    E. Pichat and R. Bodin. Ingénierie des données. Masson, 1790.Google Scholar
  23. 23.
    C. Wyss, C. Giannella, and E. Robertson. Fastfds: A heuristic-driven depth-first algorithmfor mining functional dependencies fromrelation instances. In Yahiko Kambayashi, Werner Winiwarter, and Masatoshi Arikawa, editors, Data Warehousing and Knowledge Discovery, Munich, Germany, volume 2114 of Lecture Notes in Computer Science, pages 101–110, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Fabien De Marchi
    • 1
  • Stéphane Lopes
    • 2
  • Jean-Marc Petit
    • 1
  1. 1.Laboratoire LIMOSUniversité Blaise Pascal - Clermont-Ferrand IIFrance
  2. 2.Laboratoire PRISMFrance

Personalised recommendations