Abstract
Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, algorithms need to be devised to discover foreign keys. One of the underlying problems is known to be the inclusion dependency (IND) inference problem. In this paper a new data mining algorithm for computing unary INDs is given. From unary INDs, we also propose a levelwise algorithmto discover all remaining INDs, where candidate INDs of size i + 1 are generated from satisfied INDs of size i, (i > 0). An implementation of these algorithms has been achieved and tested against synthetic databases. Up to our knowledge, this paper is the first one to address in a comprehensive manner this data mining problem, from algorithms to experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, International Conference on Very Large Data Bases, Santiago de Chile, Chile, pages 487–499. Morgan Kaufmann, 1994.
S. Bell and P. Brockhausen. Discovery of constraints and data dependencies in databases (extended abstract). In Nada Lavrac and Stefan Wrobel, editors, European Conference on Machine Learning, Crete, Greece, pages 267–270, 1995.
G. Vossen C. Fahrner. A survey of database design transformations based on the entity-relationship model. Data and Knowledge Engineering, 15(3):213–250, 1995.
M. Casanova, R. Fagin, and C. Papadimitriou. Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1):29–59, February 1984.
Qi Cheng, Jarek Gryz, Fred Koo, T. Y. Cliff Leung, Linqi Liu, Xiaoyan Qian, and Berni Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie, editors, International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, pages 687–698. Morgan Kaufmann, 1999.
C. J. Date. Referential integrity. In International Conference on Very Large Data Bases, Cannes, France, pages 2–12. IEEE Computer Society Press, 1981.
F. De Marchi, S. Lopes, and J-M. Petit. Informative armstrong relations: Application to database analysis. In Bases de Données Avancées, Agadir, Maroc, October 2001.
F. De Marchi, M. Rivon, S. Lopes, and J-M. Petit. Mind: Algorithme par niveaux de découverte des dépendances d’inclusion. In Inforsid 2001 (french information system conference), Martigny, Swiss, May 2001.
Jarek Gryz. Query folding with inclusion dependencies. In International Conference on Data Engineering, Orlando, Florida, USA, pages 126–133. IEEE Computer Society, 1998.
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, August 2000.
Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen. TANE: An efficient algorithmfor discovering functional and approximate dependencies. The Computer Journal, 42(2):100–111, 1999.
M. Kantola, H. Mannila, K. J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7:591–607, 1992.
M. Levene and G. Loizou. A Guided Tour of Relational Databases and Beyond. SPRINGER, 1999.
M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2):281–291, 2000.
S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In Carlo Zaniolo, Peter C. Lockemann, Marc H. Scholl, and Torsten Grust, editors, International Conference on Extending Database Technology, Konstanz, Germany, volume 1777 of Lecture Notes in Computer Science, pages 350–364. Springer, 2000.
S. Lopes, J-M. Petit, and F. Toumani. Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1):1–19, 2002.
H. Mannila and K. J. Räihä. The Design of Relational Databases. Addison-Wesley, second edition, 1994.
H. Mannila and H. Toivonen. Levelwise Search and Borders of Theories in Knowledge Discovery. Data Mining and Knowledge Discovery, 1(1):241–258, 1997.
V.M. Markowitz and J.A. Makowsky. Identifying Extended Entity-Relationship Object Structures in Relational Schemas. IEEE Transactions on Software Engineering, 16(1):777–790, August 1990.
R. J. Miller, M. A. Hernández, L. M. Haas, L. Yan, C. T. H. Ho, R. Fagin, and L. Popa. The clio project: Managing heterogeneity. SIGMOD Record, 30(1):78–83, 2001.
Noel Novelli and Rosine Cicchetti. Fun: An efficient algorithmfor mining functional and embedded dependencies. In Jan Van den Bussche and Victor Vianu, editors, 8th International Conference on Databases Theory, London, UK, volume 1973 of Lecture Notes in Computer Science, pages 189–203. Springer, 2001.
E. Pichat and R. Bodin. Ingénierie des données. Masson, 1790.
C. Wyss, C. Giannella, and E. Robertson. Fastfds: A heuristic-driven depth-first algorithmfor mining functional dependencies fromrelation instances. In Yahiko Kambayashi, Werner Winiwarter, and Masatoshi Arikawa, editors, Data Warehousing and Knowledge Discovery, Munich, Germany, volume 2114 of Lecture Notes in Computer Science, pages 101–110, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Marchi, F., Lopes, S., Petit, JM. (2002). Efficient Algorithms for Mining Inclusion Dependencies. In: Jensen, C.S., et al. Advances in Database Technology — EDBT 2002. EDBT 2002. Lecture Notes in Computer Science, vol 2287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45876-X_30
Download citation
DOI: https://doi.org/10.1007/3-540-45876-X_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43324-8
Online ISBN: 978-3-540-45876-0
eBook Packages: Springer Book Archive