Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases

De Marchi, Fabien; Flouvat, Frédéric; Petit, Jean-Marc

doi:10.1007/11615576_5

Fabien De Marchi²¹,
Frédéric Flouvat²² &
Jean-Marc Petit²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

306 Accesses
7 Citations

Abstract

Given the theoretical framework of Mannila and Toivonen [26], we are interested in the discovery of the positive border of interesting patterns, also called the most specific interesting patterns. Many approaches have been proposed among which we quote the levelwise algorithm and the Dualize and Advance algorithm. In this paper, we propose an adaptive strategy – complementary to these two algorithms – based on four steps: 1) In order to initialize the discovery, eliciting some elements of the negative border, for instance using a levelwise strategy until a certain level k. 2) From the negative border found so far, inferring the optimistic positive border by dualization, i.e. the set of patterns whose all specializations are known to be not interesting patterns. 3) Estimating the distance between the positive border to be discovered and the optimistic positive border. 4) Based on these estimates, carrying out an adaptive search either bottom-up (the jump was too optimistic) or top-down (the solution should be very close).

We have instantiated this proposition to the problem of inclusion dependency (IND) discovery. IND is a generalization of the well known concept of foreign keys in databases and is very important in practice. We will first point out how the problem of IND discovery fits into the theoretical framework of [26]. Then, we will describe an instantiation of our adaptive strategy for IND discovery, called Zigzag, from which some experiments were conducted on synthetic databases. The underlying application of this work takes place in a project called DBA Companion devoted to the understanding of existing databases at the logical level using data mining techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abiteboul, S., Hull, R., Vianu, V.: Fondements des bases de données. Addison-Wesley, Reading (2000)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases, Santiago de Chile, Chile, pp. 487–499 (1994)
Google Scholar
Albert-Lorincz, H., Boulicaut, J.-F.: Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In: Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA. SIAM (2003)
Google Scholar
Bailey, J., Manoukian, T., Ramamohanarao, K.: A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: IEEE International Conference on Data Mining (ICDM 2003), Floride, USA, pp. 485–488. IEEE Computer Society, Los Alamitos (November 2003)
Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, June 2-4, pp. 85–93. ACM Press, New York (1998)
Chapter Google Scholar
Bell, S., Brockhausen, P.: Discovery of Data Dependencies in Relational Databases. Technical report, LS-8 Report 14, University of Dortmund, p. 18 (April 1995)
Google Scholar
Berge, C.: Graphs and Hypergraphs, 2nd rev. edn. North-Holland Mathematical Library 6. American Elsevier, Amsterdam (1976)
Google Scholar
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Adaptive constraint pushing in frequent pattern mining. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 47–58. Springer, Heidelberg (2003)
Chapter Google Scholar
Boros, E., Elbassioni, K., Gurvich, V., Khachiyan, L.: An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals and its application in joint generation. special issue of Discrete Applied Mathematics
Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th IEEE International Conference on Data Engineering, Heidelberg, Germany, pp. 443–452. IEEE Computer Society, Los Alamitos (2001)
Google Scholar
De Marchi, F., Lopes, S., Petit, J.-M.: Efficient algorithms for mining inclusion dependencies. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 464–476. Springer, Heidelberg (2002)
Chapter Google Scholar
De Marchi, F., Lopes, S., Petit, J.-M., Toumani, F.: Analysis of existing databases at the logical level: the dba companion project. ACM Sigmod Record 32(1), 47–52 (2003)
Article Google Scholar
Demetrovics, J., Thi, V.: Some remarks on generating Armstrong and inferring functional dependencies relation. Acta Cybernetica 12(2), 167–180 (1995)
MATH MathSciNet Google Scholar
Eiter, T., Gottlob, G., Makino, K.: New results on monotone dualization and generating hypergraph transversals. In: STOC 2002, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, Montreal, Quebec, Canada, June 2-4, 1998, pp. 14–22. ACM Press, New York (2002)
Chapter Google Scholar
Flouvat, F., Marchi, F.D., Petit, J.-M.: Abs: Adaptive borders search of frequent itemsets. In: FIMI 2004 (2004)
Google Scholar
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining, pp. 163–170. ACM Press, New York (2001)
Chapter Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transaction on Database Systems 28(2), 140–174 (2003)
Article Google Scholar
Kantola, M., Mannila, H., Räihä, K.-J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems 7, 591–607 (1992)
Article MATH Google Scholar
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoritical Computer Science 149(1), 129–149 (1995)
Article MATH MathSciNet Google Scholar
Koeller, A., Rundensteiner, E.: Discovery of high-dimentional inclusion dependencies (poster). In: International Conference on Data Engineering (ICDE 2003). IEEE Computer Society, Los Alamitos (2003)
Google Scholar
Levene, M., Loizou, G.: A Guided Tour of Relational Databases and Beyond. Springer, Heidelberg (1999)
Book Google Scholar
Lin, D.-I., Kedem, Z.M.: Pincer search: A new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 105–119. Springer, Heidelberg (1998)
Chapter Google Scholar
Lopes, S., De Marchi, F., Petit, J.-M.: DBA companion: A tool for logical database tuning (demo). In: 20th Proceedings of the IEEE International Conference on Data Engineering, Boston, USA, p. 859. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
Lopes, S., Petit, J.-M., Lakhal, L.: Functional and approximate dependencies mining: Databases and FCA point of view. Special issue of Journal of Experimental and Theoretical Artificial Intelligence 14(2/3), 93–114 (2002)
Article MATH Google Scholar
Lopes, S., Petit, J.-M., Toumani, F.: Discovering interesting inclusion dependencies: Application to logical database tuning. Information Systems 17(1), 1–19 (2002)
Article Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Article Google Scholar
Mitchell, J.-C.: The implication problem for functional and inclusion dependencies. Information and Control 56(3), 154–173 (1983)
Article MATH MathSciNet Google Scholar
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and resource-aware mining of frequent sets. In: International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 338–345. IEEE Computer Society, Los Alamitos (2002)
Chapter Google Scholar
Rymon, R.: Search through systematic set enumeration. In: Nebel, B., Rich, C., Swartout, W.R. (eds.) International Conference on Principles of Knowledge Representation and Reasoning (KR 1992), Cambridge, USA, pp. 539–550. Morgan Kaufmann, San Francisco (1992)
Google Scholar
Uno, T., Satoh, K.: Detailed description of an algorithm for enumeration of maximal frequent sets with irredundant dualization. In: Goethals, B., Zaki, M.J. (eds.) FIMI 2003, Frequent Itemset Mining Implementations, ICDM 2003 Workshop, Melbourne, Florida, USA. CEUR Workshop Proceedings, vol. 90 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

LIRIS, UMR CNRS 5205, Université Lyon 1, 69622, Villeurbanne, France
Fabien De Marchi
LIMOS, UMR CNRS 6158, Université Clermont-Ferrand II, 63177, Aubière, France
Frédéric Flouvat & Jean-Marc Petit

Authors

Fabien De Marchi
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Flouvat
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Petit
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt
HIIT, Helsinki University of Technology and, University of Helsinki, Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Marchi, F., Flouvat, F., Petit, JM. (2006). Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_5

Download citation

DOI: https://doi.org/10.1007/11615576_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics