Skip to main content

Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases

  • Conference paper
Constraint-Based Mining and Inductive Databases

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

Abstract

Given the theoretical framework of Mannila and Toivonen [26], we are interested in the discovery of the positive border of interesting patterns, also called the most specific interesting patterns. Many approaches have been proposed among which we quote the levelwise algorithm and the Dualize and Advance algorithm. In this paper, we propose an adaptive strategy – complementary to these two algorithms – based on four steps: 1) In order to initialize the discovery, eliciting some elements of the negative border, for instance using a levelwise strategy until a certain level k. 2) From the negative border found so far, inferring the optimistic positive border by dualization, i.e. the set of patterns whose all specializations are known to be not interesting patterns. 3) Estimating the distance between the positive border to be discovered and the optimistic positive border. 4) Based on these estimates, carrying out an adaptive search either bottom-up (the jump was too optimistic) or top-down (the solution should be very close).

We have instantiated this proposition to the problem of inclusion dependency (IND) discovery. IND is a generalization of the well known concept of foreign keys in databases and is very important in practice. We will first point out how the problem of IND discovery fits into the theoretical framework of [26]. Then, we will describe an instantiation of our adaptive strategy for IND discovery, called Zigzag, from which some experiments were conducted on synthetic databases. The underlying application of this work takes place in a project called DBA Companion devoted to the understanding of existing databases at the logical level using data mining techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Fondements des bases de données. Addison-Wesley, Reading (2000)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases, Santiago de Chile, Chile, pp. 487–499 (1994)

    Google Scholar 

  3. Albert-Lorincz, H., Boulicaut, J.-F.: Mining frequent sequential patterns under regular expressions: A highly adaptive strategy for pushing contraints. In: Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA. SIAM (2003)

    Google Scholar 

  4. Bailey, J., Manoukian, T., Ramamohanarao, K.: A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: IEEE International Conference on Data Mining (ICDM 2003), Floride, USA, pp. 485–488. IEEE Computer Society, Los Alamitos (November 2003)

    Google Scholar 

  5. Bayardo, R.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, June 2-4, pp. 85–93. ACM Press, New York (1998)

    Chapter  Google Scholar 

  6. Bell, S., Brockhausen, P.: Discovery of Data Dependencies in Relational Databases. Technical report, LS-8 Report 14, University of Dortmund, p. 18 (April 1995)

    Google Scholar 

  7. Berge, C.: Graphs and Hypergraphs, 2nd rev. edn. North-Holland Mathematical Library 6. American Elsevier, Amsterdam (1976)

    Google Scholar 

  8. Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Adaptive constraint pushing in frequent pattern mining. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 47–58. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Boros, E., Elbassioni, K., Gurvich, V., Khachiyan, L.: An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals and its application in joint generation. special issue of Discrete Applied Mathematics

    Google Scholar 

  10. Burdick, D., Calimlim, M., Gehrke, J.: Mafia: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th IEEE International Conference on Data Engineering, Heidelberg, Germany, pp. 443–452. IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

  11. De Marchi, F., Lopes, S., Petit, J.-M.: Efficient algorithms for mining inclusion dependencies. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 464–476. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  12. De Marchi, F., Lopes, S., Petit, J.-M., Toumani, F.: Analysis of existing databases at the logical level: the dba companion project. ACM Sigmod Record 32(1), 47–52 (2003)

    Article  Google Scholar 

  13. Demetrovics, J., Thi, V.: Some remarks on generating Armstrong and inferring functional dependencies relation. Acta Cybernetica 12(2), 167–180 (1995)

    MATH  MathSciNet  Google Scholar 

  14. Eiter, T., Gottlob, G., Makino, K.: New results on monotone dualization and generating hypergraph transversals. In: STOC 2002, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, Montreal, Quebec, Canada, June 2-4, 1998, pp. 14–22. ACM Press, New York (2002)

    Chapter  Google Scholar 

  15. Flouvat, F., Marchi, F.D., Petit, J.-M.: Abs: Adaptive borders search of frequent itemsets. In: FIMI 2004 (2004)

    Google Scholar 

  16. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining, pp. 163–170. ACM Press, New York (2001)

    Chapter  Google Scholar 

  17. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Transaction on Database Systems 28(2), 140–174 (2003)

    Article  Google Scholar 

  18. Kantola, M., Mannila, H., Räihä, K.-J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems 7, 591–607 (1992)

    Article  MATH  Google Scholar 

  19. Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoritical Computer Science 149(1), 129–149 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  20. Koeller, A., Rundensteiner, E.: Discovery of high-dimentional inclusion dependencies (poster). In: International Conference on Data Engineering (ICDE 2003). IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  21. Levene, M., Loizou, G.: A Guided Tour of Relational Databases and Beyond. Springer, Heidelberg (1999)

    Book  Google Scholar 

  22. Lin, D.-I., Kedem, Z.M.: Pincer search: A new algorithm for discovering the maximum frequent set. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 105–119. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  23. Lopes, S., De Marchi, F., Petit, J.-M.: DBA companion: A tool for logical database tuning (demo). In: 20th Proceedings of the IEEE International Conference on Data Engineering, Boston, USA, p. 859. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  24. Lopes, S., Petit, J.-M., Lakhal, L.: Functional and approximate dependencies mining: Databases and FCA point of view. Special issue of Journal of Experimental and Theoretical Artificial Intelligence 14(2/3), 93–114 (2002)

    Article  MATH  Google Scholar 

  25. Lopes, S., Petit, J.-M., Toumani, F.: Discovering interesting inclusion dependencies: Application to logical database tuning. Information Systems 17(1), 1–19 (2002)

    Article  Google Scholar 

  26. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  27. Mitchell, J.-C.: The implication problem for functional and inclusion dependencies. Information and Control 56(3), 154–173 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  28. Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and resource-aware mining of frequent sets. In: International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 338–345. IEEE Computer Society, Los Alamitos (2002)

    Chapter  Google Scholar 

  29. Rymon, R.: Search through systematic set enumeration. In: Nebel, B., Rich, C., Swartout, W.R. (eds.) International Conference on Principles of Knowledge Representation and Reasoning (KR 1992), Cambridge, USA, pp. 539–550. Morgan Kaufmann, San Francisco (1992)

    Google Scholar 

  30. Uno, T., Satoh, K.: Detailed description of an algorithm for enumeration of maximal frequent sets with irredundant dualization. In: Goethals, B., Zaki, M.J. (eds.) FIMI 2003, Frequent Itemset Mining Implementations, ICDM 2003 Workshop, Melbourne, Florida, USA. CEUR Workshop Proceedings, vol. 90 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

De Marchi, F., Flouvat, F., Petit, JM. (2006). Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_5

Download citation

  • DOI: https://doi.org/10.1007/11615576_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31331-1

  • Online ISBN: 978-3-540-31351-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics