Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Bisociative Knowledge Discovery pp 104–121Cite as

  1. Home
  2. Bisociative Knowledge Discovery
  3. Chapter
Cover Similarity Based Item Set Mining

Cover Similarity Based Item Set Mining

  • Marc Segond5 &
  • Christian Borgelt5 
  • Chapter
  • Open Access
  • 9063 Accesses

  • 3 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7250)

Abstract

In standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions. We, instead, strive to find item sets for which the similarity of the covers of the items (that is, the sets of transactions containing the items) exceeds a user-defined threshold. This approach yields a much better assessment of the association strength of the items, because it takes additional information about their occurrences into account. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. In addition, standard frequent item set mining turns out to be a special case of this flexible framework. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.

Download chapter PDF

References

  1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. 20th Int. Conf. on Very Large Databases, VLDB 1994, Santiago de Chile, pp. 487–499. Morgan Kaufmann, San Mateo (1994)

    Google Scholar 

  2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast Discovery of Association Rules. In: [13], pp. 307–328

    Google Scholar 

  3. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science, University of California at Irvine, CA, USA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  4. Baroni-Urbani, C., Buser, M.W.: Similarity of Binary Data. Systematic Zoology 25(3), 251–259 (1976)

    CrossRef  Google Scholar 

  5. Bayardo, R., Goethals, B., Zaki, M.J. (eds.): Proc. Workshop Frequent Item Set Mining Implementations, FIMI 2004, Brighton, UK, Aachen, Germany. CEUR Workshop Proceedings, vol. 126 (2004), http://www.ceur-ws.org/Vol-126/

  6. Borgelt, C., Wang, X.: SaM: A Split and Merge Algorithm for Fuzzy Frequent Item Set Mining. In: Proc.13th Int. Fuzzy Systems Association World Congress and 6th Conf. of the European Society for Fuzzy Logic and Technology, IFSA/EUSFLAT 2009. IFSA/EUSFLAT Organization Committee, Lisbon (2009)

    Google Scholar 

  7. Cha, S.-H., Tappert, C.C., Yoon, S.: Enhancing Binary Feature Vector Similarity Measures. J. Pattern Recognition Research 1, 63–77 (2006)

    CrossRef  Google Scholar 

  8. Choi, S.-S., Cha, S.-H., Tappert, C.C.: A Survey of Binary Similarity and Distance Measures. Journal of Systemics, Cybernetics and Informatics 8(1), 43–48 (2010); Int. Inst. of Informatics and Systemics, Caracas, Venezuela (2010)

    Google Scholar 

  9. Czekanowski, J.: Zarys metod statystycznych w zastosowaniu do antropologii (An Outline of Statistical Methods Applied in Anthropology). Towarzystwo Naukowe Warszawskie, Warsaw, Poland (1913)

    Google Scholar 

  10. Dice, L.R.: Measures of the Amount of Ecologic Association between Species. Ecology 26, 297–302 (1945)

    CrossRef  Google Scholar 

  11. Dunn, G., Everitt, B.S.: An Introduction to Mathematical Taxonomy. Cambridge University Press, Cambirdge (1982)

    MATH  Google Scholar 

  12. Faith, D.P.: Asymmetric Binary Similarity Measures. Oecologia 57(3), 287–290 (1983)

    CrossRef  Google Scholar 

  13. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press / MIT Press, Cambridge (1996)

    Google Scholar 

  14. Goethals, B. (ed.): Frequent Item Set Mining Dataset Repository. University of Helsinki, Finland (2004), http://fimi.cs.helsinki.fi/data/

  15. Goethals, B., Zaki, M.J. (eds.): Proc. Workshop Frequent Item Set Mining Implementations, FIMI 2003, Melbourne, FL, USA. CEUR Workshop Proceedings 90, Aachen, Germany (2003), http://www.ceur-ws.org/Vol-90/

  16. Grahne, G., Zhu, J.: Efficiently Using Prefix-trees in Mining Frequent Itemsets. In: Proc. Workshop Frequent Item Set Mining Implementations, FIMI, Melbourne, FL [15] (2003)

    Google Scholar 

  17. Grahne, G., Zhu, J.: Reducing the Main Memory Consumptions of FPmax* and FPclose. In: Proc. Workshop Frequent Item Set Mining Implementations, FIMI, Brighton, UK [5] (2004)

    Google Scholar 

  18. Gower, J.C., Legendre, P.: Metric and Euclidean Properties of Dissimilarity Coefficients. Journal of Classification 3, 5–48 (1986)

    CrossRef  MathSciNet  Google Scholar 

  19. Han, J., Pei, H., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. Conf. on the Management of Data, SIGMOD 2000, Dallas, TX, pp. 1–12. ACM Press, New York (2000)

    Google Scholar 

  20. Hamann, V.: Merkmalbestand und Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum System der Monokotyledonen. Willdenowia 2, 639–768 (1961)

    Google Scholar 

  21. Hamming, R.V.: Error Detecting and Error Correcting Codes. Bell Systems Tech. Journal 29, 147–160 (1950)

    CrossRef  MathSciNet  Google Scholar 

  22. Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579 (1901)

    Google Scholar 

  23. Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Exploration 2(2), 86–93 (2000)

    CrossRef  Google Scholar 

  24. Kötter, T., Berthold, M.R.: Concept Detection. In: Proc. 8th Conf. on Computing and Philosophy, ECAP 2010. University of Munich, Munich (2010)

    Google Scholar 

  25. Kulczynski, S.: Classe des Sciences Mathématiques et Naturelles. Bulletin Int. de l’Acadamie Polonaise des Sciences et des Lettres Série B (Sciences Naturelles) (Supp. II), 57–203 (1927)

    Google Scholar 

  26. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    CrossRef  Google Scholar 

  27. Rogers, D.J., Tanimoto, T.T.: A Computer Program for Classifying Plants. Science 132, 1115–1118 (1960)

    CrossRef  Google Scholar 

  28. Russel, P.F., Rao, T.R.: On Habitat and Association of Species of Anopheline Larvae in South-eastern Madras. J. Malaria Institute 3, 153–178 (1940)

    Google Scholar 

  29. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Freeman Books, San Francisco (1973)

    MATH  Google Scholar 

  30. Sokal, R.R., Michener, C.D.: A Statistical Method for Evaluating Systematic Relationships. University of Kansas Scientific Bulletin 38, 1409–1438 (1958)

    Google Scholar 

  31. Sokal, R.R., Sneath, P.H.A.: Principles of Numerical Taxonomy. Freeman Books, San Francisco (1963)

    MATH  Google Scholar 

  32. Sørensen, T.: A Method of Establishing Groups of Equal Amplitude in Plant Sociology based on Similarity of Species and its Application to Analyses of the Vegetation on Danish Commons. Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab 5(4), 1–34 (1948)

    Google Scholar 

  33. Tanimoto, T.T.: IBM Internal Report, November 17 (1957)

    Google Scholar 

  34. Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. Proc. Workshop Frequent Item Set Mining Implementations, FIMI 2004, Brighton, UK. CEUR Workshop Proceedings 126, Aachen, Germany (2004)

    Google Scholar 

  35. Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining. Proc. 1st Open Source Data Mining on Frequent Pattern Mining Implementations, OSDM 2005, Chicago, IL, pp. 77–86. ACM Press, New York (2005)

    Google Scholar 

  36. Webb, G.I., Zhang, S.: k-Optimal-Rule-Discovery. Data Mining and Knowledge Discovery 10(1), 39–79 (2005)

    CrossRef  MathSciNet  Google Scholar 

  37. Webb, G.I.: Discovering Significant Patterns. Machine Learning 68(1), 1–33 (2007)

    CrossRef  Google Scholar 

  38. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. 3rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 1997, Newport Beach, CA, pp. 283–296. AAAI Press, Menlo Park (1997)

    Google Scholar 

  39. Zaki, M.J., Gouda, K.: Fast Vertical Mining Using Diffsets. In: Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2003, Washington, DC, pp. 326–335. ACM Press, New York (2003)

    Google Scholar 

  40. Zheng, Z., Kohavi, R., Mason, L.: Real World Performance of Association Rule Algorithms. In: Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2001, San Francisco, CA, ACM Press, New York (2001)

    Google Scholar 

  41. Synthetic Data Generation Code for Associations and Sequential Patterns. Intelligent Information Systems, IBM Almaden Research Center, http://www.almaden.ibm.com/software/quest/Resources/index.shtml

Download references

Author information

Authors and Affiliations

  1. European Centre for Soft Computing, Calle Gonzalo Gutiérrez Quirós s/n, E-33600, Mieres, Asturias, Spain

    Marc Segond & Christian Borgelt

Authors
  1. Marc Segond
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Christian Borgelt
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Department of Computer and Information Science, University of Konstanz, Konstanz, Germany

    Michael R. Berthold

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2012 The Author(s)

About this chapter

Cite this chapter

Segond, M., Borgelt, C. (2012). Cover Similarity Based Item Set Mining. In: Berthold, M.R. (eds) Bisociative Knowledge Discovery. Lecture Notes in Computer Science(), vol 7250. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31830-6_8

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-31830-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31829-0

  • Online ISBN: 978-3-642-31830-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

Not affiliated

Springer Nature

© 2023 Springer Nature