Skip to main content

Significant Frequent Item Sets Via Pattern Spectrum Filtering

  • Chapter
  • First Online:
  • 765 Accesses

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 335))

Abstract

Frequent item set mining often suffers from the grave problem that the number of frequent item sets can be huge, even if they are restricted to closed or maximal item sets: in some cases the size of the output can even exceed the size of the transaction database to analyze. In order to overcome this problem, several approaches have been suggested that try to reduce the output by statistical assessments so that only significant frequent item sets (or association rules derived from them) are reported. In this paper we propose a new method along these lines, which combines data randomization with so-called pattern spectrum filtering, as it has been developed for neural spike train analysis. The former serves the purpose to implicitly represent the null hypothesis of independent items, while the latter helps to cope with the multiple testing problem resulting from a statistical evaluation of found patterns.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abdi, H.: Bonferroni and S̆idák corrections for multiple comparisons. In: Salkind, N.J. (ed.) Encyclopedia of Measurement and Statistics, pp. 103–107. Sage Publications, Thousand Oaks (2007)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB 1994, Santiago de Chile), pp. 487–499. Morgan Kaufmann, San Mateo (1994)

    Google Scholar 

  3. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 57(1), 289–300. Blackwell, Oxford, United Kingdom (1995)

    Google Scholar 

  4. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California at Irvine, CA (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. Bonferroni, C.E.: Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni, pp. 13–60. Bardi, Rome (1935)

    Google Scholar 

  6. Borgelt, C.: Frequent item set mining. wiley interdisciplinary reviews (WIREs): data mining and knowledge discovery 2(6), 437–456 (2012). doi:10.1002/widm.1074, Wiley, Chichester, United Kingdom

    Google Scholar 

  7. Bringmann, B., Zimmermann, A.: The chosen few: on identifying valuable patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 63–72. IEEE Press, Piscataway, NJ (2007)

    Google Scholar 

  8. De Raedt, L., Zimmermann, A: Constraint-based pattern set mining. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 237–248. IEEE Press, Piscataway, NJ (2007)

    Google Scholar 

  9. Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3), 14 (2007). ACM Press, New York

    Google Scholar 

  10. Goethals, B.: Frequent Itemset Mining Implementations Repository. University of Antwerp, Belgium (2003). http://fimi.ua.ac.be/

  11. Goethals, B.: Frequent set mining. Data Mining and Knowledge Discovery Handbook, pp. 321–338. Springer, Berlin (2010)

    Google Scholar 

  12. Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, Aachen, Germany (2003)

    Google Scholar 

  13. Grahne, G., Zhu, J.: Reducing the main memory consumptions of FPmax* and FPclose. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol. 126, CEUR Workshop Proceedings, Aachen, Germany (2004)

    Google Scholar 

  14. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the19th ACM International Conference on Management of Data (SIGMOD 2000, Dallas, TX), pp. 1–12. ACM Press, New York (2000)

    Google Scholar 

  15. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979). Wiley, Chichester, United Kingdom

    Google Scholar 

  16. Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Exploration 2(2), 86–93 (2000) .ACM Press, New York

    Google Scholar 

  17. Louis, S., Borgelt, C., Grün, S.: Generation and selection of surrogate methods for correlation analysis. In: Grün, S., Rotter, S. (eds.) Analysis of Parallel Spike Trains, pp. 359–382. Springer, Berlin (2010)

    Chapter  Google Scholar 

  18. Picado-Muiño, D., Borgelt, C., Berger, D., Gerstein, G.L., Grün, S.: Finding neural assemblies with frequent item set mining. Front. Neuroinformatics 7(9) (2013). doi:10.3389/fninf.2013.00009, Frontiers Media, Lausanne, Switzerland

  19. Siebes, A., Vreeken, J., van Leeuwen, M., Item Sets that Compress. In: Proceedings SIAM International Conference on Data Mining (SDM 2006, Bethesda, MD), pp. 393–404. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2006)

    Google Scholar 

  20. Torre, E., Picado-Muiño, D., Denker, M., Borgelt, C., Grün, S.: Statistical evaluation of synchronous spike patterns extracted by frequent item set mining. Front. Comput. Neurosc. 7(132) (2013). doi:10.3389/fninf.2013.00132. Frontiers Media, Lausanne, Switzerland

  21. Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings Workshop on Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, TU Aachen, Germany (2003)

    Google Scholar 

  22. Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol.126. CEUR Workshop Proceedings, Aachen, Germany (2004)

    Google Scholar 

  23. Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st Open Source Data Mining on Frequent Pattern Mining Implementations (OSDM 2005, Chicago, IL), pp. 77–86. ACM Press, New York, (2005)

    Google Scholar 

  24. Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Webb, G.I.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007)

    Article  Google Scholar 

  26. Webb, G.I.: Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Mach. Learn. 71(2–3), 307–323 (2008)

    Article  Google Scholar 

  27. Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)

    MathSciNet  Google Scholar 

  28. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Confernece on Knowledge Discovery and Data Mining (KDD 1997, Newport Beach, CA), pp. 283–296. AAAI Press, Menlo Park, CA, USA (1997)

    Google Scholar 

  29. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2003, Washington, DC), pp. 326–335. ACM Press, New York, NY, USA (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Borgelt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Borgelt, C., Picado-Muiño, D. (2016). Significant Frequent Item Sets Via Pattern Spectrum Filtering. In: Collan, M., Fedrizzi, M., Kacprzyk, J. (eds) Fuzzy Technology. Studies in Fuzziness and Soft Computing, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-319-26986-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26986-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26984-9

  • Online ISBN: 978-3-319-26986-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics