Fuzzy Technology pp 73-84

Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 335) | Cite as

Significant Frequent Item Sets Via Pattern Spectrum Filtering

Chapter

Abstract

Frequent item set mining often suffers from the grave problem that the number of frequent item sets can be huge, even if they are restricted to closed or maximal item sets: in some cases the size of the output can even exceed the size of the transaction database to analyze. In order to overcome this problem, several approaches have been suggested that try to reduce the output by statistical assessments so that only significant frequent item sets (or association rules derived from them) are reported. In this paper we propose a new method along these lines, which combines data randomization with so-called pattern spectrum filtering, as it has been developed for neural spike train analysis. The former serves the purpose to implicitly represent the null hypothesis of independent items, while the latter helps to cope with the multiple testing problem resulting from a statistical evaluation of found patterns.

References

  1. 1.
    Abdi, H.: Bonferroni and S̆idák corrections for multiple comparisons. In: Salkind, N.J. (ed.) Encyclopedia of Measurement and Statistics, pp. 103–107. Sage Publications, Thousand Oaks (2007)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB 1994, Santiago de Chile), pp. 487–499. Morgan Kaufmann, San Mateo (1994)Google Scholar
  3. 3.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 57(1), 289–300. Blackwell, Oxford, United Kingdom (1995)Google Scholar
  4. 4.
    Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California at Irvine, CA (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
  5. 5.
    Bonferroni, C.E.: Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni, pp. 13–60. Bardi, Rome (1935)Google Scholar
  6. 6.
    Borgelt, C.: Frequent item set mining. wiley interdisciplinary reviews (WIREs): data mining and knowledge discovery 2(6), 437–456 (2012). doi:10.1002/widm.1074, Wiley, Chichester, United Kingdom
  7. 7.
    Bringmann, B., Zimmermann, A.: The chosen few: on identifying valuable patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 63–72. IEEE Press, Piscataway, NJ (2007)Google Scholar
  8. 8.
    De Raedt, L., Zimmermann, A: Constraint-based pattern set mining. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007, Omaha, NE), pp. 237–248. IEEE Press, Piscataway, NJ (2007)Google Scholar
  9. 9.
    Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3), 14 (2007). ACM Press, New YorkGoogle Scholar
  10. 10.
    Goethals, B.: Frequent Itemset Mining Implementations Repository. University of Antwerp, Belgium (2003). http://fimi.ua.ac.be/
  11. 11.
    Goethals, B.: Frequent set mining. Data Mining and Knowledge Discovery Handbook, pp. 321–338. Springer, Berlin (2010)Google Scholar
  12. 12.
    Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, Aachen, Germany (2003)Google Scholar
  13. 13.
    Grahne, G., Zhu, J.: Reducing the main memory consumptions of FPmax* and FPclose. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol. 126, CEUR Workshop Proceedings, Aachen, Germany (2004)Google Scholar
  14. 14.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the19th ACM International Conference on Management of Data (SIGMOD 2000, Dallas, TX), pp. 1–12. ACM Press, New York (2000)Google Scholar
  15. 15.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979). Wiley, Chichester, United KingdomGoogle Scholar
  16. 16.
    Kohavi, R., Bradley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD Exploration 2(2), 86–93 (2000) .ACM Press, New YorkGoogle Scholar
  17. 17.
    Louis, S., Borgelt, C., Grün, S.: Generation and selection of surrogate methods for correlation analysis. In: Grün, S., Rotter, S. (eds.) Analysis of Parallel Spike Trains, pp. 359–382. Springer, Berlin (2010)CrossRefGoogle Scholar
  18. 18.
    Picado-Muiño, D., Borgelt, C., Berger, D., Gerstein, G.L., Grün, S.: Finding neural assemblies with frequent item set mining. Front. Neuroinformatics 7(9) (2013). doi:10.3389/fninf.2013.00009, Frontiers Media, Lausanne, Switzerland
  19. 19.
    Siebes, A., Vreeken, J., van Leeuwen, M., Item Sets that Compress. In: Proceedings SIAM International Conference on Data Mining (SDM 2006, Bethesda, MD), pp. 393–404. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2006)Google Scholar
  20. 20.
    Torre, E., Picado-Muiño, D., Denker, M., Borgelt, C., Grün, S.: Statistical evaluation of synchronous spike patterns extracted by frequent item set mining. Front. Comput. Neurosc. 7(132) (2013). doi:10.3389/fninf.2013.00132. Frontiers Media, Lausanne, Switzerland
  21. 21.
    Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings Workshop on Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL), vol. 90. CEUR Workshop Proceedings, TU Aachen, Germany (2003)Google Scholar
  22. 22.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings Workshop Frequent Item Set Mining Implementations (FIMI 2004, Brighton, UK), vol.126. CEUR Workshop Proceedings, Aachen, Germany (2004)Google Scholar
  23. 23.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st Open Source Data Mining on Frequent Pattern Mining Implementations (OSDM 2005, Chicago, IL), pp. 77–86. ACM Press, New York, (2005)Google Scholar
  24. 24.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Webb, G.I.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007)CrossRefGoogle Scholar
  26. 26.
    Webb, G.I.: Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Mach. Learn. 71(2–3), 307–323 (2008)CrossRefGoogle Scholar
  27. 27.
    Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)MathSciNetGoogle Scholar
  28. 28.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of the 3rd International Confernece on Knowledge Discovery and Data Mining (KDD 1997, Newport Beach, CA), pp. 283–296. AAAI Press, Menlo Park, CA, USA (1997)Google Scholar
  29. 29.
    Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2003, Washington, DC), pp. 326–335. ACM Press, New York, NY, USA (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.European Centre for Soft ComputingMieresSpain

Personalised recommendations