Data Mining and Knowledge Discovery

, Volume 28, Issue 3, pp 593–633 | Cite as

ParaMiner: a generic pattern mining algorithm for multi-core architectures

  • Benjamin Negrevergne
  • Alexandre Termier
  • Marie-Christine Rousset
  • Jean-François Méhaut
Article

Abstract

In this paper, we present ParaMiner which is a generic and parallel algorithm for closed pattern mining. ParaMiner is built on the principles of pattern enumeration in strongly accessible set systems. Its efficiency is due to a novel dataset reduction technique (that we call EL-reduction), combined with novel technique for performing dataset reduction in a parallel execution on a multi-core architecture. We illustrate ParaMiner’s genericity by using this algorithm to solve three different pattern mining problems: the frequent itemset mining problem, the mining frequent connected relational graphs problem and the mining gradual itemsets problem. In this paper, we prove the soundness and the completeness of ParaMiner. Furthermore, our experiments show that despite being a generic algorithm, ParaMiner can compete with specialized state of the art algorithms designed for the pattern mining problems mentioned above. Besides, for the particular problem of gradual itemset mining, ParaMiner outperforms the state of the art algorithm by two orders of magnitude.

Keywords

Data mining Closed pattern mining Parallel pattern mining Multi-core architectures 

References

  1. Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969CrossRefGoogle Scholar
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: International conference on VLDB, pp 487–499Google Scholar
  3. Arimura H, Uno T (2009) Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In: Proceedings SDM, pp 1087–1098Google Scholar
  4. Ayouni S, Laurent A, Yahia SB, Poncelet P (2010) Mining closed gradual patterns. In: ICAISC, pp 267–274Google Scholar
  5. Boley M, Horváth T, Poigné A, Wrobel S (2010) Listing closed sets of strongly accessible set systems with applications to data mining. Theor Comput Sci 411(3):691–700CrossRefMATHGoogle Scholar
  6. Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Proceedings of IEEE international conference on data mining, ICDM, pp 35–42Google Scholar
  7. Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2):377–399CrossRefGoogle Scholar
  8. Buehrer G, Parthasarathy S, Chen YK (2006) Adaptive parallel graph mining for cmp architectures. In: Proceedings of IEEE international conference on data mining, ICDM, pp 97–106Google Scholar
  9. Chaoji V, Hasan MA, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495CrossRefMathSciNetGoogle Scholar
  10. Di-Jorio L, Laurent A, Teisseire M (2009) Mining frequent gradual itemsets from large databases. In: Advances in intelligent data analysis VIII, pp 297–308Google Scholar
  11. Do TDT, Laurent A, Termier A (2010) Pglcm: efficient parallel mining of closed frequent gradual itemsets. In: Proceedings of IEEE international conference on data mining, ICDM, pp 138–147Google Scholar
  12. Flouvat F, Marchi FD, Petit JM (2009) The izi project: easy prototyping of interesting pattern mining algorithms. In: PAKDD workshops, pp 1–15Google Scholar
  13. Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen YK, Dubey P (2005) Cache-conscious frequent pattern mining on a modern processor. In: Very large data bases (VLDB), VLDB endowment, pp 577–588Google Scholar
  14. Goethals B (2004) Fimi repository website. http://fimi.cs.helsinki.fi/. Accessed 6 March 2007
  15. Guns T, Nijssen S, Raedt LD (2011) Itemset mining: a constraint programming perspective. Artif Intell 175(12–13):1951–1983CrossRefMATHGoogle Scholar
  16. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. Special Interest Group Manag Data (SIGMOD) 29(2):1–12Google Scholar
  17. Imoto S, Goto T, Miyano S (2001) Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. In: PSB’02: Kauai, Hawaii. World Scientific Pub Co Inc, Singapore, 3–7 January 2002, p 175Google Scholar
  18. Lucchese C, Orlando S, Perego R (2007) Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: ICDM, pp 242–251Google Scholar
  19. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258CrossRefGoogle Scholar
  20. Negrevergne B (2011) A generic and parallel pattern mining algorithm for multi-core architectures. PhD thesis, University of Grenoble, GrenobleGoogle Scholar
  21. Negrevergne B, Termier A, Mehaut JF, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: Proceedings of HPCS, pp 521–528Google Scholar
  22. Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: SIGMOD conference, pp 13–24Google Scholar
  23. Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 647–652Google Scholar
  24. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT, pp 398–416Google Scholar
  25. Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Conference on knowledge discovery and data mining, KDD, pp 350–354Google Scholar
  26. Pei J, Han J, Lakshmanan LVS (2001) Mining frequent item sets with convertible constraints. In: Proceedings of ICDE, pp 433–442Google Scholar
  27. Soulet A, Crémilleux B (2005) An efficient framework for mining flexible constraints. In: Pacific-Asia conference on knowledge discovery and data mining, PAKDD, pp 661–671Google Scholar
  28. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of conference on knowledge discovery and data mining, KDD, pp 67–73Google Scholar
  29. Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94CrossRefGoogle Scholar
  30. Tatikonda S, Parthasarathy S (2009) Mining tree-structured data on multicore systems. In: International conference on VLDB, pp 694–705Google Scholar
  31. Uno T, Asai T, Uchida Y, Arimura H (2003) Lcm: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings of IEEE ICDM, vol 3, CiteseerGoogle Scholar
  32. Uno T, Kiyomi M, Arimura H (2004) Lcm ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM’04 WorkshopGoogle Scholar
  33. Uno T, Kiyomi M, Arimura H (2005) Lcm ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: OSDM’05 workshop. ACM, New York, pp 77–86Google Scholar
  34. Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: International conference on data mining (ICDM), pp 721–724Google Scholar
  35. Yan X, Zhou XJ, Han J (2005) Mining closed relational graphs with connectivity constraints. In: ICDE, pp 357–358Google Scholar
  36. Zhu F, Yan X, Han J, Yu P (2007) gprune: a constraint pushing framework for graph pattern mining. In: Advances in knowledge discovery and data mining, pp 388–400Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Benjamin Negrevergne
    • 1
  • Alexandre Termier
    • 1
  • Marie-Christine Rousset
    • 1
  • Jean-François Méhaut
    • 1
  1. 1.LIG LaboratoryUniversity of GrenobleGrenobleFrance

Personalised recommendations