Skip to main content

Advertisement

Log in

Para Miner: a generic pattern mining algorithm for multi-core architectures

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper, we present Para Miner which is a generic and parallel algorithm for closed pattern mining. Para Miner is built on the principles of pattern enumeration in strongly accessible set systems. Its efficiency is due to a novel dataset reduction technique (that we call EL-reduction), combined with novel technique for performing dataset reduction in a parallel execution on a multi-core architecture. We illustrate Para Miner’s genericity by using this algorithm to solve three different pattern mining problems: the frequent itemset mining problem, the mining frequent connected relational graphs problem and the mining gradual itemsets problem. In this paper, we prove the soundness and the completeness of Para Miner. Furthermore, our experiments show that despite being a generic algorithm, Para Miner can compete with specialized state of the art algorithms designed for the pattern mining problems mentioned above. Besides, for the particular problem of gradual itemset mining, Para Miner outperforms the state of the art algorithm by two orders of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://membres-liglab.imag.fr/negrevergne/

References

  • Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969

    Article  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: International conference on VLDB, pp 487–499

  • Arimura H, Uno T (2009) Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In: Proceedings SDM, pp 1087–1098

  • Ayouni S, Laurent A, Yahia SB, Poncelet P (2010) Mining closed gradual patterns. In: ICAISC, pp 267–274

  • Boley M, Horváth T, Poigné A, Wrobel S (2010) Listing closed sets of strongly accessible set systems with applications to data mining. Theor Comput Sci 411(3):691–700

    Article  MATH  Google Scholar 

  • Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Proceedings of IEEE international conference on data mining, ICDM, pp 35–42

  • Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2):377–399

    Article  Google Scholar 

  • Buehrer G, Parthasarathy S, Chen YK (2006) Adaptive parallel graph mining for cmp architectures. In: Proceedings of IEEE international conference on data mining, ICDM, pp 97–106

  • Chaoji V, Hasan MA, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495

    Article  MathSciNet  Google Scholar 

  • Di-Jorio L, Laurent A, Teisseire M (2009) Mining frequent gradual itemsets from large databases. In: Advances in intelligent data analysis VIII, pp 297–308

  • Do TDT, Laurent A, Termier A (2010) Pglcm: efficient parallel mining of closed frequent gradual itemsets. In: Proceedings of IEEE international conference on data mining, ICDM, pp 138–147

  • Flouvat F, Marchi FD, Petit JM (2009) The izi project: easy prototyping of interesting pattern mining algorithms. In: PAKDD workshops, pp 1–15

  • Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen YK, Dubey P (2005) Cache-conscious frequent pattern mining on a modern processor. In: Very large data bases (VLDB), VLDB endowment, pp 577–588

  • Goethals B (2004) Fimi repository website. http://fimi.cs.helsinki.fi/. Accessed 6 March 2007

  • Guns T, Nijssen S, Raedt LD (2011) Itemset mining: a constraint programming perspective. Artif Intell 175(12–13):1951–1983

    Article  MATH  Google Scholar 

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. Special Interest Group Manag Data (SIGMOD) 29(2):1–12

    Google Scholar 

  • Imoto S, Goto T, Miyano S (2001) Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. In: PSB’02: Kauai, Hawaii. World Scientific Pub Co Inc, Singapore, 3–7 January 2002, p 175

  • Lucchese C, Orlando S, Perego R (2007) Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: ICDM, pp 242–251

  • Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3):241–258

    Article  Google Scholar 

  • Negrevergne B (2011) A generic and parallel pattern mining algorithm for multi-core architectures. PhD thesis, University of Grenoble, Grenoble

  • Negrevergne B, Termier A, Mehaut JF, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: Proceedings of HPCS, pp 521–528

  • Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: SIGMOD conference, pp 13–24

  • Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 647–652

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of ICDT, pp 398–416

  • Pei J, Han J (2000) Can we push more constraints into frequent pattern mining? In: Conference on knowledge discovery and data mining, KDD, pp 350–354

  • Pei J, Han J, Lakshmanan LVS (2001) Mining frequent item sets with convertible constraints. In: Proceedings of ICDE, pp 433–442

  • Soulet A, Crémilleux B (2005) An efficient framework for mining flexible constraints. In: Pacific-Asia conference on knowledge discovery and data mining, PAKDD, pp 661–671

  • Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of conference on knowledge discovery and data mining, KDD, pp 67–73

  • Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94

    Article  Google Scholar 

  • Tatikonda S, Parthasarathy S (2009) Mining tree-structured data on multicore systems. In: International conference on VLDB, pp 694–705

  • Uno T, Asai T, Uchida Y, Arimura H (2003) Lcm: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings of IEEE ICDM, vol 3, Citeseer

  • Uno T, Kiyomi M, Arimura H (2004) Lcm ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of IEEE ICDM’04 Workshop

  • Uno T, Kiyomi M, Arimura H (2005) Lcm ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: OSDM’05 workshop. ACM, New York, pp 77–86

  • Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: International conference on data mining (ICDM), pp 721–724

  • Yan X, Zhou XJ, Han J (2005) Mining closed relational graphs with connectivity constraints. In: ICDE, pp 357–358

  • Zhu F, Yan X, Han J, Yu P (2007) gprune: a constraint pushing framework for graph pattern mining. In: Advances in knowledge discovery and data mining, pp 388–400

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Negrevergne.

Additional information

Responsible editor: Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Negrevergne, B., Termier, A., Rousset, MC. et al. Para Miner: a generic pattern mining algorithm for multi-core architectures. Data Min Knowl Disc 28, 593–633 (2014). https://doi.org/10.1007/s10618-013-0313-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0313-2

Keywords

Navigation