PGLCM: efficient parallel mining of closed frequent gradual itemsets

Abstract

Numerical data (e.g., DNA micro-array data, sensor data) pose a challenging problem to existing frequent pattern mining methods which hardly handle them. In this framework, gradual patterns have been recently proposed to extract covariations of attributes, such as: “When X increases, Y decreases”. There exist some algorithms for mining frequent gradual patterns, but they cannot scale to real-world databases. We present in this paper GLCM, the first algorithm for mining closed frequent gradual patterns, which proposes strong complexity guarantees: the mining time is linear with the number of closed frequent gradual itemsets. Our experimental study shows that GLCM is two orders of magnitude faster than the state of the art, with a constant low memory usage. We also present PGLCM, a parallelization of GLCM capable of exploiting multicore processors, with good scale-up properties on complex datasets. These algorithms are the first algorithms capable of mining large real world datasets to discover gradual patterns.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Notes

  1. 1.

    The interested reader will have noticed that it is the same example database as in [22]. However the support value is 3 here instead of 2 in [22], hence the difference in output.

  2. 2.

    http://movielens.umn.edu/.

  3. 3.

    Source from Yahoo Finance! http://finance.yahoo.com/.

  4. 4.

    http://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill.

  5. 5.

    http://en.wikipedia.org/wiki/Anadarko_Petroleum_Corporation.

References

  1. 1.

    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, 1994, pp 487–499

  2. 2.

    Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: mining sequential patterns by prefix-projected growth. ICDE 2001:215–224

    Google Scholar 

  3. 3.

    Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the second SIAM international conference on data mining (SDM2002), Arlington, VA, April 2002, pp 158–174

  4. 4.

    Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. PKDD 2000:13–23

    Google Scholar 

  5. 5.

    Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: SIGMOD Conference 1996:1–12

  6. 6.

    Aumann Y, Lindell Y (2003) A statistical theory for quantitative association rules. J Intell Inf Syst 20(3):255–283

    Article  Google Scholar 

  7. 7.

    Washio T, Mitsunaga Y, Motoda H (2005) Mining quantitative frequent itemsets using adaptive density-based subspace clustering. ICDM 2005:793–796

    Google Scholar 

  8. 8.

    Di Jorio L, Laurent A, Teisseire M (2009) Mining frequent gradual itemsets from large databases. In: International conference on intelligent data analysis, IDA’09, 2009

  9. 9.

    Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24:25–46

    Article  Google Scholar 

  10. 10.

    Goethals B (2003–2004) Fimi repository website. http://fimi.cs.helsinki.fi/, 2003–2004

  11. 11.

    Uno T, Kiyomi M, Arimura H (2004) Lcm ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI, 2004

  12. 12.

    Negrevergne B, Termier A, Mehaut J-F, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: The 2010 International Conference on High Performance Computing and Simulation (HPCS 2010), 2010

  13. 13.

    Negrevergne B (2011) A generic and parallel pattern mining algorithm for multi-core architectures. In: PhD dissertation, 2011

  14. 14.

    Arimura H, Uno T (2005) An output-polynomial time algorithm for mining frequent closed attribute trees. In: 15th international conference on inductive logic programming (ILP’05), 2005

  15. 15.

    Berzal F, Cubero J-C, Sanchez D, Vila M-A, Serrano JM (2007) An alternative approach to discover gradual dependencies. Int J Uncertain Fuzziness Knowl Based Syst (IJUFKS) 15(5):559–570

    Article  MATH  MathSciNet  Google Scholar 

  16. 16.

    Hüllermeier E (2002) Association rules for expressing gradual dependencies. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery, PKDD’02. Springer-Verlag 2002:200–211

  17. 17.

    Laurent A, Négrevergne B, Sicard N, Termier A (2010) Pgp-mc: towards a multicore parallel approach for mining gradual patterns. In: DASFAA (1), 2010, pp 78–84

  18. 18.

    Ayouni S, Laurent A, Yahia SB, Poncelet P (2010) Mining closed gradual patterns. In: 10th international conference on artificial intelligence and soft computing, ICAISC 2010, ser. LNCS, vol 6113, 2010, pp 267–274

  19. 19.

    Laurent A, Lesot M-J, Rifqi M (2009) Graank: exploiting rank correlations for extracting gradual dependencies. In Proceedings of FQAS’ 09:2009

  20. 20.

    Kendall M, Babington Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287

    Article  Google Scholar 

  21. 21.

    Lucchese C, Orlando S, Perego R (2004) Dci closed: a fast and memory efficient algorithm to mine frequent closed itemsets. In: FIMI, 2004

  22. 22.

    Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. Discov Sci 2004:16–31

    Google Scholar 

  23. 23.

    Gelernter D (1989) Multiple tuple spaces in linda. 1989, pp 20–27. doi:10.1007/3-540-51285-3_30

  24. 24.

    Dubois D, Prade H (1996) What are fuzzy rules and how to use them. Fuzzy Sets Syst 84(2):169–185

    Article  MATH  MathSciNet  Google Scholar 

  25. 25.

    Dubois D, Prade H (1992) Gradual inference rules in approximate reasoning. Inf Sci 61:103–122

    Article  MATH  MathSciNet  Google Scholar 

  26. 26.

    Dubois D, Prade H, Grabisch M (1995) Gradual rules and the approximation of control laws. In: Theoretical aspects of fuzzy control, pp 147–181

  27. 27.

    Dubois D, Prade H, Ughetto L (2003) A new perspective on reasoning with fuzzy rules. Int J Intell Syst 18(5):541–567

    Article  MATH  Google Scholar 

  28. 28.

    Cheng H, Yan X, Han J, Hsu C-W (2007), Discriminative frequent pattern analysis for effective classification. In: International conference on data, engineering, pp 717–725

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alexandre Termier.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Do, T.D.T., Termier, A., Laurent, A. et al. PGLCM: efficient parallel mining of closed frequent gradual itemsets. Knowl Inf Syst 43, 497–527 (2015). https://doi.org/10.1007/s10115-014-0749-8

Download citation

Keywords

  • Data mining
  • Frequent pattern mining
  • Gradual itemsets
  • Parallelism