Knowledge and Information Systems

, Volume 43, Issue 3, pp 497–527 | Cite as

PGLCM: efficient parallel mining of closed frequent gradual itemsets

  • Trong Dinh Thac Do
  • Alexandre Termier
  • Anne Laurent
  • Benjamin Negrevergne
  • Behrooz Omidvar-Tehrani
  • Sihem Amer-Yahia
Regular Paper

Abstract

Numerical data (e.g., DNA micro-array data, sensor data) pose a challenging problem to existing frequent pattern mining methods which hardly handle them. In this framework, gradual patterns have been recently proposed to extract covariations of attributes, such as: “When X increases, Y decreases”. There exist some algorithms for mining frequent gradual patterns, but they cannot scale to real-world databases. We present in this paper GLCM, the first algorithm for mining closed frequent gradual patterns, which proposes strong complexity guarantees: the mining time is linear with the number of closed frequent gradual itemsets. Our experimental study shows that GLCM is two orders of magnitude faster than the state of the art, with a constant low memory usage. We also present PGLCM, a parallelization of GLCM capable of exploiting multicore processors, with good scale-up properties on complex datasets. These algorithms are the first algorithms capable of mining large real world datasets to discover gradual patterns.

Keywords

Data mining Frequent pattern mining Gradual itemsets Parallelism 

References

  1. 1.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, 1994, pp 487–499Google Scholar
  2. 2.
    Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: mining sequential patterns by prefix-projected growth. ICDE 2001:215–224Google Scholar
  3. 3.
    Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the second SIAM international conference on data mining (SDM2002), Arlington, VA, April 2002, pp 158–174Google Scholar
  4. 4.
    Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. PKDD 2000:13–23Google Scholar
  5. 5.
    Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: SIGMOD Conference 1996:1–12Google Scholar
  6. 6.
    Aumann Y, Lindell Y (2003) A statistical theory for quantitative association rules. J Intell Inf Syst 20(3):255–283CrossRefGoogle Scholar
  7. 7.
    Washio T, Mitsunaga Y, Motoda H (2005) Mining quantitative frequent itemsets using adaptive density-based subspace clustering. ICDM 2005:793–796Google Scholar
  8. 8.
    Di Jorio L, Laurent A, Teisseire M (2009) Mining frequent gradual itemsets from large databases. In: International conference on intelligent data analysis, IDA’09, 2009Google Scholar
  9. 9.
    Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24:25–46CrossRefGoogle Scholar
  10. 10.
    Goethals B (2003–2004) Fimi repository website. http://fimi.cs.helsinki.fi/, 2003–2004
  11. 11.
    Uno T, Kiyomi M, Arimura H (2004) Lcm ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI, 2004Google Scholar
  12. 12.
    Negrevergne B, Termier A, Mehaut J-F, Uno T (2010) Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: The 2010 International Conference on High Performance Computing and Simulation (HPCS 2010), 2010Google Scholar
  13. 13.
    Negrevergne B (2011) A generic and parallel pattern mining algorithm for multi-core architectures. In: PhD dissertation, 2011Google Scholar
  14. 14.
    Arimura H, Uno T (2005) An output-polynomial time algorithm for mining frequent closed attribute trees. In: 15th international conference on inductive logic programming (ILP’05), 2005Google Scholar
  15. 15.
    Berzal F, Cubero J-C, Sanchez D, Vila M-A, Serrano JM (2007) An alternative approach to discover gradual dependencies. Int J Uncertain Fuzziness Knowl Based Syst (IJUFKS) 15(5):559–570CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Hüllermeier E (2002) Association rules for expressing gradual dependencies. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery, PKDD’02. Springer-Verlag 2002:200–211Google Scholar
  17. 17.
    Laurent A, Négrevergne B, Sicard N, Termier A (2010) Pgp-mc: towards a multicore parallel approach for mining gradual patterns. In: DASFAA (1), 2010, pp 78–84Google Scholar
  18. 18.
    Ayouni S, Laurent A, Yahia SB, Poncelet P (2010) Mining closed gradual patterns. In: 10th international conference on artificial intelligence and soft computing, ICAISC 2010, ser. LNCS, vol 6113, 2010, pp 267–274Google Scholar
  19. 19.
    Laurent A, Lesot M-J, Rifqi M (2009) Graank: exploiting rank correlations for extracting gradual dependencies. In Proceedings of FQAS’ 09:2009Google Scholar
  20. 20.
    Kendall M, Babington Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287CrossRefGoogle Scholar
  21. 21.
    Lucchese C, Orlando S, Perego R (2004) Dci closed: a fast and memory efficient algorithm to mine frequent closed itemsets. In: FIMI, 2004Google Scholar
  22. 22.
    Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. Discov Sci 2004:16–31Google Scholar
  23. 23.
    Gelernter D (1989) Multiple tuple spaces in linda. 1989, pp 20–27. doi:10.1007/3-540-51285-3_30
  24. 24.
    Dubois D, Prade H (1996) What are fuzzy rules and how to use them. Fuzzy Sets Syst 84(2):169–185CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    Dubois D, Prade H (1992) Gradual inference rules in approximate reasoning. Inf Sci 61:103–122CrossRefMATHMathSciNetGoogle Scholar
  26. 26.
    Dubois D, Prade H, Grabisch M (1995) Gradual rules and the approximation of control laws. In: Theoretical aspects of fuzzy control, pp 147–181Google Scholar
  27. 27.
    Dubois D, Prade H, Ughetto L (2003) A new perspective on reasoning with fuzzy rules. Int J Intell Syst 18(5):541–567CrossRefMATHGoogle Scholar
  28. 28.
    Cheng H, Yan X, Han J, Hsu C-W (2007), Discriminative frequent pattern analysis for effective classification. In: International conference on data, engineering, pp 717–725Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Trong Dinh Thac Do
    • 1
    • 2
  • Alexandre Termier
    • 1
  • Anne Laurent
    • 2
  • Benjamin Negrevergne
    • 3
  • Behrooz Omidvar-Tehrani
    • 1
  • Sihem Amer-Yahia
    • 1
  1. 1.LIG, CNRS UMRUniversity of GrenobleGrenobleFrance
  2. 2.LIRMM, CNRS UMRUniversity of Montpellier IIMontpellierFrance
  3. 3.Department of Computer ScienceKU LeuvenLeuvenBelgium

Personalised recommendations