Knowledge and Information Systems

, Volume 46, Issue 3, pp 491–514

Frequent pattern mining in attributed trees: algorithms and applications

  • Claude Pasquier
  • Jérémy Sanhes
  • Frédéric Flouvat
  • Nazha Selmaoui-Folcher
Regular Paper

Abstract

Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we introduce a new domain of patterns, attributed trees (atrees), and a method to extract these patterns in a forest of atrees. Attributed trees are trees in which vertices are associated with itemsets. Mining this type of patterns (called asubtrees), which combines tree mining and itemset mining, requires the exploration of a huge search space. To make our approach scalable, we investigate the mining of condensed representations. For attributed trees, the classical concept of closure involves both itemset closure and structural closure. We present three algorithms for mining all patterns, closed patterns w.r.t. itemsets (content) and/or structure in attributed trees. We show that, for low support values, mining content-closed attributed trees is a good compromise between non-redundancy of solutions and execution time.

Keywords

Tree mining Frequent pattern mining Attributed tree  Condensed representation 

References

  1. 1.
    Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216CrossRefGoogle Scholar
  2. 2.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE, 95, pp 3–14Google Scholar
  3. 3.
    Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDMGoogle Scholar
  4. 4.
    Asai T, Arimura H, Uno T, Nakano S-I (2003) Discovering frequent substructures in large unordered trees. In: The 6th International Conference on Discovery Science, Springer, pp 47–61Google Scholar
  5. 5.
    Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435Google Scholar
  6. 6.
    Balcázar JL, Bifet A, Lozano A (2010) Mining frequent closed rooted trees. Mach Learn 78(1–2):1–33MathSciNetGoogle Scholar
  7. 7.
    Bayardo RJ (1998) Efficiently mining long patterns from databases. In: ACM SIGMOD International Conference on Management of Data SIGMOD 98, pp 85–93Google Scholar
  8. 8.
    Chehreghani MH (2011) Efficiently mining unordered trees. In: ICDM, pp 111–120Google Scholar
  9. 9.
    Chi Y, Muntz RR, Nijssen S, Kok JN (2004) Frequent subtree mining—an overview. Fundam Inf 66(1–2):161–198MathSciNetGoogle Scholar
  10. 10.
    Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM’03)Google Scholar
  11. 11.
    Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pp 11–20Google Scholar
  12. 12.
    Chi Y, Yang Y, Xia Y, Muntz RR (2004) Cmtreeminer: mining both closed and maximal frequent subtrees. In: PAKDD, pp 63–73Google Scholar
  13. 13.
    Deshpande M, Kuramochi M, Karypis G (2003) Frequent sub-structure-based approaches for classifying chemical compounds. In: Third IEEE International Conference on Data Mining, IEEE Comput. Soc, pp 35–42Google Scholar
  14. 14.
    Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: PAKDD, pp 147–159Google Scholar
  15. 15.
    Gay D, Selmaoui-Folcher N, Boulicaut J-F (2010) Application-independent feature construction based on almost-closedness properties. Knowl Inf Syst 30(1):87–111CrossRefGoogle Scholar
  16. 16.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12CrossRefGoogle Scholar
  17. 17.
    Hido S, Kawano H (2005) Amiot: induced ordered tree mining in tree-structured databases. In: ICDM, pp 170–177Google Scholar
  18. 18.
    Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28:75–105CrossRefGoogle Scholar
  19. 19.
    Luccio F, Enriquez AM, Rieumont PO, Pagli L (2001) Exact rooted subtree matching in sublinear time, Universita Di Pisa Technical Report TR-01 14Google Scholar
  20. 20.
    Luccio F, Pagli L (1995) Approximate matching for 2 families of trees. Inf Comput 123(1):111–120CrossRefMathSciNetMATHGoogle Scholar
  21. 21.
    Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: KDD, pp 189–194Google Scholar
  22. 22.
    Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp 527–532Google Scholar
  23. 23.
    Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604Google Scholar
  24. 24.
    Mougel P-N, Rigotti C, Gandrillon O (2012) Finding collections of k-clique percolated components in attributed graphs. In: PAKDD, pp 181–192Google Scholar
  25. 25.
    Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequences (MGTS)Google Scholar
  26. 26.
    Pasquier C, Sanhes J, Flouvat F, Selmaoui-Folcher N (2013) Frequent Pattern Mining in Attributed trees. In: Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13)., Gold Coast Australia, pp 26–37Google Scholar
  27. 27.
    Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: ICDT, pp 398–416Google Scholar
  28. 28.
    Pensa RG, Boulicaut J-F (2005) From local pattern mining to relevant bi-cluster characterization. In: 6th International Symposium on Intelligent Data Analysis (IDA 2005), pp 293–304Google Scholar
  29. 29.
    Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR’92), pp 539–550Google Scholar
  30. 30.
    Selmaoui-Folcher N, Flouvat F (2011) How to use “classical” tree mining algorithms to find complex spatio-temporal patterns?. In: DEXA (2), pp 107–117Google Scholar
  31. 31.
    Termier A, Rousset M-C, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546Google Scholar
  32. 32.
    Termier A, Rousset M-C, Sebag M, Ohara K, Washio T, Motoda H (2008) Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans Knowl Data Eng 20(3):300–320CrossRefGoogle Scholar
  33. 33.
    Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451Google Scholar
  34. 34.
    Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor Newsl 5(1):59–68CrossRefGoogle Scholar
  35. 35.
    Xiao Y, Yao J-F, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: ICDM, pp 379–386Google Scholar
  36. 36.
    Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: SIGMOD Conference, pp 335–346Google Scholar
  37. 37.
    Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD, pp 71–80Google Scholar
  38. 38.
    Zaki MJ (2004) Efficiently mining frequent embedded unordered trees. Fundam Inf 66(1–2):33–52MathSciNetGoogle Scholar
  39. 39.
    Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8):1021–1035CrossRefGoogle Scholar
  40. 40.
    Zou L, Lu Y, Zhang H, Hu R (2006) Prefixtreespan: a pattern growth algorithm for mining embedded subtrees. In: WISE, pp 499–505Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Claude Pasquier
    • 1
    • 2
    • 3
  • Jérémy Sanhes
    • 3
  • Frédéric Flouvat
    • 3
  • Nazha Selmaoui-Folcher
    • 3
  1. 1.University of Nice Sophia Antipolis, I3S, UMR 7271Sophia AntipolisFrance
  2. 2.CNRS, I3S, UMR 7271Sophia AntipolisFrance
  3. 3.Pôle Pluridisciplinaire de la Matière et de l’Environnement (PPME)University of New CaledoniaNouméaNew Caledonia

Personalised recommendations