Abstract
Frequent pattern mining is an important data mining task with a broad range of applications. Initially focused on the discovery of frequent itemsets, studies were extended to mine structural forms like sequences, trees or graphs. In this paper, we introduce a new domain of patterns, attributed trees (atrees), and a method to extract these patterns in a forest of atrees. Attributed trees are trees in which vertices are associated with itemsets. Mining this type of patterns (called asubtrees), which combines tree mining and itemset mining, requires the exploration of a huge search space. To make our approach scalable, we investigate the mining of condensed representations. For attributed trees, the classical concept of closure involves both itemset closure and structural closure. We present three algorithms for mining all patterns, closed patterns w.r.t. itemsets (content) and/or structure in attributed trees. We show that, for low support values, mining content-closed attributed trees is a good compromise between non-redundancy of solutions and execution time.
Similar content being viewed by others
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216
Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE, 95, pp 3–14
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM
Asai T, Arimura H, Uno T, Nakano S-I (2003) Discovering frequent substructures in large unordered trees. In: The 6th International Conference on Discovery Science, Springer, pp 47–61
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD, pp 429–435
Balcázar JL, Bifet A, Lozano A (2010) Mining frequent closed rooted trees. Mach Learn 78(1–2):1–33
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: ACM SIGMOD International Conference on Management of Data SIGMOD 98, pp 85–93
Chehreghani MH (2011) Efficiently mining unordered trees. In: ICDM, pp 111–120
Chi Y, Muntz RR, Nijssen S, Kok JN (2004) Frequent subtree mining—an overview. Fundam Inf 66(1–2):161–198
Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM’03)
Chi Y, Yang Y, Muntz RR (2004) Hybridtreeminer: an efficient algorithm for mining frequent rooted trees and free trees using canonical form. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, pp 11–20
Chi Y, Yang Y, Xia Y, Muntz RR (2004) Cmtreeminer: mining both closed and maximal frequent subtrees. In: PAKDD, pp 63–73
Deshpande M, Kuramochi M, Karypis G (2003) Frequent sub-structure-based approaches for classifying chemical compounds. In: Third IEEE International Conference on Data Mining, IEEE Comput. Soc, pp 35–42
Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: PAKDD, pp 147–159
Gay D, Selmaoui-Folcher N, Boulicaut J-F (2010) Application-independent feature construction based on almost-closedness properties. Knowl Inf Syst 30(1):87–111
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12
Hido S, Kawano H (2005) Amiot: induced ordered tree mining in tree-structured databases. In: ICDM, pp 170–177
Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28:75–105
Luccio F, Enriquez AM, Rieumont PO, Pagli L (2001) Exact rooted subtree matching in sublinear time, Universita Di Pisa Technical Report TR-01 14
Luccio F, Pagli L (1995) Approximate matching for 2 families of trees. Inf Comput 123(1):111–120
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: KDD, pp 189–194
Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: ICDM Workshops, pp 527–532
Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604
Mougel P-N, Rigotti C, Gandrillon O (2012) Finding collections of k-clique percolated components in attributed graphs. In: PAKDD, pp 181–192
Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: First International Workshop on Mining Graphs, Trees and Sequences (MGTS)
Pasquier C, Sanhes J, Flouvat F, Selmaoui-Folcher N (2013) Frequent Pattern Mining in Attributed trees. In: Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13)., Gold Coast Australia, pp 26–37
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: ICDT, pp 398–416
Pensa RG, Boulicaut J-F (2005) From local pattern mining to relevant bi-cluster characterization. In: 6th International Symposium on Intelligent Data Analysis (IDA 2005), pp 293–304
Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR’92), pp 539–550
Selmaoui-Folcher N, Flouvat F (2011) How to use “classical” tree mining algorithms to find complex spatio-temporal patterns?. In: DEXA (2), pp 107–117
Termier A, Rousset M-C, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: ICDM, pp 543–546
Termier A, Rousset M-C, Sebag M, Ohara K, Washio T, Motoda H (2008) Dryadeparent, an efficient and robust closed attribute tree mining algorithm. IEEE Trans Knowl Data Eng 20(3):300–320
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: PAKDD, pp 441–451
Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor Newsl 5(1):59–68
Xiao Y, Yao J-F, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: ICDM, pp 379–386
Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: SIGMOD Conference, pp 335–346
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD, pp 71–80
Zaki MJ (2004) Efficiently mining frequent embedded unordered trees. Fundam Inf 66(1–2):33–52
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8):1021–1035
Zou L, Lu Y, Zhang H, Hu R (2006) Prefixtreespan: a pattern growth algorithm for mining embedded subtrees. In: WISE, pp 499–505
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pasquier, C., Sanhes, J., Flouvat, F. et al. Frequent pattern mining in attributed trees: algorithms and applications. Knowl Inf Syst 46, 491–514 (2016). https://doi.org/10.1007/s10115-015-0831-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0831-x