Abstract
The number of frequent subtrees usually grows exponentially with the tree size because of combinatorial explosion. As a result, there are too many frequent subtrees for users to manage and use. To solve this problem, we generalize a compressed frame based on δ-cluster to the problem of compressing frequent-subtree sets, and propose an algorithm RPTlocal which can mine compressed frequent subtrees set directly. This algorithm sacrifices the theoretical bounds but still has good compression quality. By pruning the search space and generating frequent subtrees directly, this algorithm is also efficient. Experiment result shows the representative subtrees mining by RPTlocal is almost two orders of magnitude less than the whole collection of the closed subtrees, and is more efficient than CMtreeMiner, the algorithm for mining both closed and Maximal frequent subtrees.
Similar content being viewed by others
References
Wang Ke, Liu Huiqing. Schema Discovery for Semistructured Data [EB/OL].[2008-02-10]. http:zSzzSzwww.iscs.nus.edu.sgzSz:_wangkzSzpubzSzkdd1.pdf/wang97schema.pdf.
Miyahara T, Shoudai T, Uchida T, et al. Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents [EB/OL].[2008-02-10]. http://www.springerlink.com/content/40b73r9aqpepp1yr/fulltext.pdf.
Zaki M J. Efficiently Mining Frequent Trees in a Forest[EB/OL]. [2008-02-10]. http://www.lans.ece.utexas.edu/course/ee380l/03sp/papers/71.pdf.
Asai T, Abe K, Kawasoe S, et al. Efficient Substructure Discovery from Large Semi-Structured Data[EB/OL]. [2008-02-10]. http://www.siam.org/meeings/sdm02/proceedings/sdm02-10.pdf.
Wang Chen, Hong Mingsheng, Wang Wei, et al. Chopper: Efficient[J]. Algorithm for Tree Mining. 2004,(3): 309–319.
Zhu Yongtai. Wang Chen, Hong Mingsheng, et al. ESPM—An Algorithm to Mine Frequent Subtrees[J]. Computer Research and Development, 2004, 4(10): 1720–1726(Ch).
Pei Jian, Han Jiawei, Lu Hongjun, et al. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Database [EB/OL].[2007-08-10]. http://www-sal.cs.uiuc.edu/:_hanj/pdf/hmine01.pdf.
Pei Jian, Han Jiawei, Behzad Mortazavi-Asl, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth[EB/OL]. [2007-08-10]. http://www-sal.cs.uiuc.edu/:_hanj/pdf/span01.pdf.
Chi Yun, Yang Yirong, Xia Yi, et al. CMTreeMiner: Mining both Closed and Maximal Frequent Subtrees [EB/OL]. [2007-08-10]. http://springerlink.metapress.com/content/ 2yl0d4nn1e57478u/fulltext.pdf.
Xin Dong, Han Jiawei, Yan Xifeng, et al. Mining Compressed Frequent-Pattern Sets[EB/OL]. [2007-08-10]. http://www.cs.uiuc.edu/:_hanj/pdf/vldb05.pdf.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the National Natural Science Foundation of China (70371015)
Rights and permissions
About this article
Cite this article
Zhao, C., Wang, X., Sun, Z. et al. Mining compressed frequent subtrees set. Wuhan Univ. J. Nat. Sci. 14, 29–34 (2009). https://doi.org/10.1007/s11859-009-0107-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11859-009-0107-y