Skip to main content
Log in

Mining compressed frequent subtrees set

  • Published:
Wuhan University Journal of Natural Sciences

Abstract

The number of frequent subtrees usually grows exponentially with the tree size because of combinatorial explosion. As a result, there are too many frequent subtrees for users to manage and use. To solve this problem, we generalize a compressed frame based on δ-cluster to the problem of compressing frequent-subtree sets, and propose an algorithm RPTlocal which can mine compressed frequent subtrees set directly. This algorithm sacrifices the theoretical bounds but still has good compression quality. By pruning the search space and generating frequent subtrees directly, this algorithm is also efficient. Experiment result shows the representative subtrees mining by RPTlocal is almost two orders of magnitude less than the whole collection of the closed subtrees, and is more efficient than CMtreeMiner, the algorithm for mining both closed and Maximal frequent subtrees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wang Ke, Liu Huiqing. Schema Discovery for Semistructured Data [EB/OL].[2008-02-10]. http:zSzzSzwww.iscs.nus.edu.sgzSz:_wangkzSzpubzSzkdd1.pdf/wang97schema.pdf.

  2. Miyahara T, Shoudai T, Uchida T, et al. Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents [EB/OL].[2008-02-10]. http://www.springerlink.com/content/40b73r9aqpepp1yr/fulltext.pdf.

  3. Zaki M J. Efficiently Mining Frequent Trees in a Forest[EB/OL]. [2008-02-10]. http://www.lans.ece.utexas.edu/course/ee380l/03sp/papers/71.pdf.

  4. Asai T, Abe K, Kawasoe S, et al. Efficient Substructure Discovery from Large Semi-Structured Data[EB/OL]. [2008-02-10]. http://www.siam.org/meeings/sdm02/proceedings/sdm02-10.pdf.

  5. Wang Chen, Hong Mingsheng, Wang Wei, et al. Chopper: Efficient[J]. Algorithm for Tree Mining. 2004,(3): 309–319.

  6. Zhu Yongtai. Wang Chen, Hong Mingsheng, et al. ESPM—An Algorithm to Mine Frequent Subtrees[J]. Computer Research and Development, 2004, 4(10): 1720–1726(Ch).

    Google Scholar 

  7. Pei Jian, Han Jiawei, Lu Hongjun, et al. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Database [EB/OL].[2007-08-10]. http://www-sal.cs.uiuc.edu/:_hanj/pdf/hmine01.pdf.

  8. Pei Jian, Han Jiawei, Behzad Mortazavi-Asl, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth[EB/OL]. [2007-08-10]. http://www-sal.cs.uiuc.edu/:_hanj/pdf/span01.pdf.

  9. Chi Yun, Yang Yirong, Xia Yi, et al. CMTreeMiner: Mining both Closed and Maximal Frequent Subtrees [EB/OL]. [2007-08-10]. http://springerlink.metapress.com/content/ 2yl0d4nn1e57478u/fulltext.pdf.

  10. Xin Dong, Han Jiawei, Yan Xifeng, et al. Mining Compressed Frequent-Pattern Sets[EB/OL]. [2007-08-10]. http://www.cs.uiuc.edu/:_hanj/pdf/vldb05.pdf.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuanshen Zhao.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (70371015)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, C., Wang, X., Sun, Z. et al. Mining compressed frequent subtrees set. Wuhan Univ. J. Nat. Sci. 14, 29–34 (2009). https://doi.org/10.1007/s11859-009-0107-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-009-0107-y

Key words

CLC number

Navigation