Mining rooted ordered trees under subtree homeomorphism

Haghir Chehreghani, Mostafa; Bruynooghe, Maurice

doi:10.1007/s10618-015-0439-5

Mining rooted ordered trees under subtree homeomorphism

Published: 19 October 2015

Volume 30, pages 1249–1272, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Mostafa Haghir Chehreghani¹ &
Maurice Bruynooghe¹

439 Accesses
13 Citations
Explore all metrics

Abstract

Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transactional Tree Mining

BOSTER: An Efficient Algorithm for Mining Frequent Unordered Induced Subtrees

BEST: An Efficient Algorithm for Mining Frequent Unordered Embedded Subtrees

Notes

The upper bound of the scope of the last vertex is already available in scope; for convenience of presentation, the information is duplicated in RP.

References

Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the second SIAM international conference on data mining (SDM), SIAM, pp 158–174
Balcazar JL, Bifet A, Lozano A (2010) Mining frequent closed rooted trees. Mach Learn 78(1–2):1–33
MathSciNet MATH Google Scholar
Bille P, Gortz I (2011) The tree inclusion problem: in linear space and faster. ACM Trans Algorithm 7(3):1–47
Article MathSciNet MATH Google Scholar
Chalmers R, Almeroth K (2001) Modeling the branching characteristics and efficiency gains of global multicast trees. In: Proceedings of the 20th IEEE international conference on computer communications (INFOCOM), pp 449–458
Chalmers RC, Member S, Almeroth KC (2003) On the topology of multicast trees. IEEE/ACM Trans Netw 11:153–165
Article Google Scholar
Chaoji V, Hasan MA, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495
Article MathSciNet Google Scholar
Chehreghani MH (2011) Efficiently mining unordered trees. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), pp 111–120
Chehreghani MH, Chehreghani MH, Lucas C, Rahgozar M (2011) OInduced: an efficient algorithm for mining induced patterns from rooted ordered trees. IEEE Trans Syst Man Cybern A 41(5):1013–1025
Article Google Scholar
Chi Y, Muntz RR, Nijssen S, Kok JN (2005) Frequent subtree mining—an overview. Fundam Inf 66(1–2):161–198
MathSciNet MATH Google Scholar
Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the third IEEE international conference on data mining (ICDM), pp 509–512
Cui J, Kim J, Maggiorini D, Boussetta K, Gerla M (2002) Aggregated multicast—a comparative study. In: Proceedings of the second international IFIP-TC6 networking conference on networking technologies, services, and protocols; performance of computer and communication networks; and mobile and wireless communications (NETWORKING), pp 1032–1044
Diestel R (2010) Graph theory, 4th edn. Springer, Heidelberg
Book MATH Google Scholar
Dietz PF (1982) Maintaining order in a linked list. In: Proceedings of the 14th ACM symposium on theory of computing (STOC), pp 122–127
Ivancsy R, Vajk I (2006) Frequent pattern mining in web log data. Acta Polytech Hung 3(1):77–90
Google Scholar
Kilpelainen P, Mannila H (1995) Ordered and unordered tree inclusion. SIAM J Comput 24(2):340–356
Article MathSciNet MATH Google Scholar
Miyahara T, Suzuki Y, Shoudai T, Uchida T, Takahashi K, Ueda H (2004) Discovery of maximally frequent tag tree patterns with contractible variables from semistructured documents. In: Proceedings of the 8th Pacific Asia conference on knowledge discovery and data mining (PAKDD), pp 133–144
Nijssen S, Kok JN (2003) Efficient discovery of frequent unordered trees. In: Proceedings of the first international workshop on mining graphs, trees, and sequences (MGTS), pp 55–64
Qin L, Yu JX, Ding B (2007) TwigList: make twig pattern matching fast. In: Proceedings of the 12th international conference on database systems for advanced applications (DASFAA), pp 850–862
Sidhu AS, Dillon TS, Chang E (2006) Protein ontology. In: Ma Z, Chen JY (eds) Database modeling in biology: practices and challenges. Springer, New York, pp 39–60
Google Scholar
Tan H, Hadzic F, Dillon TS, Chang E, Feng L (2008) Tree model guided candidate generation for mining frequent subtrees from XML documents. ACM Trans Knowl Discov Data 2(2):43. doi:10.1145/1376815.1376818
Article Google Scholar
Tatikonda S, Parthasarathy S (2009) Mining tree-structured data on multicore systems. Proc VLDB Endow 2(1):694–705
Article Google Scholar
Tatikonda S, Parthasarathy S, Kurc TM (2006) TRIPS and TIDES: new algorithms for tree mining. In: Proceedings of the 15th ACM international conference on information and knowledge management (CIKM), pp 455–464 (2006)
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific Asia conference on knowledge discovery and data mining (PAKDD), pp 441–451
Xiao Y, Yao JF, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the third IEEE international conference on data mining (ICDM), pp 379–386
Zaki MJ (2005) Efficiently mining frequent embedded unordered trees. Fundam Inf 66(1–2):33–52
MathSciNet MATH Google Scholar
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Tran Knowl Data Eng 17(8):1021–1035
Article Google Scholar
Zaki MJ, Aggarwal CC (2006) XRules: an effective algorithm for structural classification of XML data. Mach Learn 62(1–2):137–170
Article Google Scholar

Download references

Acknowledgments

We are grateful to Professor Mohammed Javeed Zaki for providing the VTreeMiner code, the CSLOGS datasets and the TreeGenerator program, to Dr Henry Tan for providing the MB3Miner code, to Dr Fedja Hadzic for providing the Prions dataset and to Professor Jun-Hong Cui for providing the NASA dataset. Finally, we would like to thank Dr Morteza Haghir Chehreghani for his discussion and suggestions.

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, 3001, Leuven, Belgium
Mostafa Haghir Chehreghani & Maurice Bruynooghe

Authors

Mostafa Haghir Chehreghani
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Bruynooghe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mostafa Haghir Chehreghani.

Additional information

Responsible editors: Joao Gama, Indre Zliobaite, Alipio Jorge and Concha Bielza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haghir Chehreghani, M., Bruynooghe, M. Mining rooted ordered trees under subtree homeomorphism. Data Min Knowl Disc 30, 1249–1272 (2016). https://doi.org/10.1007/s10618-015-0439-5

Download citation

Received: 23 April 2015
Accepted: 02 October 2015
Published: 19 October 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10618-015-0439-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining rooted ordered trees under subtree homeomorphism

Abstract

Access this article

Similar content being viewed by others

Transactional Tree Mining

BOSTER: An Efficient Algorithm for Mining Frequent Unordered Induced Subtrees

BEST: An Efficient Algorithm for Mining Frequent Unordered Embedded Subtrees

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining rooted ordered trees under subtree homeomorphism

Abstract

Access this article

Similar content being viewed by others

Transactional Tree Mining

BOSTER: An Efficient Algorithm for Mining Frequent Unordered Induced Subtrees

BEST: An Efficient Algorithm for Mining Frequent Unordered Embedded Subtrees

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation