Tree Mining

Nijssen, Siegfried

doi:10.1007/978-0-387-30164-8_851

Siegfried Nijssen

341 Accesses

Definition

Tree mining is an instance of constraint-based pattern mining and studiesthe discovery of tree patterns in data that is represented as a tree structure or as a set of trees structures. Minimum frequency is the most studied constraint.

Motivation and Background

Tree mining is motivated by the availability of many types of data that can be represented as tree structures. There is a large variety in tree types, for instance, ordered trees, unordered trees, rooted trees, unrooted (free) trees, labeled trees, unlabeled trees, and binary trees; each of these has its own application areas. An example are trees in tree banks, which store sentences annotated with parse trees. In such data, it is not only of interest to find commonly occurring sets of words (for which frequent itemset miners could be used), but also to find commonly occurring parses of these words. Tree miners aim at finding patterns in this structured information. The patterns can be interesting in their own right,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of the second SIAM international conference on data mining (pp. 158–174). SIAM.
Google Scholar
Berka, P. (1999). Workshop notes on discovery challenge PKDD-99 (Tech. Rep.). Prague, Czech Republic: University of Economics.
Google Scholar
Chalmers, R., & Almeroth, K. (2003). On the topology of multicast trees. In IEEE/ACM transactions on networking (Vol. 11, pp. 153–165). IEEE Press/ACM Press.
Google Scholar
Chi, Y., Nijssen, S., Muntz, R. R., & Kok, J. N. (2005). Frequent subtree mining—An overview. In Fundamenta Informaticae (Vol. 66, pp. 161–198). IOS Press.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. In Computational linguistics (Vol. 19, pp. 313–330). MIT Press.
Google Scholar
Morell, V. (1996). TreeBASE: The roots of phylogeny. In Science (Vol. 273, p. 569).
Google Scholar
Punin, J., Krishnamoorthy, M., & Zaki, M. J. (2002). LOGML—log markup language for web usage mining. In WEBKDD 2001—mining web log data across all customers touch points. Third international workshop. Lecture notes in artificial intelligence (Vol. 2356, pp. 88–112). Springer.
Google Scholar
Sekine, S. (1998). Corpus-based parsing and sublanguages studies. Ph.D. dissertation. New York University, New York.
Google Scholar
Wang, K., & Liu, H. (1998). Discovering typical structures of documents: A road map approach. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 146–154). ACM Press.
Google Scholar
Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. In Proceedings of the 8th international conference knowledge discovery and data mining (KDD) (pp. 71–80). ACM Press.
Google Scholar
Zhang, S., & Wang, J. (2005). Frequent agreement subtree mining. http://aria.njit.edu/mediadb/fast/.

Download references

Author information

Authors and Affiliations

Authors

Siegfried Nijssen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Nijssen, S. (2011). Tree Mining. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_851

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_851
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics