Definition
Tree mining is an instance of constraint-based pattern mining and studiesthe discovery of tree patterns in data that is represented as a tree structure or as a set of trees structures. Minimum frequency is the most studied constraint.
Motivation and Background
Tree mining is motivated by the availability of many types of data that can be represented as tree structures. There is a large variety in tree types, for instance, ordered trees, unordered trees, rooted trees, unrooted (free) trees, labeled trees, unlabeled trees, and binary trees; each of these has its own application areas. An example are trees in tree banks, which store sentences annotated with parse trees. In such data, it is not only of interest to find commonly occurring sets of words (for which frequent itemset miners could be used), but also to find commonly occurring parses of these words. Tree miners aim at finding patterns in this structured information. The patterns can be interesting in their own right,...
References
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of the second SIAM international conference on data mining (pp. 158–174). SIAM.
Berka, P. (1999). Workshop notes on discovery challenge PKDD-99 (Tech. Rep.). Prague, Czech Republic: University of Economics.
Chalmers, R., & Almeroth, K. (2003). On the topology of multicast trees. In IEEE/ACM transactions on networking (Vol. 11, pp. 153–165). IEEE Press/ACM Press.
Chi, Y., Nijssen, S., Muntz, R. R., & Kok, J. N. (2005). Frequent subtree mining—An overview. In Fundamenta Informaticae (Vol. 66, pp. 161–198). IOS Press.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. In Computational linguistics (Vol. 19, pp. 313–330). MIT Press.
Morell, V. (1996). TreeBASE: The roots of phylogeny. In Science (Vol. 273, p. 569).
Punin, J., Krishnamoorthy, M., & Zaki, M. J. (2002). LOGML—log markup language for web usage mining. In WEBKDD 2001—mining web log data across all customers touch points. Third international workshop. Lecture notes in artificial intelligence (Vol. 2356, pp. 88–112). Springer.
Sekine, S. (1998). Corpus-based parsing and sublanguages studies. Ph.D. dissertation. New York University, New York.
Wang, K., & Liu, H. (1998). Discovering typical structures of documents: A road map approach. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 146–154). ACM Press.
Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. In Proceedings of the 8th international conference knowledge discovery and data mining (KDD) (pp. 71–80). ACM Press.
Zhang, S., & Wang, J. (2005). Frequent agreement subtree mining. http://aria.njit.edu/mediadb/fast/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Nijssen, S. (2011). Tree Mining. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_851
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_851
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering