Tree mining is an instance of constraint-based pattern mining and studiesthe discovery of tree patterns in data that is represented as a tree structure or as a set of trees structures. Minimum frequency is the most studied constraint.
Motivation and Background
Tree mining is motivated by the availability of many types of data that can be represented as tree structures. There is a large variety in tree types, for instance, ordered trees, unordered trees, rooted trees, unrooted (free) trees, labeled trees, unlabeled trees, and binary trees; each of these has its own application areas. An example are trees in tree banks, which store sentences annotated with parse trees. In such data, it is not only of interest to find commonly occurring sets of words (for which frequent itemset miners could be used), but also to find commonly occurring parses of these words. Tree miners aim at finding patterns in this structured information. The patterns can be interesting in their own right,...
- Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., & Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In Proceedings of the second SIAM international conference on data mining (pp. 158–174). SIAM.Google Scholar
- Berka, P. (1999). Workshop notes on discovery challenge PKDD-99 (Tech. Rep.). Prague, Czech Republic: University of Economics.Google Scholar
- Chalmers, R., & Almeroth, K. (2003). On the topology of multicast trees. In IEEE/ACM transactions on networking (Vol. 11, pp. 153–165). IEEE Press/ACM Press.Google Scholar
- Chi, Y., Nijssen, S., Muntz, R. R., & Kok, J. N. (2005). Frequent subtree mining—An overview. In Fundamenta Informaticae (Vol. 66, pp. 161–198). IOS Press.Google Scholar
- Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. In Computational linguistics (Vol. 19, pp. 313–330). MIT Press.Google Scholar
- Morell, V. (1996). TreeBASE: The roots of phylogeny. In Science (Vol. 273, p. 569).Google Scholar
- Punin, J., Krishnamoorthy, M., & Zaki, M. J. (2002). LOGML—log markup language for web usage mining. In WEBKDD 2001—mining web log data across all customers touch points. Third international workshop. Lecture notes in artificial intelligence (Vol. 2356, pp. 88–112). Springer.Google Scholar
- Sekine, S. (1998). Corpus-based parsing and sublanguages studies. Ph.D. dissertation. New York University, New York.Google Scholar
- Wang, K., & Liu, H. (1998). Discovering typical structures of documents: A road map approach. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 146–154). ACM Press.Google Scholar
- Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. In Proceedings of the 8th international conference knowledge discovery and data mining (KDD) (pp. 71–80). ACM Press.Google Scholar
- Zhang, S., & Wang, J. (2005). Frequent agreement subtree mining. http://aria.njit.edu/mediadb/fast/.