Learning Stochastic Tree Edit Distance
Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.
KeywordsStochastic tree edit distance EM algorithm generative models discriminative models
- 3.Oncina, J., Sebban, M.: Learning stochastic edit distance: application in handwritten character recognition. Journal of Pattern Recognition (to appear, 2006)Google Scholar
- 4.McCallum, A., Bellare, K., Pereira, P.: A conditional random field for disciminatively-trained finite-state sting edit distance. In: UAI 2005 (2005)Google Scholar
- 7.Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 1245–1262 (1989)Google Scholar
- 10.Bouchard, G., Triggs, B.: The trade-off between generative and discrminative classifiers. In: COMPSTAT 2004. Springer, Heidelberg (2004)Google Scholar