Learning Stochastic Tree Edit Distance

  • Marc Bernard
  • Amaury Habrard
  • Marc Sebban
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the Expectation-Maximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.


Stochastic tree edit distance EM algorithm generative models discriminative models 


  1. 1.
    Bille, P.: A survey on tree edit distance and related problem. Theoretical Computer Science 337(1-3), 217–239 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Ristad, S., Yianilos, P.: Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)CrossRefGoogle Scholar
  3. 3.
    Oncina, J., Sebban, M.: Learning stochastic edit distance: application in handwritten character recognition. Journal of Pattern Recognition (to appear, 2006)Google Scholar
  4. 4.
    McCallum, A., Bellare, K., Pereira, P.: A conditional random field for disciminatively-trained finite-state sting edit distance. In: UAI 2005 (2005)Google Scholar
  5. 5.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)zbMATHCrossRefGoogle Scholar
  6. 6.
    Neuhaus, M., Bunke, H.: A probabilistic approach to learning costs for graph edit distance. In: 17th Int. Conf. on Pattern Recognition, pp. 389–393. IEEE, Los Alamitos (2004)CrossRefGoogle Scholar
  7. 7.
    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 1245–1262 (1989)Google Scholar
  8. 8.
    Klein, P.: Computing the edit-distance between unrooted ordered trees. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 91–102. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Selkow, S.: The tree-to-tree editing problem. Information Processing Letters 6(6), 184–186 (1977)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Bouchard, G., Triggs, B.: The trade-off between generative and discrminative classifiers. In: COMPSTAT 2004. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Dempster, A., Laird, M., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B(39), 1–38 (1977)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marc Bernard
    • 1
  • Amaury Habrard
    • 2
  • Marc Sebban
    • 1
  1. 1.EURISEUniversité Jean Monnet de Saint-EtienneSaint-EtienneFrance
  2. 2.LIFUniversité de ProvenceMarseilleFrance

Personalised recommendations