Skip to main content

A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer

  • Conference paper
Grammatical Inference: Algorithms and Applications (ICGI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4201))

Included in the following conference series:

Abstract

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.

This work was supported in part by the IST Programme of the European Community, under the Pascal Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bilenko, M., Mooney, R.: Adaptive duplicate detection using learnable string similarity measures. In: Proc. of the 9th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2003), pp. 39–48 (2003)

    Google Scholar 

  2. Bouchard, G., Triggs, B.: The tradeoff between generative and discriminative classifiers. In: Antoch, J. (ed.) Proc. in Computational Statistics (COMPSTAT 2004), 16th Symp. of IASC, Prague, vol. 16. Physica-Verlag, New York (2004)

    Google Scholar 

  3. Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 139–150. Springer, Heidelberg (1994)

    Google Scholar 

  4. Dempster, A., Laird, M., Rubin, D.: Maximun likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society B(39), 1–38 (1977)

    MathSciNet  Google Scholar 

  5. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  6. Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 1–8 (July 2002)

    Google Scholar 

  7. McCallum, A., Bellare, K., Pereira, P.: A conditional random field for discriminatively-trained finite-state string edit distance. In: Proc. 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2005), Arlington, Virginia, pp. 388–400. AUAI Press (2005)

    Google Scholar 

  8. Oncina, J., Sebban, M.: Learning stochastic edit distance: application in handwritten character recognition. Journal of Pattern Recognition (to appear, 2006)

    Google Scholar 

  9. Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Trans. on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)

    Article  Google Scholar 

  10. Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using kullback-leibler divergence and minimality. In: Proc. 17th Int. Conf. on Machine Learning (ICML 2000), pp. 975–982. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  11. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines. IEEE Trans. in Pattern Analysis and Machine Intelligence 27(7), 1013–1039 (2005)

    Article  Google Scholar 

  12. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21(1), 168–173 (1974)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bernard, M., Janodet, JC., Sebban, M. (2006). A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_20

Download citation

  • DOI: https://doi.org/10.1007/11872436_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45264-5

  • Online ISBN: 978-3-540-45265-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics