A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer

Bernard, Marc; Janodet, Jean-Christophe; Sebban, Marc

doi:10.1007/11872436_20

Marc Bernard²³,
Jean-Christophe Janodet²³ &
Marc Sebban²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4201))

Included in the following conference series:

International Colloquium on Grammatical Inference

546 Accesses
4 Citations

Abstract

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.

This work was supported in part by the IST Programme of the European Community, under the Pascal Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bilenko, M., Mooney, R.: Adaptive duplicate detection using learnable string similarity measures. In: Proc. of the 9th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2003), pp. 39–48 (2003)
Google Scholar
Bouchard, G., Triggs, B.: The tradeoff between generative and discriminative classifiers. In: Antoch, J. (ed.) Proc. in Computational Statistics (COMPSTAT 2004), 16th Symp. of IASC, Prague, vol. 16. Physica-Verlag, New York (2004)
Google Scholar
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 139–150. Springer, Heidelberg (1994)
Google Scholar
Dempster, A., Laird, M., Rubin, D.: Maximun likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society B(39), 1–38 (1977)
MathSciNet Google Scholar
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 1–8 (July 2002)
Google Scholar
McCallum, A., Bellare, K., Pereira, P.: A conditional random field for discriminatively-trained finite-state string edit distance. In: Proc. 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2005), Arlington, Virginia, pp. 388–400. AUAI Press (2005)
Google Scholar
Oncina, J., Sebban, M.: Learning stochastic edit distance: application in handwritten character recognition. Journal of Pattern Recognition (to appear, 2006)
Google Scholar
Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Trans. on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)
Article Google Scholar
Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using kullback-leibler divergence and minimality. In: Proc. 17th Int. Conf. on Machine Learning (ICML 2000), pp. 975–982. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines. IEEE Trans. in Pattern Analysis and Machine Intelligence 27(7), 1013–1039 (2005)
Article Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21(1), 168–173 (1974)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

EURISE, Université Jean Monnet de Saint-Etienne, 23, rue Paul Michelon, 42023, Saint-Etienne, France
Marc Bernard, Jean-Christophe Janodet & Marc Sebban

Authors

Marc Bernard
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Janodet
View author publications
You can also search for this author in PubMed Google Scholar
Marc Sebban
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, 223-8522, Yokohama, Japan
Yasubumi Sakakibara
Dept. of Computer Science, Kyoto Sangyo University, Kamigamo Motoyama, Kita-ku, Kyoto, Japan
Satoshi Kobayashi
Japan Biological Informatics Consortium, 10F TIME24 Building, 2-45 Aomi, Koto-ku, 135-8073, Tokyo, Japan
Kengo Sato
Department of Information and Communication Engineering, Graduate School of Electro-Communications, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, 182-8585, Tokyo, Japan
Tetsuro Nishino
Department of Information and Communication Engineering, Faculty of Electro-Communications, The University of Electro-Communications, Chofugaoka 1–5–1, Chofu, 182-8585, Tokyo, Japan
Etsuji Tomita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bernard, M., Janodet, JC., Sebban, M. (2006). A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_20

Download citation

DOI: https://doi.org/10.1007/11872436_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics