Issues in Analogical Inference Over Sequences of Symbols: A Case Study on Proper Name Transliteration

Langlais, Philippe; Yvon, François

doi:10.1007/978-3-642-54516-0_3

Issues in Analogical Inference Over Sequences of Symbols: A Case Study on Proper Name Transliteration

Philippe Langlais⁴ &
François Yvon⁵

Chapter
First Online: 01 January 2014

850 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 548))

Abstract

Formal analogies, that is, proportional analogies involving relations at a formal level (e.g. cordially is to cordial as appreciatively is to appreciative) have a long history in Linguistics. They can accommodate a wide variety of linguistic data without resorting to ad hoc representations and are inherently good at capturing long range dependencies between data. Unfortunately, applying analogical learning on top of formal analogy to current Natural Language Processing (NLP) tasks, which often involve massive amount of data, is quite challenging. In this chapter, we draw on our previous works and identify some issues that remain to be addressed for formal analogy to stand by itself in the landscape of NLP. As a case study, we monitor our current implementation of analogical learning on a task of transliterating English proper names into Chinese.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We also use \([{{x} : {y}\,{: :}\,{z} : {t}}]\) as a predicate.
2.
A solver typically produces several analogical solutions, among which a few are valid.
3.
Anagram forms do not have to be considered separately.
4.
Possibly involving filtering.
5.
Typical values are min = max = 3, \(\kappa \) = 20 000, and \(\eta \) = 5 000.
6.
The time measurements reported in this study have been made on an ordinary desktop computer, and are provided for illustration purposes only.
7.
http://research.microsoft.com/en-us/um/beijing/projects/letor/.
8.
http://en.wikipedia.org/wiki/Template:Transcription_into_Chinese. We could not segment lessan with this table.
9.
For the English-into-Chinese transliteration task we consider, there is only one reference for each test form.
10.
In the samp-tc configuration, 20 test forms receive several solutions with the same maximum frequency, among which the sanctioned transliteration. We do credit the system with a rank 1 solution in those cases, even if the correct transliteration is not listed first.
11.
We downloaded the script news_evaluation.py at http://translit.i2r.a-star.edu.sg/news2009/evaluation/.
12.
To the exception of the way we broke ties in the unfrequent cases where several solutions are produced with the top frequency.
13.
17 features were considered, based on a greedy search over the feature space, minimizing the training error rate over 100 epochs.
14.
We took this example just because the number of analogies involved is small enough.

References

Turney, P., Littman, M.: Corpus-based learning of analogies and semantic relations. Mach. Learn. 60, 251–278 (2005)
Article Google Scholar
Duc, N.T., Bollegala, D., Ishizuka, M.: Cross-language latent relational search: mapping knowledge across languages. In: AAAI’11, pp. 1237–1242 (2011)
Google Scholar
Marshall, J.B.: A self-watching model of analogy-making and perception. J. Exp. Theor. Artif. Intell. 18(3), 267–307 (2002)
Article Google Scholar
Hofstadter, D.R.: The Copycat Project: An Experiment in Nondeterminism and Creative Analogies, vol. 755. Massachusetts Institute of Technology, Cambridge (1984)
Google Scholar
Mitchell, M.: Analogy-Making as Perception. MIT Press/Bradford Books, Cambridge (1993)
Google Scholar
Yvon, F.: Paradigmatic cascades: a linguistically sound model of pronunciation by analogy. In: Proceedings of 35th ACL, pp. 429–435 (1997)
Google Scholar
Lepage, Y., Shin-ichi, A.: Saussurian analogy: a theoretical account and its application. In: 7th COLING, pp. 717–722 (1996)
Google Scholar
Yvon, F.: Finite-state machines solving analogies on words. Technical Report D008, École Nationale Supérieure des Télécommunications (2003)
Google Scholar
Yvon, F., Stroppa, N., Delhay, A., Miclet, L.: Solving analogies on words. Technical Report D005, École Nationale Supérieure des Télécommuncations, Paris, France (2004)
Google Scholar
Stroppa, N., Yvon, F.: Formal Models of Analogical Proportions. Available on HAL Portal (2007)
Google Scholar
Miclet, L., Bayroudh, S., Delhay, A.: Analogical dissimilarity: definitions, algorithms and two experiments in machine learning. J. Artif. Intell. Res. 32, 793–824 (2008)
Google Scholar
Ben Hassena, A.: Apprentissage analogique par analogie de structures d’arbres. Ph.D. thesis, University de Rennes I, France (2011)
Google Scholar
Bhargava, A., Kondrak, G.: How do you pronounce your name? improving g2p with transliterations. In: 49th ACL/HLT, Portland, USA, pp. 399–408 (2011)
Google Scholar
Lavallée, J.F., Langlais, P.: Moranapho: un système multilingue d’analyse morphologique basé sur l’analogie formelle. TAL 52, 17–44 (2011)
Google Scholar
Kurimo, M., Virpioja, S., Turunen, V., Blackwood, G., Byrne, W.: Overview and results of morpho challenge. In: 10th Workshop of the Cross-Language Evaluation Forum (CLEF 2009). Lecture Notes in Computer Science, pp. 578–597 (2009)
Google Scholar
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unan-notated text. In: International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR’05), Espoo, Finland, pp. 106–113 (2005)
Google Scholar
Spiegler, S.: Emma: A novel evaluation metric for morphological analysis—experimental results in detail. Technical Report CSTR-10-004, University of Bristol, Bristol (2010)
Google Scholar
Stroppa, N., Yvon, F.: An analogical learner for morphological analysis. In: 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, USA, pp. 120–127 (2005)
Google Scholar
van den Bosch, A., Daelemans, W.: Data-oriented methods for grapheme-to-phoneme conversion. In: EACL, Utrecht, Netherlands, pp. 45–53 (1993)
Google Scholar
Lepage, Y., Denoual, E.: Purest ever example-based machine translation: detailed presentation and assesment. Mach. Translat 19, 25–252 (2005)
Google Scholar
Lepage, Y., Lardilleux, A., Gosme, J.: The greyc translation memory for the iwslt 2009 evaluation campaign: one step beyond translation memory. In: 6th IWSLT, Tokyo, Japan, pp. 45–49 (2009)
Google Scholar
Langlais, P., Patry, A.: Translating unknown words by analogical learning. In: EMNLP, Prague, Czech Republic, pp. 877–886 (2007)
Google Scholar
Denoual, E.: Analogical translation of unknown words in a statistical machine translation framework. In: MT Summit XI, Copenhagen, Denmark, pp. 135–141 (2007)
Google Scholar
Langlais, P., Yvon, F., Zweigenbaum, P.: Improvements in analogical learning: application to translating multi-terms of the medical domain. In: 12th EACL, Athens, pp. 487–495 (2009)
Google Scholar
Koehn, P.: Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: 6th AMTA, Washington DC (2004)
Google Scholar
Somers, H., Sandapat, S., Naskar, S.K.: A review of ebmt using proportional analogies. In: 3rd Workshop on Example-Based Machine Translation, Dublin, Ireland, pp. 53–60 (2009)
Google Scholar
Gosme, J., Lepage, Y.: Structure des trigrammes inconnus et lissage par analogie. In: 18e TALN, Montpellier, France (2011)
Google Scholar
Claveau, V., L’Homme, M.C.: Structuring terminology by analogy-based machine learning. In: 7th International Conference on Terminology and Knowledge Engineering, Copenhagen, Denmark (2005)
Google Scholar
Moreau, F., Claveau, V., Sébillot, P.: Automatic morphological query expansion using analogy-based machine learning. In: 29th European conference on IR research (ECIR’07), Berlin, Heidelberg, pp. 222–233 (2007)
Google Scholar
Correa, W.F., Prade, H., Richard, G.: When intelligence is just a matter of copying. In: ECAI’12, pp. 276–281 (2012)
Google Scholar
Langlais, P., Yvon, F.: Scaling up analogical learning. Technical report, Paritech, INFRES, IC2, Paris, France (2008)
Google Scholar
Stroppa, N.: Définitions et caractérisations de modèles à base d’analogies pour l’apprentissage automatique des langues naturelles. Ph.D. thesis, Telecom Paris, ENST, Paris, France (2005)
Google Scholar
Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: 40th ACL, Philadelphia, USA, pp. 1–8 (2002)
Google Scholar
Lepage, Y., Lardilleux, A.: The greyc translation memory for the iwslt 2007 evaluation campaign. In: 4th IWSLT, Trento, Italy, pp. 49–54 (2008)
Google Scholar
Ando, S., Lepage, Y.: Linguistic structure analysis by analogy: Its efficiency. In: NLPRS, Phuket, Thailand, pp. 401–406 (1997)
Google Scholar
Dandapat, S., Morrissey, S., Naskar, S.K., Somers, H.: Mitigating problems in analogy-based ebmt with smt and vice versa: a case study with named entity transliteration. In: PACLIC, Sendai, Japan (2010)
Google Scholar
Li, H., Kumaran, A., Pervouchine, V., Zhang, M.: Report of news 2009 machine transliteration shared task. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration. NEWS ’09, pp. 1–18 (2009)
Google Scholar
Langlais, P.: Formal analogy for natural language processing: a review of issues to be adressed. In: 1st International Workshop Similarity and Analogy-based Methods in AI (ECAI Workshop), Montpellier, France, pp. 49–55 (2012)
Google Scholar
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. Mach. Learn. 37, 277–296 (1999)
Article MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 1–8 (2002)
Google Scholar
Lepage, Y.: Solving analogies on words: an algorithm. In: COLING-ACL, Montreal, Canada, pp. 728–733 (1998)
Google Scholar

Download references

Acknowledgments

This work has been partially founded by the Natural Sciences and Engineering Research Council of Canada (NSERC). We are grateful to the anonymous reviewers of the short paper submitted to the 2012 SAMAI workshop, as well as those that reviewed this article. We found one review in particular especially inspiring.

Author information

Authors and Affiliations

Université de Montreal, Montreal, Canada
Philippe Langlais
LIMSI/CNRS, Université Paris Sud, Orsay, France
François Yvon

Authors

Philippe Langlais
View author publications
You can also search for this author in PubMed Google Scholar
François Yvon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Langlais .

Editor information

Editors and Affiliations

IRIT-CNRS, University of Toulouse, Toulouse, France
Henri Prade
IRIT-CNRS, University of Toulouse, Toulouse, France
Gilles Richard

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Langlais, P., Yvon, F. (2014). Issues in Analogical Inference Over Sequences of Symbols: A Case Study on Proper Name Transliteration. In: Prade, H., Richard, G. (eds) Computational Approaches to Analogical Reasoning: Current Trends. Studies in Computational Intelligence, vol 548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54516-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-54516-0_3
Published: 23 March 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54515-3
Online ISBN: 978-3-642-54516-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics