Abstract
The purpose of this paper is to summarize some of the results obtained over many years of research in proportional analogy applied to natural language processing. We recall some mathematical formalizations obtained based on general axioms drawn from a study of the history of the notion from Euclid to modern linguistics. The obtained formalization relies on two articulative notions: conformity and ratio, and on two constitutive notions: similarity and contiguity. These notions are applied on a series of objects that range from sets to strings of symbols through multi-sets and vectors, so as to obtain a mathematical formalization on each of these types of objects. Thanks to these formalizations, some results are presented that were obtained in structuring language data by the characters (bitmaps), words or short sentences in several languages like Chinese or English. An important point in using such formalizations that rely on form only, concerns the truth of the analogies retrieved or produced, i.e., whether they are valid on both the levels of form and meaning. Results of evaluation on this aspect are recalled. It is also mentioned how the formalization on string of symbols can be applied to two main tasks that would correspond to ‘langage’ and ‘parole’ in Saussurian terms: structuring language data and generating language data. The results presented have been obtained from reasonably large amounts of language data, like several thousands of Chinese characters or hundred thousand sentences in English or other languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use this term to denote the notion. By doing this, we follow the German tradition (proportionale Analogiebildung (Osthoff 1979, p. 132) proportionale Analogiebildung (Paul 1920, p. 132) Proportionalanalogie, (Becker 1990, p. 14) The other term, analogical proportion, is usually understood as the result, i.e., the lists of forms or paradigms, (Welcomme 2010, p. 91), i.e., analogical clusters for us (see Fig. 2 or Table 1). But this last term is also used for the notion itself by some authors [e.g., Richard and Prade (2014)].
- 2.
This assumption comes from the observation of language data and formal data. It is not derived from any theoretical consideration and should thus be added to the set of basic properties for strings of symbols as such.
- 3.
The alternative proposal for the definition of proportional analogies between strings of symbols found in Stroppa (2005; Stroppa and Yvon 2004; Yvon et al. 2004) does not share the same property. That proposal allows the particular proportional analogy a : a :: ab : ba which is barred by our definition, and more generally all the proportional analogies of the type: p 1.u.s 1 : p 1.u.s 1 :: p 2.u.v.s 2 : p 2.v.u.s 2 which do not hold in general in our definition (notice the inversion of u and v in the right part of the proportion analogy although the first two members of the analogy are equal). According to that proposal, A : B :: C : D iff \( A \bullet D \cap B \bullet C \ne \emptyset \), where \( \bullet \) is the shuffle of two strings. Now,
$$ \left\{ \begin{gathered}a.ba = aba{\kern 1pt} \in a \bullet ba \hfill \\ ab.a = aba{\kern 1pt} \in a \bullet ab \end{gathered} \Rightarrow a \bullet ba \cap a \bullet ab \ne \emptyset \hfill \\ \right. $$(9)and
$$ \left\{ \begin{gathered} p_{1} .p_{2} .u.vu.s_{1} .s_{2} = p_{1} p_{2} uvus_{1} s_{2} {\kern 1pt} \in A \bullet D\, \hfill \\ p_{1} .p_{2} .uv.u.s_{1} .s_{2} = p_{1} p_{2} uvus_{1} s_{2} {\kern 1pt} \in B \bullet C \end{gathered}\Rightarrow A \bullet D \cap B \bullet C \ne \emptyset \hfill \\ \right. $$(10)Consequently, our definition of proportional analogy between strings of symbols puts more constraints on parallel infixing thanks to the limitations induced by distances.
- 4.
This is also trivially implied by the definition mentioned in the above footnote.
References
Anttila, R. (1989). Historical linguistics and comparative linguistics. Amsterdam: John Benjamins.
Becker, T. (1990). Analogie und morphologische Theorie. München: Whilhelm Fink. Retrieved from http://opus4.kobv.de/opus4-bamberg/files/4324/BeckerThomasDissocrseA2.pdf.
Claveau, V., & L’Homme, M. C. (2005). Terminology by analogy-based machine learning. In: Proceedings of the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, Copenhagen (Denmark).
Croft, W. (2001). Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press (Oxford Linguistics). Retrieved from http://books.google.co.jp/books?id=ESa_E-q8hbwC.
Dandapat, S., Morriessy, S., Naskar, S. K., & Somers, H. (2010). Mitigating problems in analogy-based EBMT with SMT and vice versa: A case study with named entity transliteration. In: Proceedings of the 24th Pacific Asia Conference on Language Information and Computation (PACLIC 2010), Sendai, Japan (pp. 146–153).
Denoual, E. (2007). Analogical translation of unknown words in a statistical machine translation framework. In: Proceedings of Machine Translation Summit XI. Copenhagen.
Gentner, D. (1983). Structure mapping: A theoretical model for analogy. Cognitive Science, 7(2), 155–170.
Gil, D. (2002). From repetition to reduplication in Riau Indonesian. Paper presented at the Graz Reduplication Conference, p. s: 2. Retrieved from http://www-classic.uni-graz.at/ling2www/veranst/redup2002/abstracts/gil.pdf.
Gosme, J., & Lepage, Y. (2011). Structure des trigrammes inconnus et lissage par analogie. In M. Lafourcade & V. Prince (Eds.), Actes de TALN-2011 (vol. articles longs, pp. 345–356). ATALA.
Hoffman, R. R. (1995). Monster analogies. AI Magazine, 11, 11–35.
Itkonen, E. (2005). Analogy as structure and process: Approaches in linguistics, cognitive psychology and philosophy of science. In M. Dascal, R. W. Gibbs, & J. Nuyts (Eds.), Human cognitive processing (Vol. 14, p. 250). Amsterdam: John Benjamins.
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Tenth Machine Translation Summit (MT Summit X), Phuket, Thailand (pp. 79–86). Retrieved from http://www.mt-archive.info/MTS-2005-Koehn.pdf.
Langlais, P. (2013). Mapping source to target strings without alignment by analogical learning: A case study with transliteration. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers, pp. 684–689). Association for Computational Linguistics, Sofia, Bulgaria. Retrieved from http://www.aclweb.org/anthology/P13-2120.
Langlais, P., & Patry, A. (2007). Translating unknown words by analogical learning. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 877–886). Retrieved from http://www.aclweb.org/anthology/D/D07/D07-1092.
Langlais, P., Yvon, F., & Zweigenbaum, P. (2008). Analogical translation of medical words in different languages. In Gotal’08: Proceedings of the 6th international conference on Advances in Natural Language Processing, Lecture Notes in Artificial Intelligence, (Vol. 5221, pp. 284–295). Berlin: Springer. doi: 10.1007/978-3-540-85287-2.
Lavie, R. J. (2003). Le locuteur analogique ou la grammaire mise à sa place. Thèse de doctorat, Université de Nanterre—Paris X. Retrieved from http://tel.archives-ouvertes.fr/tel-00285173.
Lehmann, W. P. (1967). A reader in nineteenth-century historical indo-european linguistics. Bloomington: Indiana University.
Lepage, Y. (2001). Analogy and formal languages. Proceedings of FG/MOL 2001, Helsinki (pp. 1–12).
Lepage, Y. (2003). De l’analogie rendant compte de la commutation en linguistique. Mémoire d’habilitation à diriger les recherches, Université de Grenoble. Retrieved from http://tel.ccsd.cnrs.fr/tel-00004372.
Lepage, Y. (2004). Analogy and formal languages. Electronic Notes in Theoretical Computer Science, 53, 180–191. Retrieved from http://www.sciencedirect.com/.
Lepage, Y. (2004). Lower and higher estimates of the number of “true analogies” between sentences contained in a large multilingual corpus. In Proceedings of COLING-2004, Geneva (Vol. 1, pp. 736–742). Retrieved from http://aclweb.org/anthology//C/C04/C04-1106.pdf.
Lepage, Y. (2014). Analogy between binary images: Application to Chinese characters. In H. Prade & G. Richard (Eds.), Computational approaches to analogical reasoning: Current trends. (pp. 1–33). Berlin: Springer.
Lepage, Y., & Denoual, E. (2005). Adding paraphrases of the same quality to the C-STAR BTEC. In 11th Conference in Natural Language Processing (pp. 1141–1144). Yokohama University. Retrieved from http://www.slt.atr.co.jp/~lepage/pdf/nlpj05-1.pdf.gz.
Lepage, Y., & Denoual, E. (2005). BLEU in characters: Towards automatic evaluation in languages without word delimiters. In Companion Volume to the Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05) (pp. 81–86). Jeju. Retrieved from http://www.slt.atr.co.jp/~lepage/pdf/ijcnlp05.pdf.gz.
Lepage, Y., & Denoual, E. (2005). Purest ever example-based machine translation: Detailed presentation and assessment. Machine Translation, 19, 251–282. Retrieved from http://www.springerlink.com/content/tqj32n0m5v8w3m6u/fulltext.pdf.
Lepage, Y., Migeot, J., & Guillerm, E. (2007). A corpus study on the number of true proportional analogies between chunks in two typologically different languages. In Proceedings of the seventh international Symposium on Natural Language Processing (SNLP 2007) (pp. 117–122). Pattaya, Thailand: Kasetsart University (ISBN 978-974-623-062-9).
Lepage, Y., Migeot, J., & Guillerm, E. (2009). A measure of the number of true analogies between chunks in Japanese. Lecture Notes in Artificial Intelligence, 5603, 154–164.
Lepage, Y., & Peralta, G. (2004). Using paradigm tables to generate new utterances similar to those existing in linguistic resources. In Proceedings of the 4th internation conference on Language Resources and Evaluation (LREC 2004) (Vol. 1, pp. 243–246), Lisbon.
Luo, J., & Lepage, Y. (2013). A comparison of association and estimation approaches to alignment in word-to-word translation. In Proceedings of the tenth international Symposium on Natural Language Processing (SNLP 2013) (pp. 181–186), Phuket, Phuket, Thailand.
Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In A. Elithorn & R. Banerji (Eds.) Artificial & Human Intelligence (pp. 173–180). Amsterdam: Elsevier.
Osthoff, H. (1979). Kleine Beiträge zur Declinationslehre. Morphologische Untersuchungen auf dem Gebiete der Indogermanischen Sprachen II.
Paul, H. (1920). Prinzipien der Sprachgeschichte. Tübingen: Niemayer.
Richard, G., Prade, H. (2014). A short introduction to computational trends in analogical reasoning. In H. Prade & G. Richard (Eds.) Computational Approaches to Analogical Reasoning: Current Trends, (pp. i–xx). Berlin: Springer.
de Saussure, F. (1995). Cours de Linguistique Générale, [1ère éd. 1916] edn. Lausanne et Paris: Payot.
Skousen, R. (1989). Analogical Modeling of Language. Dordrecht: Kluwer.
Stankiewicz, E. (1986). Baudouin de Courtenay I Podstawy Wspólczesnego Jezykoznawstwa. Wrocław: Ossolineum.
Stroppa, N. (2005). Définitions et caractérisation de modèles à base d’analogies pour l’apprentissage automatique des langues naturelles. Thèse de doctorat, École nationale supérieure des télécommunications.
Stroppa, N., & Yvon, F. (2004). Analogies dans les séquences: un solveur à états finis. In: Actes de la 11e Conférence Annuelle sur le Traitement Automatique des Langues Naturelles (TALN-2004), p. [pas de numérotation]. Fès. Retrieved from http://aune.lpl.univ-aix.fr/jep-taln04/proceed/actes/taln2004-Fez/Stroppa-Yvon.pdf.
Stroppa, N., & Yvon, F. (2005) An analogical learner for morphological analysis. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005), (pp. 120–127). Ann Arbor, MI.
Takeya, K., & Lepage, Y. (2011). A study of the number of proportional analogies between marker-based chunks in 11 European languages. In Z. Vetulani (Ed.) Proceedings of the 5th Language & Technology Conference (LTC’1), (pp. 284–288). Poznań: Fundacja uniwersytetu im. Adama Mickiewicza.
Takeya, K., & Lepage, Y. (2013) Marker-based chunking in eleven European languages for analogy-based translation. Lecture Notes in Artificial Intelligence 8387, pp. XX–YY.
Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., & Yamamoto, S. (2002). Toward a broad coverage bilingual corpus for speech translation of travel conversation in the real world. In Proceedings of LREC 2002, (pp. 147–152). Las Palmas.
Tkaczyk, B. (2004). Re ‘cloning’, i.e. ‘reduplicative’ process. In KUL (Ed.) 36th Poznań Linguistics meetings.
Turney, P. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), (pp. 905–912). Manchester, UK: Coling 2008 Organizing Committee. Retrieved from http://aclweb.org/anthology//C/C08/C08-1114.pdf.
Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics 32(2), 379–416. Retrieved from http://aclweb.org/anthology//P/P06/P06-1040.pdf.
Turney, P. D., & Littman, M. L. (2005). Corpus-based learning of analogies and semantic relations. Machine Learning 60(1–3), 251–278. Retrieved from http://www.citebase.org/abstract?id=oai:arXiv.org:cs/0508103.
Varro, M. T. (1954). De lingua latina. Coll. Belles-lettres, Paris. Trad. J. Collart.
Veale, T., & Chen, S. (2006). Learning to extract semantic content from the orthographic structure of Chinese words. In Proceedings of the 17th Irish conference on Artificial Intelligence and Cognitive Science (AICS2006). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.6982.
Veale, T., & Li, G. (2014). Analogy as an organizational principle in the construction of large knowledge-bases. In H. Prade & G. Richard (Eds.) Computational Approaches to Analogical Reasoning: Current Trends, Studies in Computational Intelligence Vol. 548 (pp. 58–77). Berlin: Springer.
Welcomme, A. (2010). Hermann Paul et le concept d’analogie. CÍRCULO de Lingüstica Aplicada a la Comunicación (clac) 43, 49–122. Retrieved from http://pendientedemigracion.ucm.es/info/circulo/no43/welcomme.pdf.
Yang, W., Wang, H., & Lepage, Y. (2013). Using analogical associations to acquire Chinese-Japanese quasi-parallel sentences. In Proceedings of the tenth symposium on natural language processing (SNLP2013), (pp. 86–93). Phuket, Thailand.
Yvon, F., Stroppa, N., Miclet, L., & Delhay, A. (2004). Solving analogical equations on words. Rapport technique ENST2004D005, ENST.
Acknowledgments
I would like to mention the contribution of many colleagues, researchers or students in getting many of the experimental results presented here over the years. They are listed in alphabetical order below: Nicolas Auclerc, Etienne Denoual, Chooi Ling Goh, Erwan Guillerm, Juan Luo, Jin Matsuoka, Kota Matsushita, Julien Migeot, Guilhem Peralta, Kota Takeya, Hao Wang, Wei Yang.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lepage, Y. (2015). Proportional Analogy in Written Language Data. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-08043-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)