Skip to main content

Proportional Analogy in Written Language Data

  • Chapter
  • First Online:
Language Production, Cognition, and the Lexicon

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

Abstract

The purpose of this paper is to summarize some of the results obtained over many years of research in proportional analogy applied to natural language processing. We recall some mathematical formalizations obtained based on general axioms drawn from a study of the history of the notion from Euclid to modern linguistics. The obtained formalization relies on two articulative notions: conformity and ratio, and on two constitutive notions: similarity and contiguity. These notions are applied on a series of objects that range from sets to strings of symbols through multi-sets and vectors, so as to obtain a mathematical formalization on each of these types of objects. Thanks to these formalizations, some results are presented that were obtained in structuring language data by the characters (bitmaps), words or short sentences in several languages like Chinese or English. An important point in using such formalizations that rely on form only, concerns the truth of the analogies retrieved or produced, i.e., whether they are valid on both the levels of form and meaning. Results of evaluation on this aspect are recalled. It is also mentioned how the formalization on string of symbols can be applied to two main tasks that would correspond to ‘langage’ and ‘parole’ in Saussurian terms: structuring language data and generating language data. The results presented have been obtained from reasonably large amounts of language data, like several thousands of Chinese characters or hundred thousand sentences in English or other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use this term to denote the notion. By doing this, we follow the German tradition (proportionale Analogiebildung (Osthoff 1979, p. 132) proportionale Analogiebildung (Paul 1920, p. 132) Proportionalanalogie, (Becker 1990, p. 14) The other term, analogical proportion, is usually understood as the result, i.e., the lists of forms or paradigms, (Welcomme 2010, p. 91), i.e., analogical clusters for us (see Fig. 2 or Table 1). But this last term is also used for the notion itself by some authors [e.g., Richard and Prade (2014)].

  2. 2.

    This assumption comes from the observation of language data and formal data. It is not derived from any theoretical consideration and should thus be added to the set of basic properties for strings of symbols as such.

  3. 3.

    The alternative proposal for the definition of proportional analogies between strings of symbols found in Stroppa (2005; Stroppa and Yvon 2004; Yvon et al. 2004) does not share the same property. That proposal allows the particular proportional analogy a : a :: ab : ba which is barred by our definition, and more generally all the proportional analogies of the type: p 1.u.s 1 : p 1.u.s 1 :: p 2.u.v.s 2 : p 2.v.u.s 2 which do not hold in general in our definition (notice the inversion of u and v in the right part of the proportion analogy although the first two members of the analogy are equal). According to that proposal, A : B :: C : D iff \( A \bullet D \cap B \bullet C \ne \emptyset \), where \( \bullet \) is the shuffle of two strings. Now,

    $$ \left\{ \begin{gathered}a.ba = aba{\kern 1pt} \in a \bullet ba \hfill \\ ab.a = aba{\kern 1pt} \in a \bullet ab \end{gathered} \Rightarrow a \bullet ba \cap a \bullet ab \ne \emptyset \hfill \\ \right. $$
    (9)

    and

    $$ \left\{ \begin{gathered} p_{1} .p_{2} .u.vu.s_{1} .s_{2} = p_{1} p_{2} uvus_{1} s_{2} {\kern 1pt} \in A \bullet D\, \hfill \\ p_{1} .p_{2} .uv.u.s_{1} .s_{2} = p_{1} p_{2} uvus_{1} s_{2} {\kern 1pt} \in B \bullet C \end{gathered}\Rightarrow A \bullet D \cap B \bullet C \ne \emptyset \hfill \\ \right. $$
    (10)

    Consequently, our definition of proportional analogy between strings of symbols puts more constraints on parallel infixing thanks to the limitations induced by distances.

  4. 4.

    This is also trivially implied by the definition mentioned in the above footnote.

References

  • Anttila, R. (1989). Historical linguistics and comparative linguistics. Amsterdam: John Benjamins.

    Google Scholar 

  • Becker, T. (1990). Analogie und morphologische Theorie. München: Whilhelm Fink. Retrieved from http://opus4.kobv.de/opus4-bamberg/files/4324/BeckerThomasDissocrseA2.pdf.

  • Claveau, V., & L’Homme, M. C. (2005). Terminology by analogy-based machine learning. In: Proceedings of the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, Copenhagen (Denmark).

    Google Scholar 

  • Croft, W. (2001). Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press (Oxford Linguistics). Retrieved from http://books.google.co.jp/books?id=ESa_E-q8hbwC.

  • Dandapat, S., Morriessy, S., Naskar, S. K., & Somers, H. (2010). Mitigating problems in analogy-based EBMT with SMT and vice versa: A case study with named entity transliteration. In: Proceedings of the 24th Pacific Asia Conference on Language Information and Computation (PACLIC 2010), Sendai, Japan (pp. 146–153).

    Google Scholar 

  • Denoual, E. (2007). Analogical translation of unknown words in a statistical machine translation framework. In: Proceedings of Machine Translation Summit XI. Copenhagen.

    Google Scholar 

  • Gentner, D. (1983). Structure mapping: A theoretical model for analogy. Cognitive Science, 7(2), 155–170.

    Article  Google Scholar 

  • Gil, D. (2002). From repetition to reduplication in Riau Indonesian. Paper presented at the Graz Reduplication Conference, p. s: 2. Retrieved from http://www-classic.uni-graz.at/ling2www/veranst/redup2002/abstracts/gil.pdf.

  • Gosme, J., & Lepage, Y. (2011). Structure des trigrammes inconnus et lissage par analogie. In M. Lafourcade & V. Prince (Eds.), Actes de TALN-2011 (vol. articles longs, pp. 345–356). ATALA.

    Google Scholar 

  • Hoffman, R. R. (1995). Monster analogies. AI Magazine, 11, 11–35.

    Google Scholar 

  • Itkonen, E. (2005). Analogy as structure and process: Approaches in linguistics, cognitive psychology and philosophy of science. In M. Dascal, R. W. Gibbs, & J. Nuyts (Eds.), Human cognitive processing (Vol. 14, p. 250). Amsterdam: John Benjamins.

    Google Scholar 

  • Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Tenth Machine Translation Summit (MT Summit X), Phuket, Thailand (pp. 79–86). Retrieved from http://www.mt-archive.info/MTS-2005-Koehn.pdf.

  • Langlais, P. (2013). Mapping source to target strings without alignment by analogical learning: A case study with transliteration. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers, pp. 684–689). Association for Computational Linguistics, Sofia, Bulgaria. Retrieved from http://www.aclweb.org/anthology/P13-2120.

  • Langlais, P., & Patry, A. (2007). Translating unknown words by analogical learning. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 877–886). Retrieved from http://www.aclweb.org/anthology/D/D07/D07-1092.

  • Langlais, P., Yvon, F., & Zweigenbaum, P. (2008). Analogical translation of medical words in different languages. In Gotal’08: Proceedings of the 6th international conference on Advances in Natural Language Processing, Lecture Notes in Artificial Intelligence, (Vol. 5221, pp. 284–295). Berlin: Springer. doi: 10.1007/978-3-540-85287-2.

  • Lavie, R. J. (2003). Le locuteur analogique ou la grammaire mise à sa place. Thèse de doctorat, Université de Nanterre—Paris X. Retrieved from http://tel.archives-ouvertes.fr/tel-00285173.

  • Lehmann, W. P. (1967). A reader in nineteenth-century historical indo-european linguistics. Bloomington: Indiana University.

    Google Scholar 

  • Lepage, Y. (2001). Analogy and formal languages. Proceedings of FG/MOL 2001, Helsinki (pp. 1–12).

    Google Scholar 

  • Lepage, Y. (2003). De l’analogie rendant compte de la commutation en linguistique. Mémoire d’habilitation à diriger les recherches, Université de Grenoble. Retrieved from http://tel.ccsd.cnrs.fr/tel-00004372.

  • Lepage, Y. (2004). Analogy and formal languages. Electronic Notes in Theoretical Computer Science, 53, 180–191. Retrieved from http://www.sciencedirect.com/.

  • Lepage, Y. (2004). Lower and higher estimates of the number of “true analogies” between sentences contained in a large multilingual corpus. In Proceedings of COLING-2004, Geneva (Vol. 1, pp. 736–742). Retrieved from http://aclweb.org/anthology//C/C04/C04-1106.pdf.

  • Lepage, Y. (2014). Analogy between binary images: Application to Chinese characters. In H. Prade & G. Richard (Eds.), Computational approaches to analogical reasoning: Current trends. (pp. 1–33). Berlin: Springer.

    Google Scholar 

  • Lepage, Y., & Denoual, E. (2005). Adding paraphrases of the same quality to the C-STAR BTEC. In 11th Conference in Natural Language Processing (pp. 1141–1144). Yokohama University. Retrieved from http://www.slt.atr.co.jp/~lepage/pdf/nlpj05-1.pdf.gz.

  • Lepage, Y., & Denoual, E. (2005). BLEU in characters: Towards automatic evaluation in languages without word delimiters. In Companion Volume to the Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05) (pp. 81–86). Jeju. Retrieved from http://www.slt.atr.co.jp/~lepage/pdf/ijcnlp05.pdf.gz.

  • Lepage, Y., & Denoual, E. (2005). Purest ever example-based machine translation: Detailed presentation and assessment. Machine Translation, 19, 251–282. Retrieved from http://www.springerlink.com/content/tqj32n0m5v8w3m6u/fulltext.pdf.

  • Lepage, Y., Migeot, J., & Guillerm, E. (2007). A corpus study on the number of true proportional analogies between chunks in two typologically different languages. In Proceedings of the seventh international Symposium on Natural Language Processing (SNLP 2007) (pp. 117–122). Pattaya, Thailand: Kasetsart University (ISBN 978-974-623-062-9).

    Google Scholar 

  • Lepage, Y., Migeot, J., & Guillerm, E. (2009). A measure of the number of true analogies between chunks in Japanese. Lecture Notes in Artificial Intelligence, 5603, 154–164.

    Google Scholar 

  • Lepage, Y., & Peralta, G. (2004). Using paradigm tables to generate new utterances similar to those existing in linguistic resources. In Proceedings of the 4th internation conference on Language Resources and Evaluation (LREC 2004) (Vol. 1, pp. 243–246), Lisbon.

    Google Scholar 

  • Luo, J., & Lepage, Y. (2013). A comparison of association and estimation approaches to alignment in word-to-word translation. In Proceedings of the tenth international Symposium on Natural Language Processing (SNLP 2013) (pp. 181–186), Phuket, Phuket, Thailand.

    Google Scholar 

  • Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In A. Elithorn & R. Banerji (Eds.) Artificial & Human Intelligence (pp. 173–180). Amsterdam: Elsevier.

    Google Scholar 

  • Osthoff, H. (1979). Kleine Beiträge zur Declinationslehre. Morphologische Untersuchungen auf dem Gebiete der Indogermanischen Sprachen II.

    Google Scholar 

  • Paul, H. (1920). Prinzipien der Sprachgeschichte. Tübingen: Niemayer.

    Google Scholar 

  • Richard, G., Prade, H. (2014). A short introduction to computational trends in analogical reasoning. In H. Prade & G. Richard (Eds.) Computational Approaches to Analogical Reasoning: Current Trends, (pp. i–xx). Berlin: Springer.

    Google Scholar 

  • de Saussure, F. (1995). Cours de Linguistique Générale, [1ère éd. 1916] edn. Lausanne et Paris: Payot.

    Google Scholar 

  • Skousen, R. (1989). Analogical Modeling of Language. Dordrecht: Kluwer.

    MATH  Google Scholar 

  • Stankiewicz, E. (1986). Baudouin de Courtenay I Podstawy Wspólczesnego Jezykoznawstwa. Wrocław: Ossolineum.

    Google Scholar 

  • Stroppa, N. (2005). Définitions et caractérisation de modèles à base d’analogies pour l’apprentissage automatique des langues naturelles. Thèse de doctorat, École nationale supérieure des télécommunications.

    Google Scholar 

  • Stroppa, N., & Yvon, F. (2004). Analogies dans les séquences: un solveur à états finis. In: Actes de la 11e Conférence Annuelle sur le Traitement Automatique des Langues Naturelles (TALN-2004), p. [pas de numérotation]. Fès. Retrieved from http://aune.lpl.univ-aix.fr/jep-taln04/proceed/actes/taln2004-Fez/Stroppa-Yvon.pdf.

  • Stroppa, N., & Yvon, F. (2005) An analogical learner for morphological analysis. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005), (pp. 120–127). Ann Arbor, MI.

    Google Scholar 

  • Takeya, K., & Lepage, Y. (2011). A study of the number of proportional analogies between marker-based chunks in 11 European languages. In Z. Vetulani (Ed.) Proceedings of the 5th Language & Technology Conference (LTC’1), (pp. 284–288). Poznań: Fundacja uniwersytetu im. Adama Mickiewicza.

    Google Scholar 

  • Takeya, K., & Lepage, Y. (2013) Marker-based chunking in eleven European languages for analogy-based translation. Lecture Notes in Artificial Intelligence 8387, pp. XX–YY.

    Google Scholar 

  • Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., & Yamamoto, S. (2002). Toward a broad coverage bilingual corpus for speech translation of travel conversation in the real world. In Proceedings of LREC 2002, (pp. 147–152). Las Palmas.

    Google Scholar 

  • Tkaczyk, B. (2004). Re ‘cloning’, i.e. ‘reduplicative’ process. In KUL (Ed.) 36th Poznań Linguistics meetings.

    Google Scholar 

  • Turney, P. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), (pp. 905–912). Manchester, UK: Coling 2008 Organizing Committee. Retrieved from http://aclweb.org/anthology//C/C08/C08-1114.pdf.

  • Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics 32(2), 379–416. Retrieved from http://aclweb.org/anthology//P/P06/P06-1040.pdf.

  • Turney, P. D., & Littman, M. L. (2005). Corpus-based learning of analogies and semantic relations. Machine Learning 60(1–3), 251–278. Retrieved from http://www.citebase.org/abstract?id=oai:arXiv.org:cs/0508103.

  • Varro, M. T. (1954). De lingua latina. Coll. Belles-lettres, Paris. Trad. J. Collart.

    Google Scholar 

  • Veale, T., & Chen, S. (2006). Learning to extract semantic content from the orthographic structure of Chinese words. In Proceedings of the 17th Irish conference on Artificial Intelligence and Cognitive Science (AICS2006). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.6982.

  • Veale, T., & Li, G. (2014). Analogy as an organizational principle in the construction of large knowledge-bases. In H. Prade & G. Richard (Eds.) Computational Approaches to Analogical Reasoning: Current Trends, Studies in Computational Intelligence Vol. 548 (pp. 58–77). Berlin: Springer.

    Google Scholar 

  • Welcomme, A. (2010). Hermann Paul et le concept d’analogie. CÍRCULO de Lingüstica Aplicada a la Comunicación (clac) 43, 49–122. Retrieved from http://pendientedemigracion.ucm.es/info/circulo/no43/welcomme.pdf.

  • Yang, W., Wang, H., & Lepage, Y. (2013). Using analogical associations to acquire Chinese-Japanese quasi-parallel sentences. In Proceedings of the tenth symposium on natural language processing (SNLP2013), (pp. 86–93). Phuket, Thailand.

    Google Scholar 

  • Yvon, F., Stroppa, N., Miclet, L., & Delhay, A. (2004). Solving analogical equations on words. Rapport technique ENST2004D005, ENST.

    Google Scholar 

Download references

Acknowledgments

I would like to mention the contribution of many colleagues, researchers or students in getting many of the experimental results presented here over the years. They are listed in alphabetical order below: Nicolas Auclerc, Etienne Denoual, Chooi Ling Goh, Erwan Guillerm, Juan Luo, Jin Matsuoka, Kota Matsushita, Julien Migeot, Guilhem Peralta, Kota Takeya, Hao Wang, Wei Yang.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Lepage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lepage, Y. (2015). Proportional Analogy in Written Language Data. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08043-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08042-0

  • Online ISBN: 978-3-319-08043-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics