Proportional Analogy in Written Language Data

Lepage, Yves

doi:10.1007/978-3-319-08043-7_10

Yves Lepage⁵

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

1565 Accesses
2 Citations

Abstract

The purpose of this paper is to summarize some of the results obtained over many years of research in proportional analogy applied to natural language processing. We recall some mathematical formalizations obtained based on general axioms drawn from a study of the history of the notion from Euclid to modern linguistics. The obtained formalization relies on two articulative notions: conformity and ratio, and on two constitutive notions: similarity and contiguity. These notions are applied on a series of objects that range from sets to strings of symbols through multi-sets and vectors, so as to obtain a mathematical formalization on each of these types of objects. Thanks to these formalizations, some results are presented that were obtained in structuring language data by the characters (bitmaps), words or short sentences in several languages like Chinese or English. An important point in using such formalizations that rely on form only, concerns the truth of the analogies retrieved or produced, i.e., whether they are valid on both the levels of form and meaning. Results of evaluation on this aspect are recalled. It is also mentioned how the formalization on string of symbols can be applied to two main tasks that would correspond to ‘langage’ and ‘parole’ in Saussurian terms: structuring language data and generating language data. The results presented have been obtained from reasonably large amounts of language data, like several thousands of Chinese characters or hundred thousand sentences in English or other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use this term to denote the notion. By doing this, we follow the German tradition (proportionale Analogiebildung (Osthoff 1979, p. 132) proportionale Analogiebildung (Paul 1920, p. 132) Proportionalanalogie, (Becker 1990, p. 14) The other term, analogical proportion, is usually understood as the result, i.e., the lists of forms or paradigms, (Welcomme 2010, p. 91), i.e., analogical clusters for us (see Fig. 2 or Table 1). But this last term is also used for the notion itself by some authors [e.g., Richard and Prade (2014)].
2.
This assumption comes from the observation of language data and formal data. It is not derived from any theoretical consideration and should thus be added to the set of basic properties for strings of symbols as such.
3.
The alternative proposal for the definition of proportional analogies between strings of symbols found in Stroppa (2005; Stroppa and Yvon 2004; Yvon et al. 2004) does not share the same property. That proposal allows the particular proportional analogy a : a :: ab : ba which is barred by our definition, and more generally all the proportional analogies of the type: p ₁.u.s ₁ : p ₁.u.s ₁ :: p ₂.u.v.s ₂ : p ₂.v.u.s ₂ which do not hold in general in our definition (notice the inversion of u and v in the right part of the proportion analogy although the first two members of the analogy are equal). According to that proposal, A : B :: C : D iff $ A \bullet D \cap B \bullet C \ne \emptyset $, where $ \bullet $ is the shuffle of two strings. Now,
$$ \left\{ \begin{gathered}a.ba = aba{\kern 1pt} \in a \bullet ba \hfill \\ ab.a = aba{\kern 1pt} \in a \bullet ab \end{gathered} \Rightarrow a \bullet ba \cap a \bullet ab \ne \emptyset \hfill \\ \right. $$
(9)
and
$$ \left\{ \begin{gathered} p_{1} .p_{2} .u.vu.s_{1} .s_{2} = p_{1} p_{2} uvus_{1} s_{2} {\kern 1pt} \in A \bullet D\, \hfill \\ p_{1} .p_{2} .uv.u.s_{1} .s_{2} = p_{1} p_{2} uvus_{1} s_{2} {\kern 1pt} \in B \bullet C \end{gathered}\Rightarrow A \bullet D \cap B \bullet C \ne \emptyset \hfill \\ \right. $$
(10)

Consequently, our definition of proportional analogy between strings of symbols puts more constraints on parallel infixing thanks to the limitations induced by distances.
4.
This is also trivially implied by the definition mentioned in the above footnote.

References

Anttila, R. (1989). Historical linguistics and comparative linguistics. Amsterdam: John Benjamins.
Google Scholar
Becker, T. (1990). Analogie und morphologische Theorie. München: Whilhelm Fink. Retrieved from http://opus4.kobv.de/opus4-bamberg/files/4324/BeckerThomasDissocrseA2.pdf.
Claveau, V., & L’Homme, M. C. (2005). Terminology by analogy-based machine learning. In: Proceedings of the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, Copenhagen (Denmark).
Google Scholar
Croft, W. (2001). Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford University Press (Oxford Linguistics). Retrieved from http://books.google.co.jp/books?id=ESa_E-q8hbwC.
Dandapat, S., Morriessy, S., Naskar, S. K., & Somers, H. (2010). Mitigating problems in analogy-based EBMT with SMT and vice versa: A case study with named entity transliteration. In: Proceedings of the 24th Pacific Asia Conference on Language Information and Computation (PACLIC 2010), Sendai, Japan (pp. 146–153).
Google Scholar
Denoual, E. (2007). Analogical translation of unknown words in a statistical machine translation framework. In: Proceedings of Machine Translation Summit XI. Copenhagen.
Google Scholar
Gentner, D. (1983). Structure mapping: A theoretical model for analogy. Cognitive Science, 7(2), 155–170.
Article Google Scholar
Gil, D. (2002). From repetition to reduplication in Riau Indonesian. Paper presented at the Graz Reduplication Conference, p. s: 2. Retrieved from http://www-classic.uni-graz.at/ling2www/veranst/redup2002/abstracts/gil.pdf.
Gosme, J., & Lepage, Y. (2011). Structure des trigrammes inconnus et lissage par analogie. In M. Lafourcade & V. Prince (Eds.), Actes de TALN-2011 (vol. articles longs, pp. 345–356). ATALA.
Google Scholar
Hoffman, R. R. (1995). Monster analogies. AI Magazine, 11, 11–35.
Google Scholar
Itkonen, E. (2005). Analogy as structure and process: Approaches in linguistics, cognitive psychology and philosophy of science. In M. Dascal, R. W. Gibbs, & J. Nuyts (Eds.), Human cognitive processing (Vol. 14, p. 250). Amsterdam: John Benjamins.
Google Scholar
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the Tenth Machine Translation Summit (MT Summit X), Phuket, Thailand (pp. 79–86). Retrieved from http://www.mt-archive.info/MTS-2005-Koehn.pdf.
Langlais, P. (2013). Mapping source to target strings without alignment by analogical learning: A case study with transliteration. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers, pp. 684–689). Association for Computational Linguistics, Sofia, Bulgaria. Retrieved from http://www.aclweb.org/anthology/P13-2120.
Langlais, P., & Patry, A. (2007). Translating unknown words by analogical learning. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 877–886). Retrieved from http://www.aclweb.org/anthology/D/D07/D07-1092.
Langlais, P., Yvon, F., & Zweigenbaum, P. (2008). Analogical translation of medical words in different languages. In Gotal’08: Proceedings of the 6th international conference on Advances in Natural Language Processing, Lecture Notes in Artificial Intelligence, (Vol. 5221, pp. 284–295). Berlin: Springer. doi: 10.1007/978-3-540-85287-2.
Lavie, R. J. (2003). Le locuteur analogique ou la grammaire mise à sa place. Thèse de doctorat, Université de Nanterre—Paris X. Retrieved from http://tel.archives-ouvertes.fr/tel-00285173.
Lehmann, W. P. (1967). A reader in nineteenth-century historical indo-european linguistics. Bloomington: Indiana University.
Google Scholar
Lepage, Y. (2001). Analogy and formal languages. Proceedings of FG/MOL 2001, Helsinki (pp. 1–12).
Google Scholar
Lepage, Y. (2003). De l’analogie rendant compte de la commutation en linguistique. Mémoire d’habilitation à diriger les recherches, Université de Grenoble. Retrieved from http://tel.ccsd.cnrs.fr/tel-00004372.
Lepage, Y. (2004). Analogy and formal languages. Electronic Notes in Theoretical Computer Science, 53, 180–191. Retrieved from http://www.sciencedirect.com/.
Lepage, Y. (2004). Lower and higher estimates of the number of “true analogies” between sentences contained in a large multilingual corpus. In Proceedings of COLING-2004, Geneva (Vol. 1, pp. 736–742). Retrieved from http://aclweb.org/anthology//C/C04/C04-1106.pdf.
Lepage, Y. (2014). Analogy between binary images: Application to Chinese characters. In H. Prade & G. Richard (Eds.), Computational approaches to analogical reasoning: Current trends. (pp. 1–33). Berlin: Springer.
Google Scholar
Lepage, Y., & Denoual, E. (2005). Adding paraphrases of the same quality to the C-STAR BTEC. In 11th Conference in Natural Language Processing (pp. 1141–1144). Yokohama University. Retrieved from http://www.slt.atr.co.jp/~lepage/pdf/nlpj05-1.pdf.gz.
Lepage, Y., & Denoual, E. (2005). BLEU in characters: Towards automatic evaluation in languages without word delimiters. In Companion Volume to the Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05) (pp. 81–86). Jeju. Retrieved from http://www.slt.atr.co.jp/~lepage/pdf/ijcnlp05.pdf.gz.
Lepage, Y., & Denoual, E. (2005). Purest ever example-based machine translation: Detailed presentation and assessment. Machine Translation, 19, 251–282. Retrieved from http://www.springerlink.com/content/tqj32n0m5v8w3m6u/fulltext.pdf.
Lepage, Y., Migeot, J., & Guillerm, E. (2007). A corpus study on the number of true proportional analogies between chunks in two typologically different languages. In Proceedings of the seventh international Symposium on Natural Language Processing (SNLP 2007) (pp. 117–122). Pattaya, Thailand: Kasetsart University (ISBN 978-974-623-062-9).
Google Scholar
Lepage, Y., Migeot, J., & Guillerm, E. (2009). A measure of the number of true analogies between chunks in Japanese. Lecture Notes in Artificial Intelligence, 5603, 154–164.
Google Scholar
Lepage, Y., & Peralta, G. (2004). Using paradigm tables to generate new utterances similar to those existing in linguistic resources. In Proceedings of the 4th internation conference on Language Resources and Evaluation (LREC 2004) (Vol. 1, pp. 243–246), Lisbon.
Google Scholar
Luo, J., & Lepage, Y. (2013). A comparison of association and estimation approaches to alignment in word-to-word translation. In Proceedings of the tenth international Symposium on Natural Language Processing (SNLP 2013) (pp. 181–186), Phuket, Phuket, Thailand.
Google Scholar
Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In A. Elithorn & R. Banerji (Eds.) Artificial & Human Intelligence (pp. 173–180). Amsterdam: Elsevier.
Google Scholar
Osthoff, H. (1979). Kleine Beiträge zur Declinationslehre. Morphologische Untersuchungen auf dem Gebiete der Indogermanischen Sprachen II.
Google Scholar
Paul, H. (1920). Prinzipien der Sprachgeschichte. Tübingen: Niemayer.
Google Scholar
Richard, G., Prade, H. (2014). A short introduction to computational trends in analogical reasoning. In H. Prade & G. Richard (Eds.) Computational Approaches to Analogical Reasoning: Current Trends, (pp. i–xx). Berlin: Springer.
Google Scholar
de Saussure, F. (1995). Cours de Linguistique Générale, [1ère éd. 1916] edn. Lausanne et Paris: Payot.
Google Scholar
Skousen, R. (1989). Analogical Modeling of Language. Dordrecht: Kluwer.
MATH Google Scholar
Stankiewicz, E. (1986). Baudouin de Courtenay I Podstawy Wspólczesnego Jezykoznawstwa. Wrocław: Ossolineum.
Google Scholar
Stroppa, N. (2005). Définitions et caractérisation de modèles à base d’analogies pour l’apprentissage automatique des langues naturelles. Thèse de doctorat, École nationale supérieure des télécommunications.
Google Scholar
Stroppa, N., & Yvon, F. (2004). Analogies dans les séquences: un solveur à états finis. In: Actes de la 11e Conférence Annuelle sur le Traitement Automatique des Langues Naturelles (TALN-2004), p. [pas de numérotation]. Fès. Retrieved from http://aune.lpl.univ-aix.fr/jep-taln04/proceed/actes/taln2004-Fez/Stroppa-Yvon.pdf.
Stroppa, N., & Yvon, F. (2005) An analogical learner for morphological analysis. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005), (pp. 120–127). Ann Arbor, MI.
Google Scholar
Takeya, K., & Lepage, Y. (2011). A study of the number of proportional analogies between marker-based chunks in 11 European languages. In Z. Vetulani (Ed.) Proceedings of the 5th Language & Technology Conference (LTC’1), (pp. 284–288). Poznań: Fundacja uniwersytetu im. Adama Mickiewicza.
Google Scholar
Takeya, K., & Lepage, Y. (2013) Marker-based chunking in eleven European languages for analogy-based translation. Lecture Notes in Artificial Intelligence 8387, pp. XX–YY.
Google Scholar
Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., & Yamamoto, S. (2002). Toward a broad coverage bilingual corpus for speech translation of travel conversation in the real world. In Proceedings of LREC 2002, (pp. 147–152). Las Palmas.
Google Scholar
Tkaczyk, B. (2004). Re ‘cloning’, i.e. ‘reduplicative’ process. In KUL (Ed.) 36th Poznań Linguistics meetings.
Google Scholar
Turney, P. (2008). A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), (pp. 905–912). Manchester, UK: Coling 2008 Organizing Committee. Retrieved from http://aclweb.org/anthology//C/C08/C08-1114.pdf.
Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics 32(2), 379–416. Retrieved from http://aclweb.org/anthology//P/P06/P06-1040.pdf.
Turney, P. D., & Littman, M. L. (2005). Corpus-based learning of analogies and semantic relations. Machine Learning 60(1–3), 251–278. Retrieved from http://www.citebase.org/abstract?id=oai:arXiv.org:cs/0508103.
Varro, M. T. (1954). De lingua latina. Coll. Belles-lettres, Paris. Trad. J. Collart.
Google Scholar
Veale, T., & Chen, S. (2006). Learning to extract semantic content from the orthographic structure of Chinese words. In Proceedings of the 17th Irish conference on Artificial Intelligence and Cognitive Science (AICS2006). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.6982.
Veale, T., & Li, G. (2014). Analogy as an organizational principle in the construction of large knowledge-bases. In H. Prade & G. Richard (Eds.) Computational Approaches to Analogical Reasoning: Current Trends, Studies in Computational Intelligence Vol. 548 (pp. 58–77). Berlin: Springer.
Google Scholar
Welcomme, A. (2010). Hermann Paul et le concept d’analogie. CÍRCULO de Lingüstica Aplicada a la Comunicación (clac) 43, 49–122. Retrieved from http://pendientedemigracion.ucm.es/info/circulo/no43/welcomme.pdf.
Yang, W., Wang, H., & Lepage, Y. (2013). Using analogical associations to acquire Chinese-Japanese quasi-parallel sentences. In Proceedings of the tenth symposium on natural language processing (SNLP2013), (pp. 86–93). Phuket, Thailand.
Google Scholar
Yvon, F., Stroppa, N., Miclet, L., & Delhay, A. (2004). Solving analogical equations on words. Rapport technique ENST2004D005, ENST.
Google Scholar

Download references

Acknowledgments

I would like to mention the contribution of many colleagues, researchers or students in getting many of the experimental results presented here over the years. They are listed in alphabetical order below: Nicolas Auclerc, Etienne Denoual, Chooi Ling Goh, Erwan Guillerm, Juan Luo, Jin Matsuoka, Kota Matsushita, Julien Migeot, Guilhem Peralta, Kota Takeya, Hao Wang, Wei Yang.

Author information

Authors and Affiliations

Graduate School of Information, Production and Systems, Waseda University, Hibikino 2–7, Wakamatsu-Ku, Kitakyushu-Shi, Fukuoka-Ken, 808-0135, Japan
Yves Lepage

Authors

Yves Lepage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves Lepage .

Editor information

Editors and Affiliations

CNRS-LIF, UMR 7279, Aix-Marseille University, City, France
Núria Gala
CNRS-LIF, UMR 7279, Aix-Marseille University and University of Mainz, Marseille, France
Reinhard Rapp
CNRS-LIF, UMR 7279, Aix-Marseille University, Marseille, France
Gemma Bel-Enguix

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lepage, Y. (2015). Proportional Analogy in Written Language Data. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-08043-7_10
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics