Clues to Compare Languages for Morphosyntactic Analysis: A Study Run on Parallel Corpora and Morphosyntactic Lexicons

  • Helena Blancafort
  • Claude de Loupy
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)


The aim of the present work is to find clues on how to compare the difficulties of five languages for morphosyntactic analysis and the development of lexicographic resources running a corpora and lexical comparative study on multilingual parallel corpora and morphosyntactic lexicons. First, we ran some corpus-based experiments without any other type of knowledge, following classical measures used in lexical statistics. Then we carried out further experiments on the corpora using morphosyntactic lexicons. Finally, we plotted given diagrams using different clues to offer an overview of the difficulty of a language for the development of morphosyntactic resources.


morphosyntactic lexicons multilingual resources lexical statistics language typology for NLP 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: Proceedings of LREC 2006. ELRA, Genoa (2006)Google Scholar
  2. 2.
    Baayen, H.: Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht (2001)CrossRefzbMATHGoogle Scholar
  3. 3.
    Baroni, M.: Distributions in text. In: Lüdeling, A., Kytö, M. (eds.) Corpus linguistics: An International Handbook, vol. 2, pp. 803–821. Mouton de Gruyter (2009)Google Scholar
  4. 4.
    Blancafort, H., de Loupy, C.: Comparing languages from vocabulary growth to inflection paradigms: a study run on parallel corpora and multilingual lexicons. Procesamiento del lenguaje natural 41, 113–120 (2008) ISSN 1135-5948 Google Scholar
  5. 5.
    Blancafort, H.: Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica. In: LREC 2010, La Valetta, Malta (2010)Google Scholar
  6. 6.
    Evert, S., Baroni, M.: ZipfR: Working with words and other rare events in R. In: R User Conference (2006)Google Scholar
  7. 7.
    Feldman, A., Hana, J.: A resource-light approach to morpho-syntactic tagging. In: Mair, C., Meyer, C.F., Oostdijk, N. (eds.) Language and Computers. Studies in Practical Linguistics, vol. 70. Rodopi Press, Amsterdam (2010)Google Scholar
  8. 8.
    Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–198 (2001)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ide, N., Véronis, J.: MULTEXT: Multilingual Text Tools and Corpora. In: Proceedings of the 15th International Conference on Computational Linguistics, COLING 1994, Kyoto, Japan, pp. 588–592 (1994)Google Scholar
  10. 10.
    Kettunen, K., Sadeniemi, M., Lindh-Knuutila, T., Honkela, T.: Analysis of EU languages through text compression. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 99–109. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit (2005)Google Scholar
  12. 12.
    Lepage,Y., Lardilleux, A, Gosme, J.: Commonality across vocabulary structures as an estimate of the proximity between languages. In: 4th Language & Technology Conference (LTC 2009), Poznań, Poland. (2009) Google Scholar
  13. 13.
    Lezius, W.: Morphy-German Morphology, Part-of-Speech Tagging and Applications. In: Heid, U., Evert, S., Lehmann, E., Rohrer, C. (eds.) Proceedings of the 9th EURALEX International Congress, Stuttgart, Germany, pp. 619–623 (2000)Google Scholar
  14. 14.
    Mérialdo, B.: Multilevel decoding for very-large-size-dictionary speech recognition. IBM Journal of Research and Development 32(2), 227–237 (1988)CrossRefGoogle Scholar
  15. 15.
    Pirkola, A.: Morphological Typology of Languages for IR. Journal of Documentation 57, 330–348 (2001)CrossRefGoogle Scholar
  16. 16.
    Resnik, P., Broman, O., Diab, M.: The Bible as a parallel corpus: Annotating the “Book of 2000 Tongues. Computers and the Humanities 33(1-2), 363–379 (1999)Google Scholar
  17. 17.
    Sagot, B., Clément, L., Villemonte de la Clergerie, E., Boullier, P.: The Lefff 2 syntactic lexicon for French: architecture, acquisition, use. In: Proceedings of LREC 2006 (2006)Google Scholar
  18. 18.
    Whaley, L.J.: Introduction to typology: the unity and diversity of language. Sage Publications, Thousand Oaks (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Helena Blancafort
    • 1
    • 2
  • Claude de Loupy
    • 1
  1. 1.SyllabsParisFrance
  2. 2.Universitat Pompeu FabraBarcelonaSpain

Personalised recommendations