Automatic Recognition of Czech Derivational Prefixes

  • Alfonso Medina Urrea
  • Jaroslava Hlaváčová
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3406)


This paper describes the application of a method for the automatic, unsupervised recognition of derivational prefixes of Czech words. The technique combines two statistical measures — Entropy and the Economy Principle. The data were taken from the list of almost 170 000 lemmas of the Czech National Corpus


Entropy Measurement Economy Principle Automatic Recognition Word Segmentation Good Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hlaváčová, J.: Morphological Guesser of Czech Words. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 70–75. Springer, Heidelberg (2001) Google Scholar
  2. 2.
    Medina Urrea, A.: Automatic Discovery of Affixes by Means of a Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7, 97–114 (2000)CrossRefGoogle Scholar
  3. 3.
    Medina Urrea, A., Buenrostro Díaz, E.C.: Características cuantitativas de la flexión verbal del chuj. Estudios de Lingüística Aplicada 38, 15–31 (2003)Google Scholar
  4. 4.
    Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)zbMATHGoogle Scholar
  5. 5.
    de Kock, J., Bossaert, W.: Introducción a la lingüística automática en las lenguas románicas. Estudios y Ensayos, vol. 202. Gredos, Madrid (1974)Google Scholar
  6. 6.
    de Kock, J., Bossaert, W.: The Morpheme. An Experiment in Quantitative and Computational Linguistics. Van Gorcum, Amsterdam (1978)Google Scholar
  7. 7.
    Hafer, M.A., Weiss, S.F.: Word Segmentation by Letter Successor Varieties. Information Storage and Retrieval 10, 371–385 (1974)CrossRefGoogle Scholar
  8. 8.
    Frakes, W.B.: “Stemming Algorithms”. In: Information Retrieval, Data Structures and Algorithms, pp. 131–160. Prentice Hall, New Jersey (1992)Google Scholar
  9. 9.
    Oakes, M.P.: Statistics for Corpus Linguistics. Edinburgh University Press, Edinburgh (1998)Google Scholar
  10. 10.
    Medina Urrea, A.: Investigación cuantitativa de afijos y clíticos del español de México. Glutinometría en el Corpus del Español Mexicano Contemporáneo. PhD thesis, El Colegio de México, Mexico (2003)Google Scholar
  11. 11.
    Greenberg, J.H.: Essays in Linguistics. The University of Chicago Press, Chicago (1957)Google Scholar
  12. 12.
    Goldsmith, J.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27, 153–198 (2001)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Gelbukh, A., Alexandrov, M., Han, S.Y.: Detecting Inflection Patterns in Natural Language by Minimization of Morphological Model. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 432–438. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Alfonso Medina Urrea
    • 1
  • Jaroslava Hlaváčová
    • 2
  1. 1.GIL IINGEN UNAMCoyoacán, DF, Mexico
  2. 2.ÚFAL MFF UKPrahaCzech Republic

Personalised recommendations