Affix Discovery by Means of Corpora: Experiments for Spanish, Czech, Ralámuli and Chuj

Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 209)


Although the focus on morpheme discovering techniques originated within those linguistic schools which inherited from Franz Boas the concern for the unknown languages of the NewWorld, automatic, unsupervised morphological segmentation remains a field of interest for the computational processing and engineering1 of natural languages, as well as for the plain exercise of getting to know them intimately.2


Word Segmentation Word Fragment Left Segment Tense Marker Minimal Distance Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    O. Cromm. Afixerkennung in deutschenWortformen. Eine Untersuchung zum nicht-lexikalischen Segmentierungsverfahren von N. D. Andreev. Abschluß des Ergänzungsstudiums Linguistische Datenverarbeitung, Frankfurt am Main, 1996.Google Scholar
  2. [2]
    J. de Kock and W. Bossaert. Introducción a la lingüýstica automática en las lenguas románicas, volume 202 of Estudios y Ensayos. Gredos, Madrid, 1974.Google Scholar
  3. [3]
    J. de Kock and W. Bossaert. The Morpheme. An Experiment in Quantitative and Computational Linguistics. Van Gorcum, Amsterdam, Madrid, 1978.Google Scholar
  4. [4]
    W. B. Frakes. Stemming Algorithms. In W. B. Frakes and R. Baeza, editors, Information Retrieval, Data Structures and Algorithms, pages 131–160. Prentice Hall, New Jersey, 1992.Google Scholar
  5. [5]
    A. Gelbukh, M. Alexandrov, and S. Y. Han. Detecting Infiection Patterns in Natural Language by Minimization of Morphological Model. In Congreso Iberoamericano de Reconocimiento de Patrones, CIARP-2004, LNCS, 2004.Google Scholar
  6. [6]
    J. Goldsmith. Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics, 27(2):153–198, 2001.CrossRefMathSciNetGoogle Scholar
  7. [7]
    J. H. Greenberg. Essays in Linguistics. The University of Chicago Press, Chicago, 1967.Google Scholar
  8. [8]
    M. A. Hafer and S. F. Weiss. Word Segmentation by Letter Successor Varieties. Information Storage and Retrieval, 10:371–385, 1974.CrossRefGoogle Scholar
  9. [9]
    Z. S. Harris. From Phoneme to Morpheme. Language, 31(2):190–222, 1955.CrossRefGoogle Scholar
  10. [10]
    H. Johnson and J. Martin. Unsupervised Learning of Morphology for English and Inuktitut. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003.Google Scholar
  11. [11]
    K. Kageura. Bigram Statistics Revisited: A Comparative Examination of Some Statistical Measures in Morphological Analysis of Japanese Kanji Sequences. Journal of Quantitative Linguistics, 6:149–166, 1999.CrossRefGoogle Scholar
  12. [12]
    L. F. Lara and R. Ham Chande. Investigaciones lingüýsticas en lexicograf ýa, chapter Base estadýstica del Diccionario del Español de México, pages 5–39. Volume 89 of Jornadas [13], 1st edition, 1974.Google Scholar
  13. [13]
    L. F. Lara, R. Ham Chande, and M. I. Garcýa Hidalgo. Investigaciones lingüýsticas en lexicografýa, volume 89 of Jornadas. El Colegio de México, A. C., Mexico, 1st edition, 1979.Google Scholar
  14. [14]
    A. Medina-Urrea. Automatic Discovery of Afixes by Means of a Corpus: A Catalog of Spanish Afixes. Journal of Quantitative Linguistics, 7(2):97–114, 2000.CrossRefGoogle Scholar
  15. [15]
    A. Medina-Urrea. Investigación cuantitativa de afijos y clýticos del español de México. Glutinometrýa en el Corpus del Español Mexicano Contemporáneo. PhD thesis, El Colegio de México, Mexico, April 2003.Google Scholar
  16. [16]
    A. Medina-Urrea and M. Alvarado Garcýa. Análisis cuantitativo y cualitativo de la derivación léxica en ralámuli. In Primer Coloquio Leonardo Manrique, Mexico, Conaculta-INAH, September 2004.Google Scholar
  17. [17]
    A. Medina-Urrea and E. C. Buenrostro Dýaz. Caracterýsticas cuantitativas de la fiexión verbal del chuj. Estudios de Lingüýstica Aplicada, 38:15–31, 2003.Google Scholar
  18. [18]
    A. Medina-Urrea and J. Hlaváčová. Automatic Recognition of Czech Derivational Prefixes. In Proceedings of CICLing 2005, volume 3406 of Lecture Notes in Computer Science, pages 189–197. Springer, Berlin/Heidelberg/New York, 2005.Google Scholar
  19. [19]
    M. P. Oakes. Statistics for Corpus Linguistics. Edinburgh University Press, Edinburgh, 1998.Google Scholar
  20. [20]
    B. B. Rieger. Computing Granular Word Meanings. A Fuzzy Linguistic Approach in Computational Semiotics. In P. Wang, editor, Computing with Words, pages 147–208. John Wiley & Sons, New York, 2001.Google Scholar
  21. [21]
    J. Rini. Motives for Linguistic Change in the Formation of the Spanish Object Pronouns. Juan de la Cuesta, Newark, Delaware, 1992.Google Scholar
  22. [22]
    E. Sapir. Language: An Introduction to the Study of Speech. Harcourt, Brace & Company, New York, 1921.Google Scholar
  23. [23]
    C. E. Shannon and W. Weaver. The Mathematical Theory of Communication. University of Illinois Press, Urbana, 1949.zbMATHGoogle Scholar
  24. [24]
    A. Spencer and A. M. Zwicky. The Handbook of Morphology. Blackwell, Oxford, 1998.Google Scholar

Copyright information

© Springer 2007

Authors and Affiliations

  1. 1.Universidad Nacional Autónoma de MéxicoMéxico

Personalised recommendations