Extrinsic Evaluation on Automatic Summarization Tasks: Testing Affixality Measurements for Statistical Word Stemming

  • Carlos-Francisco Méndez-Cruz
  • Juan-Manuel Torres-Moreno
  • Alfonso Medina-Urrea
  • Gerardo Sierra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7630)


This paper presents some experiments of evaluation of a statistical stemming algorithm based on morphological segmentation. The method estimates affixality of word fragments. It combines three indexes associated to possible cuts. This unsupervised and language-independent method has been easily adapted to generate an effective morphological stemmer. This stemmer has been coupled with Cortex, an automatic summarization system, in order to generate summaries in English, Spanish and French. Summaries have been evaluated using ROUGE. The results of this extrinsic evaluation show that our stemming algorithm outperforms several classical systems.


Automatic summarization Affixality Measurements Morphological Segmentation Statistical Stemming CORTEX 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Creutz, M., Lagus, K.: Unsupervised Discovery of Morphemes. In: Proc. of the Workshop on Morphological and Phonological Learning of ACL 2002, Philadelphia, SIGPHON-ACL, pp. 21–30 (2002)Google Scholar
  2. 2.
    Harris, Z.S.: From Phoneme to Morpheme. Language 31, 190–222 (1955)CrossRefGoogle Scholar
  3. 3.
    Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process 4 (2007)Google Scholar
  4. 4.
    Goldsmith, J.A.: Segmentation and Morphology. In: The Handbook of Computational Linguistics and Natural Language Processing, pp. 364–393. Wiley-Blackwell, Oxford (2010)CrossRefGoogle Scholar
  5. 5.
    Medina-Urrea, A.: Investigación cuantitativa de afijos y clíticos del español de México. Glutinometría en el Corpus del Español Mexicano Contemporáneo. PhD thesis, El Colegio de México, México (2003)Google Scholar
  6. 6.
    Goldsmith, J.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27, 153–198 (2001)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Goldsmith, J.: An Algorithm for the Unsupervised Learning of Morphology. Natural Language Engineering 12, 353–371 (2006)CrossRefGoogle Scholar
  8. 8.
    Creutz, M.: Unsupervised segmentation of words using prior distributions of morph length and frequency. In: Hinrichs, E., Roth, D. (eds.) 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 280–287 (2003)Google Scholar
  9. 9.
    Creutz, M., Lagus, K.: Induction of a Simple Morphology for Highly-Inflecting Languages. In: Proc. of 7th Meeting of the ACL Special Interest Group in Computational Phonology SIGPHON-ACL, pp. 43–51 (2004)Google Scholar
  10. 10.
    Creutz, M., Lagus, K.: Inducing the Morphological Lexicon of a Natural Language from Unannotated Text. In: Int. and Interdisciplinary Conf. on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 106–113 (2005)Google Scholar
  11. 11.
    Gelbukh, A., Alexandrov, M., Han, S.-Y.: Detecting Inflection Patterns in Natural Language by Minimization of Morphological Model. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 432–438. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Reyes, D.: Sistema de segmentación automática de palabras para el español. Master’s thesis, CIC-IPN (2008)Google Scholar
  13. 13.
    Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 23–31 (1968)Google Scholar
  14. 14.
    Porter, M.F.: An algorithm for Suffix Stripping. Program 14, 130–137 (1980)CrossRefGoogle Scholar
  15. 15.
    Krovetz, R.: Viewing Morphology as an Inference Process. In: Proccedings of the 16th ACM/SICIR Conference, pp. 191–202 (1993)Google Scholar
  16. 16.
    Lennon, M., Pierce, D., Tarry, B., Willet, P.: An evaluation of some conflation algorithms for information retrieval. J. of Information Science 3, 177–183 (1981)CrossRefGoogle Scholar
  17. 17.
    Majumder, P., Mitra, M., Pal, D.: Bulgarian, Hungarian and Czech Stemming Using YASS. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 49–56. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Mechanical Translation and Computational Linguistics 41, 121–137 (2005)Google Scholar
  19. 19.
    Paik, J.H., Mitra, M., Parui, S.K., Jarvelin, K.: GRAS: An effective and efficient stemming algorithm for information retrieval. ACM Trans. Inf. Syst. 29 (2011)Google Scholar
  20. 20.
    McNamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Information Retrieval 7, 73–97 (2004)CrossRefGoogle Scholar
  21. 21.
    Torres-Moreno, J.M.: Reagrupamiento en familias y lexematización automática independientes del idioma. Inteligencia Artificial 47, 38–53 (2010)Google Scholar
  22. 22.
    Hull, D.A.: Stemming algorithms - A case study for detailed evaluation. Journal of the American Society for Information Science 47, 70–84 (1996)CrossRefGoogle Scholar
  23. 23.
    Medina-Urrea, A.: Automatic Discovery of Affixes by means of Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7, 97–114 (2000)CrossRefGoogle Scholar
  24. 24.
    Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Medina-Urrea, A.: Affix Discovery based on Entropy and Economy Measurements. Texas Linguistics Society 10, 99–112 (2008)Google Scholar
  26. 26.
    Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)zbMATHGoogle Scholar
  27. 27.
    de Kock, J., Bossaert, W.: Introducción a la lingüística automática en las lenguas románicas. Gredos, Madrid (1974)Google Scholar
  28. 28.
    Greenberg, J.H.: Essays in Linguistics. The Univ. of Chicago Press, Chicago (1957)Google Scholar
  29. 29.
    Spärck-Jones, K., Galliers, J.: Evaluating Natural Language Processing Systems: An Analysis and Review. Springer, New York (1996)Google Scholar
  30. 30.
    Medina-Urrea, A.: Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 101–104. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  31. 31.
    Torres-Moreno, J.M.: Résume automatique de documents, Lavoisier, Paris (2011)Google Scholar
  32. 32.
    Torres-Moreno, J.M., St-Onge, P.L., Gagnon, M., El-Bèze, M., Bellot, P.: Automatic Summarization System coupled with a Question-Answering System (QAAS). CoRR abs/0905.2990 (2009)Google Scholar
  33. 33.
    Lin, C.Y.: Rouge: A Package for Automatic Evaluation of Summaries. In: Workshop on Text Summarization Branches Out (2004)Google Scholar
  34. 34.
    Saggion, H., Torres-Moreno, J.M., da Cunha, I., SanJuan, E.: Multilingual summarization evaluation without human models. In: 23rd Int. Conf. on Computational Linguistics, COLING 2010, pp. 1059–1067. ACL, Beijing (2010)Google Scholar
  35. 35.
    Lara, L., Ham Chande, R., García Hidalgo, M.: Investigaciones lingüísticas en lexicografía. El Colegio de México, A.C., México (1979)Google Scholar
  36. 36.
    Torres-Moreno, J.M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P.: Summary Evaluation with and without References. Polibits 42, 13–19 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Carlos-Francisco Méndez-Cruz
    • 1
    • 4
  • Juan-Manuel Torres-Moreno
    • 1
    • 2
  • Alfonso Medina-Urrea
    • 3
  • Gerardo Sierra
    • 4
  1. 1.LIA-Université d’Avignon et des Pays de VaucluseFrance
  2. 2.École Polytechnique de MontréalCanada
  3. 3.El Colegio de México A.C.México
  4. 4.GIL-Instituto de Ingeniería UNAMMéxico

Personalised recommendations