Extrinsic Evaluation on Automatic Summarization Tasks: Testing Affixality Measurements for Statistical Word Stemming

Méndez-Cruz, Carlos-Francisco; Torres-Moreno, Juan-Manuel; Medina-Urrea, Alfonso; Sierra, Gerardo

doi:10.1007/978-3-642-37798-3_5

Carlos-Francisco Méndez-Cruz^21,24,
Juan-Manuel Torres-Moreno^21,22,
Alfonso Medina-Urrea²³ &
…
Gerardo Sierra²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7630))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1692 Accesses

Abstract

This paper presents some experiments of evaluation of a statistical stemming algorithm based on morphological segmentation. The method estimates affixality of word fragments. It combines three indexes associated to possible cuts. This unsupervised and language-independent method has been easily adapted to generate an effective morphological stemmer. This stemmer has been coupled with Cortex, an automatic summarization system, in order to generate summaries in English, Spanish and French. Summaries have been evaluated using ROUGE. The results of this extrinsic evaluation show that our stemming algorithm outperforms several classical systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Creutz, M., Lagus, K.: Unsupervised Discovery of Morphemes. In: Proc. of the Workshop on Morphological and Phonological Learning of ACL 2002, Philadelphia, SIGPHON-ACL, pp. 21–30 (2002)
Google Scholar
Harris, Z.S.: From Phoneme to Morpheme. Language 31, 190–222 (1955)
Article Google Scholar
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process 4 (2007)
Google Scholar
Goldsmith, J.A.: Segmentation and Morphology. In: The Handbook of Computational Linguistics and Natural Language Processing, pp. 364–393. Wiley-Blackwell, Oxford (2010)
Chapter Google Scholar
Medina-Urrea, A.: Investigación cuantitativa de afijos y clíticos del español de México. Glutinometría en el Corpus del Español Mexicano Contemporáneo. PhD thesis, El Colegio de México, México (2003)
Google Scholar
Goldsmith, J.: Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics 27, 153–198 (2001)
Article MathSciNet Google Scholar
Goldsmith, J.: An Algorithm for the Unsupervised Learning of Morphology. Natural Language Engineering 12, 353–371 (2006)
Article Google Scholar
Creutz, M.: Unsupervised segmentation of words using prior distributions of morph length and frequency. In: Hinrichs, E., Roth, D. (eds.) 41st Annual Meeting of the ACL, Sapporo, Japan, pp. 280–287 (2003)
Google Scholar
Creutz, M., Lagus, K.: Induction of a Simple Morphology for Highly-Inflecting Languages. In: Proc. of 7th Meeting of the ACL Special Interest Group in Computational Phonology SIGPHON-ACL, pp. 43–51 (2004)
Google Scholar
Creutz, M., Lagus, K.: Inducing the Morphological Lexicon of a Natural Language from Unannotated Text. In: Int. and Interdisciplinary Conf. on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 106–113 (2005)
Google Scholar
Gelbukh, A., Alexandrov, M., Han, S.-Y.: Detecting Inflection Patterns in Natural Language by Minimization of Morphological Model. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 432–438. Springer, Heidelberg (2004)
Chapter Google Scholar
Reyes, D.: Sistema de segmentación automática de palabras para el español. Master’s thesis, CIC-IPN (2008)
Google Scholar
Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 23–31 (1968)
Google Scholar
Porter, M.F.: An algorithm for Suffix Stripping. Program 14, 130–137 (1980)
Article Google Scholar
Krovetz, R.: Viewing Morphology as an Inference Process. In: Proccedings of the 16th ACM/SICIR Conference, pp. 191–202 (1993)
Google Scholar
Lennon, M., Pierce, D., Tarry, B., Willet, P.: An evaluation of some conflation algorithms for information retrieval. J. of Information Science 3, 177–183 (1981)
Article Google Scholar
Majumder, P., Mitra, M., Pal, D.: Bulgarian, Hungarian and Czech Stemming Using YASS. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 49–56. Springer, Heidelberg (2008)
Chapter Google Scholar
Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Mechanical Translation and Computational Linguistics 41, 121–137 (2005)
Google Scholar
Paik, J.H., Mitra, M., Parui, S.K., Jarvelin, K.: GRAS: An effective and efficient stemming algorithm for information retrieval. ACM Trans. Inf. Syst. 29 (2011)
Google Scholar
McNamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Information Retrieval 7, 73–97 (2004)
Article Google Scholar
Torres-Moreno, J.M.: Reagrupamiento en familias y lexematización automática independientes del idioma. Inteligencia Artificial 47, 38–53 (2010)
Google Scholar
Hull, D.A.: Stemming algorithms - A case study for detailed evaluation. Journal of the American Society for Information Science 47, 70–84 (1996)
Article Google Scholar
Medina-Urrea, A.: Automatic Discovery of Affixes by means of Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7, 97–114 (2000)
Article Google Scholar
Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)
Chapter Google Scholar
Medina-Urrea, A.: Affix Discovery based on Entropy and Economy Measurements. Texas Linguistics Society 10, 99–112 (2008)
Google Scholar
Shannon, C., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)
MATH Google Scholar
de Kock, J., Bossaert, W.: Introducción a la lingüística automática en las lenguas románicas. Gredos, Madrid (1974)
Google Scholar
Greenberg, J.H.: Essays in Linguistics. The Univ. of Chicago Press, Chicago (1957)
Google Scholar
Spärck-Jones, K., Galliers, J.: Evaluating Natural Language Processing Systems: An Analysis and Review. Springer, New York (1996)
Google Scholar
Medina-Urrea, A.: Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 101–104. Springer, Heidelberg (2006)
Chapter Google Scholar
Torres-Moreno, J.M.: Résume automatique de documents, Lavoisier, Paris (2011)
Google Scholar
Torres-Moreno, J.M., St-Onge, P.L., Gagnon, M., El-Bèze, M., Bellot, P.: Automatic Summarization System coupled with a Question-Answering System (QAAS). CoRR abs/0905.2990 (2009)
Google Scholar
Lin, C.Y.: Rouge: A Package for Automatic Evaluation of Summaries. In: Workshop on Text Summarization Branches Out (2004)
Google Scholar
Saggion, H., Torres-Moreno, J.M., da Cunha, I., SanJuan, E.: Multilingual summarization evaluation without human models. In: 23rd Int. Conf. on Computational Linguistics, COLING 2010, pp. 1059–1067. ACL, Beijing (2010)
Google Scholar
Lara, L., Ham Chande, R., García Hidalgo, M.: Investigaciones lingüísticas en lexicografía. El Colegio de México, A.C., México (1979)
Google Scholar
Torres-Moreno, J.M., Saggion, H., da Cunha, I., SanJuan, E., Velázquez-Morales, P.: Summary Evaluation with and without References. Polibits 42, 13–19 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

LIA-Université d’Avignon et des Pays de Vaucluse, France
Carlos-Francisco Méndez-Cruz & Juan-Manuel Torres-Moreno
École Polytechnique de Montréal, Canada
Juan-Manuel Torres-Moreno
El Colegio de México A.C., México
Alfonso Medina-Urrea
GIL-Instituto de Ingeniería UNAM, México
Carlos-Francisco Méndez-Cruz & Gerardo Sierra

Authors

Carlos-Francisco Méndez-Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Juan-Manuel Torres-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Medina-Urrea
View author publications
You can also search for this author in PubMed Google Scholar
Gerardo Sierra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Mexican Petroleum Institute, Eje Central Lazaro Cardenas Norte, 152, Col. San Bartolo Atepehuacan, CP 07730, México D.F., Mexico
Ildar Batyrshin
Tecnológico de Monterrey, Campus Estado de México, Carretera Lago de Guadalupe Km 3.5, CP 52926, Atizapán de Zaragoza, Estado de México, Mexico
Miguel González Mendoza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Méndez-Cruz, CF., Torres-Moreno, JM., Medina-Urrea, A., Sierra, G. (2013). Extrinsic Evaluation on Automatic Summarization Tasks: Testing Affixality Measurements for Statistical Word Stemming. In: Batyrshin, I., Mendoza, M.G. (eds) Advances in Computational Intelligence. MICAI 2012. Lecture Notes in Computer Science(), vol 7630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37798-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-37798-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37797-6
Online ISBN: 978-3-642-37798-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics