Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization

Conference paper

DOI: 10.1007/978-3-319-19581-0_4

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9103)
Cite this paper as:
Oufaida H., Blache P., Nouali O. (2015) Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization. In: Biemann C., Handschuh S., Freitas A., Meziane F., Métais E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science, vol 9103. Springer, Cham

Abstract

Multilingual summarization task aims to develop summarization systems that are fully or partly language free. Extractive techniques are at the center of such systems. They use statistical features to score and extract most relevant sentences to form a summary within a size limit. In this paper, we investigate recently released multilingual distributed word representations combined with mRMR discriminant analysis to score terms then sentences. We also propose a novel sentence extraction algorithm to deal with redundancy issue. We present experimental results of our system applied to three languages: English, Arabic and French using the TAC MultiLing 2011 Dataset. Our results demonstrate that word representations enhance the summarization system, MeMoG and ROUGE results are comparable to recent state-of-the-art systems.

Keywords

Multilingual summarization Distributed word representations Discriminant analysis Minimum redundancy Maximum relevance 

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Ecole Nationale Supérieure D’Informatique ESIAlgiersAlgeria
  2. 2.Aix Marseille Université, CNRSAix En ProvenceFrance
  3. 3.Centre de Recherche Sur L’Information Scientifique Et Technique CERISTAlgiersAlgeria

Personalised recommendations