Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization

Oufaida, Houda; Blache, Philippe; Nouali, Omar

doi:10.1007/978-3-319-19581-0_4

Houda Oufaida¹⁸,
Philippe Blache¹⁹ &
Omar Nouali²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1835 Accesses
6 Citations

Abstract

Multilingual summarization task aims to develop summarization systems that are fully or partly language free. Extractive techniques are at the center of such systems. They use statistical features to score and extract most relevant sentences to form a summary within a size limit. In this paper, we investigate recently released multilingual distributed word representations combined with mRMR discriminant analysis to score terms then sentences. We also propose a novel sentence extraction algorithm to deal with redundancy issue. We present experimental results of our system applied to three languages: English, Arabic and French using the TAC MultiLing 2011 Dataset. Our results demonstrate that word representations enhance the summarization system, MeMoG and ROUGE results are comparable to recent state-of-the-art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.internetworldstats.com.

References

Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 183–192 (2013)
Google Scholar
Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. J. Computat. Linguist. 31(3), 297–328 (2005)
Article Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. IBM J. Res. Dev. 2, 159–165 (1958)
Article Google Scholar
Chavez, A., Davila, H., Gutierrez, Y., Fernandez-Orquin, A., Montoyo, A., Munoz, R.: UMCC\_DLSI\_SemSim: Multilingual system for measuring semantic textual similarity. In: Proceedings of the Third Joint Conference on Lexical and Computational Semantics, SemEval 2014, pp. 716–721 (2014)
Google Scholar
Chen, Y., Perozzi, B., Al-Rfou, R., Skiena, S.: The expressive power of word embeddings. In: Workshop on Deep Learning for Audio, Speech, and Language Processing, ICML 2013 (2013)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167 (2008)
Google Scholar
Conroy, J.M., Schlesinger, J.D., Kubina, J., Rankel, P.A., OLeary, D.P.: CLASSY 2011 at TAC: guided and multi-lingual summaries and evaluation metrics. In: Proceedings of the Text Analysis Conference (TAC) (2011)
Google Scholar
Conroy, J., Davis, S.T., Kubina, J., Liu, Y.K., O’Leary, D.P., Schlesinger, J.D.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 55–63 (2013)
Google Scholar
Das, P., Srihari, R.: Global and local models for multi-document summarization. In: Proceedings of the Text Analysis Conference (TAC) (2011)
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)
Article MATH Google Scholar
Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: TAC 2011 multiling pilot overview. In: Proceedings of the Text Analysis Conference (TAC) (2011)
Google Scholar
Giannakopoulos, G., Karkaletsis, V.: AutoSummENG and MeMoG in evaluating guided summaries. In: Proceedings of the Text Analysis Conference (TAC) (2011)
Google Scholar
Giannakopoulos, G.: Multi-document multilingual summarization and evaluation tracks. In: ACL 2013 MultiLing Workshop. Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 20–28 (2013)
Google Scholar
Hmida, F., Favre, B.: LIF at TAC multiling: towards a truly language independent summarizer. In: Proceedings of the Text Analysis Conference (TAC) (2011)
Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - vol. 1, ACL 2012, pp. 873–882 (2012)
Google Scholar
Kubina, J., Conroy, J., Schlesinger, J.: ACL 2013 multiling pilot overview. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 29–38 (2013)
Google Scholar
Li, L., Heng, W., Yu, J., Liu, Y., Wan, S.: CIST system report for acl multiling 2013 track 1: multilingual multi-document summarization. In: Proceedings of the Multi-Ling 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 39–44 (2013)
Google Scholar
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop on Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 335–336 (1998)
Google Scholar
Marcu, D.: The rhetorical parsing of natural language texts. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, EACL 1997, pp. 96–103 (1997)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, pp. 404–411 (2004)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: The Computing Research Repository (CoRR) (2013)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the HLT-NAACL, pp. 746–751 (2013)
Google Scholar
Miller, T., Biemann, C., Zesch, T., Gurevych, I.: Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation. In: Proceedings of COLING, pp. 1781–1796 (2012)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. J. 27, 1226–1238 (2005)
Article Google Scholar
Radev, D.R., Jing, H., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. J. Inf. Process. Manage. 40(6), 919–938 (2004)
Article MATH Google Scholar
Saric, F., Glavas, G., Karan, M., Snajder, J., Basic, B.D.: Takelab: systems for measuring semantic text similarity. Proc. First Jt. Conf. Lexical Comput. Semant. SemEval 2012 1, 441–448 (2012)
Google Scholar
Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems, pp. 801–809 (2011)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 384–394 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Ecole Nationale Supérieure D’Informatique ESI, Algiers, Algeria
Houda Oufaida
Aix Marseille Université, CNRS, LPL UMR 7309, 13604, Aix En Provence, France
Philippe Blache
Centre de Recherche Sur L’Information Scientifique Et Technique CERIST, Algiers, Algeria
Omar Nouali

Authors

Houda Oufaida
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Blache
View author publications
You can also search for this author in PubMed Google Scholar
Omar Nouali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Houda Oufaida .

Editor information

Editors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Chris Biemann
Universität Passau, Passau, Germany
Siegfried Handschuh
Universität Passau, Passau, Germany
André Freitas
University of Salford, Salford, United Kingdom
Farid Meziane
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oufaida, H., Blache, P., Nouali, O. (2015). Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-19581-0_4
Published: 04 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19580-3
Online ISBN: 978-3-319-19581-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics