Skip to main content

Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9103))

Abstract

Multilingual summarization task aims to develop summarization systems that are fully or partly language free. Extractive techniques are at the center of such systems. They use statistical features to score and extract most relevant sentences to form a summary within a size limit. In this paper, we investigate recently released multilingual distributed word representations combined with mRMR discriminant analysis to score terms then sentences. We also propose a novel sentence extraction algorithm to deal with redundancy issue. We present experimental results of our system applied to three languages: English, Arabic and French using the TAC MultiLing 2011 Dataset. Our results demonstrate that word representations enhance the summarization system, MeMoG and ROUGE results are comparable to recent state-of-the-art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.internetworldstats.com.

References

  • Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: distributed word representations for multilingual NLP. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 183–192 (2013)

    Google Scholar 

  • Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. J. Computat. Linguist. 31(3), 297–328 (2005)

    Article  Google Scholar 

  • Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. IBM J. Res. Dev. 2, 159–165 (1958)

    Article  Google Scholar 

  • Chavez, A., Davila, H., Gutierrez, Y., Fernandez-Orquin, A., Montoyo, A., Munoz, R.: UMCC\_DLSI\_SemSim: Multilingual system for measuring semantic textual similarity. In: Proceedings of the Third Joint Conference on Lexical and Computational Semantics, SemEval 2014, pp. 716–721 (2014)

    Google Scholar 

  • Chen, Y., Perozzi, B., Al-Rfou, R., Skiena, S.: The expressive power of word embeddings. In: Workshop on Deep Learning for Audio, Speech, and Language Processing, ICML 2013 (2013)

    Google Scholar 

  • Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167 (2008)

    Google Scholar 

  • Conroy, J.M., Schlesinger, J.D., Kubina, J., Rankel, P.A., OLeary, D.P.: CLASSY 2011 at TAC: guided and multi-lingual summaries and evaluation metrics. In: Proceedings of the Text Analysis Conference (TAC) (2011)

    Google Scholar 

  • Conroy, J., Davis, S.T., Kubina, J., Liu, Y.K., O’Leary, D.P., Schlesinger, J.D.: Multilingual summarization: dimensionality reduction and a step towards optimal term coverage. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 55–63 (2013)

    Google Scholar 

  • Das, P., Srihari, R.: Global and local models for multi-document summarization. In: Proceedings of the Text Analysis Conference (TAC) (2011)

    Google Scholar 

  • Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)

    Article  MATH  Google Scholar 

  • Giannakopoulos, G., El-Haj, M., Favre, B., Litvak, M., Steinberger, J., Varma, V.: TAC 2011 multiling pilot overview. In: Proceedings of the Text Analysis Conference (TAC) (2011)

    Google Scholar 

  • Giannakopoulos, G., Karkaletsis, V.: AutoSummENG and MeMoG in evaluating guided summaries. In: Proceedings of the Text Analysis Conference (TAC) (2011)

    Google Scholar 

  • Giannakopoulos, G.: Multi-document multilingual summarization and evaluation tracks. In: ACL 2013 MultiLing Workshop. Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 20–28 (2013)

    Google Scholar 

  • Hmida, F., Favre, B.: LIF at TAC multiling: towards a truly language independent summarizer. In: Proceedings of the Text Analysis Conference (TAC) (2011)

    Google Scholar 

  • Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - vol. 1, ACL 2012, pp. 873–882 (2012)

    Google Scholar 

  • Kubina, J., Conroy, J., Schlesinger, J.: ACL 2013 multiling pilot overview. In: Proceedings of the MultiLing 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 29–38 (2013)

    Google Scholar 

  • Li, L., Heng, W., Yu, J., Liu, Y., Wan, S.: CIST system report for acl multiling 2013 track 1: multilingual multi-document summarization. In: Proceedings of the Multi-Ling 2013 Workshop on Multilingual Multi-document Summarization, ACL 2013, pp. 39–44 (2013)

    Google Scholar 

  • Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the ACL-04 Workshop on Text Summarization Branches Out, pp. 74–81 (2004)

    Google Scholar 

  • Luhn, H.P.: The automatic creation of literature abstracts. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 335–336 (1998)

    Google Scholar 

  • Marcu, D.: The rhetorical parsing of natural language texts. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, EACL 1997, pp. 96–103 (1997)

    Google Scholar 

  • Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, pp. 404–411 (2004)

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: The Computing Research Repository (CoRR) (2013)

    Google Scholar 

  • Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the HLT-NAACL, pp. 746–751 (2013)

    Google Scholar 

  • Miller, T., Biemann, C., Zesch, T., Gurevych, I.: Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation. In: Proceedings of COLING, pp. 1781–1796 (2012)

    Google Scholar 

  • Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. J. 27, 1226–1238 (2005)

    Article  Google Scholar 

  • Radev, D.R., Jing, H., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. J. Inf. Process. Manage. 40(6), 919–938 (2004)

    Article  MATH  Google Scholar 

  • Saric, F., Glavas, G., Karan, M., Snajder, J., Basic, B.D.: Takelab: systems for measuring semantic text similarity. Proc. First Jt. Conf. Lexical Comput. Semant. SemEval 2012 1, 441–448 (2012)

    Google Scholar 

  • Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  • Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 384–394 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Houda Oufaida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Oufaida, H., Blache, P., Nouali, O. (2015). Using Distributed Word Representations and mRMR Discriminant Analysis for Multilingual Text Summarization. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2015. Lecture Notes in Computer Science(), vol 9103. Springer, Cham. https://doi.org/10.1007/978-3-319-19581-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19581-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19580-3

  • Online ISBN: 978-3-319-19581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics