Skip to main content

UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2017, CCL 2017)

Abstract

In this paper, we put forward UIDS, a new high-performing extensible framework for extractive MultiLingual Document Summarization. Our approach looks on a document in a multilingual corpus as an item sequence set, in which each sentence is an item sequence and each item is the minimal semantic unit. Then we formalize the extractive summary as summary diversity sampling problem that considers topic diversity and redundancy at the same time. The topic diversity is reflected using hierarchical topic models, the redundancy is reflected using similarity and the summary diversity is enhanced using Determinantal Point Processes. We then illustrate how this method encompasses a framework that is amenable to compute summaries for MultiLingual Single- and Multi-documents. Experiments on the MultiLing summarization task datasets demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alex, K., Ben, T.: Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012)

  2. Balikas, G., Amini, M.R.: The participation of UJF-grenoble team at multiling 2015 (2015)

    Google Scholar 

  3. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM (JACM) 57(2), 7 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Borodin, A.: Determinantal point processes (2009)

    Google Scholar 

  6. Celikyilmaz, A., Hakkani-Tur, D.: A hybrid hierarchical model for multi-document summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 815–824. Association for Computational Linguistics (2010)

    Google Scholar 

  7. Conroy, J.M., Davis, S.T., Kubina, J.: Preprocessing and term weights in multilingual summarization (2015)

    Google Scholar 

  8. Davis, S.T., Conroy, J.M., Schlesinger, J.D.: OCCAMS-an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 454–463. IEEE (2012)

    Google Scholar 

  9. Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)

    Article  Google Scholar 

  10. Giannakopoulos, G., Conroy, J., Kubina, J., Rankel, P.A.: Multiling 2017 overview (2017)

    Google Scholar 

  11. Giannakopoulos, G., Kubina, J., Conroy, J.M., Steinberger, J., Favre, B., Kabadjov, M.A., Kruschwitz, U., Poesio, M.: Multiling 2015: multilingual summarization of single and multi-documents, on-line Fora, and call-center conversations. In: SIGDIAL Conference, pp. 270–274 (2015)

    Google Scholar 

  12. Giannakopoulos, G., Lloret, E., Conroy, M.J., Steinberger, J., Litvak, M., Rankel, P., Favre, B.: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres. Association for Computational Linguistics (2017). http://aclweb.org/anthology/W17-1000

  13. Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization based on multiple feature combination (2016)

    Google Scholar 

  14. Hung, H.T., Shih, K.W., Chen, B.: The NTNU summarization system at MultiLing 2015 (2015)

    Google Scholar 

  15. Kam-Fai, W., Mingli, W., Wenjie, L.: Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 985–992. Association for Computational Linguistics (2008)

    Google Scholar 

  16. Matérn, B.: Stochastic previous models and their application to some problems in forest surveys and other sampling investigations. Medd. Statens Skogsforskningsinstitut 49, 5 (1960)

    Google Scholar 

  17. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proc. EMNLP 2004, 404–411 (2004)

    Google Scholar 

  18. Ren, Z., de Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 93–102. ACM (2015)

    Google Scholar 

  19. Technology, B.: Rosette base linguistics (2016). https://www.rosette.com/function/tokenization/

  20. Thomas, S., Beutenmüller, C., de la Puente, X., Remus, R., Bordag, S.: EXB text summarizer. In: 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 260 (2015)

    Google Scholar 

  21. Vicente, M., Alcón, O., Lloret, E.: The university of alicante at multiling 2015: approach, results and further insights. In: 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 250 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Social Science Foundation of China under Grant 16ZDA055; National Natural Science Foundation of China under Grant 91546121, 71231002 and 61202247; EU FP7 IRSES MobileCloud Project 612212; the 111 Project of China under Grant B08004; Engineering Research Center of Information Networks, Ministry of Education; the project of Beijing Institute of Science and Technology Information; the project of CapInfo Company Limited.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yazhao Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Li, L., Zhang, Y., Chi, J., Huang, Z. (2017). UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69005-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69004-9

  • Online ISBN: 978-3-319-69005-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics