UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics

Li, Lei; Zhang, Yazhao; Chi, Junqi; Huang, Zuying

doi:10.1007/978-3-319-69005-6_29

Lei Li¹⁷,
Yazhao Zhang¹⁷,
Junqi Chi¹⁷ &
…
Zuying Huang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10565))

Included in the following conference series:

1927 Accesses
1 Citations

Abstract

In this paper, we put forward UIDS, a new high-performing extensible framework for extractive MultiLingual Document Summarization. Our approach looks on a document in a multilingual corpus as an item sequence set, in which each sentence is an item sequence and each item is the minimal semantic unit. Then we formalize the extractive summary as summary diversity sampling problem that considers topic diversity and redundancy at the same time. The topic diversity is reflected using hierarchical topic models, the redundancy is reflected using similarity and the summary diversity is enhanced using Determinantal Point Processes. We then illustrate how this method encompasses a framework that is amenable to compute summaries for MultiLingual Single- and Multi-documents. Experiments on the MultiLing summarization task datasets demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alex, K., Ben, T.: Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012)
Balikas, G., Amini, M.R.: The participation of UJF-grenoble team at multiling 2015 (2015)
Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM (JACM) 57(2), 7 (2010)
Article MathSciNet MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Borodin, A.: Determinantal point processes (2009)
Google Scholar
Celikyilmaz, A., Hakkani-Tur, D.: A hybrid hierarchical model for multi-document summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 815–824. Association for Computational Linguistics (2010)
Google Scholar
Conroy, J.M., Davis, S.T., Kubina, J.: Preprocessing and term weights in multilingual summarization (2015)
Google Scholar
Davis, S.T., Conroy, J.M., Schlesinger, J.D.: OCCAMS-an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 454–463. IEEE (2012)
Google Scholar
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)
Article Google Scholar
Giannakopoulos, G., Conroy, J., Kubina, J., Rankel, P.A.: Multiling 2017 overview (2017)
Google Scholar
Giannakopoulos, G., Kubina, J., Conroy, J.M., Steinberger, J., Favre, B., Kabadjov, M.A., Kruschwitz, U., Poesio, M.: Multiling 2015: multilingual summarization of single and multi-documents, on-line Fora, and call-center conversations. In: SIGDIAL Conference, pp. 270–274 (2015)
Google Scholar
Giannakopoulos, G., Lloret, E., Conroy, M.J., Steinberger, J., Litvak, M., Rankel, P., Favre, B.: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres. Association for Computational Linguistics (2017). http://aclweb.org/anthology/W17-1000
Huang, T., Li, L., Zhang, Y.: Multilingual multi-document summarization based on multiple feature combination (2016)
Google Scholar
Hung, H.T., Shih, K.W., Chen, B.: The NTNU summarization system at MultiLing 2015 (2015)
Google Scholar
Kam-Fai, W., Mingli, W., Wenjie, L.: Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 985–992. Association for Computational Linguistics (2008)
Google Scholar
Matérn, B.: Stochastic previous models and their application to some problems in forest surveys and other sampling investigations. Medd. Statens Skogsforskningsinstitut 49, 5 (1960)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proc. EMNLP 2004, 404–411 (2004)
Google Scholar
Ren, Z., de Rijke, M.: Summarizing contrastive themes via hierarchical non-parametric processes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 93–102. ACM (2015)
Google Scholar
Technology, B.: Rosette base linguistics (2016). https://www.rosette.com/function/tokenization/
Thomas, S., Beutenmüller, C., de la Puente, X., Remus, R., Bordag, S.: EXB text summarizer. In: 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 260 (2015)
Google Scholar
Vicente, M., Alcón, O., Lloret, E.: The university of alicante at multiling 2015: approach, results and further insights. In: 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, p. 250 (2015)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Social Science Foundation of China under Grant 16ZDA055; National Natural Science Foundation of China under Grant 91546121, 71231002 and 61202247; EU FP7 IRSES MobileCloud Project 612212; the 111 Project of China under Grant B08004; Engineering Research Center of Information Networks, Ministry of Education; the project of Beijing Institute of Science and Technology Information; the project of CapInfo Company Limited.

Author information

Authors and Affiliations

Center for Intelligence Science and Technology, School of Computer, Beijing University of Posts and Telecommunications, Beijing, People’s Republic of China
Lei Li, Yazhao Zhang, Junqi Chi & Zuying Huang

Authors

Lei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yazhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Junqi Chi
View author publications
You can also search for this author in PubMed Google Scholar
Zuying Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yazhao Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Peking University, Beijing, China
Baobao Chang
Soochow University, Suzhou, China
Deyi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Zhang, Y., Chi, J., Huang, Z. (2017). UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-69005-6_29
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics