TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization

Zhang, Zhiming; Li, Hongjie; Huang, Lian’en

doi:10.1007/978-3-642-38562-9_35

Zhiming Zhang²¹,
Hongjie Li²¹ &
Lian’en Huang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

International Conference on Web-Age Information Management

3508 Accesses
6 Citations

Abstract

Multi-document summarization attempts to select the most important information to generate a compressed summary from a collection of documents. From the perspective of data reconstruction, a good summary may also well reconstruct the original documents. A document generally contains a variety of information centered around a main topic and covers different aspects of the main topic. In this paper we propose a novel model that combines data reconstruction and topic decomposition to summarize the documents, named TopicDSDR, which can not only best reconstruct the original documents but also capture the semantic similarity and main topics. We discuss two kinds of reconstructions: linear reconstruction and nonnegative reconstruction. We use the generalized Kullback-Leibler(KL) divergence as the loss function to evaluate the quality of summary for linear and nonnegative reconstruction and develop two new algorithms respectively. We conduct experiments on DUC2006 and DUC2007 summarization data sets, the experimental results demonstrate the effectiveness of our proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: AND, pp. 91–97 (2008)
Google Scholar
Arora, R., Ravindran, B.: Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization. In: ICDM, pp. 713–718 (2008)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR, 993–1022 (2003)
Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 107–117 (1998)
Google Scholar
Cai, D., He, X., Han, J., Huang, T.S.: Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell., 1548–1560 (2011)
Google Scholar
Carbonell, J.G., Goldstein, J.: The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In: SIGIR, pp. 335–336 (1998)
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM 16(2), 264–285 (1969)
Article MATH Google Scholar
Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. (JAIR), 457–479 (2004)
Google Scholar
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: SIGIR, pp. 19–25 (2001)
Google Scholar
Haghighi, A., Vanderwende, L.: Exploring Content Models for Multi-Document Summarization. In: HLT-NAACL, pp. 362–370 (2009)
Google Scholar
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document Summarization Based on Data Reconstruction. In: AAAI (2012)
Google Scholar
Hennig, L.: Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis. In: International Conference RANLP, Borovets, Bulgaria, pp. 144–149 (2009)
Google Scholar
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: UAI, pp. 289–296 (1999)
Google Scholar
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: SODA, pp. 668–677 (1998)
Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: NIPS, pp. 556–562 (2000)
Google Scholar
Lee, J., Park, S., Ahn, C., Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inf. Process. Manage., 20–34 (2009)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: ACL Workshop, pp. 74–81 (2004)
Google Scholar
Lin, H., Bilmes, J.: Multi-document Summarization via Budgeted Maximization of Submodular Functions. In: HLT-NAACL, pp. 912–920 (2010)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM JRD 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)
Chapter Google Scholar
Mihalcea, R., Tarau, P.: Text-rank: bringing order into texts. In: EMNLP, pp. 404–411 (2004)
Google Scholar
Woodsend, K., Lapata, M.: Multiple Aspect Summarization Using Integer Linear Programming. In: EMNLP-CoNLL, pp. 233–243 (2012)
Google Scholar
Phan, X.-H., Nguyen, C.-T.: GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation, LDA (2007)
Google Scholar
Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: ICML, pp. 1081–1088 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Key Lab for Cloud Computing Technology and Applications, Peking University Shenzhen Graduate School, Shenzhen, Guangdong, P.R. China
Zhiming Zhang, Hongjie Li & Lian’en Huang

Authors

Zhiming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Lian’en Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jianyong Wang
Management Science and Information Systems Department, Rutgers, the State University of New Jersey, 1, Washington Park, 07102, Newark, NJ, USA
Hui Xiong
Department of Information Engineering, Nagoya University, 464-8601, Nagoya, Japan
Yoshiharu Ishikawa
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Jianliang Xu
School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
Junfeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Li, H., Huang, L. (2013). TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-38562-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics