Abstract
Multi-document summarization attempts to select the most important information to generate a compressed summary from a collection of documents. From the perspective of data reconstruction, a good summary may also well reconstruct the original documents. A document generally contains a variety of information centered around a main topic and covers different aspects of the main topic. In this paper we propose a novel model that combines data reconstruction and topic decomposition to summarize the documents, named TopicDSDR, which can not only best reconstruct the original documents but also capture the semantic similarity and main topics. We discuss two kinds of reconstructions: linear reconstruction and nonnegative reconstruction. We use the generalized Kullback-Leibler(KL) divergence as the loss function to evaluate the quality of summary for linear and nonnegative reconstruction and develop two new algorithms respectively. We conduct experiments on DUC2006 and DUC2007 summarization data sets, the experimental results demonstrate the effectiveness of our proposed methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: AND, pp. 91–97 (2008)
Arora, R., Ravindran, B.: Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization. In: ICDM, pp. 713–718 (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR, 993–1022 (2003)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 107–117 (1998)
Cai, D., He, X., Han, J., Huang, T.S.: Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell., 1548–1560 (2011)
Carbonell, J.G., Goldstein, J.: The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In: SIGIR, pp. 335–336 (1998)
Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM 16(2), 264–285 (1969)
Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. (JAIR), 457–479 (2004)
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: SIGIR, pp. 19–25 (2001)
Haghighi, A., Vanderwende, L.: Exploring Content Models for Multi-Document Summarization. In: HLT-NAACL, pp. 362–370 (2009)
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document Summarization Based on Data Reconstruction. In: AAAI (2012)
Hennig, L.: Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis. In: International Conference RANLP, Borovets, Bulgaria, pp. 144–149 (2009)
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: UAI, pp. 289–296 (1999)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: SODA, pp. 668–677 (1998)
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: NIPS, pp. 556–562 (2000)
Lee, J., Park, S., Ahn, C., Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inf. Process. Manage., 20–34 (2009)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: ACL Workshop, pp. 74–81 (2004)
Lin, H., Bilmes, J.: Multi-document Summarization via Budgeted Maximization of Submodular Functions. In: HLT-NAACL, pp. 912–920 (2010)
Luhn, H.P.: The automatic creation of literature abstracts. IBM JRD 2(2), 159–165 (1958)
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)
Mihalcea, R., Tarau, P.: Text-rank: bringing order into texts. In: EMNLP, pp. 404–411 (2004)
Woodsend, K., Lapata, M.: Multiple Aspect Summarization Using Integer Linear Programming. In: EMNLP-CoNLL, pp. 233–243 (2012)
Phan, X.-H., Nguyen, C.-T.: GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation, LDA (2007)
Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: ICML, pp. 1081–1088 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Z., Li, H., Huang, L. (2013). TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)