Skip to main content

TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization

  • Conference paper
Web-Age Information Management (WAIM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

Abstract

Multi-document summarization attempts to select the most important information to generate a compressed summary from a collection of documents. From the perspective of data reconstruction, a good summary may also well reconstruct the original documents. A document generally contains a variety of information centered around a main topic and covers different aspects of the main topic. In this paper we propose a novel model that combines data reconstruction and topic decomposition to summarize the documents, named TopicDSDR, which can not only best reconstruct the original documents but also capture the semantic similarity and main topics. We discuss two kinds of reconstructions: linear reconstruction and nonnegative reconstruction. We use the generalized Kullback-Leibler(KL) divergence as the loss function to evaluate the quality of summary for linear and nonnegative reconstruction and develop two new algorithms respectively. We conduct experiments on DUC2006 and DUC2007 summarization data sets, the experimental results demonstrate the effectiveness of our proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arora, R., Ravindran, B.: Latent dirichlet allocation based multi-document summarization. In: AND, pp. 91–97 (2008)

    Google Scholar 

  2. Arora, R., Ravindran, B.: Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization. In: ICDM, pp. 713–718 (2008)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR, 993–1022 (2003)

    Google Scholar 

  4. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 107–117 (1998)

    Google Scholar 

  5. Cai, D., He, X., Han, J., Huang, T.S.: Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell., 1548–1560 (2011)

    Google Scholar 

  6. Carbonell, J.G., Goldstein, J.: The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In: SIGIR, pp. 335–336 (1998)

    Google Scholar 

  7. Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM 16(2), 264–285 (1969)

    Article  MATH  Google Scholar 

  8. Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. (JAIR), 457–479 (2004)

    Google Scholar 

  9. Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: SIGIR, pp. 19–25 (2001)

    Google Scholar 

  10. Haghighi, A., Vanderwende, L.: Exploring Content Models for Multi-Document Summarization. In: HLT-NAACL, pp. 362–370 (2009)

    Google Scholar 

  11. He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document Summarization Based on Data Reconstruction. In: AAAI (2012)

    Google Scholar 

  12. Hennig, L.: Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis. In: International Conference RANLP, Borovets, Bulgaria, pp. 144–149 (2009)

    Google Scholar 

  13. Hofmann, T.: Probabilistic Latent Semantic Analysis. In: UAI, pp. 289–296 (1999)

    Google Scholar 

  14. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: SODA, pp. 668–677 (1998)

    Google Scholar 

  15. Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: NIPS, pp. 556–562 (2000)

    Google Scholar 

  16. Lee, J., Park, S., Ahn, C., Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inf. Process. Manage., 20–34 (2009)

    Google Scholar 

  17. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: ACL Workshop, pp. 74–81 (2004)

    Google Scholar 

  18. Lin, H., Bilmes, J.: Multi-document Summarization via Budgeted Maximization of Submodular Functions. In: HLT-NAACL, pp. 912–920 (2010)

    Google Scholar 

  19. Luhn, H.P.: The automatic creation of literature abstracts. IBM JRD 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  20. McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Mihalcea, R., Tarau, P.: Text-rank: bringing order into texts. In: EMNLP, pp. 404–411 (2004)

    Google Scholar 

  22. Woodsend, K., Lapata, M.: Multiple Aspect Summarization Using Integer Linear Programming. In: EMNLP-CoNLL, pp. 233–243 (2012)

    Google Scholar 

  23. Phan, X.-H., Nguyen, C.-T.: GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation, LDA (2007)

    Google Scholar 

  24. Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: ICML, pp. 1081–1088 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Z., Li, H., Huang, L. (2013). TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38562-9_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38561-2

  • Online ISBN: 978-3-642-38562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics