Query-Focused Summarization by Combining Topic Model and Affinity Propagation

  • Dewei Chen
  • Jie Tang
  • Limin Yao
  • Juanzi Li
  • Lizhu Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5446)


The goal of query-focused summarization is to extract a summary for a given query from the document collection. Although much work has been done for this problem, there are still many challenging issues: (1) The length of the summary is predefined by, for example, the number of word tokens or the number of sentences. (2) A query usually asks for information of several perspectives (topics); however existing methods cannot capture topical aspects with respect to the query. In this paper, we propose a novel approach by combining statistical topic model and affinity propagation. Specifically, the topic model, called qLDA, can simultaneously model documents and the query. Moreover, the affinity propagation can automatically discover key sentences from the document collection without predefining the length of the summary. Experimental results on DUC05 and DUC06 data sets show that our approach is effective and the summarization performance is better than baseline methods.


Topic Model Latent Dirichlet Allocation Affinity Propagation Document Cluster Latent Semantic Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barzilay, R., Lee, L.: Catching the drift: probabilistic content models, with applications to generation and summarization. In: Proceedings of HLT-NAACL 2004 (2004)Google Scholar
  2. 2.
    Bhandari, H., Shimbo, M., Ito, T., Matsumoto, Y.: Generic text summarization using probabilistic latent semantic indexing. In: Proceedings of IJCNLP 2008 (2008)Google Scholar
  3. 3.
    Blei, D., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. JMLR 3, 993–1022 (2003)zbMATHGoogle Scholar
  4. 4.
    Blei, D., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of NIPS 2004 (2004)Google Scholar
  5. 5.
    Chen, B., Chen, Y.: Word Topical Mixture Models for Extractive Spoken Document Summarization. In: Proceedings of ICME 2007 (2007)Google Scholar
  6. 6.
    Conroy, J., Schlesinger, J., O’Leary, D.: Topic Focused Multi-document Summarization Using an Approximate Oracle Score. In: Proceedings of ACL 2006 (2006)Google Scholar
  7. 7.
    Daumé III, H., Marcu, D.: Bayesian Query-Focused Summarization. In: Proceedings of ACL 2006 (2006)Google Scholar
  8. 8.
    Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Griffiths, T., Steyvers, M.: Finding scientific topics. In: Proceedings of NAS, pp. 5228–5235 (2004)Google Scholar
  10. 10.
    Harabagiu, S., Lacatusu, F.: Topic Themes for Multi-Document Summarization. In: Proceedings of SIGIR 2005 (2005)Google Scholar
  11. 11.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR 1999 (1999)Google Scholar
  12. 12.
    Kong, S., Lee, L.: Improved Spoken Document Summarization Using Probabilistic Latent Semantic Analysis (PLSA). In: Proceedings of ICASS 2006 (2006)Google Scholar
  13. 13.
    Kumar, R., Mahadevan, U., Sivakumar, D.: A graph-theoretic approach to extract storylines from search results. In: Proceedings of KDD 2004, pp. 216–225 (2004)Google Scholar
  14. 14.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics, vol. 22, pp. 79–86 (1951)Google Scholar
  15. 15.
    Li, W., Li, W., Li, B., Chen, Q., Wu, M.: The Hong Kong Polytechnic University at DUC2005. In: Proceedings of DUC 2005 (2005)Google Scholar
  16. 16.
    Lin, C., Hovy, E.: The Automatic Acquisition of Topic Signatures for Text Summarization. In: Proceedings of COLING 2000 (2000)Google Scholar
  17. 17.
    Lin, C., Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL 2003 (2003)Google Scholar
  18. 18.
    Mei, Q., Ling, X., Wondra, M., Su, H., Zhai, C.: Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. In: Proceedings of WWW 2007 (2007)Google Scholar
  19. 19.
    Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of SIGIR 2006 (2006)Google Scholar
  20. 20.
    Page, L., Brin, S., Motwani, R., Winograd, T.: PageRank Bringing Order to the Web. Stanford University (1999)Google Scholar
  21. 21.
    Steyvers, M., Smyth, P., Griffiths, T.: Probabilistic author topic models for information discovery. In: Proceedings of SIGKDD 2004, pp. 306–315 (2004)Google Scholar
  22. 22.
    Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: Proceedings of SDM 2009 (2009)Google Scholar
  23. 23.
    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of SIGKDD 2008, pp. 990–998 (2008)Google Scholar
  24. 24.
    Wei, X., Bruce Croft, W.: LDA-based document models for Ad-hoc retrieval. In: Proceedings of SIGIR 2006 (2006)Google Scholar
  25. 25.
    Ye, S., Qiu, L., Chua, T., Kan, M.: NUS at DUC2005: Understanding documents via concept links. In: Proceedings of DUC 2005 (2005)Google Scholar
  26. 26.
    Yih, W., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dewei Chen
    • 1
  • Jie Tang
    • 1
  • Limin Yao
    • 2
  • Juanzi Li
    • 1
  • Lizhu Zhou
    • 1
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityChina
  2. 2.Department of Computer ScienceUniversity of Massachusetts AmherstUSA

Personalised recommendations