Multi-Document Automatic Summarization Based on the Hierarchical Topics

  • Yong-Dong Xu
  • Fang Xu
  • Guang-Ri Quan
  • Ya-Dong Wang
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 135)


A concept of is proposed for multi-document automatic summarization task, which used multi-layer topic tree structure to represent the text set. Each node in the topic tree represent specific topic and contains multiple similar sentences in the text set. The structure may describe accurately the similarity between sentences at different levels of granularity. Therefore it can reflect the real content of the text set than single layer topic set. And can be used to find the important sentences in the important topic which can compose the summary of the text set. Concretely, a series of algorithms including building tree, key sentences extraction based on tree and summarization generation are proposed. The capability of summarization system is testified by sets of experiments and shows good result.


Key sentences extraction Multiple document summarization Hierarchical topic 



This work is supported by China National Natural Science Foundation (60803092), Promotive research fund for excellent young and middle-aged scientists of Shandong Province (2010BSA10014) and WeiHai City Science & Technology Fund Planning Project (2010-3-96)


  1. 1.
    Dragomir RR, Hongyan J, Malgorzata B (2000) Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In: ANLP/NAACL Workshop on summarization, Seattle, WA, Apr 2000Google Scholar
  2. 2.
    Hilda H (2001) Cross-document summarization by concept classification. Workshop on text summarization (DUC 2001). New Orleans, 2001Google Scholar
  3. 3.
    Boros E, Kantor PB, Neu DJ (2001) A clustering based approach to creating multi-document summaries. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans,Google Scholar
  4. 4.
    Guo Y, Stylios G (2003) A new system. In: Proceedings of the document understanding conferenceGoogle Scholar
  5. 5.
    Carenini G, Ng R, Pauls A (2006) Multi-Document summarization of evaluative text. In: Proceedings of the conference of the european chapter of the association for computational linguistics (EACL), TrentoGoogle Scholar
  6. 6.
    Hu M, Liu B. (2004) Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, pp 168–177Google Scholar
  7. 7.
    Titov I, McDonald R (2008) A joint model of text and aspect ratings for sentiment summarization. 46th meeting of association for computational linguistics(ACL), Ohio, USA, pp 308–316 Google Scholar
  8. 8.
    Ku W, Liang YT, Chen HH (2006) Opinion extraction,summarization and tracking in news and blog cor pora. In: Proceedings of the AAAI-2006 spring symposium on computational approaches to analyzing weblogs (AAAI-CAAW), pp 100-107 Google Scholar
  9. 9.
    Xu YD et al (2005) Using multiple features and statistical model to calculate text units similarity, In: Proceedings of the 4th international conference on machine learning and cybernetics IEEE, Guangzhou, pp 19–21 Aug 2005Google Scholar
  10. 10.
    García JA, Fdez-Valdivia J, Cortijo FJ, Molina R (1994) A dynamic approach for clustering data. Signal Process 44(2):181–196CrossRefGoogle Scholar
  11. 11.
    Nobata C, Sekine S (2003) Results of CRL/NYU system at DUC-2003 and an experiment on division of document Sets. In: 2003 document understanding conference draft papers, pp 79–85Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Yong-Dong Xu
    • 1
  • Fang Xu
    • 1
  • Guang-Ri Quan
    • 1
  • Ya-Dong Wang
    • 2
  1. 1.Harbin Institute of Technology at WeiHaiWei HaiChina
  2. 2.Harbin Institute of TechnologyHarbinChina

Personalised recommendations