Multiple Documents Summarization Based on Genetic Algorithm

  • Derong Liu
  • Yongcheng Wang
  • Chuanhan Liu
  • Zhiqi Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4223)


With the increasing volume of online information, it is more important to automatically extract the core content from lots of information sources. We propose a model for multiple documents summarization that maximize the coverage of topics and minimize the redundancy of contents. Based on Chinese concept lexicon and corpus, the proposed model can analyze the topic of each document, their relationships and the central theme of the collection to evaluate sentences. We present different approaches to determine which sentences are appropriate for the extraction on the basis of sentences weight and their relevance from the related documents. A genetic algorithm is designed to improve the quality of the summarization. The experimental results indicate that it is useful and effective to improve the quality of multiple documents summarization using genetic algorithm.


Genetic Algorithm Information Entropy Concept Space Concept Cohesion Theme Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mani, I.: Automatic Summarization. John Benjamins, Amsterdam (2001)MATHGoogle Scholar
  2. 2.
    Gregory, H.: An Efficient Text Summarizer using Lexical Chains. In: NAACL-ANLP 2000 Workshop (2000)Google Scholar
  3. 3.
    White, M., et al.: Multi-document Summarization via Information Extraction. In: First International Conference on Human Language Technology Research (HLT) (2001)Google Scholar
  4. 4.
    Fung, P., et al.: Combining Optimal Clustering and Hidden Markov Models for Extractive Summarization. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, Springer, Heidelberg (2004)Google Scholar
  5. 5.
    HTRDP Evaluations (2004),
  6. 6.
    Dragomir, R.: A common theory of information fusion from multiple text sources, step one: Crossdocument structure. In: Proceedings of the 1st ACL SIGDIAL 2000 (2000)Google Scholar
  7. 7.
    Liu, D., et al.: Study of concept cohesion based on lexicon and corpus. In: The 1st National Conference on Information Retrieval and Content Security (2004)Google Scholar
  8. 8.
    Zengdong, D., Qiang, D.: Hownet,
  9. 9.
    Jiaju, M., Yiming, Z.: Synonym Thesaurus (1983)Google Scholar
  10. 10.
    Carbonell, J., et al.: The use of MMR, diversity-based reranking for reordering documents and producing summarization. In: Proceedings of SIGIN 1998 (1998)Google Scholar
  11. 11.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)Google Scholar
  12. 12.
    Dragomir, R., et al.: Evaluation challenges in large-scale document summarization. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Derong Liu
    • 1
    • 2
  • Yongcheng Wang
    • 1
  • Chuanhan Liu
    • 1
  • Zhiqi Wang
    • 1
  1. 1.Dept. of Comp. Sci. and EngineeringShanghai Jiao Tong University 
  2. 2.Merchant Marine CollegeShanghai Maritime University 

Personalised recommendations