Multi-document Summarization Based on Sentence Features and Frequent Itemsets

Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 166)


Information retrieval is the process of searching for information and related knowledge within the collected documents or from the web Users and are presented with vast information which suffers from redundancy and irrelevance. Searching for the required information from this huge collection is a tiresome task. This motivated the researchers to provide high quality summary that allows the user to quickly locate the desired information. In this paper an attempt is made to improve the performance of summarization technique using the sentence features as length, position, centriod, Noun and by adding the new feature Noun-Verb pair. The second technique exploits modified FIS – Frequent Itemset Sequence generation algorithm for summarization. The redundancy elimination techniques are applied to achieve the efficient summary from various documents. The performance of proposed algorithms is compared with the existing MEAD summarization technique by considering F-measure. Introduction of Noun –Verb pair improves the quality of summarization compared to existing MEAD and our proposed FIS technique.


Multi-document summarization Query based summary generic summary frequent item set 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Radev, D.R., Jing, H., Stys, M., Tam, D.: Centriod based summarization of multiple documents. Information Processing and Management 6, 869–1038 (2004)Google Scholar
  2. 2.
    Radev, D.R., Fan, W.: Automatic summarization of search engine hit lists. In: Proceedings of the ACL 2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval, vol. 11, pp. 99–109 (2000)Google Scholar
  3. 3.
    Khan, A.U., Khan, S., Mahmood, W.: MRST:A NewTechnique for Information Summarization. Proceedings of World Academy of Science, Engineering and Technology 4, 249–255 (2005)Google Scholar
  4. 4.
    Zhang, S., Zhao, T., Zheng, D., Zhao, H.: Two stage sentence selection approach for multi-Document summarization. Journal of Electronics 2(4), 562–567 (2008)Google Scholar
  5. 5.
    Wei, F., He, Y., Li, W., Lu, Q.: A Query-Sensitive Graph-Based Sentence Ranking Algorithm for Query-Oriented Multi-Document Summarization. In: International Symposiums on Information Processing, pp. 9–13 (2008)Google Scholar
  6. 6.
    Yang, X.-P., Liu, X.-R.: Personalized Multi-Document Summarization in Information Retrieval. In: Seventh International Conference on Machine Learning and Cybernetics, Kunming, July 12-15, pp. 4108–4112 (2008)Google Scholar
  7. 7.
    Wang, D., Li, T., Zhu, S., Ding, C.: Multi-Document Summarization via Sentence–Level Semantic Analysis and Symmetric Matrix Factorization. In: SIGIR, Singapore, July 20-24, pp. 307–314 (2008)Google Scholar
  8. 8.
    Hachey, B.: Multi-Document Summarization Using Generic Relation Extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 420–429 (2009)Google Scholar
  9. 9.
    Ali, M.M., Ghosh, M.K., Al-Mamun, A.: Multi-document Text Summarization: Sim With First Based Features and Sentence Co-selection Based Evaluation. In: International Conference on Future Computer and Communication, April 3-5, vol. 12, pp. 93–96 (2009) ISBN–13: 978-0-7695-3591-3Google Scholar
  10. 10.
    Huang, L., He, Y., Wei, F., Li, W.: Modeling Document Summarization as Multi-objective Optimization. In: Third International Symposium on Intelligent Information Technology and Security Informatics, April 2-4, pp. 382–386 (2010)Google Scholar
  11. 11.
    Gong, S., Qu, Y., Tian, S.: Subtopic-based Multidocuments Summarization. In: Third International Joint Conference on Computational Science and Optimization, pp. 382–386 (2010)Google Scholar
  12. 12.
    Kogilavani, A., Balasubramani, P.: Clustering and Feature Specific Sentence Extraction Based Summarization of Multiple Documents. International Journal of Computer Science & Information Technology (IJCSIT) 2(4), 99–111 (2010)CrossRefGoogle Scholar
  13. 13.
    Frakes, W.B., Fox, C.J.: Strength and Similarity of Affix Removal Stemming Algorithms. In: ACM SIGIR Forum, pp. 26–30 (2003)Google Scholar
  14. 14.
    Harman, D.: How Effective is Suffixing. Journal of the American Society for Information Science 42(1), 7–15 (1991)CrossRefGoogle Scholar
  15. 15.
    Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistic 11, 22–31 (1968)Google Scholar
  16. 16.
    Paice, C.D.: Another Stemmer. In: SIGIR Forum, vol. 24(3), pp. 56–61 (1990)Google Scholar
  17. 17.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program. 14, 130–137 (1980)CrossRefGoogle Scholar
  18. 18.
    Fung, B., Wnag, K., Ester, M.: Hierarchical Document Clustering using Frequent itemsets. In: SIAM International Conference on Data Mining, SDM 2003, pp. 59–70 (2003)Google Scholar
  19. 19.
    Li, Y., Chung, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Journal, of Data & Knowledge Engineering 64(1), 381–404 (2008)CrossRefGoogle Scholar
  20. 20.
  21. 21.
    Prathima, Y., Supreethi, K.P.: A Survey Paper on Concept Based Text Clustering. International Journal of Research in IT & Management 1(3), 45–60 (2011)Google Scholar
  22. 22.
    Wang, D., Li, T.: Document Update Summarization using Incremental Hierarchical Clustering. In: Proceedings of the Conference on Information and Knowledge Management (CIKM 2010), pp. 279–288 (October 2010)Google Scholar
  23. 23.
    Xiong, Y., Liu, H., Li, L.: Multi-Document Summarization Based on Improved Features and Clustering. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 1–5 (August 2010)Google Scholar
  24. 24.
    Ge, S.S., Zhang, Z., He, H.: Weighted Graph Model Based Sentence Clustering and Ranking for Document Summarization. In: 4th International Conference on Interaction Sciences (ICIS), pp. 90–95 (August 2011)Google Scholar
  25. 25.
    Alguliev, R.M., Aliguliyev, R.M., Mehdiyev, C.A.: An Optimization Model and DPSO–EDA for Document Summarization I. J. of Information Technology and Computer Science (5), 59–68 (2011)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringPondicherry Engineering CollegePuducherryIndia
  2. 2.Department of Information TechnologyPondicherry Engineering CollegePuducherryIndia

Personalised recommendations