Soft Computing

, Volume 21, Issue 7, pp 1785–1801 | Cite as

Query-based multi-documents summarization using linguistic knowledge and content word expansion

  • Asad Abdi
  • Norisma Idris
  • Rasim M. Alguliyev
  • Ramiz M. Aliguliyev
Methodologies and Application

Abstract

In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users’ requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

Keywords

Query-based multi-document summarization Graph-based sentence ranking Query expansion Extractive summarization 

References

  1. Abdi A, Idris N (2014) Automated summarization assessment system: quality assessment without a reference summary. In: The international conference on advances in applied science and environmental engineering (ASEE). IRED PressGoogle Scholar
  2. Abdi A, Idris N, Alguliev RM, Aliguliyev RM (2015) Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems. Inf Process Manag 51:340–358CrossRefGoogle Scholar
  3. Alguliev RM, Aliguliyev RM, Mehdiyev CA (2011) Sentence selection for generic document summarization using an adaptive differential evolution algorithm. SwarmEvol Comput 1:213–222CrossRefGoogle Scholar
  4. Aliguliyev RM (2009) A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst Appl 36:7764–7772CrossRefGoogle Scholar
  5. Aytar Y, Shah M, Luo J (2008) Utilizing semantic word similarity measures for video retrieval. In: IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 1–8Google Scholar
  6. Badrinath R, Venkatasubramaniyan S, Madhavan CV (2011) Improving query focused summarization using look-ahead strategy. In: Advances in information retrieval. Springer, pp 641–652Google Scholar
  7. Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11:203–224Google Scholar
  8. Burgess C, Livesay K, Lund K (1998) Explorations in context space: words, sentences, discourse. Discourse Process 25:211–257CrossRefGoogle Scholar
  9. Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41:535–543CrossRefGoogle Scholar
  10. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336Google Scholar
  11. Chali Y, Hasan SA, Joty SR (2011) Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels. Inf Process Manag 47:843–855CrossRefGoogle Scholar
  12. Conroy JM, Schlesinger JD, O’leary DP, Goldstein J (2006) Back to basics: CLASSY 2006. In: Proceedings of DUCGoogle Scholar
  13. Davidson I, Ravi S (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Knowledge discovery in databases: PKDD. Springer, pp 59–70Google Scholar
  14. Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479Google Scholar
  15. Favre B et al (2006) The LIA-Thales summarization system at DUC-2006. In: Proceedings of document understanding conference (DUC-2006), New York, USAGoogle Scholar
  16. Goldstein J, Mittal V, Carbonell J, Kantrowitz M (2000) Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP workshop on automatic summarization-volume 4. Association for Computational Linguistics, pp 40–48Google Scholar
  17. Guangbing Y (2014) A novel contextual topic model for query-focused multi-document summarization. In: IEEE 26th international conference on tools with artificial intelligence (ICTAI), 10–12 Nov 2014, pp 576–583. doi:10.1109/ICTAI.2014.92
  18. He Q, Hao H-W, Yin X-C (2012) Query-based automatic multi-document summarization extraction method for web pages. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 107–112Google Scholar
  19. Hoa H (2006) Overview of DUC 2006. In: Document understanding conference. New York CityGoogle Scholar
  20. Hu P, He T, Wang H (2010) Multi-view sentence ranking for query-biased summarization. In: 2010 international conference on computational intelligence and software engineering (CiSE). IEEE, pp 1–4Google Scholar
  21. Huang L, He Y, Wei F, Li W (2010) Modeling document summarization as multi-objective optimization. In: 2010 third international symposium on intelligent information technology and security informatics (IITSI). IEEE, pp 382–386Google Scholar
  22. Idris N, Baba S, Abdullah R (2009) A summary sentence decomposition algorithm for summarizing strategies identification. Comput Inf Sci 2:P200Google Scholar
  23. Jagarlamudi PPJ, Varma V (2006) Query independent sentence scoring approach to duc 2006. In: In Proceeding of document understanding conference (DUC-2006)Google Scholar
  24. Kanejiya D, Kumar A, Prasad S (2003) Automatic evaluation of students’ answers using syntactically enhanced LSA. In: Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-volume 2. Association for Computational Linguistics, pp 53–60Google Scholar
  25. Landauer TK (2002) On the computational basis of learning and cognition: arguments from LSA. Psychol Learn Motiv 41:43–84CrossRefGoogle Scholar
  26. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse process 25:259–284CrossRefGoogle Scholar
  27. Lee J-H, Park S, Ahn C-M, Kim D (2009) Automatic generic document summarization based on non-negative matrix factorization. Inf Process Manag 45:20–34CrossRefGoogle Scholar
  28. Li S, Ouyang Y, Sun B, Guo Z (2006a) Peking University at DUC 2006. In: Proceedings of DUC2006Google Scholar
  29. Li Y, McLean D, Bandar ZA, O’shea JD, Crockett K (2006b) Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng 18:1138–1150Google Scholar
  30. Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, pp 74–81Google Scholar
  31. Lloret E, Llorens H, Moreda P, Saquete E, Palomar M (2011) Text summarization contribution to semantic question answering: new approaches for finding answers on the web. Int J Intell Syst 26:1125–1152CrossRefGoogle Scholar
  32. Lu W, Cheng J, Yang Q (2012) Question answering system based on web. In: Proceedings of the 2012 fifth international conference on intelligent computation technology and automation. IEEE Computer Society, pp 573–576Google Scholar
  33. Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41:4158–4169CrossRefGoogle Scholar
  34. Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780Google Scholar
  35. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6:1–28CrossRefGoogle Scholar
  36. Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 915–922Google Scholar
  37. Ouyang Y, Li W, Li S, Lu Q (2010) Intertopic information mining for query-based summarization. J Am Soc Inf Sci Technol 61:1062–1072CrossRefGoogle Scholar
  38. Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237CrossRefGoogle Scholar
  39. Pandit SR, Potey M (2013) A query specific graph based approach to multi-document text summarization: simultaneous cluster and sentence ranking. In: 2013 international conference on machine intelligence and research advancement (ICMIRA). IEEE, pp 213–217Google Scholar
  40. Pérez D, Gliozzo AM, Strapparava C, Alfonseca E, Rodríguez P, Magnini B (2005) Automatic assessment of students’ free-text answers underpinned by the combination of a BLEU-inspired algorithm and latent semantic analysis. In: FLAIRS conference, pp 358–363Google Scholar
  41. Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21Google Scholar
  42. Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of. Addison-Wesley, ReadingGoogle Scholar
  43. Sarker A, Mollá D, Paris C (2013) An approach for query-focused text summarisation for evidence based medicine. In: Artificial intelligence in medicine. Springer, pp 295–304Google Scholar
  44. Shekhar S, Xiong H (2008) Nearest neighbor algorithm encyclopedia of GIS:771–771Google Scholar
  45. Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: SDM. SIAM, pp 1147–1158Google Scholar
  46. Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp 622–631Google Scholar
  47. Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: IJCAI, pp 2903–2908Google Scholar
  48. Warin M (2004) Using WordNet and semantic similarity to disambiguate an ontology retrieved 25 Jan 2008Google Scholar
  49. Wei F, Li W, He Y (2011) Document-aware graph models for query-oriented multi-document summarization. In: Multimedia analysis, processing and communications. Springer, pp 655–678Google Scholar
  50. Wiemer-Hastings P, Wiemer P (2000) Adding syntactic information to LSA. In: Proceedings of the 22nd annual meeting of the Cognitive Science Society. CiteseerGoogle Scholar
  51. Wiemer-Hastings P, Zipitria I (2001) Rules for syntax, vectors for semantics. In: Proceedings of the twenty-third annual conference of the Cognitive Science Society, pp 1112–1117Google Scholar
  52. Yang G, Wen D, Sutinen E(2013) A contextual query expansion based multi-document summarizer for smart learning. In: 2013 international conference on signal-image technology & internet-based systems (SITIS). IEEE, pp 1010–1016Google Scholar
  53. Ye S, Chua T-S (2006) NUS at DUC 2006: document concept lattice for summarization. In: Proceedings of DUCGoogle Scholar
  54. Zhang B et al (2005) Improving web search results using affinity graph. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 504–511Google Scholar
  55. Zhao L, Wu L, Huang X (2009) Using query expansion in graph-based approach for query-focused multi-document summarization. Inf Process Manag 45:35–41CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Asad Abdi
    • 1
  • Norisma Idris
    • 1
  • Rasim M. Alguliyev
    • 2
  • Ramiz M. Aliguliyev
    • 2
  1. 1.Department of Artificial Intelligence, Faculty of Computer Science and Information TechnologyUniversity of MalayaKuala LumpurMalaysia
  2. 2.Institute of Information TechnologyAzerbaijan National Academy of SciencesBakuAzerbaijan

Personalised recommendations