Query-based multi-documents summarization using linguistic knowledge and content word expansion

Abstract

In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users’ requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Abdi A, Idris N (2014) Automated summarization assessment system: quality assessment without a reference summary. In: The international conference on advances in applied science and environmental engineering (ASEE). IRED Press

  2. Abdi A, Idris N, Alguliev RM, Aliguliyev RM (2015) Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems. Inf Process Manag 51:340–358

    Article  Google Scholar 

  3. Alguliev RM, Aliguliyev RM, Mehdiyev CA (2011) Sentence selection for generic document summarization using an adaptive differential evolution algorithm. SwarmEvol Comput 1:213–222

    Article  Google Scholar 

  4. Aliguliyev RM (2009) A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst Appl 36:7764–7772

    Article  Google Scholar 

  5. Aytar Y, Shah M, Luo J (2008) Utilizing semantic word similarity measures for video retrieval. In: IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 1–8

  6. Badrinath R, Venkatasubramaniyan S, Madhavan CV (2011) Improving query focused summarization using look-ahead strategy. In: Advances in information retrieval. Springer, pp 641–652

  7. Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11:203–224

    Google Scholar 

  8. Burgess C, Livesay K, Lund K (1998) Explorations in context space: words, sentences, discourse. Discourse Process 25:211–257

    Article  Google Scholar 

  9. Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41:535–543

    Article  Google Scholar 

  10. Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336

  11. Chali Y, Hasan SA, Joty SR (2011) Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels. Inf Process Manag 47:843–855

    Article  Google Scholar 

  12. Conroy JM, Schlesinger JD, O’leary DP, Goldstein J (2006) Back to basics: CLASSY 2006. In: Proceedings of DUC

  13. Davidson I, Ravi S (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Knowledge discovery in databases: PKDD. Springer, pp 59–70

  14. Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Google Scholar 

  15. Favre B et al (2006) The LIA-Thales summarization system at DUC-2006. In: Proceedings of document understanding conference (DUC-2006), New York, USA

  16. Goldstein J, Mittal V, Carbonell J, Kantrowitz M (2000) Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP workshop on automatic summarization-volume 4. Association for Computational Linguistics, pp 40–48

  17. Guangbing Y (2014) A novel contextual topic model for query-focused multi-document summarization. In: IEEE 26th international conference on tools with artificial intelligence (ICTAI), 10–12 Nov 2014, pp 576–583. doi:10.1109/ICTAI.2014.92

  18. He Q, Hao H-W, Yin X-C (2012) Query-based automatic multi-document summarization extraction method for web pages. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 107–112

  19. Hoa H (2006) Overview of DUC 2006. In: Document understanding conference. New York City

  20. Hu P, He T, Wang H (2010) Multi-view sentence ranking for query-biased summarization. In: 2010 international conference on computational intelligence and software engineering (CiSE). IEEE, pp 1–4

  21. Huang L, He Y, Wei F, Li W (2010) Modeling document summarization as multi-objective optimization. In: 2010 third international symposium on intelligent information technology and security informatics (IITSI). IEEE, pp 382–386

  22. Idris N, Baba S, Abdullah R (2009) A summary sentence decomposition algorithm for summarizing strategies identification. Comput Inf Sci 2:P200

    Google Scholar 

  23. Jagarlamudi PPJ, Varma V (2006) Query independent sentence scoring approach to duc 2006. In: In Proceeding of document understanding conference (DUC-2006)

  24. Kanejiya D, Kumar A, Prasad S (2003) Automatic evaluation of students’ answers using syntactically enhanced LSA. In: Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-volume 2. Association for Computational Linguistics, pp 53–60

  25. Landauer TK (2002) On the computational basis of learning and cognition: arguments from LSA. Psychol Learn Motiv 41:43–84

    Article  Google Scholar 

  26. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse process 25:259–284

    Article  Google Scholar 

  27. Lee J-H, Park S, Ahn C-M, Kim D (2009) Automatic generic document summarization based on non-negative matrix factorization. Inf Process Manag 45:20–34

    Article  Google Scholar 

  28. Li S, Ouyang Y, Sun B, Guo Z (2006a) Peking University at DUC 2006. In: Proceedings of DUC2006

  29. Li Y, McLean D, Bandar ZA, O’shea JD, Crockett K (2006b) Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng 18:1138–1150

  30. Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, pp 74–81

  31. Lloret E, Llorens H, Moreda P, Saquete E, Palomar M (2011) Text summarization contribution to semantic question answering: new approaches for finding answers on the web. Int J Intell Syst 26:1125–1152

    Article  Google Scholar 

  32. Lu W, Cheng J, Yang Q (2012) Question answering system based on web. In: Proceedings of the 2012 fifth international conference on intelligent computation technology and automation. IEEE Computer Society, pp 573–576

  33. Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41:4158–4169

    Article  Google Scholar 

  34. Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780

  35. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6:1–28

    Article  Google Scholar 

  36. Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 915–922

  37. Ouyang Y, Li W, Li S, Lu Q (2010) Intertopic information mining for query-based summarization. J Am Soc Inf Sci Technol 61:1062–1072

    Article  Google Scholar 

  38. Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237

    Article  Google Scholar 

  39. Pandit SR, Potey M (2013) A query specific graph based approach to multi-document text summarization: simultaneous cluster and sentence ranking. In: 2013 international conference on machine intelligence and research advancement (ICMIRA). IEEE, pp 213–217

  40. Pérez D, Gliozzo AM, Strapparava C, Alfonseca E, Rodríguez P, Magnini B (2005) Automatic assessment of students’ free-text answers underpinned by the combination of a BLEU-inspired algorithm and latent semantic analysis. In: FLAIRS conference, pp 358–363

  41. Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21

  42. Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of. Addison-Wesley, Reading

  43. Sarker A, Mollá D, Paris C (2013) An approach for query-focused text summarisation for evidence based medicine. In: Artificial intelligence in medicine. Springer, pp 295–304

  44. Shekhar S, Xiong H (2008) Nearest neighbor algorithm encyclopedia of GIS:771–771

  45. Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: SDM. SIAM, pp 1147–1158

  46. Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp 622–631

  47. Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: IJCAI, pp 2903–2908

  48. Warin M (2004) Using WordNet and semantic similarity to disambiguate an ontology retrieved 25 Jan 2008

  49. Wei F, Li W, He Y (2011) Document-aware graph models for query-oriented multi-document summarization. In: Multimedia analysis, processing and communications. Springer, pp 655–678

  50. Wiemer-Hastings P, Wiemer P (2000) Adding syntactic information to LSA. In: Proceedings of the 22nd annual meeting of the Cognitive Science Society. Citeseer

  51. Wiemer-Hastings P, Zipitria I (2001) Rules for syntax, vectors for semantics. In: Proceedings of the twenty-third annual conference of the Cognitive Science Society, pp 1112–1117

  52. Yang G, Wen D, Sutinen E(2013) A contextual query expansion based multi-document summarizer for smart learning. In: 2013 international conference on signal-image technology & internet-based systems (SITIS). IEEE, pp 1010–1016

  53. Ye S, Chua T-S (2006) NUS at DUC 2006: document concept lattice for summarization. In: Proceedings of DUC

  54. Zhang B et al (2005) Improving web search results using affinity graph. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 504–511

  55. Zhao L, Wu L, Huang X (2009) Using query expansion in graph-based approach for query-focused multi-document summarization. Inf Process Manag 45:35–41

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Asad Abdi.

Ethics declarations

Conflict of interest

I hereby and on behalf of the co-authors declare all the authors agreed to submit the article exclusively to this journal and also declare that there is no conflict of interests regarding the publication of this article.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abdi, A., Idris, N., Alguliyev, R.M. et al. Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21, 1785–1801 (2017). https://doi.org/10.1007/s00500-015-1881-4

Download citation

Keywords

  • Query-based multi-document summarization
  • Graph-based sentence ranking
  • Query expansion
  • Extractive summarization