Skip to main content
Log in

Query-based multi-documents summarization using linguistic knowledge and content word expansion

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users’ requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdi A, Idris N (2014) Automated summarization assessment system: quality assessment without a reference summary. In: The international conference on advances in applied science and environmental engineering (ASEE). IRED Press

  • Abdi A, Idris N, Alguliev RM, Aliguliyev RM (2015) Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems. Inf Process Manag 51:340–358

    Article  Google Scholar 

  • Alguliev RM, Aliguliyev RM, Mehdiyev CA (2011) Sentence selection for generic document summarization using an adaptive differential evolution algorithm. SwarmEvol Comput 1:213–222

    Article  Google Scholar 

  • Aliguliyev RM (2009) A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Syst Appl 36:7764–7772

    Article  Google Scholar 

  • Aytar Y, Shah M, Luo J (2008) Utilizing semantic word similarity measures for video retrieval. In: IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 1–8

  • Badrinath R, Venkatasubramaniyan S, Madhavan CV (2011) Improving query focused summarization using look-ahead strategy. In: Advances in information retrieval. Springer, pp 641–652

  • Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11:203–224

    Google Scholar 

  • Burgess C, Livesay K, Lund K (1998) Explorations in context space: words, sentences, discourse. Discourse Process 25:211–257

    Article  Google Scholar 

  • Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41:535–543

    Article  Google Scholar 

  • Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336

  • Chali Y, Hasan SA, Joty SR (2011) Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels. Inf Process Manag 47:843–855

    Article  Google Scholar 

  • Conroy JM, Schlesinger JD, O’leary DP, Goldstein J (2006) Back to basics: CLASSY 2006. In: Proceedings of DUC

  • Davidson I, Ravi S (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. In: Knowledge discovery in databases: PKDD. Springer, pp 59–70

  • Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Google Scholar 

  • Favre B et al (2006) The LIA-Thales summarization system at DUC-2006. In: Proceedings of document understanding conference (DUC-2006), New York, USA

  • Goldstein J, Mittal V, Carbonell J, Kantrowitz M (2000) Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP workshop on automatic summarization-volume 4. Association for Computational Linguistics, pp 40–48

  • Guangbing Y (2014) A novel contextual topic model for query-focused multi-document summarization. In: IEEE 26th international conference on tools with artificial intelligence (ICTAI), 10–12 Nov 2014, pp 576–583. doi:10.1109/ICTAI.2014.92

  • He Q, Hao H-W, Yin X-C (2012) Query-based automatic multi-document summarization extraction method for web pages. In: Proceedings of the 2011 2nd international congress on computer applications and computational science. Springer, pp 107–112

  • Hoa H (2006) Overview of DUC 2006. In: Document understanding conference. New York City

  • Hu P, He T, Wang H (2010) Multi-view sentence ranking for query-biased summarization. In: 2010 international conference on computational intelligence and software engineering (CiSE). IEEE, pp 1–4

  • Huang L, He Y, Wei F, Li W (2010) Modeling document summarization as multi-objective optimization. In: 2010 third international symposium on intelligent information technology and security informatics (IITSI). IEEE, pp 382–386

  • Idris N, Baba S, Abdullah R (2009) A summary sentence decomposition algorithm for summarizing strategies identification. Comput Inf Sci 2:P200

    Google Scholar 

  • Jagarlamudi PPJ, Varma V (2006) Query independent sentence scoring approach to duc 2006. In: In Proceeding of document understanding conference (DUC-2006)

  • Kanejiya D, Kumar A, Prasad S (2003) Automatic evaluation of students’ answers using syntactically enhanced LSA. In: Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-volume 2. Association for Computational Linguistics, pp 53–60

  • Landauer TK (2002) On the computational basis of learning and cognition: arguments from LSA. Psychol Learn Motiv 41:43–84

    Article  Google Scholar 

  • Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse process 25:259–284

    Article  Google Scholar 

  • Lee J-H, Park S, Ahn C-M, Kim D (2009) Automatic generic document summarization based on non-negative matrix factorization. Inf Process Manag 45:20–34

    Article  Google Scholar 

  • Li S, Ouyang Y, Sun B, Guo Z (2006a) Peking University at DUC 2006. In: Proceedings of DUC2006

  • Li Y, McLean D, Bandar ZA, O’shea JD, Crockett K (2006b) Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng 18:1138–1150

  • Lin C-Y (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, pp 74–81

  • Lloret E, Llorens H, Moreda P, Saquete E, Palomar M (2011) Text summarization contribution to semantic question answering: new approaches for finding answers on the web. Int J Intell Syst 26:1125–1152

    Article  Google Scholar 

  • Lu W, Cheng J, Yang Q (2012) Question answering system based on web. In: Proceedings of the 2012 fifth international conference on intelligent computation technology and automation. IEEE Computer Society, pp 573–576

  • Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41:4158–4169

    Article  Google Scholar 

  • Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780

  • Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6:1–28

    Article  Google Scholar 

  • Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, pp 915–922

  • Ouyang Y, Li W, Li S, Lu Q (2010) Intertopic information mining for query-based summarization. J Am Soc Inf Sci Technol 61:1062–1072

    Article  Google Scholar 

  • Ouyang Y, Li W, Li S, Lu Q (2011) Applying regression models to query-focused multi-document summarization. Inf Process Manag 47:227–237

    Article  Google Scholar 

  • Pandit SR, Potey M (2013) A query specific graph based approach to multi-document text summarization: simultaneous cluster and sentence ranking. In: 2013 international conference on machine intelligence and research advancement (ICMIRA). IEEE, pp 213–217

  • Pérez D, Gliozzo AM, Strapparava C, Alfonseca E, Rodríguez P, Magnini B (2005) Automatic assessment of students’ free-text answers underpinned by the combination of a BLEU-inspired algorithm and latent semantic analysis. In: FLAIRS conference, pp 358–363

  • Saggion H, Poibeau T (2013) Automatic text summarization: past, present and future. In: Multi-source, multilingual information extraction and summarization. Springer, pp 3–21

  • Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of. Addison-Wesley, Reading

  • Sarker A, Mollá D, Paris C (2013) An approach for query-focused text summarisation for evidence based medicine. In: Artificial intelligence in medicine. Springer, pp 295–304

  • Shekhar S, Xiong H (2008) Nearest neighbor algorithm encyclopedia of GIS:771–771

  • Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: SDM. SIAM, pp 1147–1158

  • Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, pp 622–631

  • Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: IJCAI, pp 2903–2908

  • Warin M (2004) Using WordNet and semantic similarity to disambiguate an ontology retrieved 25 Jan 2008

  • Wei F, Li W, He Y (2011) Document-aware graph models for query-oriented multi-document summarization. In: Multimedia analysis, processing and communications. Springer, pp 655–678

  • Wiemer-Hastings P, Wiemer P (2000) Adding syntactic information to LSA. In: Proceedings of the 22nd annual meeting of the Cognitive Science Society. Citeseer

  • Wiemer-Hastings P, Zipitria I (2001) Rules for syntax, vectors for semantics. In: Proceedings of the twenty-third annual conference of the Cognitive Science Society, pp 1112–1117

  • Yang G, Wen D, Sutinen E(2013) A contextual query expansion based multi-document summarizer for smart learning. In: 2013 international conference on signal-image technology & internet-based systems (SITIS). IEEE, pp 1010–1016

  • Ye S, Chua T-S (2006) NUS at DUC 2006: document concept lattice for summarization. In: Proceedings of DUC

  • Zhang B et al (2005) Improving web search results using affinity graph. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 504–511

  • Zhao L, Wu L, Huang X (2009) Using query expansion in graph-based approach for query-focused multi-document summarization. Inf Process Manag 45:35–41

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asad Abdi.

Ethics declarations

Conflict of interest

I hereby and on behalf of the co-authors declare all the authors agreed to submit the article exclusively to this journal and also declare that there is no conflict of interests regarding the publication of this article.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdi, A., Idris, N., Alguliyev, R.M. et al. Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21, 1785–1801 (2017). https://doi.org/10.1007/s00500-015-1881-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1881-4

Keywords

Navigation