Graph-based extractive text summarization based on single document

Yadav, Avaneesh Kumar; Ranvijay; Yadav, Rama Shankar; Maurya, Ashish Kumar

doi:10.1007/s11042-023-16199-8

Graph-based extractive text summarization based on single document

Published: 25 July 2023

Volume 83, pages 18987–19013, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Avaneesh Kumar Yadav ORCID: orcid.org/0000-0001-9600-7490¹,
Ranvijay¹,
Rama Shankar Yadav¹ &
…
Ashish Kumar Maurya¹

491 Accesses
Explore all metrics

Abstract

Day by day, the amount of online and offline text data is growing tremendously from various sources like legal documents, medical documents, news articles, etc. Manual text summarization of large documents is unfeasible and costly because it takes much time and requires more effort. As a consequence, various graph-based text summarization techniques have been designed which provide thoroughly and well-prepared summaries of documents. The problems issues that exist in these techniques are redundancy of data, loss of information and readability. To overcome these problems, we have proposed a textual graph-based extractive text summarization technique called TGETS, for extracting essential information from a single document. In the proposed approach, a graph’s node is denoted as group of sentences in the document and an edge of the graph is represented as an association between two sentences. The summary generation is based on the sum of sentence weight and the average weight of the textual graph. The performance of proposed approach is evaluated on the BBC news articles dataset through the ROUGE-metric (\(R_1\) and \(R_2\)). The proposed approach in the range of 100-200 words length summary offers better scores of 19.88%, 38.76%, and 30.73% for \(R_1\) under precision, recall and \(F_1\)-score with respect to the existing PageRank (PR) method. Similarly, for \(R_2\), the proposed approach exceeds by 32%, 26.99%, and 29.01% for precision, recall, and \(F_1\)-score with respect to existing PageRank (PR) method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Automated identification of media bias in news articles: an interdisciplinary literature review

Article Open access 16 November 2018

Recent automatic text summarization techniques: a survey

Article 29 March 2016

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

Data availability

The dataset is available on demand.

References

Alami N, Mallahi ME, Amakdouf H, Qjidaa H (2021) Hybrid method for text summarization based on statistical and semantic treatment. Multimed Tools Appl 80:19567–19600
Article Google Scholar
Al-Sabahi K, Zuping Z, Nadher M (2018) A hierarchical structured self-attentive model for extractive document summarization (HSAAS). IEEE Access 6:24205–24212
Article Google Scholar
Awan MN, Beg MO (2021) Top-rank: a topical position rank for extraction and classification of key phrases in text. Comput Speech Lang 65:101116
Article Google Scholar
Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization: an itemset mining and sentence clustering approach. J Biomed Inform 84:42–58
Article Google Scholar
Bahloul B, Aliane H, Benmohammed M (2020) Ara* summarizer: an Arabic text summarization system based on subtopic segmentation and using an a* algorithm for reduction. Expert Syst 37(2):e12476
Article Google Scholar
Belwal RC, Rai S, Gupta A (2021) A new graph-based extractive text summarization using keywords or topic modeling. J Ambient Intell Humaniz Comput 12(10):8975–8990
Article Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Article Google Scholar
Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41(2):535–543
Article Google Scholar
Chen Y-N, Huang Y, Yeh C-F, Lee L-S (2011) Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms. In: Twelfth Annual Conference of the International Speech Communication Association
Demange M, Di Fonso A, Di Stefano G, Vittorini P (2022) A graph theoretical approach to the firebreak locating problem. Theoret Comput Sci 914:47–72
Article MathSciNet Google Scholar
Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified pagerank algorithm. Egypt Inform J 21(2):73–81
Article Google Scholar
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) Edgesumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264
Article Google Scholar
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195
Article Google Scholar
Fattah MA, Ren F (2009) Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput Speech Lang 23(1):126–144
Article Google Scholar
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66
Article Google Scholar
Gambhir M, Gupta V (2022) Deep learning-based extractive text summarization with word-level attention mechanism. Multimed Tools Appl 81(15):20829–20852
Article Google Scholar
Gupta VK, Siddiqui TJ (2012) Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI). IEEE, pp 1–5
Hachey B, Murray G, Reitter D (2005) The Embra system at DUC 2005: query-oriented multi-document summarization with a very large latent semantic space. In: Proceedings of the Document Understanding Conference (DUC) 2005, Vancouver, BC, Canada
Jaradat YA, Al-Taani AT (2016) Hybrid-based Arabic single-document text summarization approach using genatic algorithm. In: 2016 7th International Conference on Information and Communication Systems (ICICS). IEEE, pp 85–91
Knight K, Marcu D (2002) Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artif Intell 139(1):91–107
Article Google Scholar
Knor M, Škrekovski R, Yero IG (2022) A note on the metric and edge metric dimensions of 2-connected graphs. Discret Appl Math 319:454–460
Article MathSciNet Google Scholar
Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international Conference on Information and Knowledge Management. pp 625–633
Krahmer E, Marsi E, van Pelt P (2008) Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusion. In: Proceedings of ACL-08: HLT, Short Papers. pp 193–196
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37:1–41
Article Google Scholar
Mahalleh ER, Gharehchopogh FS (2022) An automatic text summarization based on valuable sentences selection. Int J Inf Technol 14(6):2963–2969
Google Scholar
Mallick C, Das AK, Dutta M, Das AK, Sarkar A (2019) Graph-based text summarization using modified textrank. In: Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018. Springer, pp 137–146
Medelyan O (2007) Computing lexical chains with graph clustering. In: Proceedings of the ACL 2007 Student Research Workshop. pp 85–90
Miao L, Cao D, Li J, Guan W (2020) Multi-modal product title compression. Inf Process Manag 57(1):102123
Article Google Scholar
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. pp 404–411
Mihalcea R, Tarau P (2005) A language independent algorithm for single and multiple document summarization. In: Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts
Moawad IF, Aref M (2012) Semantic graph reduction approach for abstractive text summarization. In: 2012 Seventh International Conference on Computer Engineering & Systems (ICCES). IEEE, pp 132–138
Moratanch N, Chitrakala S (2023) Anaphora resolved abstractive text summarization (AR-ATS) system. Multimed Tools Appl 82(3):4569–4597
Article Google Scholar
Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359
Article Google Scholar
Nallapati R, Zhou B, Gulcehre C, Xiang B et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. Preprint at http://arxiv.org/abs/1602.06023
Nandhini K, Balasundaram SR (2013) Improving readability through extractive summarization for learners with reading difficulties. Egypt Inform J 14(3):195–204
Article Google Scholar
Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Information Processing & Management 56(6):102088
Article Google Scholar
Parveen D, Ramsl H-M, Strube M (2015) Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp 1949–1954
Plisson J, Lavrac N, Mladenic D et al (2004) A rule based approach to word lemmatization. In: Proceedings of IS, vol 3. pp 83–86
Rani R, Lobiyal D (2021) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80:3275–3305
Article Google Scholar
Roul RK (2021) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput 25:1113–1127
Article Google Scholar
Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect based multi-document summarization. In: 2016 International Conference on Computing, Communication and Automation (ICCCA). IEEE, pp 873–877
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207
Article Google Scholar
Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78:857–875
Article Google Scholar
Srivastava AK, Pandey D, Agarwal A (2021) Extractive multi-document text summarization using dolphin swarm optimization approach. Multimed Tools Appl 80:11273–11290
Article Google Scholar
Srivastava R, Singh P, Rana K, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636
Article Google Scholar
Steinberger J, Ježek K (2009) Update summarization based on latent semantic analysis. In: Text, Speech and Dialogue: 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings 12, Springer, pp 77–84
Tomer M, Kumar M (2022) Multi-document extractive text summarization based on firefly algorithm. J King Saud Univ - Comput Inf 34(8):6057–6065
Google Scholar
Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inform J 21(3):145–157
Article Google Scholar
Vaissnave V, Deepalakshmi P (2023) Modeling of automated glowworm swarm optimization based deep learning model for legal text summarization. Multimed Tools Appl 82(11):17175–17194
Article Google Scholar
Wang D, Liu P, Zheng Y, Qiu X, Huang X (2020) Heterogeneous graph neural networks for extractive document summarization. Preprint at http://arxiv.org/abs/2004.12393
Xia T, Chen X (2020) A discrete hidden Markov model for SMS spam detection. Appl Sci 10(14):5011
Article CAS Google Scholar
Yadav AK, Maurya AK, Ranvijay, Yadav RS (2021) Extractive text summarization using recent approaches: A survey. Ingénierie des Systèmes d’Information 26(1)
Yadav AK, Ranvijay, Yadav RS, Maurya AK (2023) State-of-the-art approach to extractive text summarization: a comprehensive review. Multimed Tools Appl 1–63
Yadav AK, Saxena S (2016) A new conception of information requisition in web of things. Indian J Sci Technol 9:44
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, 211004, India
Avaneesh Kumar Yadav, Ranvijay, Rama Shankar Yadav & Ashish Kumar Maurya

Authors

Avaneesh Kumar Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Ranvijay
View author publications
You can also search for this author in PubMed Google Scholar
Rama Shankar Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Kumar Maurya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avaneesh Kumar Yadav.

Ethics declarations

Conflict of interest

There is no potential for a conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Text document and reference summary of BBC news articles

Sentences	Sentences of the text document
\(S_1\)	Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes.
\(S_2\)	Paul Stewart and Chris Riddell’s book came top in the category for six to eight year olds and won the award chosen by after-school club members.
\(S_3\)	Sally Grindley’s Spilled Water, about a Chinese girl sold as a servant, was top in vote of readers aged nine to 11.
\(S_4 \)	Biscuit Bear by Mini Grey took the top award in the under-five category.
\(S_5\)	Winners were voted for by about 6,000 children from a shortlist picked by an adult panel.
\(S_6\)	The prize, which is celebrating its \(20^{th}\) year, is billed as the UK’s biggest children’s book award.
\(S_7\)	Fergus Crane includes text by Stewart and illustrations by Riddell, who also created The Edge Chronicles together.
\(S_8\)	As well as the six to eight prizes, it won the 4-Children Special Award voted for by after-school club members.
\(S_9\)	Julia Eccleshare, chair of the adult judging panel, said children’s literature had "never looked stronger" in the prize’s 20 years.
\(S_{10}\)	This award counts because the final choice of winners is made by children, who are the toughest critics of all,” she said.
\(S_{11}\)	This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today.
\(S_{12}\)	Previous winners have included JK Rowling, Jacqueline Wilson and Dick King-Smith.

Sentences	Reference summary of text document
\(S_1\)	Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes.
\(S_2\)	Paul Stewart and Chris Riddell’s book came top in the category for six to eight year olds and won the award chosen by after-school club members.
\(S_6\)	The prize, which is celebrating its \(20^{th}\) year, is billed as the UK’s biggest children’s book award.
\(S_8\)	As well as the six to eight prizes, it won the 4-Children Special Award voted for by after-school club member.
\(S_{11}\)	This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today.

Appendix B: Sample of system generated summary for BBC-news articles dataset from Appendix A text document

Sentences	System generated summary
\(S_1\)	Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes.
\(S_2\)	Paul Stewart and Chris Riddell’s book came top in the category for six- to eight-year-olds and won the award chosen by after-school club members.
\(S_5\)	Winners were voted for by about 6,000 children from a shortlist picked by an adult panel.
\(S_7\)	Fergus Crane includes text by Stewart and illustrations by Riddell, who also created The Edge Chronicles together.
\(S_{11}\)	This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yadav, A.K., Ranvijay, Yadav, R.S. et al. Graph-based extractive text summarization based on single document. Multimed Tools Appl 83, 18987–19013 (2024). https://doi.org/10.1007/s11042-023-16199-8

Download citation

Received: 21 September 2022
Revised: 14 June 2023
Accepted: 04 July 2023
Published: 25 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-16199-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph-based extractive text summarization based on single document

Abstract

Access this article

Similar content being viewed by others

Automated identification of media bias in news articles: an interdisciplinary literature review

Recent automatic text summarization techniques: a survey

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Text document and reference summary of BBC news articles

Appendix B: Sample of system generated summary for BBC-news articles dataset from Appendix A text document

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Graph-based extractive text summarization based on single document

Abstract

Access this article

Similar content being viewed by others

Automated identification of media bias in news articles: an interdisciplinary literature review

Recent automatic text summarization techniques: a survey

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Text document and reference summary of BBC news articles

Appendix B: Sample of system generated summary for BBC-news articles dataset from Appendix A text document

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation