Skip to main content
Log in

Graph-based extractive text summarization based on single document

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Day by day, the amount of online and offline text data is growing tremendously from various sources like legal documents, medical documents, news articles, etc. Manual text summarization of large documents is unfeasible and costly because it takes much time and requires more effort. As a consequence, various graph-based text summarization techniques have been designed which provide thoroughly and well-prepared summaries of documents. The problems issues that exist in these techniques are redundancy of data, loss of information and readability. To overcome these problems, we have proposed a textual graph-based extractive text summarization technique called TGETS, for extracting essential information from a single document. In the proposed approach, a graph’s node is denoted as group of sentences in the document and an edge of the graph is represented as an association between two sentences. The summary generation is based on the sum of sentence weight and the average weight of the textual graph. The performance of proposed approach is evaluated on the BBC news articles dataset through the ROUGE-metric (\(R_1\) and \(R_2\)). The proposed approach in the range of 100-200 words length summary offers better scores of 19.88%, 38.76%, and 30.73% for \(R_1\) under precision, recall and \(F_1\)-score with respect to the existing PageRank (PR) method. Similarly, for \(R_2\), the proposed approach exceeds by 32%, 26.99%, and 29.01% for precision, recall, and \(F_1\)-score with respect to existing PageRank (PR) method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The dataset is available on demand.

References

  1. Alami N, Mallahi ME, Amakdouf H, Qjidaa H (2021) Hybrid method for text summarization based on statistical and semantic treatment. Multimed Tools Appl 80:19567–19600

    Article  Google Scholar 

  2. Al-Sabahi K, Zuping Z, Nadher M (2018) A hierarchical structured self-attentive model for extractive document summarization (HSAAS). IEEE Access 6:24205–24212

    Article  Google Scholar 

  3. Awan MN, Beg MO (2021) Top-rank: a topical position rank for extraction and classification of key phrases in text. Comput Speech Lang 65:101116

    Article  Google Scholar 

  4. Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization: an itemset mining and sentence clustering approach. J Biomed Inform 84:42–58

    Article  Google Scholar 

  5. Bahloul B, Aliane H, Benmohammed M (2020) Ara* summarizer: an Arabic text summarization system based on subtopic segmentation and using an a* algorithm for reduction. Expert Syst 37(2):e12476

    Article  Google Scholar 

  6. Belwal RC, Rai S, Gupta A (2021) A new graph-based extractive text summarization using keywords or topic modeling. J Ambient Intell Humaniz Comput 12(10):8975–8990

    Article  Google Scholar 

  7. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  8. Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41(2):535–543

    Article  Google Scholar 

  9. Chen Y-N, Huang Y, Yeh C-F, Lee L-S (2011) Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms. In: Twelfth Annual Conference of the International Speech Communication Association

  10. Demange M, Di Fonso A, Di Stefano G, Vittorini P (2022) A graph theoretical approach to the firebreak locating problem. Theoret Comput Sci 914:47–72

    Article  MathSciNet  Google Scholar 

  11. Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified pagerank algorithm. Egypt Inform J 21(2):73–81

    Article  Google Scholar 

  12. El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) Edgesumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264

    Article  Google Scholar 

  13. Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  14. Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195

    Article  Google Scholar 

  15. Fattah MA, Ren F (2009) Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput Speech Lang 23(1):126–144

    Article  Google Scholar 

  16. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66

    Article  Google Scholar 

  17. Gambhir M, Gupta V (2022) Deep learning-based extractive text summarization with word-level attention mechanism. Multimed Tools Appl 81(15):20829–20852

    Article  Google Scholar 

  18. Gupta VK, Siddiqui TJ (2012) Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI). IEEE, pp 1–5

  19. Hachey B, Murray G, Reitter D (2005) The Embra system at DUC 2005: query-oriented multi-document summarization with a very large latent semantic space. In: Proceedings of the Document Understanding Conference (DUC) 2005, Vancouver, BC, Canada

  20. Jaradat YA, Al-Taani AT (2016) Hybrid-based Arabic single-document text summarization approach using genatic algorithm. In: 2016 7th International Conference on Information and Communication Systems (ICICS). IEEE, pp 85–91

  21. Knight K, Marcu D (2002) Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artif Intell 139(1):91–107

    Article  Google Scholar 

  22. Knor M, Škrekovski R, Yero IG (2022) A note on the metric and edge metric dimensions of 2-connected graphs. Discret Appl Math 319:454–460

    Article  MathSciNet  Google Scholar 

  23. Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international Conference on Information and Knowledge Management. pp 625–633

  24. Krahmer E, Marsi E, van Pelt P (2008) Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusion. In: Proceedings of ACL-08: HLT, Short Papers. pp 193–196

  25. Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37:1–41

    Article  Google Scholar 

  26. Mahalleh ER, Gharehchopogh FS (2022) An automatic text summarization based on valuable sentences selection. Int J Inf Technol 14(6):2963–2969

    Google Scholar 

  27. Mallick C, Das AK, Dutta M, Das AK, Sarkar A (2019) Graph-based text summarization using modified textrank. In: Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018. Springer, pp 137–146

  28. Medelyan O (2007) Computing lexical chains with graph clustering. In: Proceedings of the ACL 2007 Student Research Workshop. pp 85–90

  29. Miao L, Cao D, Li J, Guan W (2020) Multi-modal product title compression. Inf Process Manag 57(1):102123

    Article  Google Scholar 

  30. Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. pp 404–411

  31. Mihalcea R, Tarau P (2005) A language independent algorithm for single and multiple document summarization. In: Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

  32. Moawad IF, Aref M (2012) Semantic graph reduction approach for abstractive text summarization. In: 2012 Seventh International Conference on Computer Engineering & Systems (ICCES). IEEE, pp 132–138

  33. Moratanch N, Chitrakala S (2023) Anaphora resolved abstractive text summarization (AR-ATS) system. Multimed Tools Appl 82(3):4569–4597

    Article  Google Scholar 

  34. Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359

    Article  Google Scholar 

  35. Nallapati R, Zhou B, Gulcehre C, Xiang B et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. Preprint at http://arxiv.org/abs/1602.06023

  36. Nandhini K, Balasundaram SR (2013) Improving readability through extractive summarization for learners with reading difficulties. Egypt Inform J 14(3):195–204

    Article  Google Scholar 

  37. Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Information Processing & Management 56(6):102088

    Article  Google Scholar 

  38. Parveen D, Ramsl H-M, Strube M (2015) Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp 1949–1954

  39. Plisson J, Lavrac N, Mladenic D et al (2004) A rule based approach to word lemmatization. In: Proceedings of IS, vol 3. pp 83–86

  40. Rani R, Lobiyal D (2021) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80:3275–3305

    Article  Google Scholar 

  41. Roul RK (2021) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput 25:1113–1127

    Article  Google Scholar 

  42. Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect based multi-document summarization. In: 2016 International Conference on Computing, Communication and Automation (ICCCA). IEEE, pp 873–877

  43. Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207

    Article  Google Scholar 

  44. Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78:857–875

    Article  Google Scholar 

  45. Srivastava AK, Pandey D, Agarwal A (2021) Extractive multi-document text summarization using dolphin swarm optimization approach. Multimed Tools Appl 80:11273–11290

    Article  Google Scholar 

  46. Srivastava R, Singh P, Rana K, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636

    Article  Google Scholar 

  47. Steinberger J, Ježek K (2009) Update summarization based on latent semantic analysis. In: Text, Speech and Dialogue: 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings 12, Springer, pp 77–84

  48. Tomer M, Kumar M (2022) Multi-document extractive text summarization based on firefly algorithm. J King Saud Univ - Comput Inf 34(8):6057–6065

    Google Scholar 

  49. Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inform J 21(3):145–157

    Article  Google Scholar 

  50. Vaissnave V, Deepalakshmi P (2023) Modeling of automated glowworm swarm optimization based deep learning model for legal text summarization. Multimed Tools Appl 82(11):17175–17194

    Article  Google Scholar 

  51. Wang D, Liu P, Zheng Y, Qiu X, Huang X (2020) Heterogeneous graph neural networks for extractive document summarization. Preprint at http://arxiv.org/abs/2004.12393

  52. Xia T, Chen X (2020) A discrete hidden Markov model for SMS spam detection. Appl Sci 10(14):5011

    Article  CAS  Google Scholar 

  53. Yadav AK, Maurya AK, Ranvijay, Yadav RS (2021) Extractive text summarization using recent approaches: A survey. Ingénierie des Systèmes d’Information 26(1)

  54. Yadav AK, Ranvijay, Yadav RS, Maurya AK (2023) State-of-the-art approach to extractive text summarization: a comprehensive review. Multimed Tools Appl 1–63

  55. Yadav AK, Saxena S (2016) A new conception of information requisition in web of things. Indian J Sci Technol 9:44

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avaneesh Kumar Yadav.

Ethics declarations

Conflict of interest

There is no potential for a conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Text document and reference summary of BBC news articles

Sentences

Sentences of the text document

\(S_1\)

Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes.

\(S_2\)

Paul Stewart and Chris Riddell’s book came top in the category for six to eight year olds and won the award chosen by after-school club members.

\(S_3\)

Sally Grindley’s Spilled Water, about a Chinese girl sold as a servant, was top in vote of readers aged nine to 11.

\(S_4 \)

Biscuit Bear by Mini Grey took the top award in the under-five category.

\(S_5\)

Winners were voted for by about 6,000 children from a shortlist picked by an adult panel.

\(S_6\)

The prize, which is celebrating its \(20^{th}\) year, is billed as the UK’s biggest children’s book award.

\(S_7\)

Fergus Crane includes text by Stewart and illustrations by Riddell, who also created The Edge Chronicles together.

\(S_8\)

As well as the six to eight prizes, it won the 4-Children Special Award voted for by after-school club members.

\(S_9\)

Julia Eccleshare, chair of the adult judging panel, said children’s literature had "never looked stronger" in the prize’s 20 years.

\(S_{10}\)

This award counts because the final choice of winners is made by children, who are the toughest critics of all,” she said.

\(S_{11}\)

This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today.

\(S_{12}\)

Previous winners have included JK Rowling, Jacqueline Wilson and Dick King-Smith.

Sentences

Reference summary of text document

\(S_1\)

Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes.

\(S_2\)

Paul Stewart and Chris Riddell’s book came top in the category for six to eight year olds and won the award chosen by after-school club members.

\(S_6\)

The prize, which is celebrating its \(20^{th}\) year, is billed as the UK’s biggest children’s book award.

\(S_8\)

As well as the six to eight prizes, it won the 4-Children Special Award voted for by after-school club member.

\(S_{11}\)

This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today.

Appendix B: Sample of system generated summary for BBC-news articles dataset from Appendix A text document

Sentences

System generated summary

\(S_1\)

Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes.

\(S_2\)

Paul Stewart and Chris Riddell’s book came top in the category for six- to eight-year-olds and won the award chosen by after-school club members.

\(S_5\)

Winners were voted for by about 6,000 children from a shortlist picked by an adult panel.

\(S_7\)

Fergus Crane includes text by Stewart and illustrations by Riddell, who also created The Edge Chronicles together.

\(S_{11}\)

This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yadav, A.K., Ranvijay, Yadav, R.S. et al. Graph-based extractive text summarization based on single document. Multimed Tools Appl 83, 18987–19013 (2024). https://doi.org/10.1007/s11042-023-16199-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16199-8

Keywords

Navigation