Abstract
Day by day, the amount of online and offline text data is growing tremendously from various sources like legal documents, medical documents, news articles, etc. Manual text summarization of large documents is unfeasible and costly because it takes much time and requires more effort. As a consequence, various graph-based text summarization techniques have been designed which provide thoroughly and well-prepared summaries of documents. The problems issues that exist in these techniques are redundancy of data, loss of information and readability. To overcome these problems, we have proposed a textual graph-based extractive text summarization technique called TGETS, for extracting essential information from a single document. In the proposed approach, a graph’s node is denoted as group of sentences in the document and an edge of the graph is represented as an association between two sentences. The summary generation is based on the sum of sentence weight and the average weight of the textual graph. The performance of proposed approach is evaluated on the BBC news articles dataset through the ROUGE-metric (\(R_1\) and \(R_2\)). The proposed approach in the range of 100-200 words length summary offers better scores of 19.88%, 38.76%, and 30.73% for \(R_1\) under precision, recall and \(F_1\)-score with respect to the existing PageRank (PR) method. Similarly, for \(R_2\), the proposed approach exceeds by 32%, 26.99%, and 29.01% for precision, recall, and \(F_1\)-score with respect to existing PageRank (PR) method.
Similar content being viewed by others
Data availability
The dataset is available on demand.
References
Alami N, Mallahi ME, Amakdouf H, Qjidaa H (2021) Hybrid method for text summarization based on statistical and semantic treatment. Multimed Tools Appl 80:19567–19600
Al-Sabahi K, Zuping Z, Nadher M (2018) A hierarchical structured self-attentive model for extractive document summarization (HSAAS). IEEE Access 6:24205–24212
Awan MN, Beg MO (2021) Top-rank: a topical position rank for extraction and classification of key phrases in text. Comput Speech Lang 65:101116
Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization: an itemset mining and sentence clustering approach. J Biomed Inform 84:42–58
Bahloul B, Aliane H, Benmohammed M (2020) Ara* summarizer: an Arabic text summarization system based on subtopic segmentation and using an a* algorithm for reduction. Expert Syst 37(2):e12476
Belwal RC, Rai S, Gupta A (2021) A new graph-based extractive text summarization using keywords or topic modeling. J Ambient Intell Humaniz Comput 12(10):8975–8990
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst Appl 41(2):535–543
Chen Y-N, Huang Y, Yeh C-F, Lee L-S (2011) Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms. In: Twelfth Annual Conference of the International Speech Communication Association
Demange M, Di Fonso A, Di Stefano G, Vittorini P (2022) A graph theoretical approach to the firebreak locating problem. Theoret Comput Sci 914:47–72
Elbarougy R, Behery G, El Khatib A (2020) Extractive Arabic text summarization using modified pagerank algorithm. Egypt Inform J 21(2):73–81
El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) Edgesumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264
Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195
Fattah MA, Ren F (2009) Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Comput Speech Lang 23(1):126–144
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66
Gambhir M, Gupta V (2022) Deep learning-based extractive text summarization with word-level attention mechanism. Multimed Tools Appl 81(15):20829–20852
Gupta VK, Siddiqui TJ (2012) Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI). IEEE, pp 1–5
Hachey B, Murray G, Reitter D (2005) The Embra system at DUC 2005: query-oriented multi-document summarization with a very large latent semantic space. In: Proceedings of the Document Understanding Conference (DUC) 2005, Vancouver, BC, Canada
Jaradat YA, Al-Taani AT (2016) Hybrid-based Arabic single-document text summarization approach using genatic algorithm. In: 2016 7th International Conference on Information and Communication Systems (ICICS). IEEE, pp 85–91
Knight K, Marcu D (2002) Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artif Intell 139(1):91–107
Knor M, Škrekovski R, Yero IG (2022) A note on the metric and edge metric dimensions of 2-connected graphs. Discret Appl Math 319:454–460
Korenius T, Laurikkala J, Järvelin K, Juhola M (2004) Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the thirteenth ACM international Conference on Information and Knowledge Management. pp 625–633
Krahmer E, Marsi E, van Pelt P (2008) Query-based sentence fusion is better defined and leads to more preferred results than generic sentence fusion. In: Proceedings of ACL-08: HLT, Short Papers. pp 193–196
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37:1–41
Mahalleh ER, Gharehchopogh FS (2022) An automatic text summarization based on valuable sentences selection. Int J Inf Technol 14(6):2963–2969
Mallick C, Das AK, Dutta M, Das AK, Sarkar A (2019) Graph-based text summarization using modified textrank. In: Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018. Springer, pp 137–146
Medelyan O (2007) Computing lexical chains with graph clustering. In: Proceedings of the ACL 2007 Student Research Workshop. pp 85–90
Miao L, Cao D, Li J, Guan W (2020) Multi-modal product title compression. Inf Process Manag 57(1):102123
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. pp 404–411
Mihalcea R, Tarau P (2005) A language independent algorithm for single and multiple document summarization. In: Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts
Moawad IF, Aref M (2012) Semantic graph reduction approach for abstractive text summarization. In: 2012 Seventh International Conference on Computer Engineering & Systems (ICCES). IEEE, pp 132–138
Moratanch N, Chitrakala S (2023) Anaphora resolved abstractive text summarization (AR-ATS) system. Multimed Tools Appl 82(3):4569–4597
Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359
Nallapati R, Zhou B, Gulcehre C, Xiang B et al (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. Preprint at http://arxiv.org/abs/1602.06023
Nandhini K, Balasundaram SR (2013) Improving readability through extractive summarization for learners with reading difficulties. Egypt Inform J 14(3):195–204
Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Information Processing & Management 56(6):102088
Parveen D, Ramsl H-M, Strube M (2015) Topical coherence for graph-based extractive summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp 1949–1954
Plisson J, Lavrac N, Mladenic D et al (2004) A rule based approach to word lemmatization. In: Proceedings of IS, vol 3. pp 83–86
Rani R, Lobiyal D (2021) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl 80:3275–3305
Roul RK (2021) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput 25:1113–1127
Sahoo D, Balabantaray R, Phukon M, Saikia S (2016) Aspect based multi-document summarization. In: 2016 International Conference on Computing, Communication and Automation (ICCCA). IEEE, pp 873–877
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207
Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78:857–875
Srivastava AK, Pandey D, Agarwal A (2021) Extractive multi-document text summarization using dolphin swarm optimization approach. Multimed Tools Appl 80:11273–11290
Srivastava R, Singh P, Rana K, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636
Steinberger J, Ježek K (2009) Update summarization based on latent semantic analysis. In: Text, Speech and Dialogue: 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings 12, Springer, pp 77–84
Tomer M, Kumar M (2022) Multi-document extractive text summarization based on firefly algorithm. J King Saud Univ - Comput Inf 34(8):6057–6065
Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inform J 21(3):145–157
Vaissnave V, Deepalakshmi P (2023) Modeling of automated glowworm swarm optimization based deep learning model for legal text summarization. Multimed Tools Appl 82(11):17175–17194
Wang D, Liu P, Zheng Y, Qiu X, Huang X (2020) Heterogeneous graph neural networks for extractive document summarization. Preprint at http://arxiv.org/abs/2004.12393
Xia T, Chen X (2020) A discrete hidden Markov model for SMS spam detection. Appl Sci 10(14):5011
Yadav AK, Maurya AK, Ranvijay, Yadav RS (2021) Extractive text summarization using recent approaches: A survey. Ingénierie des Systèmes d’Information 26(1)
Yadav AK, Ranvijay, Yadav RS, Maurya AK (2023) State-of-the-art approach to extractive text summarization: a comprehensive review. Multimed Tools Appl 1–63
Yadav AK, Saxena S (2016) A new conception of information requisition in web of things. Indian J Sci Technol 9:44
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no potential for a conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Text document and reference summary of BBC news articles
Sentences | Sentences of the text document |
---|---|
\(S_1\) | Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes. |
\(S_2\) | Paul Stewart and Chris Riddell’s book came top in the category for six to eight year olds and won the award chosen by after-school club members. |
\(S_3\) | Sally Grindley’s Spilled Water, about a Chinese girl sold as a servant, was top in vote of readers aged nine to 11. |
\(S_4 \) | Biscuit Bear by Mini Grey took the top award in the under-five category. |
\(S_5\) | Winners were voted for by about 6,000 children from a shortlist picked by an adult panel. |
\(S_6\) | The prize, which is celebrating its \(20^{th}\) year, is billed as the UK’s biggest children’s book award. |
\(S_7\) | Fergus Crane includes text by Stewart and illustrations by Riddell, who also created The Edge Chronicles together. |
\(S_8\) | As well as the six to eight prizes, it won the 4-Children Special Award voted for by after-school club members. |
\(S_9\) | Julia Eccleshare, chair of the adult judging panel, said children’s literature had "never looked stronger" in the prize’s 20 years. |
\(S_{10}\) | This award counts because the final choice of winners is made by children, who are the toughest critics of all,” she said. |
\(S_{11}\) | This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today. |
\(S_{12}\) | Previous winners have included JK Rowling, Jacqueline Wilson and Dick King-Smith. |
Sentences | Reference summary of text document |
---|---|
\(S_1\) | Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes. |
\(S_2\) | Paul Stewart and Chris Riddell’s book came top in the category for six to eight year olds and won the award chosen by after-school club members. |
\(S_6\) | The prize, which is celebrating its \(20^{th}\) year, is billed as the UK’s biggest children’s book award. |
\(S_8\) | As well as the six to eight prizes, it won the 4-Children Special Award voted for by after-school club member. |
\(S_{11}\) | This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today. |
Appendix B: Sample of system generated summary for BBC-news articles dataset from Appendix A text document
Sentences | System generated summary |
---|---|
\(S_1\) | Young book fans have voted Fergus Crane, a story about a boy who is taken on an adventure by a flying horse, the winner of two Smarties Book Prizes. |
\(S_2\) | Paul Stewart and Chris Riddell’s book came top in the category for six- to eight-year-olds and won the award chosen by after-school club members. |
\(S_5\) | Winners were voted for by about 6,000 children from a shortlist picked by an adult panel. |
\(S_7\) | Fergus Crane includes text by Stewart and illustrations by Riddell, who also created The Edge Chronicles together. |
\(S_{11}\) | This year’s young judges chose the winners from an exceptionally strong and varied shortlist which showcases the very best in children’s books today. |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yadav, A.K., Ranvijay, Yadav, R.S. et al. Graph-based extractive text summarization based on single document. Multimed Tools Appl 83, 18987–19013 (2024). https://doi.org/10.1007/s11042-023-16199-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16199-8