Frequent item-set mining and clustering based ranked biomedical text summarization

Gupta, Supriya; Sharaff, Aakanksha; Nagwani, Naresh Kumar

doi:10.1007/s11227-022-04578-1

Frequent item-set mining and clustering based ranked biomedical text summarization

Published: 04 July 2022

Volume 79, pages 139–159, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

407 Accesses
6 Citations
Explore all metrics

Abstract

The difficulty of deriving value out of vast available scientific literature in a condensed form lead us to look for a proficient theme based summarization solution which can preserve precise biomedical content. The study targets to analyze impact of combining semantic biomedical concepts extraction, frequent item-set mining and clustering techniques over information retention, objective functions and ROUGE values for the obtained final summary. The suggested frequent item-set mining and clustering (FRI-CL) graph-based framework uses UMLS metathesarus and BERT-based semantic embeddings to identify domain-relevant concepts. The scrutinized concepts are mined according to their relationship with neighbors and frequency via an amended FP-Growth model. The framework utilizes S-DPMM clustering, which is a probabilistic mixture model and aids in the identification and clubbing of complex relevant patterns to increase coverage of important sub-themes. The sentences with the frequent concepts are scored via PageRank to form an efficient and compelling summary. The research experiments on the 100 sample biomedical documents taken from PubMed archives are evaluated via calculation of ROUGE scores, coverage, readability, non-redundancy, memory utilization and information retention from the summary output. The results with the FRI-CL summarization system showcased 10% ROUGE performance improvement and are at par with the other baseline methods. On an average 30–40% improvement in memory utilization is observed with up to 50% information retention when experiments are performed using S-DPMM clustering. The research indicates that the fusion of semantic mapping, clustering, along with frequent-item set mining of biomedical concepts enhance the overall co-related information covering all sub-themes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining

G-Bean: an ontology-graph based web tool for biomedical literature retrieval

Article Open access 06 November 2014

An Efficient Tool for Semantic Biomedical Document Analysis

Data availability

If any request we can provide data.

References

PubMed (2017) https://ncbi.nlm.nih.gov/pubmed/ Accessed 12 Apr 2017
Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J et al (2014) Textsummarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467
Article Google Scholar
Afantenos S, Karkaletsis V, Stamatopoulos P (2005) Summarization from medicaldocuments: a survey. Artif Intell Med 33(2):157–177
Article Google Scholar
Fleuren WWM, Alkema W (2015) Application of text mining in the biomedical domain. Methods 74:97–106
Article Google Scholar
Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481
Article Google Scholar
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Article Google Scholar
Yao J-G, Wan X, Xiao J (2017) Recent advances in document summarization. Knowl Inform Syst 53(2):297–336
Article Google Scholar
Reeve L, Han H, Brooks AD (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, pp 180–184
Reeve LH, Han H, Brooks AD (2007) The use of domain-specific concepts in biomedical text summarization. Inf Process Manag 43(6):1765–1776
Article Google Scholar
Plaza L, Díaz A, Gervás P (2011) A semantic graph-based approach to biomedical summarization. Artif Intell Med 53(1):1–14
Article Google Scholar
Davoodijam E, Ghadiri N, LotfiShahreza M, Rinaldi F (2021) MultiGBS: a multi-layer graph approach to biomedical summarization. J Biomed Inf 116:103706
Article Google Scholar
Agrawal R, Imielinski T (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216
Article Google Scholar
Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-09549-3
Article Google Scholar
Nelson SJ, Powell T, Humphreys BL (2002) The unified medical language system (UMLSs) project, in encyclopedia of library and information science, 3rd edn. CRC Press, Florida
Google Scholar
LinCY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out. Post-conference workshop of ACL. pp 74–81
Hovy E (2005) Automated text summarization. The Oxford handbook of computational linguistics. Oxford University Press, Oxford, pp 583–598
Google Scholar
Wafaa S, El-KassasaCherif R, Salamaab Ahmed A, RafeabHoda A, Mohameda K (2021) Automatic text summarization: a comprehensive survey. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113679
Article Google Scholar
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268
Google Scholar
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41
Article Google Scholar
Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC (2009) Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 42(5):801–813
Article Google Scholar
Ding D, Karabatsos G (2021) Process mixture models with shrinkage prior. Stat. https://doi.org/10.1002/sta4.3
Article MathSciNet Google Scholar
Brandow R, Mitze K, Rau LF (1995) Automatic condensation of electronic publicationsby sentence selection. Inf Process Manag 31(5):675–685
Article Google Scholar
Anton H (1994) Elementary linear algebra. Wiley, New Jersey
MATH Google Scholar
Jaccard P (1901) Etude de la distribution floraledansune portion des Alpes et du Jura. Bull Soc Vaud Des Sci Nat 37:547–579
Google Scholar
Singhal A (2001) Modern information retrieval: a brief overview. IEEE Comput Soc Tech Comm Data Eng 24:35–42
Google Scholar
Radev DR, Jing H, Budzikowska M (2000) Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluation, and user studies. Inf Process Manag 40(10):919–938
Google Scholar
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28:11–21
Article Google Scholar
Erkan G, Radev DR (2004) Lexrank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, California
Google Scholar
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts, proceedings of EMNLP, vol 85. pp 404–411
Baralis E,Cagliero L, Jabeen S, Fiori A (2012) Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th annual ACM Symposium on Applied Computing, pp 782–786
Baralis E, Cagliero L, Fiori A, Garza P (2015) MWI-Sum: a multilingual summarizer based on frequent weighted item sets. ACM Trans Inf Syst 34:1–35
Article Google Scholar
Qiang JP, Chen P, Ding W, Xie F, Wu X (2016) Multi-document summarization using closed patterns. Knowl-Based Syst 99:28–38
Article Google Scholar
Dzuganova B (2013) English medical terminology–different ways of forming medical terms. JAHR Eur J Bioeth 4:55–69
Google Scholar
Moradi M, Ghadiri N (2017) Quantifying the informativeness for biomedical literature summarization: an item-set mining method. Comput Methods Program Biomed 146:77–89
Article Google Scholar
Shortliffe EH, Cimino JJ (2014) Biomedical informatics: computer applications in health care and biomedicine, 4th ed. Springer, London
Book Google Scholar
Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modelling. Expert Syst Appl 172:114652
Article Google Scholar
National B, Us M (2009) UMLS Rreference manual. Health (San Francisco)
Ordonez C, Ezquerra N, Santana CA (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):1–2
Article Google Scholar
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(90001):D267–D270
Article Google Scholar
Plaza L, Carrillo-de-Albornoz J (2013) Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization. BMC Bioinform 14(1):71
Article Google Scholar
Nigam K, McCullam A, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled document using em. Mach Learn 39(2/3):103–134
Article MATH Google Scholar
Jones KS, Galliers JR (1996) evaluating natural language processing systems: an analysis and review, vol 228. Springer, New York
Google Scholar
Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109
Article MathSciNet Google Scholar
SweSum (2017) Automatic text summarizer. http://swesum.nada.kth.se/index-engadv. Accessed 15 Mar 2017
Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization an itemset mining and sentence clustering approach. J Biomed Inf 84:1532–2464
Google Scholar
Rouane O, Belhadef H, Bouakkaz M (2019) Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst Appl J 135:362–373
Article Google Scholar
Moradi M (2018) CIBS a biomedical text summarizer using topic-based sentence clustering. J Biomed Inf 88:53–61
Article Google Scholar
Janaki Raman K, Meenakshi K (2021) Automatic text summarization of article (NEWS) using lexical chains and wordnet—a review. Artif Intell Tech Adv Comput Appl. https://doi.org/10.1007/978-981-15-5329-5_26
Article Google Scholar

Download references

Acknowledgements

No Acknowledgements

Funding

No funding is involved in this work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, India
Supriya Gupta, Aakanksha Sharaff & Naresh Kumar Nagwani

Authors

Supriya Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Aakanksha Sharaff
View author publications
You can also search for this author in PubMed Google Scholar
Naresh Kumar Nagwani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

There is no authorship contribution.

Corresponding author

Correspondence to Supriya Gupta.

Ethics declarations

Conflict of interest

Conflict of interest is not applicable in this work.

Ethical approval

No participation of humans takes place in this implementation process.

Human and animal rights

No violation of Human and Animal Rights is involved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, S., Sharaff, A. & Nagwani, N.K. Frequent item-set mining and clustering based ranked biomedical text summarization. J Supercomput 79, 139–159 (2023). https://doi.org/10.1007/s11227-022-04578-1

Download citation

Accepted: 30 April 2022
Published: 04 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11227-022-04578-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Frequent item-set mining and clustering based ranked biomedical text summarization

Abstract

Access this article

Similar content being viewed by others

A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining

G-Bean: an ontology-graph based web tool for biomedical literature retrieval

An Efficient Tool for Semantic Biomedical Document Analysis

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Frequent item-set mining and clustering based ranked biomedical text summarization

Abstract

Access this article

Similar content being viewed by others

A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining

G-Bean: an ontology-graph based web tool for biomedical literature retrieval

An Efficient Tool for Semantic Biomedical Document Analysis

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation