Skip to main content
Log in

Frequent item-set mining and clustering based ranked biomedical text summarization

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The difficulty of deriving value out of vast available scientific literature in a condensed form lead us to look for a proficient theme based summarization solution which can preserve precise biomedical content. The study targets to analyze impact of combining semantic biomedical concepts extraction, frequent item-set mining and clustering techniques over information retention, objective functions and ROUGE values for the obtained final summary. The suggested frequent item-set mining and clustering (FRI-CL) graph-based framework uses UMLS metathesarus and BERT-based semantic embeddings to identify domain-relevant concepts. The scrutinized concepts are mined according to their relationship with neighbors and frequency via an amended FP-Growth model. The framework utilizes S-DPMM clustering, which is a probabilistic mixture model and aids in the identification and clubbing of complex relevant patterns to increase coverage of important sub-themes. The sentences with the frequent concepts are scored via PageRank to form an efficient and compelling summary. The research experiments on the 100 sample biomedical documents taken from PubMed archives are evaluated via calculation of ROUGE scores, coverage, readability, non-redundancy, memory utilization and information retention from the summary output. The results with the FRI-CL summarization system showcased 10% ROUGE performance improvement and are at par with the other baseline methods. On an average 30–40% improvement in memory utilization is observed with up to 50% information retention when experiments are performed using S-DPMM clustering. The research indicates that the fusion of semantic mapping, clustering, along with frequent-item set mining of biomedical concepts enhance the overall co-related information covering all sub-themes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

If any request we can provide data.

References

  1. PubMed (2017) https://ncbi.nlm.nih.gov/pubmed/ Accessed 12 Apr 2017

  2. Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J et al (2014) Textsummarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467

    Article  Google Scholar 

  3. Afantenos S, Karkaletsis V, Stamatopoulos P (2005) Summarization from medicaldocuments: a survey. Artif Intell Med 33(2):157–177

    Article  Google Scholar 

  4. Fleuren WWM, Alkema W (2015) Application of text mining in the biomedical domain. Methods 74:97–106

    Article  Google Scholar 

  5. Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481

    Article  Google Scholar 

  6. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66

    Article  Google Scholar 

  7. Yao J-G, Wan X, Xiao J (2017) Recent advances in document summarization. Knowl Inform Syst 53(2):297–336

    Article  Google Scholar 

  8. Reeve L, Han H, Brooks AD (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, pp 180–184

  9. Reeve LH, Han H, Brooks AD (2007) The use of domain-specific concepts in biomedical text summarization. Inf Process Manag 43(6):1765–1776

    Article  Google Scholar 

  10. Plaza L, Díaz A, Gervás P (2011) A semantic graph-based approach to biomedical summarization. Artif Intell Med 53(1):1–14

    Article  Google Scholar 

  11. Davoodijam E, Ghadiri N, LotfiShahreza M, Rinaldi F (2021) MultiGBS: a multi-layer graph approach to biomedical summarization. J Biomed Inf 116:103706

    Article  Google Scholar 

  12. Agrawal R, Imielinski T (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216

    Article  Google Scholar 

  13. Rani R, Lobiyal DK (2020) An extractive text summarization approach using tagged-LDA based topic modeling. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-09549-3

    Article  Google Scholar 

  14. Nelson SJ, Powell T, Humphreys BL (2002) The unified medical language system (UMLSs) project, in encyclopedia of library and information science, 3rd edn. CRC Press, Florida

    Google Scholar 

  15. LinCY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out. Post-conference workshop of ACL. pp 74–81

  16. Hovy E (2005) Automated text summarization. The Oxford handbook of computational linguistics. Oxford University Press, Oxford, pp 583–598

    Google Scholar 

  17. Wafaa S, El-KassasaCherif R, Salamaab Ahmed A, RafeabHoda A, Mohameda K (2021) Automatic text summarization: a comprehensive survey. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113679

    Article  Google Scholar 

  18. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    Article  MathSciNet  Google Scholar 

  19. Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268

    Google Scholar 

  20. Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41

    Article  Google Scholar 

  21. Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC (2009) Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 42(5):801–813

    Article  Google Scholar 

  22. Ding D, Karabatsos G (2021) Process mixture models with shrinkage prior. Stat. https://doi.org/10.1002/sta4.3

    Article  MathSciNet  Google Scholar 

  23. Brandow R, Mitze K, Rau LF (1995) Automatic condensation of electronic publicationsby sentence selection. Inf Process Manag 31(5):675–685

    Article  Google Scholar 

  24. Anton H (1994) Elementary linear algebra. Wiley, New Jersey

    MATH  Google Scholar 

  25. Jaccard P (1901) Etude de la distribution floraledansune portion des Alpes et du Jura. Bull Soc Vaud Des Sci Nat 37:547–579

    Google Scholar 

  26. Singhal A (2001) Modern information retrieval: a brief overview. IEEE Comput Soc Tech Comm Data Eng 24:35–42

    Google Scholar 

  27. Radev DR, Jing H, Budzikowska M (2000) Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluation, and user studies. Inf Process Manag 40(10):919–938

    Google Scholar 

  28. Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28:11–21

    Article  Google Scholar 

  29. Erkan G, Radev DR (2004) Lexrank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  30. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, California

    Google Scholar 

  31. Mihalcea R, Tarau P (2004) TextRank: bringing order into texts, proceedings of EMNLP, vol 85. pp 404–411

  32. Baralis E,Cagliero L, Jabeen S, Fiori A (2012) Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th annual ACM Symposium on Applied Computing, pp 782–786

  33. Baralis E, Cagliero L, Fiori A, Garza P (2015) MWI-Sum: a multilingual summarizer based on frequent weighted item sets. ACM Trans Inf Syst 34:1–35

    Article  Google Scholar 

  34. Qiang JP, Chen P, Ding W, Xie F, Wu X (2016) Multi-document summarization using closed patterns. Knowl-Based Syst 99:28–38

    Article  Google Scholar 

  35. Dzuganova B (2013) English medical terminology–different ways of forming medical terms. JAHR Eur J Bioeth 4:55–69

    Google Scholar 

  36. Moradi M, Ghadiri N (2017) Quantifying the informativeness for biomedical literature summarization: an item-set mining method. Comput Methods Program Biomed 146:77–89

    Article  Google Scholar 

  37. Shortliffe EH, Cimino JJ (2014) Biomedical informatics: computer applications in health care and biomedicine, 4th ed. Springer, London

    Book  Google Scholar 

  38. Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modelling. Expert Syst Appl 172:114652

    Article  Google Scholar 

  39. National B, Us M (2009) UMLS Rreference manual. Health (San Francisco)

  40. Ordonez C, Ezquerra N, Santana CA (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):1–2

    Article  Google Scholar 

  41. Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(90001):D267–D270

    Article  Google Scholar 

  42. Plaza L, Carrillo-de-Albornoz J (2013) Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization. BMC Bioinform 14(1):71

    Article  Google Scholar 

  43. Nigam K, McCullam A, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled document using em. Mach Learn 39(2/3):103–134

    Article  MATH  Google Scholar 

  44. Jones KS, Galliers JR (1996) evaluating natural language processing systems: an analysis and review, vol 228. Springer, New York

    Google Scholar 

  45. Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109

    Article  MathSciNet  Google Scholar 

  46. SweSum (2017) Automatic text summarizer. http://swesum.nada.kth.se/index-engadv. Accessed 15 Mar 2017

  47. Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization an itemset mining and sentence clustering approach. J Biomed Inf 84:1532–2464

    Google Scholar 

  48. Rouane O, Belhadef H, Bouakkaz M (2019) Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst Appl J 135:362–373

    Article  Google Scholar 

  49. Moradi M (2018) CIBS a biomedical text summarizer using topic-based sentence clustering. J Biomed Inf 88:53–61

    Article  Google Scholar 

  50. Janaki Raman K, Meenakshi K (2021) Automatic text summarization of article (NEWS) using lexical chains and wordnet—a review. Artif Intell Tech Adv Comput Appl. https://doi.org/10.1007/978-981-15-5329-5_26

    Article  Google Scholar 

Download references

Acknowledgements

No Acknowledgements

Funding

No funding is involved in this work.

Author information

Authors and Affiliations

Authors

Contributions

There is no authorship contribution.

Corresponding author

Correspondence to Supriya Gupta.

Ethics declarations

Conflict of interest

Conflict of interest is not applicable in this work.

Ethical approval

No participation of humans takes place in this implementation process.

Human and animal rights

No violation of Human and Animal Rights is involved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Sharaff, A. & Nagwani, N.K. Frequent item-set mining and clustering based ranked biomedical text summarization. J Supercomput 79, 139–159 (2023). https://doi.org/10.1007/s11227-022-04578-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04578-1

Keywords

Navigation