pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts

Rani, Jyoti; Shah, Ab Rauf; Ramachandran, Srinivasan

doi:10.1007/s12038-015-9552-2

pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts

Published: 29 September 2015

Volume 40, pages 671–682, (2015)
Cite this article

Journal of Biosciences Aims and scope Submit manuscript

Jyoti Rani¹,
Ab Rauf Shah¹ &
Srinivasan Ramachandran¹

3741 Accesses
54 Citations
3 Altmetric
Explore all metrics

Abstract

The PubMed literature database is a valuable source of information for scientific research. It is rich in biomedical literature with more than 24 million citations. Data-mining of voluminous literature is a challenging task. Although several text-mining algorithms have been developed in recent years with focus on data visualization, they have limitations such as speed, are rigid and are not available in the open source. We have developed an R package, pubmed.mineR, wherein we have combined the advantages of existing algorithms, overcome their limitations, and offer user flexibility and link with other packages in Bioconductor and the Comprehensive R Network (CRAN) in order to expand the user capabilities for executing multifaceted approaches. Three case studies are presented, namely, ‘Evolving role of diabetes educators’, ‘Cancer risk assessment’ and ‘Dynamic concepts on disease and comorbidity’ to illustrate the use of pubmed.mineR. The package generally runs fast with small elapsed times in regular workstations even on large corpus sizes and with compute intensive functions. The pubmed.mineR is available at http://cran.r-project.org/web/packages/pubmed.mineR .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GPDminer: a tool for extracting named entities and analyzing relations in biological literature

Article Open access 06 March 2024

Biomarker Discovery with Text Mining and Literature Based Discovery

Supporting Biological Pathway Curation Through Text Mining

References

Bodenhofer U, Kothmeier A and Hochreiter S 2011 APCluster: an R package for affinity propagation clustering. Bioinformatics 27 2463–2464
Article CAS PubMed Google Scholar
Canese K and Weis S 2013 updated PubMed: The Bibliographic Database; in The NCBI Handbook [Internet] 2nd edition
Cheng D, Knox C, Young N, Stothard P, Damaraju S and Wishart DS 2008 PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 36 399–405
Article Google Scholar
Cohen KB and Hunter LE 2013 Chapter 16: Text mining for translational bioinformatics. PLoS Comput. Biol. 9 e1003044
Article PubMed Central PubMed Google Scholar
Davi A, Haughton D, Nasr N, Shah G, Skaletsky M and Spack R 2005 A Review of Two Text-Mining Packages: SAS TextMining and WordStat. Am. Stat. 59 89–103
Article Google Scholar
Delfs R, Doms A, Kozlenkov A and Schroeder M 2004 GoPubMed: ontology-based literature search applied to GeneOntology and PubMed; in Proceedings of German Bioinformatics Conference pp 169–178
Drab S 2013 The Evolving Role of Diabetes Educators. Am. J. Med. Sci. 345 307–313
Article PubMed Google Scholar
Feinerer I, Hornik K and Meyer D 2008 Text mining infrastructure in R. J. Stat. Softw. 25 1–54
Article Google Scholar
Frey BJ and Dueck D 2007 Clustering by passing messages between data points. Science 31 5972–5976
Google Scholar
Frisch M, Klocke B, Haltmeier M and Frech K 2009 LitInspector: literature and signal transduction pathway mining in PubMed abstracts. Nucleic Acids Res. 37 135–140
Article Google Scholar
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, et al. 2004 Bioconductor: open software development for computationalbiology and bioinformatics. Genome Biol. 5 R80
Article PubMed Central PubMed Google Scholar
Giron J, Ginebra J and Riba A 2005 Bayesian analysis of a multinomial sequence and homogeneity of literary style. Am. Stat. 59 19–30
Gray KA, Yates B, Seal RL, Wright MW and Bruford EA 2015 Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. doi:10.1093/nar/gku1071
Korhonen A, Silins I, Sun L and Stenius U 2009 The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature. BMC Bioinf. 10 303
Article Google Scholar
Maglott D, Ostell J, Pruitt KD and Tatusova T 2011 Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 39 D52–D57
Article PubMed Central CAS PubMed Google Scholar
Radlinski F and Joachims T 2007 Active exploration for learning rankings from click-through data; in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp 570–579
Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, Pico AR, Bader GD, et al. 2012 A travel guide to Cytoscape plugins. Nat. Methods 9 1069–1076
Article PubMed Central CAS PubMed Google Scholar
The UniProt Consortium 2014 Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res. 42 D191–D198
Article PubMed Central Google Scholar
Wild F 2007 lsa: Latent Semantic Analysis; R package version 0.63-3, http://CRAN.R-project.org/package=lsa

Download references

Acknowledgements

The authors thank Smriti Sharma and Inna Mittal for using the package and providing valuable feedback on text-mining and for suggesting improvements to the algorithms.

This work was supported by Council of Scientific and Industrial Research grant BSC0122.

Author information

Authors and Affiliations

GN Ramachandran Knowledge Centre for Genome Informatics, CSIR–Institute of Genomics and Integrative Biology, New Delhi, 110 025, India
Jyoti Rani, Ab Rauf Shah & Srinivasan Ramachandran

Authors

Jyoti Rani
View author publications
You can also search for this author in PubMed Google Scholar
Ab Rauf Shah
View author publications
You can also search for this author in PubMed Google Scholar
Srinivasan Ramachandran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Srinivasan Ramachandran.

Additional information

Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/oct2015/supp/Rani.pdf

[Rani J, Shah AR and Ramachandran S 2015 pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts. J. Biosci.] DOI 10.1007/s12038-015-9552-2

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 3.94 MB)

Glossary

Association: A term used to denote ‘closeness’ in relationship between a pair of terms.
Concept: A word referring to how it works. Examples – diabetes education, self-management, depigmentation, autoimmune.
Corpus: A collection of documents. plural-corpora
Document summarization: A short summary of the document including the most important parts such as brief introduction and conclusion.
Pre-processing: The process of preparing for analysis using mathematical approaches or other search and display utilities. Examples - word tokenization, sentence tokenization.
Term: A word with exact meaning. Examples -patient, vitiligo, diabetes educator.
Term-document matrix: A numerical matrix where terms are in rows and documents are in columns and the cells contain frequencies of occurrence of terms in the documents.
Text classification: Classifying the documents under defined terms.
Themes: Subjects usually defined by terms and preferably non-overlapping.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rani, J., Shah, A.R. & Ramachandran, S. pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts. J Biosci 40, 671–682 (2015). https://doi.org/10.1007/s12038-015-9552-2

Download citation

Published: 29 September 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s12038-015-9552-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts

Abstract

Access this article

Similar content being viewed by others

GPDminer: a tool for extracting named entities and analyzing relations in biological literature

Biomarker Discovery with Text Mining and Literature Based Discovery

Supporting Biological Pathway Curation Through Text Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Glossary

Rights and permissions

About this article

Cite this article

Keywords

Navigation

pubmed.mineR: An R package with text-mining algorithms to analyse PubMed abstracts

Abstract

Access this article

Similar content being viewed by others

GPDminer: a tool for extracting named entities and analyzing relations in biological literature

Biomarker Discovery with Text Mining and Literature Based Discovery

Supporting Biological Pathway Curation Through Text Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Glossary

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation