Biomedical Literature Mining and Its Components

Raja, Kalpana

doi:10.1007/978-1-0716-2305-3_1

Kalpana Raja³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2496))

778 Accesses
1 Citations
1 Altmetric

Abstract

The published biomedical articles are the best source of knowledge to understand the importance of biomedical entities such as disease, drugs, and their role in different patient population groups. The number of biomedical literature available and being published is increasing at an exponential rate with the use of large scale experimental techniques. Manual extraction of such information is becoming extremely difficult because of the huge number of biomedical literature available. Alternatively, text mining approaches receive much interest within biomedicine by providing automatic extraction of such information in more structured format from the unstructured biomedical text. Here, a text mining protocol to extract the patient population information, to identify the disease and drug mentions in PubMed titles and abstracts, and a simple information retrieval approach to retrieve a list of relevant documents for a user query are presented. The text mining protocol presented in this chapter is useful for retrieving information on drugs for patients with a specific disease. The protocol covers three major text mining tasks, namely, information retrieval, information extraction, and knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cohen KB, Hunter L (2004) Natural language processing and systems biology. In: Dubitzky W, Azuaje F (eds) Artificial intelligence methods and tools for systems biology. Springer Netherlands, Dordrecht, pp 147–173
Chapter Google Scholar
Weeber M, Klein H, Aronson AR, Mork JG, de Jong-van den Berg LT, Vos R (2000) Text-based discovery in biomedicine: the architecture of the DAD-system. Proc AMIA Symp 903–907
Google Scholar
Yeh AS, Hirschman L, Morgan AA (2003) Evaluation of text data mining for database curation: lessons learned from the KDD challenge cup. Bioinformatics 19(suppl_1):i331–i339. https://doi.org/10.1093/bioinformatics/btg1046
Article PubMed Google Scholar
Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28. https://doi.org/10.1038/ng0501-21
Article CAS PubMed Google Scholar
Cohen KB, Hunter L (2008) Getting started in text mining. PLoS Comput Biol 4(1):e20-e. https://doi.org/10.1371/journal.pcbi.0040020
Article CAS Google Scholar
Krallinger M, Valencia A (2005) Text-mining and information-retrieval services for molecular biology. Genome Biol 6(7):224. https://doi.org/10.1186/gb-2005-6-7-224
Article CAS PubMed PubMed Central Google Scholar
Tanabe L, Wilbur WJ (2002) Tagging gene and protein names in biomedical text. Bioinformatics 18(8):1124–1132. https://doi.org/10.1093/bioinformatics/18.8.1124
Article CAS PubMed Google Scholar
Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM (2003) Gene indexing: characterization and analysis of NLM’s GeneRIFs. AMIA Annu Symp Proc 2003:460–464
PubMed Central Google Scholar
Cusick ME, Yu H, Smolyar A, Venkatesan K, Carvunis AR, Simonis N et al (2009) Literature-curated protein interaction datasets. Nat Methods 6(1):39–46. https://doi.org/10.1038/nmeth.1284
Article CAS PubMed PubMed Central Google Scholar
Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search. Addison-Wesley Publishing Company, Harlow
Google Scholar
Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2):248–256. https://doi.org/10.1093/bioinformatics/bth496
Article CAS PubMed Google Scholar
Dingare S, Nissim M, Finkel J, Manning C, Grover C (2005) A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations. Comp Funct Genomics 6(1–2):77–85. https://doi.org/10.1002/cfg.457
Article CAS PubMed PubMed Central Google Scholar
Feldman R, Aumann Y, Finkelstein-Landau M, Hurvitz E, Regev Y, Yaroshevich A (2002) A comparative study of information extraction strategies. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. Berlin, Springer, pp 349–359
Chapter Google Scholar
Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–855. https://doi.org/10.1089/106652703322756104
Article CAS PubMed Google Scholar
Yang Z, Lin H, Li Y (2010) BioPPISVMExtractor: a protein-protein interaction extractor for biomedical literature using SVM and rich feature sets. J Biomed Inform 43(1):88–96. https://doi.org/10.1016/j.jbi.2009.08.013
Article CAS PubMed Google Scholar
Kabiljo R, Clegg AB, Shepherd AJ (2009) A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics 10:233. https://doi.org/10.1186/1471-2105-10-233
Article CAS PubMed PubMed Central Google Scholar
Yu W, Clyne M, Dolan SM, Yesupriya A, Wulf A, Liu T et al (2008) GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. BMC Bioinformatics 9:205. https://doi.org/10.1186/1471-2105-9-205
Article CAS PubMed PubMed Central Google Scholar
Minard A-L, Makour L, Ligozat A-L, Grau B (2011) Feature selection for drug-drug interaction detection using machine-learning based approaches
Google Scholar
Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C (2008) Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 15(1):87–98. https://doi.org/10.1197/jamia.M2401
Article PubMed PubMed Central Google Scholar
Raja K, Dasot N, Goyal P, Jonnalagadda SR (2016) Towards evidence-based precision medicine: extracting population information from biomedical text using binary classifiers and syntactic patterns. AMIA Jt Summits Transl Sci Proc 2016:203–212
PubMed PubMed Central Google Scholar
Rindflesch TC, Kilicoglu H, Fiszman M, Rosemblat G, Shin D (2011) Semantic MEDLINE: an advanced information management application for biomedicine. Inf Serv Use 31:15–21. https://doi.org/10.3233/ISU-2011-0627
Article CAS Google Scholar
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland. pp 55–60
Google Scholar
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu. Accessed
Liu L, Zsu MT (2009) Encyclopedia of database systems. Springer Publishing Company, New York
Book Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book Google Scholar

Download references

Author information

Authors and Affiliations

Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
Kalpana Raja

Authors

Kalpana Raja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Morgridge Institute for Research, University of Wisconsin, Madison, WI, USA
Kalpana Raja

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Raja, K. (2022). Biomedical Literature Mining and Its Components. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_1

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2305-3_1
Published: 18 June 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2304-6
Online ISBN: 978-1-0716-2305-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics