Skip to main content

Biomedical Literature Mining and Its Components

  • Protocol
  • First Online:
Biomedical Text Mining

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2496))

Abstract

The published biomedical articles are the best source of knowledge to understand the importance of biomedical entities such as disease, drugs, and their role in different patient population groups. The number of biomedical literature available and being published is increasing at an exponential rate with the use of large scale experimental techniques. Manual extraction of such information is becoming extremely difficult because of the huge number of biomedical literature available. Alternatively, text mining approaches receive much interest within biomedicine by providing automatic extraction of such information in more structured format from the unstructured biomedical text. Here, a text mining protocol to extract the patient population information, to identify the disease and drug mentions in PubMed titles and abstracts, and a simple information retrieval approach to retrieve a list of relevant documents for a user query are presented. The text mining protocol presented in this chapter is useful for retrieving information on drugs for patients with a specific disease. The protocol covers three major text mining tasks, namely, information retrieval, information extraction, and knowledge discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cohen KB, Hunter L (2004) Natural language processing and systems biology. In: Dubitzky W, Azuaje F (eds) Artificial intelligence methods and tools for systems biology. Springer Netherlands, Dordrecht, pp 147–173

    Chapter  Google Scholar 

  2. Weeber M, Klein H, Aronson AR, Mork JG, de Jong-van den Berg LT, Vos R (2000) Text-based discovery in biomedicine: the architecture of the DAD-system. Proc AMIA Symp 903–907

    Google Scholar 

  3. Yeh AS, Hirschman L, Morgan AA (2003) Evaluation of text data mining for database curation: lessons learned from the KDD challenge cup. Bioinformatics 19(suppl_1):i331–i339. https://doi.org/10.1093/bioinformatics/btg1046

    Article  PubMed  Google Scholar 

  4. Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28. https://doi.org/10.1038/ng0501-21

    Article  CAS  PubMed  Google Scholar 

  5. Cohen KB, Hunter L (2008) Getting started in text mining. PLoS Comput Biol 4(1):e20-e. https://doi.org/10.1371/journal.pcbi.0040020

    Article  CAS  Google Scholar 

  6. Krallinger M, Valencia A (2005) Text-mining and information-retrieval services for molecular biology. Genome Biol 6(7):224. https://doi.org/10.1186/gb-2005-6-7-224

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Tanabe L, Wilbur WJ (2002) Tagging gene and protein names in biomedical text. Bioinformatics 18(8):1124–1132. https://doi.org/10.1093/bioinformatics/18.8.1124

    Article  CAS  PubMed  Google Scholar 

  8. Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM (2003) Gene indexing: characterization and analysis of NLM’s GeneRIFs. AMIA Annu Symp Proc 2003:460–464

    PubMed Central  Google Scholar 

  9. Cusick ME, Yu H, Smolyar A, Venkatesan K, Carvunis AR, Simonis N et al (2009) Literature-curated protein interaction datasets. Nat Methods 6(1):39–46. https://doi.org/10.1038/nmeth.1284

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search. Addison-Wesley Publishing Company, Harlow

    Google Scholar 

  11. Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2):248–256. https://doi.org/10.1093/bioinformatics/bth496

    Article  CAS  PubMed  Google Scholar 

  12. Dingare S, Nissim M, Finkel J, Manning C, Grover C (2005) A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations. Comp Funct Genomics 6(1–2):77–85. https://doi.org/10.1002/cfg.457

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Feldman R, Aumann Y, Finkelstein-Landau M, Hurvitz E, Regev Y, Yaroshevich A (2002) A comparative study of information extraction strategies. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. Berlin, Springer, pp 349–359

    Chapter  Google Scholar 

  14. Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–855. https://doi.org/10.1089/106652703322756104

    Article  CAS  PubMed  Google Scholar 

  15. Yang Z, Lin H, Li Y (2010) BioPPISVMExtractor: a protein-protein interaction extractor for biomedical literature using SVM and rich feature sets. J Biomed Inform 43(1):88–96. https://doi.org/10.1016/j.jbi.2009.08.013

    Article  CAS  PubMed  Google Scholar 

  16. Kabiljo R, Clegg AB, Shepherd AJ (2009) A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics 10:233. https://doi.org/10.1186/1471-2105-10-233

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Yu W, Clyne M, Dolan SM, Yesupriya A, Wulf A, Liu T et al (2008) GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. BMC Bioinformatics 9:205. https://doi.org/10.1186/1471-2105-9-205

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Minard A-L, Makour L, Ligozat A-L, Grau B (2011) Feature selection for drug-drug interaction detection using machine-learning based approaches

    Google Scholar 

  19. Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C (2008) Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 15(1):87–98. https://doi.org/10.1197/jamia.M2401

    Article  PubMed  PubMed Central  Google Scholar 

  20. Raja K, Dasot N, Goyal P, Jonnalagadda SR (2016) Towards evidence-based precision medicine: extracting population information from biomedical text using binary classifiers and syntactic patterns. AMIA Jt Summits Transl Sci Proc 2016:203–212

    PubMed  PubMed Central  Google Scholar 

  21. Rindflesch TC, Kilicoglu H, Fiszman M, Rosemblat G, Shin D (2011) Semantic MEDLINE: an advanced information management application for biomedicine. Inf Serv Use 31:15–21. https://doi.org/10.3233/ISU-2011-0627

    Article  CAS  Google Scholar 

  22. Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland. pp 55–60

    Google Scholar 

  23. McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu. Accessed

  24. Liu L, Zsu MT (2009) Encyclopedia of database systems. Springer Publishing Company, New York

    Book  Google Scholar 

  25. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Raja, K. (2022). Biomedical Literature Mining and Its Components. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2305-3_1

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2304-6

  • Online ISBN: 978-1-0716-2305-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics