Abstract
The published biomedical articles are the best source of knowledge to understand the importance of biomedical entities such as disease, drugs, and their role in different patient population groups. The number of biomedical literature available and being published is increasing at an exponential rate with the use of large scale experimental techniques. Manual extraction of such information is becoming extremely difficult because of the huge number of biomedical literature available. Alternatively, text mining approaches receive much interest within biomedicine by providing automatic extraction of such information in more structured format from the unstructured biomedical text. Here, a text mining protocol to extract the patient population information, to identify the disease and drug mentions in PubMed titles and abstracts, and a simple information retrieval approach to retrieve a list of relevant documents for a user query are presented. The text mining protocol presented in this chapter is useful for retrieving information on drugs for patients with a specific disease. The protocol covers three major text mining tasks, namely, information retrieval, information extraction, and knowledge discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cohen KB, Hunter L (2004) Natural language processing and systems biology. In: Dubitzky W, Azuaje F (eds) Artificial intelligence methods and tools for systems biology. Springer Netherlands, Dordrecht, pp 147–173
Weeber M, Klein H, Aronson AR, Mork JG, de Jong-van den Berg LT, Vos R (2000) Text-based discovery in biomedicine: the architecture of the DAD-system. Proc AMIA Symp 903–907
Yeh AS, Hirschman L, Morgan AA (2003) Evaluation of text data mining for database curation: lessons learned from the KDD challenge cup. Bioinformatics 19(suppl_1):i331–i339. https://doi.org/10.1093/bioinformatics/btg1046
Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28. https://doi.org/10.1038/ng0501-21
Cohen KB, Hunter L (2008) Getting started in text mining. PLoS Comput Biol 4(1):e20-e. https://doi.org/10.1371/journal.pcbi.0040020
Krallinger M, Valencia A (2005) Text-mining and information-retrieval services for molecular biology. Genome Biol 6(7):224. https://doi.org/10.1186/gb-2005-6-7-224
Tanabe L, Wilbur WJ (2002) Tagging gene and protein names in biomedical text. Bioinformatics 18(8):1124–1132. https://doi.org/10.1093/bioinformatics/18.8.1124
Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM (2003) Gene indexing: characterization and analysis of NLM’s GeneRIFs. AMIA Annu Symp Proc 2003:460–464
Cusick ME, Yu H, Smolyar A, Venkatesan K, Carvunis AR, Simonis N et al (2009) Literature-curated protein interaction datasets. Nat Methods 6(1):39–46. https://doi.org/10.1038/nmeth.1284
Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search. Addison-Wesley Publishing Company, Harlow
Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21(2):248–256. https://doi.org/10.1093/bioinformatics/bth496
Dingare S, Nissim M, Finkel J, Manning C, Grover C (2005) A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations. Comp Funct Genomics 6(1–2):77–85. https://doi.org/10.1002/cfg.457
Feldman R, Aumann Y, Finkelstein-Landau M, Hurvitz E, Regev Y, Yaroshevich A (2002) A comparative study of information extraction strategies. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. Berlin, Springer, pp 349–359
Shatkay H, Feldman R (2003) Mining the biomedical literature in the genomic era: an overview. J Comput Biol 10(6):821–855. https://doi.org/10.1089/106652703322756104
Yang Z, Lin H, Li Y (2010) BioPPISVMExtractor: a protein-protein interaction extractor for biomedical literature using SVM and rich feature sets. J Biomed Inform 43(1):88–96. https://doi.org/10.1016/j.jbi.2009.08.013
Kabiljo R, Clegg AB, Shepherd AJ (2009) A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics 10:233. https://doi.org/10.1186/1471-2105-10-233
Yu W, Clyne M, Dolan SM, Yesupriya A, Wulf A, Liu T et al (2008) GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. BMC Bioinformatics 9:205. https://doi.org/10.1186/1471-2105-9-205
Minard A-L, Makour L, Ligozat A-L, Grau B (2011) Feature selection for drug-drug interaction detection using machine-learning based approaches
Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C (2008) Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 15(1):87–98. https://doi.org/10.1197/jamia.M2401
Raja K, Dasot N, Goyal P, Jonnalagadda SR (2016) Towards evidence-based precision medicine: extracting population information from biomedical text using binary classifiers and syntactic patterns. AMIA Jt Summits Transl Sci Proc 2016:203–212
Rindflesch TC, Kilicoglu H, Fiszman M, Rosemblat G, Shin D (2011) Semantic MEDLINE: an advanced information management application for biomedicine. Inf Serv Use 31:15–21. https://doi.org/10.3233/ISU-2011-0627
Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland. pp 55–60
McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu. Accessed
Liu L, Zsu MT (2009) Encyclopedia of database systems. Springer Publishing Company, New York
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Raja, K. (2022). Biomedical Literature Mining and Its Components. In: Raja, K. (eds) Biomedical Text Mining. Methods in Molecular Biology, vol 2496. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2305-3_1
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2305-3_1
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2304-6
Online ISBN: 978-1-0716-2305-3
eBook Packages: Springer Protocols