Biomedical Literature Mining pp 33-45 | Cite as
Mapping of Biomedical Text to Concepts of Lexicons, Terminologies, and Ontologies
- 6 Citations
- 2.6k Downloads
Abstract
Concept mapping is a fundamental task in biomedical text mining in which textual mentions of concepts of interest are annotated with specific entries of lexicons, terminologies, ontologies, or databases representing these concepts. Though there has been a significant amount of research, there are still a limited number of practical, publicly available tools for concept mapping of biomedical text specified by the user as an independent task. In this chapter, several tools that can automatically map biomedical text to concepts from a wide range of terminological resources are presented, followed by those that can map to more restricted sets of these resources. This presentation is intended to serve as a guide to researchers without a background in biomedical concept mapping of text for the selection of an appropriate tool based on usability, scalability, configurability, balance between precision and recall, and the desired set of terminological resources with which to annotate the text. Only with effective automatic concept-mapping tools will systems be able to scalably analyze the biomedical literature and other large sets of documents as a fundamental part of more complex text-mining tasks such as information extraction and hypothesis evaluation and generation.
Key words
Concept mapping Concept recognition Concept normalization Annotation Terminologies Vocabularies OntologiesReferences
- 1.Nadeau K, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26CrossRefGoogle Scholar
- 2.Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ (2005) GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinform 6(Suppl I):S3CrossRefGoogle Scholar
- 3.Krauthammer M, Nenadic G (2004) Term identification in the biomedical literature. J Biomed Inform 37:512–526PubMedCrossRefGoogle Scholar
- 4.Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenburg J, Sun C, Liu H-H, Torres R, Krauthammer M, Lau WM, Liu H, Hsu C-N, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Gen Biol 9(Suppl 2):S3CrossRefGoogle Scholar
- 5.Bales ME, Lussier YA, Johnson SB (2007) Topological analysis of large-scale biomedical terminology structures. J Am Med Inform Assoc 14:788–797PubMedCentralPubMedCrossRefGoogle Scholar
- 6.Whetzel PL, Noy NF, Shah NH, Alexander RR, Nyulas C, Tudorache T, Musen MA (2011) BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res 39(Web Server issue):W541–W545PubMedCentralPubMedCrossRefGoogle Scholar
- 7.Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21:248–255PubMedCrossRefGoogle Scholar
- 8.Hirschman L, Morgan AA, Yeh AS (2002) Rutabaga by any other name: extracting biological names. J Biomed Inform 35(4): 247–259PubMedCrossRefGoogle Scholar
- 9.McCray AT, Browne AC, Bodenreider O (2002) The lexical properties of the gene ontology. Proc AMIA Annual Symp, 504–508Google Scholar
- 10.Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) GENIA corpus: a semantically annotated corpus for bio-text mining. Bioinformatics 19(Suppl 1):i180–i182PubMedCrossRefGoogle Scholar
- 11.Pyysalo S, Ginter F, Heimonen J, Björne J, Boberg J, Järvinen J, Salakoski T (2007) BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform 8:50CrossRefGoogle Scholar
- 12.Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner Jr. WA, Cohen KB, Verspoor V, Blake JA, Hunter LE (2012) Concept annotation in the CRAFT corpus. BMC Bioinform 13:161Google Scholar
- 13.Briscoe T (1991) Lexical issues in natural language processing. In: Klein E, Veltman F (eds) Natural language and speech. Springer, BerlinGoogle Scholar
- 14.Hirst G (2009) Ontology and the Lexicon. In: Staab S, Studer S (eds) Handbook on ontologies. Springer, Berlin, pp 269–292CrossRefGoogle Scholar
- 15.Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MAGoogle Scholar
- 16.McCray AT, Srinavasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care, 235–239Google Scholar
- 17.Quochi V, Monachini M, Del Gratta R, Calzolari N (2008) A lexicon for biology and bioinformatics: the BOOTStrep experience. Proceedings international conf on language resources and evaluation (LREC) 2008, Marrakech, MoroccoGoogle Scholar
- 18.Chute C (2000) Clinical classification and terminology: some history and current observations. J Am Med Informatics Assoc 7(3): 298–303CrossRefGoogle Scholar
- 19.Svenonius E (2003) Design of controlled vocabularies. In: Drake M (ed) Encyclopedia of library and information science. Marcel Dekker, New York, NY, pp 822–838Google Scholar
- 20.Ingenerf J, Pöppl S (2007) Biomedical vocabularies: the demand for differentiation. Proc Internat Conf Med Informatics (MEDINFO) 2007, BrisbaneGoogle Scholar
- 21.Sioutos N, de Coronado S, Haber MW, Hartel FW, Shaiu W-L, Wright LW (2007) NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40:30–43PubMedCrossRefGoogle Scholar
- 22.Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA (2013) Genenames.org: the HGNC resources in 2013. Nucl Acids Res 41(Database issue):D545–D552PubMedCentralPubMedCrossRefGoogle Scholar
- 23.The UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40(D1): D71–D75PubMedCentralCrossRefGoogle Scholar
- 24.Smith B (2003) Ontology. In: Floridi L (ed) Blackwell guide to the philosophy of computing and information. Blackwell, Oxford, pp 155–166Google Scholar
- 25.Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J Hum Comp Stud 43(5/6):907–928CrossRefGoogle Scholar
- 26.Bodenreider O, Stevens R (2006) Bio-ontologies: current trends and future directions. Brief Bioinform 7(3):256–274PubMedCentralPubMedCrossRefGoogle Scholar
- 27.Rubin DL, Shah NH, Noy NF (2007) Biomedical ontologies: a functional perspective. Brief Bioinform 9(1):75–90PubMedCrossRefGoogle Scholar
- 28.Smith B, Ashburner M, Rosse C, Bard C, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, The OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25:1251–1255PubMedCentralPubMedCrossRefGoogle Scholar
- 29.The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29PubMedCentralCrossRefGoogle Scholar
- 30.Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236PubMedCentralPubMedGoogle Scholar
- 31.Schuyler PL, Hole WT, Tuttle MS, Sherertz DD (1993) The UMLS Metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 81(2):217–222PubMedCentralPubMedGoogle Scholar
- 32.Dai M, Shah NH, Xuan W, Musen MA, Watson SJ, Athey BD, Meng F (2008) An efficient solution for mapping free text to ontology terms. Proc AMIA Summit Translat BioinformGoogle Scholar
- 33.Jonquet C, Shah NH, Musen MA (2009) The open biomedical annotator. Proc AMIA Summit Translat BioinformGoogle Scholar
- 34.Tanenblatt M, Coden A, Saminsky I (2010) The ConceptMapper approach to named entity recognition. Proc 7th Internat Conf Lang Resources and Eval (LREC)Google Scholar
- 35.Ferrucci D, Lally A (2004) UIMA: An architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 10(3–4):327–348CrossRefGoogle Scholar
- 36.Schuemie MJ, Jelier R, Kors JA (2007) Peregrine: lightweight gene name normalization by dictionary lookup. Proc 2nd BioCreative Challenge Evaluation Workshop, 131–133Google Scholar
- 37.Browne AC, Divita G, Lu C, McCreedy L, Nace D (2003) Lexical systems; a report to the board of scientific counselors. Lister Hill National Center for Biomedical Communications Technical Report LHNCBC-TR-2003-003Google Scholar
- 38.Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA (2009) Comparison of concept recognizers for building the open biomedical annotator. BMC Bioinform 10 (Suppl 9):S14CrossRefGoogle Scholar
- 39.Stewart SA, von Maltzahn ME, Abidi SSR (2012) Comparing MetaMa to MGrep as a tool for mapping free text to formal medical lexicons. Proc 1st international workshop on knowledge extraction and consolidation from social media (KECSM)Google Scholar
- 40.Hripcsak G, Rothschild AS (2005) Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 12:296–298PubMedCentralPubMedCrossRefGoogle Scholar
- 41.Funk C, Baumgartner Jr. W, Garcia B, Roeder C, Bada M, Cohen KB, Hunter LE, Verspoor K (2013) Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC BioinformGoogle Scholar
- 42.Kang N, Singh B, Afzal Z, van Mulligen EM, Kors JA (2013) Using rule-based natural language processing to improve disease normalization in biomedical text. J Am Med Inform Assoc 0:1–6Google Scholar
- 43.Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucl Acids Res 39(Database Issue):D52–D57PubMedCentralPubMedCrossRefGoogle Scholar
- 44.Wermter J, Tomanek K, Hahn U (2009) High-performance gene name normalization with GENO. Bioinformatics 25(6):815–821PubMedCrossRefGoogle Scholar
- 45.Hakenberg J, Gerner M, Haeussler M, Solt I, Plake C, Schroeder M, Gonzalez G, Nenadic G, Bergman CM (2011) The GNAT library for local and remote gene mention normalization. Bioinformatics 27(19):2769–2771PubMedCentralPubMedCrossRefGoogle Scholar
- 46.Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeria E, Sherry ST, Shumway M, Sirotkin K, Souvarov A, Starchenko G, Tatusova TA, Wagner L, Yaschenko E, Ye J (2009) Database resources of the National Center for Biotechnology Information. Nucl Acids Res 37(Database Issue):D5–D15PubMedCentralPubMedCrossRefGoogle Scholar
- 47.Gerner M, Nenadic G, Bergman CM (2010) LINNAEUS: a species name identification system for biomedical literature. BMC Bioinform 11:85CrossRefGoogle Scholar
- 48.Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcantara R, Darsow M, Guedj M, Ashburner M (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucl Acids Res 36(Database Issue):D344–D350PubMedCentralPubMedGoogle Scholar
- 49.Jessop DM, Adams SE, Willighagen EL, Hawizy L, Murray-Rust P (2011) OSCAR4: a flexible architecture for chemical text-mining. J Cheminform 3:41PubMedCentralPubMedCrossRefGoogle Scholar
- 50.Weisgerber DW (1997) Chemical abstracts service chemical registry system: history, scope, and impacts. J Am Soc Inform Sci 48(4): 349–360CrossRefGoogle Scholar
- 51.Tomasulo P (2002) ChemIDplus: super source for chemical and drug information. Med Ref Serv Q 21(1):53–59PubMedCrossRefGoogle Scholar
- 52.Li Q, Cheng T, Wang Y, Bryant SH (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15(23–24):1052–1057PubMedCentralPubMedCrossRefGoogle Scholar
- 53.Rocktäschel T, Weidlich M, Leser U (2012) ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28(12): 1633–1640PubMedCrossRefGoogle Scholar
- 54.Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djombou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for “Omics” research on drugs. Nucl Acids Res 39(Database Issue): D1035–D1041PubMedCentralPubMedCrossRefGoogle Scholar
- 55.Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A (2008) Text processing through Web services: calling Whatizit. Bioinformatics 24(2):296–298PubMedCrossRefGoogle Scholar
- 56.Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucl Acids Res 33(Web Server Issue):W783–W786PubMedCentralPubMedCrossRefGoogle Scholar
- 57.Pafilis E, Donoghue SI, Jensen LJ, Horn H, Kuhn M, Brown NP, Schneider R (2009) Reflect: augmented browsing for the life scientist. Nat Biotechnol 27:508–510PubMedCrossRefGoogle Scholar