Abstract
We present an approach towards the automatic detection of names of proteins, genes, species, etc. in biomedical literature and their grounding to widely accepted identifiers. The annotation is based on a large term list that contains the common expression of the terms, a normalization step that matches the terms with their actual representation in the texts, and a disambiguation step that resolves the ambiguity of matched terms. We describe various characteristics of the terms found in existing term resources and of the terms that are used in biomedical texts. We evaluate our results against a corpus of manually annotated protein mentions and achieve a precision of 57% and recall of 72%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hakenberg, J.: What’s in a gene name? Automated refinement of gene name dictionaries. In: Proceedings of BioNLP 2007: Biological, Translational, and Clinical Language Processing; Prague, Czech Republic (2007)
Kappeler, T., Kaljurand, K., Rinaldi, F.: TX Task: Automatic Detection of Focus Organisms in Biomedical Publications. In: BioNLP 2009, NAACL/HLT, Boulder, Colorado, June 4–5 (2009)
Leitner, F., Krallinger, M., Rodriguez-Penagos, C., Hakenberg, J., Plake, C., Kuo, C.-J., Hsu, C.-N., Tsai, R.T.-H., Hung, H.-C., Lau, W.W., Johnson, C.A., Saetre, R., Yoshida, K., Chen, Y.H., Kim, S., Shin, S.-Y., Zhang, B.-T., Baumgartner, W.A., Hunter, L., Haddow, B., Matthews, M., Wang, X., Ruch, P., Ehrler, F., Ozgur, A., Erkan, G., Radev, D.R., Krauthammer, M., Luong, T., Hoffmann, R.: Introducing meta-services for biomedical information extraction. Genome Biology 9(suppl. 2), S6 (2008)
Liu, H., Hu, Z.-Z., Zhang, J., Wu, C.: BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics 22(1), 103–105 (2006)
Mathivanan, S., Periaswamy, B., Gandhi, T.K.B., Kandasamy, K., Suresh, S., Mohmood, R., Ramachandra, Y.L., Pandey, A.: An evaluation of human protein-protein interaction data in the public domain. BMC Bioinformatics 7(suppl. 5), 19 (2006)
Morgan, A.A., Lu, Z., Wang, X., Cohen, A.M., Fluck, J., Ruch, P., Divoli, A., Fundel, K., Leaman, R., Hakenberg, J., Sun, C.: Overview of BioCreative II gene normalization. Genome Biology 9(suppl. 2), S3 (2008)
Rinaldi, F., Kappeler, T., Kaljurand, K., Schneider, G., Klenner, M., Clematide, S., Hess, M., von Allmen, J.-M., Parisot, P., Romacker, M., Vachon, T.: OntoGene in BioCreative II. Genome Biology 9(suppl. 2), S13 (2008)
Sarntivijai, S., Ade, A.S., Athey, B.D., States, D.J.: A bioinformatics analysis of the cell line nomenclature. Bioinformatics 24(23), 2760–2766 (2008)
Tanabe, L., John Wilbur, W.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)
Wang, X., Matthews, M.: Distinguishing the species of biomedical named entities for term identification. BMC Bioinformatics 9(suppl. 11), S6 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaljurand, K., Rinaldi, F., Kappeler, T., Schneider, G. (2009). Using Existing Biomedical Resources to Detect and Ground Terms in Biomedical Literature. In: Combi, C., Shahar, Y., Abu-Hanna, A. (eds) Artificial Intelligence in Medicine. AIME 2009. Lecture Notes in Computer Science(), vol 5651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02976-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-02976-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02975-2
Online ISBN: 978-3-642-02976-9
eBook Packages: Computer ScienceComputer Science (R0)