Using Existing Biomedical Resources to Detect and Ground Terms in Biomedical Literature
We present an approach towards the automatic detection of names of proteins, genes, species, etc. in biomedical literature and their grounding to widely accepted identifiers. The annotation is based on a large term list that contains the common expression of the terms, a normalization step that matches the terms with their actual representation in the texts, and a disambiguation step that resolves the ambiguity of matched terms. We describe various characteristics of the terms found in existing term resources and of the terms that are used in biomedical texts. We evaluate our results against a corpus of manually annotated protein mentions and achieve a precision of 57% and recall of 72%.
KeywordsBiomedical Text Token Sequence Term List Disambiguation Method Gene Normalization Task
Unable to display preview. Download preview PDF.
- 1.Hakenberg, J.: What’s in a gene name? Automated refinement of gene name dictionaries. In: Proceedings of BioNLP 2007: Biological, Translational, and Clinical Language Processing; Prague, Czech Republic (2007)Google Scholar
- 2.Kappeler, T., Kaljurand, K., Rinaldi, F.: TX Task: Automatic Detection of Focus Organisms in Biomedical Publications. In: BioNLP 2009, NAACL/HLT, Boulder, Colorado, June 4–5 (2009)Google Scholar
- 3.Leitner, F., Krallinger, M., Rodriguez-Penagos, C., Hakenberg, J., Plake, C., Kuo, C.-J., Hsu, C.-N., Tsai, R.T.-H., Hung, H.-C., Lau, W.W., Johnson, C.A., Saetre, R., Yoshida, K., Chen, Y.H., Kim, S., Shin, S.-Y., Zhang, B.-T., Baumgartner, W.A., Hunter, L., Haddow, B., Matthews, M., Wang, X., Ruch, P., Ehrler, F., Ozgur, A., Erkan, G., Radev, D.R., Krauthammer, M., Luong, T., Hoffmann, R.: Introducing meta-services for biomedical information extraction. Genome Biology 9(suppl. 2), S6 (2008)CrossRefGoogle Scholar