Rule-Based Protein Term Identification with Help from Automatic Species Tagging
In biomedical articles, terms often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions described in literature, which only work on gene/protein mentions on a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared the performance of our automatic system to that of human annotators, with very promising results.
Unable to display preview. Download preview PDF.
- 1.Krauthammer, M., Nenadic, G.: Term identification in the biomedical literature. Journal of Biomedical Informatics (Special Issue on Named Entity Recogntion in Biomedicine) 37(6), 512–526 (2004)Google Scholar
- 3.Tuason, O., Chen, L., Liu, H., Blake, J.A., Friedman, C.: Biological nomenclature: A source of lexical knowledge and ambiguity. In: Proceedings of Pac. Symp. Biocomput., pp. 238–249 (2004)Google Scholar
- 4.Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through term variation. In: Proceedings of 20th Int. Conference on Computational Linguistics (Coling 2004), Geneva, Switzerland (2004)Google Scholar
- 5.Chen, L., Liu, H., Friedman, C.: Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics, 248–256 (2005)Google Scholar
- 6.Fang, H., Murphy, K., Jin, Y., Kim, J.S., White, P.S.: Human gene name normalization using text matching with automatically extracted synonym dictionaries. In: Proceedings of BioNLP’06, New York, USA (2006)Google Scholar
- 7.Hirschman, L., Colosimo, M., Morgan, A., Columbe, J., Yeh, A.: Task 1B: Gene list task BioCreAtIve workshop. In: BioCreative: Critical Assessment for Information Extraction in Biology (2004)Google Scholar
- 12.Hackey, B., Nguyen, H., Nissim, M., Alex, B., Grover, C.: Grounding gene mentions with respect to gene database idntifiers. In: BioCreAtIvE Workshop Handouts, Granada, Spain (2004)Google Scholar
- 13.Liu, H.: BioTagger: A biological entity tagging system. In: BioCreAtIvE Workshop Handouts, Granada, Spain (2004)Google Scholar
- 15.Hanisch, D., Fluck, J., Mevissen, H.T., Zimmer, R.: Playing biology’s name game: identifying protein names in scientific text. In: Pac. Symp. Biocomput., pp. 403–414 (2003)Google Scholar
- 16.Mihalcea, R., Chklovski, T., Killgariff, A.: The Senseval-3 English lexical sample task. In: Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3) (2004)Google Scholar
- 17.Schwartz, A., Hearst, M.: A simople algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing (2003)Google Scholar