Abstract
Named Entity Recognition (NER) from biomedical literature is crucial in biomedical knowledge base automation. In this paper, both empirical rule and statistical approaches to protein entity recognition are presented and investigated on a general corpus GENIA 3.02p and a new domain-specific corpus SRC. Experimental results show the rules derived from SRC are useful though they are simpler and more general than the one used by other rule-based approaches. Meanwhile, a concise HMM-based model with rich set of features is presented and proved to be robust and competitive while comparing it to other successful hybrid models. Besides, the resolution of coordination variants common in entities recognition is addressed. By applying heuristic rules and clustering strategy, the presented resolver is proved to be feasible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Towards Information Extraction: identifying Protein Names from Biological Papers. In: The 3rd Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Hou, W.J., Chen, H.H.: Enhancing Performance of Protein Name Recognizers using Collocation. In: ACL 2003, pp. 25–32 (2003)
Lee, K.J., Hwang, Y.S., Rim, H.C.: Two-Phase Biomedical NE Recognition based on SVMs. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)
Lin, Y., Tsai, T., Chiou, W., Wu, K., Sung, T.-Y., Hsu, W.L.: A Maximum Entropy Approach to Biomedical Named Entity Recognition. In: 4th Workshop on Data Mining in Bioinformatics (2004)
Olsson, F., Eriksson, G., Franzen, K., Asker, L., Liden, P.: Notions of Correctness when Evaluating Protein Name Taggers. In: 19th International Conference on Computational Linguistics, pp. 765–771 (2002)
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), Geneva, Switzerland (2004)
Takeuchi, K., Collier, N.: Bio-Medical Entity Extraction using Support Vector Machines. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 57–64 (2003)
Tsuruoka, Y., Tsujii, J.: Boosting Precision and Recall of Dictionary-based Protein Name Recognition. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 41–48 (2003)
Zhou, G.D., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: 40th Annual Meeting of the Association for Computational Linguistics (2002)
Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.L.: Recognizing Names in Biomedical Texts: A Machine Learning Approach. Bioinformatics 20, 1178–1190 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, T., Shih, PK. (2005). Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_6
Download citation
DOI: https://doi.org/10.1007/11428817_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)