Recognition of Chemical Names in Chinese Texts
Chemical names recognition is a critical task for search and mining in some science publications and patents. However, most research on chemical names recognition has focused on English texts, e.g., MEDLINE abstracts. This paper is concerned with recognition of chemical substance names in Chinese text, which is regarded as a sequence tagging problem under Conditional Random Field (CRF) framework. In order to achieve a better performance, we make an empirical exploration of several relative parameters, including tagging unit, intervals of feature values, and feature sets. We show that there is a significant variance in performance as different parameters are selected. Based our experiment data, we give some feasibility analysis for further research.
KeywordsChinese Character Chinese Word Name Entity Recognition Word Segmentation Chinese Text
Unable to display preview. Download preview PDF.
- 2.He, Y., Kayaalp, M.: Biological Entity Recognition with Conditional Random Fields. In: AMIA Annu. Symp. Proc., pp. 293–297 (2008)Google Scholar
- 4.Liang, L., Li, W.: The identification of vocabularies about medicines and chemicals chinese commodity text. Journal of Yantai University (Natural Science and Engineering Edition) 15(4) (2002) (in Chinese)Google Scholar
- 6.Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Field: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, June 28-July 01, pp. 282–289 (2001)Google Scholar
- 7.Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)Google Scholar