Recognition of Chemical Names in Chinese Texts

  • Nan Li
  • Jiu-ming Ji
  • Rong-ting Zheng
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 124)


Chemical names recognition is a critical task for search and mining in some science publications and patents. However, most research on chemical names recognition has focused on English texts, e.g., MEDLINE abstracts. This paper is concerned with recognition of chemical substance names in Chinese text, which is regarded as a sequence tagging problem under Conditional Random Field (CRF) framework. In order to achieve a better performance, we make an empirical exploration of several relative parameters, including tagging unit, intervals of feature values, and feature sets. We show that there is a significant variance in performance as different parameters are selected. Based our experiment data, we give some feasibility analysis for further research.


Chinese Character Chinese Word Name Entity Recognition Word Segmentation Chinese Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Klinger, R., Kolarik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of IUPAC and IUPAC-like chemical names. Bioinformatics 24(13), 268–276 (2008)CrossRefGoogle Scholar
  2. 2.
    He, Y., Kayaalp, M.: Biological Entity Recognition with Conditional Random Fields. In: AMIA Annu. Symp. Proc., pp. 293–297 (2008)Google Scholar
  3. 3.
    Song, D., Sun, J.: Automatic index of chemical feature words based on rules. Journal of the China Society for Scientific and Technical Information 28(5), 689–692 (2009) (in Chinese)MathSciNetGoogle Scholar
  4. 4.
    Liang, L., Li, W.: The identification of vocabularies about medicines and chemicals chinese commodity text. Journal of Yantai University (Natural Science and Engineering Edition) 15(4) (2002) (in Chinese)Google Scholar
  5. 5.
    Gu, B., Popowich, F., Dahl, V.: Recognizing Biomedical Named Entities in Chinese Research Abstracts. In: Bergler, S. (ed.) Canadian AI. LNCS (LNAI), vol. 5032, pp. 114–125. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Field: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, June 28-July 01, pp. 282–289 (2001)Google Scholar
  7. 7.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)Google Scholar
  8. 8.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nan Li
    • 1
  • Jiu-ming Ji
    • 1
  • Rong-ting Zheng
    • 1
  1. 1.Institute of Science and Technology Information, and East China University of Science and Technology LibrariesEast China University of Science and TechnologyShanghaiChina

Personalised recommendations