A Method of Annotating Disease Names in TCM Patents Based on Co-training

  • Na DengEmail author
  • Xu Chen
  • Caiquan Xiong
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 96)


In the era of big data, annotated text data is a scarce resource. The annotated important semantic information can be used as keywords in text analysis, mining and intelligent retrieval, as well as valuable training and testing sets for machine learning. In the analysis, mining and intelligent retrieval of Traditional Chinese Medicine (TCM) patents, similar to Chinese herbal medicine name and medicine efficacy, disease name is also an important annotation object. Utilizing the characteristics of TCM patent texts and based on co-training method in machine learning, this paper proposes a method of annotating disease names from TCM patent texts. Experiments show that this method is feasible and effective. This method can also be extended to annotate other semantic information in TCM patents.



This work was supported by National Key Research and Development Program of China under Grant 2017YFC1405403; National Natural Science Foundation of China under Grant 61075059; Philosophical and Social Sciences Research Project of Hubei Education Department under Grant 19Q054; Green Industry Technology Leading Project (product development category) of Hubei University of Technology under Grant CPYF2017008; Research Foundation for Advanced Talents of Hubei University of Technology under Grant BSQD12131; Natural Science Foundation of Anhui Province under Grant 1708085MF161; and Key Project of Natural Science Research of Universities in Anhui under Grant KJ2015A236.


  1. 1.
    Guangpu, F., Xu, C., Zhiyong, P.: A rules and statistical learning based method for Chinese patent information extraction. In: Eighth Web Information Systems & Applications Conference. IEEE, Piscataway New Jersey (2011)Google Scholar
  2. 2.
    Hou, C.Y., Li, W.Q., Li, Y.: An automatic information extraction method based on the characteristics of patent. Adv. Mater. Res. 472–475, 1544–1550 (2012)CrossRefGoogle Scholar
  3. 3.
    Wang, F., Lin, L.F., Yang, Z.: An ontology-based automatic semantic annotation approach for patent document retrieval in product innovation design. Appl. Mech. Mater. 446–447, 1581–1590 (2013)Google Scholar
  4. 4.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Conference on Computational Learning Theory, pp. 92–100 (1998)Google Scholar
  5. 5.
    Deng, N., Wang, C., Zhang, M., et al.: A semi-automatic annotation method of effect clue words for chinese patents based on co-training. Int. J. Data Warehouse. Min. 14(4), 1–19 (2018)CrossRefGoogle Scholar
  6. 6.
    Chen, X., Peng, Z., Zeng, C.: A co-training based method for Chinese patent semantic annotation. In: The 21st ACM International Conference on Information and Knowledge Management. ACM (2012)Google Scholar
  7. 7.
    Chen, X., Deng, N.: A semi-supervised machine learning method for Chinese patent effect annotation. In: 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC). IEEE, Piscataway New Jersey (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer ScienceHubei University of TechnologyWuhanChina
  2. 2.School of Information and Safety EngineeringZhongnan University of Economics and LawWuhanChina

Personalised recommendations