Approach for Semi-automatic Construction of Anti-infective Drug Ontology Based on Entity Linking

  • Ying Shen
  • Yang Deng
  • Kaiqi Yuan
  • Li Liu
  • Yong Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10699)


Ontology can be used for the interpretation of natural language. To construct an anti-infective drug ontology, one needs to design and deploy a methodological step to carry out the entity discovery and linking. Medical synonym resources have been an important part of medical natural language processing (NLP). However, there are problems such as low precision and low recall rate. In this study, an NLP approach is adopted to generate candidate entities. Open ontology is analyzed to extract semantic relations. Six-word vector features and word-level features are selected to perform the entity linking. The extraction results of synonyms with a single feature and different combinations of features are studied. Experiments show that our selected features have achieved a precision rate of 86.77%, a recall rate of 89.03% and an F1 score of 87.89%. This paper finally presents the structure of the proposed ontology and its relevant statistical data.


Data mining Ontology construction Entity discovery Entity linking 



This work was financially supported by the National Natural Science Foundation of China (No. 61602013), and the Shenzhen Key Fundamental Research Projects (Grant No. JCYJ20160330095313861, and JCYJ20151030154330711).


  1. 1.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, no. 6, pp. 775–780 (2006)Google Scholar
  2. 2.
    Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.: Recognizing names in biomedical texts: a machine learning approach. Bioinformatics 20(7), 1178–1190 (2004)CrossRefGoogle Scholar
  3. 3.
    Zhou, L., Hripcsak, G.: Temporal reasoning with medical data—a review with emphasis on medical natural language processing. J. Biomed. Inform. 40(2), 183–202 (2007)CrossRefGoogle Scholar
  4. 4.
    Denny, J.C., Peterson, J.F., Choma, N.N., Xu, H., Miller, R.A., Bastarache, L., Peterson, N.B.: Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J. Am. Med. Inform. Assoc. 17(4), 383–388 (2010)CrossRefGoogle Scholar
  5. 5.
    Krallinger, M., Erhardt, R.A.A., Valencia, A.: Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today 10(6), 439–445 (2005)CrossRefGoogle Scholar
  6. 6.
    de Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med. Inform. Assoc. 18(5), 557–562 (2011)CrossRefGoogle Scholar
  7. 7.
    Jiang, M., Chen, Y., Liu, M., Rosenbloom, S.T., Mani, S., Denny, J.C., Xu, H.: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 18(5), 601–606 (2011)CrossRefGoogle Scholar
  8. 8.
    Chang, K.W., Samdani, R., Rozovskaya, A., Rizzolo, N., Sammons, M., Roth, D.: Inference protocols for coreference resolution. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 40–44 (2011)Google Scholar
  9. 9.
    Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8 (2001)Google Scholar
  10. 10.
    Cao, L.J., Keerthi, S.S., Ong, C.J., Zhang, J.Q., Periyathamby, U., Fu, X.J., Lee, H.P.: Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans. Neural Netw. 17(4), 1039–1049 (2006)CrossRefGoogle Scholar
  11. 11.
    Franc, V., Sonnenburg, S., Werner, T.: Cutting plane methods in machine learning. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 185–218. MIT Press, Cambridge (2011)Google Scholar
  12. 12.
    Kang, N., Barendse, R.J., Afzal, Z., Singh, B., Schuemie, M.J., van Mulligen, E.M., Kors, J.A.: Erasmus MC approaches to the i2b2 challenge. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, Boston, MA, USA. i2b2 (2010)Google Scholar
  13. 13.
    DeStefano, R.J., Tao, L., Gai, K.: Improving data governance in large organizations through ontology and linked data. In: 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 279–284 (2016)Google Scholar
  14. 14.
    Gai, K., Qiu, M., Chen, L.C., Liu, M.: Electronic health record error prevention approach using ontology in big data. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 752–757 (2015)Google Scholar
  15. 15.
    Jayaraman, S., Tao, L., Gai, K., Jiang, N.: Drug side effects data representation and full spectrum inferencing using knowledge graphs in intelligent telehealth. In: 2016 IEEE 3rd International Conference Cyber Security and Cloud Computing (CSCloud), pp. 289–294 (2016)Google Scholar
  16. 16.
    Schriml, L.M., Arze, C., Nadendla, S., Chang, Y.W.W., Mazaitis, M., Felix, V., Feng, G., Kibbe, W.A.: Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 40(D1), D940–D946 (2011)CrossRefGoogle Scholar
  17. 17.
    Shao, Y., Lei, K., Chen, L., Huang, Z., Cui, B., Liu, Z., Tong, Y., Xu, J.: Fast parallel path concatenation for graph extraction. IEEE Trans. Knowl. Data Eng. 29(10), 2210–2222 (2017)CrossRefGoogle Scholar
  18. 18.
    Dumas, M., ter Hofstede, A.H.M.: UML activity diagrams as a workflow specification language. In: Gogolla, M., Kobryn, C. (eds.) UML 2001. LNCS, vol. 2185, pp. 76–90. Springer, Heidelberg (2001). CrossRefGoogle Scholar
  19. 19.
    Henriksson, A., Skeppstedt, M., Kvist, M., Duneld, M., Conway, M.: Corpus-driven terminology development: populating Swedish SNOMED CT with synonyms extracted from electronic health records. In: Proceedings of the 2013 Workshop on Biomedical Natural Language Processing (BioNLP), pp. 36–44. Association for Computational Linguistics (2013)Google Scholar
  20. 20.
    Henriksson, A., Moen, H., Skeppstedt, M., Eklund, A.M., Daudaravicius, V., Hassel, M.: Synonym extraction of medical terms from clinical text using combinations of word space models. In: Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (2012)Google Scholar
  21. 21.
    Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Ying Shen
    • 1
  • Yang Deng
    • 1
  • Kaiqi Yuan
    • 1
  • Li Liu
    • 2
  • Yong Liu
    • 3
  1. 1.School of Electronics and Computer Engineering (SECE)PKU Shenzhen Graduate SchoolShenzhenPeople’s Republic of China
  2. 2.Institute of EducationTsinghua UniversityBeijingPeople’s Republic of China
  3. 3.IER Business Development CenterShenzhenPeople’s Republic of China

Personalised recommendations