Approach for Semi-automatic Construction of Anti-infective Drug Ontology Based on Entity Linking
Ontology can be used for the interpretation of natural language. To construct an anti-infective drug ontology, one needs to design and deploy a methodological step to carry out the entity discovery and linking. Medical synonym resources have been an important part of medical natural language processing (NLP). However, there are problems such as low precision and low recall rate. In this study, an NLP approach is adopted to generate candidate entities. Open ontology is analyzed to extract semantic relations. Six-word vector features and word-level features are selected to perform the entity linking. The extraction results of synonyms with a single feature and different combinations of features are studied. Experiments show that our selected features have achieved a precision rate of 86.77%, a recall rate of 89.03% and an F1 score of 87.89%. This paper finally presents the structure of the proposed ontology and its relevant statistical data.
KeywordsData mining Ontology construction Entity discovery Entity linking
This work was financially supported by the National Natural Science Foundation of China (No. 61602013), and the Shenzhen Key Fundamental Research Projects (Grant No. JCYJ20160330095313861, and JCYJ20151030154330711).
- 1.Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, no. 6, pp. 775–780 (2006)Google Scholar
- 8.Chang, K.W., Samdani, R., Rozovskaya, A., Rizzolo, N., Sammons, M., Roth, D.: Inference protocols for coreference resolution. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pp. 40–44 (2011)Google Scholar
- 9.Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8 (2001)Google Scholar
- 11.Franc, V., Sonnenburg, S., Werner, T.: Cutting plane methods in machine learning. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 185–218. MIT Press, Cambridge (2011)Google Scholar
- 12.Kang, N., Barendse, R.J., Afzal, Z., Singh, B., Schuemie, M.J., van Mulligen, E.M., Kors, J.A.: Erasmus MC approaches to the i2b2 challenge. In: Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, Boston, MA, USA. i2b2 (2010)Google Scholar
- 13.DeStefano, R.J., Tao, L., Gai, K.: Improving data governance in large organizations through ontology and linked data. In: 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 279–284 (2016)Google Scholar
- 14.Gai, K., Qiu, M., Chen, L.C., Liu, M.: Electronic health record error prevention approach using ontology in big data. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 752–757 (2015)Google Scholar
- 15.Jayaraman, S., Tao, L., Gai, K., Jiang, N.: Drug side effects data representation and full spectrum inferencing using knowledge graphs in intelligent telehealth. In: 2016 IEEE 3rd International Conference Cyber Security and Cloud Computing (CSCloud), pp. 289–294 (2016)Google Scholar
- 19.Henriksson, A., Skeppstedt, M., Kvist, M., Duneld, M., Conway, M.: Corpus-driven terminology development: populating Swedish SNOMED CT with synonyms extracted from electronic health records. In: Proceedings of the 2013 Workshop on Biomedical Natural Language Processing (BioNLP), pp. 36–44. Association for Computational Linguistics (2013)Google Scholar
- 20.Henriksson, A., Moen, H., Skeppstedt, M., Eklund, A.M., Daudaravicius, V., Hassel, M.: Synonym extraction of medical terms from clinical text using combinations of word space models. In: Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine (2012)Google Scholar