Chinese materia medica resource images screening method study
Chinese materia medica resource survey provides an important basis for the development of traditional Chinese Medicine (TCM) industry. During the Chinese materia medica resource survey process, millions of materia medica plant images are collected. The collected image dataset includes some images that are unqualified for image analysis, i.e. they can’t be used to build medicinal plant classifier model. It is a burdensome work to identify the unqualified Chinese materia medica resource images manually. How to screen the unqualified images automatically is an important task of Chinese materia medica resource survey. Image recognition techniques developed quickly in recent years. Outlier detection is a kind of unsupervised method to find the unqualified images automatically. Lots of research work has been done on the topic. Extracted features and correlation metric play important roles on the outlier image detection result. For improving the image screening performance, a novel outlier detection method is proposed in this paper. Convolutional neural network (CNN) is used to extract the complicated features of Chinese materia medica resource images. Extended entropy is introduced into the calculation of information loss that is used to measure the distance between images. Based on the extracted image features and correlation metric, a novel outlier detection method based on clustering is proposed here. The efficiency of the screening method is illustrated with a practical example.
Keywordsoutlier detection Chinese materia medica resource feature extraction deep learning convolutional neural network information loss
This work is partially supported by the Shandong science and technology development plan (Grant No. 2016GGC01061, 2016GGX101029), Natural Science Foundation of Shandong Province (Grant No.ZR2015JL023 and Grant No.ZR2015FL025).
- 3.Boriah S, Chandola V, Kumar V (2008) Similarity Measures for Categorical Data: A Comparative Evaluation. Proceedings of the 8th SIAM International Conference on Data Mining 243–254Google Scholar
- 7.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25:1097–1105Google Scholar
- 13.Niu ZX, Shi SP, Sun JY, He X (2011) A Survey of Outlier Detection Methodologies and Their Applications. International Conference on Artificial Intelligence and Computational Intelligence 380–387Google Scholar
- 15.Szegedy C, Liu W, Jia YQ, Sermanet P et al (2014) Going Deeper with Convolutions. CoRR arXiv:1409.4842Google Scholar
- 17.Tishby N, Fernando C, Bialek W (1999) The information bottleneck method. The 37th Annual Allerton Conference on Communication, Control and Computing 1–11Google Scholar
- 20.Zhou CE, Lin DY, Yang XM, Lai XM (2008) Database of Traditional Chinese Medicinal herbs: A bridge between TCM and modern science. IEEE International Symposium on IT in Medicine and Education 773–776Google Scholar
- 21.Zhu L, Qiu YY, Yu S, Yuan S (2017) A fast KNN-based MST outlier detection method. Chin J Comput 40(139):1–16Google Scholar