Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine
- 248 Downloads
In traditional Chinese medicine (TCM) diagnosis, a patient may be associated with more than one syndrome tags, and its computer-aided diagnosis is a typical application in the domain of multi-label learning of high-dimensional data. It is common that a great deal of symptoms can occur in traditional Chinese medical diagnosis, which affects the modeling of diagnostic algorithm. Feature selection entails choosing the smallest feature subset of relevant symptoms, and maximizing the generalization performance of the model. At present there are rare researches on feature selection on multi-label data. A hybrid optimization technique is introduced to symptom selection for multi-label data in TCM diagnosis in this paper, and modeling is made by means of four multi-label learning algorithms like k nearest neighbors, etc. We compare the performance of the algorithm with the current popular dimension reduction algorithms like MEFS (embedded feature selection for multi-Label learning), MDDM (multi-label dimensionality reduction via dependence maximization) on the UCI Yeast gene functional data set and an inquiry diagnosis dataset of coronary heart disease (CHD). Experimental results show that the algorithm we present has significantly improved the performance. In particular, the improvement on the average precision for the classifier is up to 10.62% and 14.54%. Syndrome inquiry modeling of CHD in TCM is realized in this paper, providing effective reference for the diagnosis of CHD and analysis of other multi-label data.
Keywordsmulti-label learning feature selection high-dimensionality inquiry of traditional Chinese medicine coronary heart disease
Unable to display preview. Download preview PDF.
- 1.Tian L, Yan Y J, Zhu J G. Data mining techniques and their application in TCM study (in Chinese). Chinese J Basic Med Trad Chin Med, 2005, 11: 710–712Google Scholar
- 2.Tsousmakas G, Zhang M L, Zhou Z H. Learning from multi-label data. In: Tutorial at ECML/PKDD’09 Bled, Slovenia, 2009Google Scholar
- 3.Zhang Y, Zhou Z H. Multi-label dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data, 2010, 4(3): Article No. 14Google Scholar
- 4.Yu K, Yu S P, Tresp V. Multi-label informed latent semantic indexing. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2005. 258–265Google Scholar
- 5.Ji S W, Ye J P. Linear dimensionality reduction for multi-label classification. In: Proceedings of the 21st International Conference on Artificial Intelligence, Pasadena, CA, 2009. 1077–1082Google Scholar
- 7.Ge L, Li G Z, You M Y. Embedded feature selection for multi-label learning (in Chinese). J Nanjing Univ (Nat Sci), 2009, 45: 671–676Google Scholar
- 8.Moody J, Utans J. Principled architecture selection for neural networks: Application to corporate bond rating prediction. In: Moody J E, Hanson S J, Lippmann P R, eds. Neural Information Processing Systems 4. San Fransisco, CA: Morgan Kaufmann Publishers, Inc, 1992. 683–690Google Scholar
- 10.Li G C, Li C T, Huang LP, et al. An investigation into regularity of syndrome classification for chronic atrophic gastritis based on structural equation model (in Chinese). J Nanjing Univ Trad Chin Med, 2006, 22: 217–220Google Scholar
- 11.Wang X W, Qu H B, Wang J. A quantitative diagnostic method based on data-mining approach in TCM (in Chinese). J Beijing Univ Trad Chin Med, 2005, 28: 4–7Google Scholar
- 19.Elisseeff A, Weston J. A kernel method for multi-labelled classification. Adv Neur Inf Process Syst, 2002, 14: 681–687Google Scholar