Science China Information Sciences

, Volume 56, Issue 5, pp 1–13 | Cite as

Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine

  • Huan Shao
  • GuoZheng Li
  • GuoPing Liu
  • YiQin Wang
Research Paper


In traditional Chinese medicine (TCM) diagnosis, a patient may be associated with more than one syndrome tags, and its computer-aided diagnosis is a typical application in the domain of multi-label learning of high-dimensional data. It is common that a great deal of symptoms can occur in traditional Chinese medical diagnosis, which affects the modeling of diagnostic algorithm. Feature selection entails choosing the smallest feature subset of relevant symptoms, and maximizing the generalization performance of the model. At present there are rare researches on feature selection on multi-label data. A hybrid optimization technique is introduced to symptom selection for multi-label data in TCM diagnosis in this paper, and modeling is made by means of four multi-label learning algorithms like k nearest neighbors, etc. We compare the performance of the algorithm with the current popular dimension reduction algorithms like MEFS (embedded feature selection for multi-Label learning), MDDM (multi-label dimensionality reduction via dependence maximization) on the UCI Yeast gene functional data set and an inquiry diagnosis dataset of coronary heart disease (CHD). Experimental results show that the algorithm we present has significantly improved the performance. In particular, the improvement on the average precision for the classifier is up to 10.62% and 14.54%. Syndrome inquiry modeling of CHD in TCM is realized in this paper, providing effective reference for the diagnosis of CHD and analysis of other multi-label data.


multi-label learning feature selection high-dimensionality inquiry of traditional Chinese medicine coronary heart disease 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tian L, Yan Y J, Zhu J G. Data mining techniques and their application in TCM study (in Chinese). Chinese J Basic Med Trad Chin Med, 2005, 11: 710–712Google Scholar
  2. 2.
    Tsousmakas G, Zhang M L, Zhou Z H. Learning from multi-label data. In: Tutorial at ECML/PKDD’09 Bled, Slovenia, 2009Google Scholar
  3. 3.
    Zhang Y, Zhou Z H. Multi-label dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data, 2010, 4(3): Article No. 14Google Scholar
  4. 4.
    Yu K, Yu S P, Tresp V. Multi-label informed latent semantic indexing. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2005. 258–265Google Scholar
  5. 5.
    Ji S W, Ye J P. Linear dimensionality reduction for multi-label classification. In: Proceedings of the 21st International Conference on Artificial Intelligence, Pasadena, CA, 2009. 1077–1082Google Scholar
  6. 6.
    Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res, 2003, 3: 1157–1182zbMATHGoogle Scholar
  7. 7.
    Ge L, Li G Z, You M Y. Embedded feature selection for multi-label learning (in Chinese). J Nanjing Univ (Nat Sci), 2009, 45: 671–676Google Scholar
  8. 8.
    Moody J, Utans J. Principled architecture selection for neural networks: Application to corporate bond rating prediction. In: Moody J E, Hanson S J, Lippmann P R, eds. Neural Information Processing Systems 4. San Fransisco, CA: Morgan Kaufmann Publishers, Inc, 1992. 683–690Google Scholar
  9. 9.
    Zhang M L, Pena J M, Robles V, et al. Feature selection for multi-label naive Bayes classification. Inf Sci, 2009, 179: 3218–3229zbMATHCrossRefGoogle Scholar
  10. 10.
    Li G C, Li C T, Huang LP, et al. An investigation into regularity of syndrome classification for chronic atrophic gastritis based on structural equation model (in Chinese). J Nanjing Univ Trad Chin Med, 2006, 22: 217–220Google Scholar
  11. 11.
    Wang X W, Qu H B, Wang J. A quantitative diagnostic method based on data-mining approach in TCM (in Chinese). J Beijing Univ Trad Chin Med, 2005, 28: 4–7Google Scholar
  12. 12.
    Liu G P, Li G Z, Wang Y L, et al. Modeling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning. BMC Complem Altern Med, 2010, 10: 37CrossRefGoogle Scholar
  13. 13.
    Gheyas I A, Smith L S. Feature subset selection in large dimensionality domains. Patt Recogn, 2010, 43: 5–13zbMATHCrossRefGoogle Scholar
  14. 14.
    Blickle T, Thiele L. A comparison of selection schemes used in evolutionary algorithms. Evolut Comput, 1996, 4: 361–394CrossRefGoogle Scholar
  15. 15.
    Motoki T. Calculating the expected loss of diversity of selection schemes. Evolut Comput, 2002, 10: 397–422CrossRefGoogle Scholar
  16. 16.
    Sokolov A, Whitley D. Unbiased tournament selection. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation. Washington DC: ACM, 2005. 1131–1138CrossRefGoogle Scholar
  17. 17.
    Zhang M L, Zhou Z H. ML-KNN: A lazy learning approach to multi-label learning. Patt Recog, 2007, 40: 2038–2048zbMATHCrossRefGoogle Scholar
  18. 18.
    Zhang M L, Zhou Z H. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng, 2006, 18: 1338–1351CrossRefGoogle Scholar
  19. 19.
    Elisseeff A, Weston J. A kernel method for multi-labelled classification. Adv Neur Inf Process Syst, 2002, 14: 681–687Google Scholar
  20. 20.
    Ronen M, Jacob Z. Using simulated annealing to optimize feature selection problem in marketing applications. Europ J Oper Res, 2006, 171: 842–858zbMATHCrossRefGoogle Scholar
  21. 21.
    Yang J, Honavar V. Feature subset selection using a genetic algorithm. IEEE Intell Syst Appl, 1998, 13: 44–49CrossRefGoogle Scholar
  22. 22.
    Pudil P, Novovicov J, Kittler J, et al. Floating search methods in feature selection. Patt Recog Lett, 1994, 15: 1119–1125CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Huan Shao
    • 1
  • GuoZheng Li
    • 2
  • GuoPing Liu
    • 3
  • YiQin Wang
    • 3
  1. 1.School of Computer Engineering and ScienceShanghai UniversityShanghaiChina
  2. 2.Department of Control Science and Engineering, Key Laboratory of Ministry of Education for Service Computing and Embedded SystemsTongji UniversityShanghaiChina
  3. 3.Laboratory of Information Access and Synthesis of TCM Four DiagnosisShanghai University of Traditional Chinese MedicineShanghaiChina

Personalised recommendations