Abstract
There are many challenges in data analytic research for TCM (Traditional Chinese Medicine), like various clinical record sources, different symptom descriptions, lots of collected clinical symptoms, more than one syndrome attached to one clinical record and etc. Novel methods on feature selection, multi-class, and multi-label techniques in machine learning field are proposed to meet the challenges. Here in this chapter, we will introduce our works on discriminative symptoms selection and multi-syndrome learning, which have improved the performance of state-of-arts works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
S. Bernardini, S. Bertolini, A. Pastore, C. Cortese, C. Motti, R. Massoud, G. Federici, Homocysteine levels are highly predictive of CHD complications in subjects with familial hypercholesterolemia. Clin. Chem. Lab. Med. 255 (1999)
T. Blickle, L. Thiele, A comparison of selection schemes used in evolutionary algorithms. Evol. Comput. 4(4), 361–394 (1996)
C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
T.T. Deng, Diagnostics of TCM (Shanghai Scientific and Technology Press, Shanghai, 1984)
T.T. Deng, Practical TCM Diagnostics (People’s Medical Publishing House, Beijing, 2004)
K. Duan, S.S. Keerthi, Which is the best multi-class SVM method? An empirical study, in Proceedings of the Sixth International Workshop on Multiple Classifier Systems (2005), pp. 278–285
A. Elisseeff, J. Weston, A kernel method for multi-labelled classification. Adv. Neural Info. Process. Syst. 14, 681–687 (2002)
I.A. Gheyas, L.S. Smith, Feature subset selection in large dimensionality domains. Pattern Recognit. 43(1), 5–13 (2010)
L. Guo-Ping, L. Guo-Zheng, W. Ya-Lei, W. Yi-Qin, Modelling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning, BMC Complementary and Alternative Medicine, 10, 37 (2010)
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
T. Hastie, R. Tibshirani, Classification by pairwise coupling. Ann. Stat. 26(2), 451–471 (1998)
H. He, E.A. Garcia, Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
I.S. Helland, PLS regression and statistical models. Scand. J. Stat. 17, 97–114 (1990)
X.H. Hu, D. Wu, Data mining and predictive modeling of biomolecular network from biomedical literature databases. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 251–263 (2007)
S.W. Ji, J.P. Ye, Linear dimensionality reduction for multi-label classification, in Proceedings of the 21st International Conference on Artificial Intelligence, Pasadena (2009), pp. 1077–1082
I.T. Jolliffe, Principal Component Analysis (Springer, New York, 1986)
D. Kerstin, N. Wolfgang, How valuable is medical social media data? Content analysis of the medical web. Inform. Sci. 179, 1870–1880 (2009)
G. Lei, L. Guo-Zheng, Y. Ming-Yu, Embedded feature selection for multi-label learning. J. Nanjing Univ. (Nat. Sci.) 45(5), 671–676 (2009) (in Chinese)
G.-Z. Li, H.-L. Bu, M.Q. Yang, X.-Q. Zeng, J.Y. Yang, Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis. BMC Genomics 9(S2), S24 (2008)
H.T. Lin, C.J. Lin, R.C. Weng, A note on Platt’s probabilistic outputs for support vector machines. Mach. Learn. 68(3), 267–276 (2007)
G.P. Liu, G.Z. Li, Y.L. Wang, Y.Q. Wang, Modeling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning. BMC Complement. Altern. Med. 10, 4–37 (2010)
X.M. Lu, Z.L. Xiong, J.J. Li, S.N. Zheng, T.G. Huo, F.M. Li, Metabonomic study on ‘Kidney-Yang Deficiency syndrome’ and intervention effects of Rhizoma Drynariae extracts in rats using ultra performance liquid chromatography coupled with mass spectrometry. Talanta 15, 700–708 (2011)
J. Moody, J. Utans, Principled architecture selection for neural networks: application to corporate bond rating prediction, in Neural Information Processing Systems 4, ed. by J.E. Moody, S.J. Hanson, R.P. Lippmann (Morgan Kauffmann, San Mateo CA, USA, 1992), pp. 683–690
T. Motoki, Calculating the expected loss of diversity of selection schemes. Evol. Comput. 10(4), 397–422 (2002)
H.C. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 8, 1226–1238 (2005)
J.C. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. in Advances in large margin classifiers (MIT Press, Cambridge, MA, USA 1999), pp. 61–74
J.C. Platt, N. Cristianini, J. Shawe-Taylor, Large margin DAGs for multi-class classification, in Proceedings of Neural Information Processing Systems, NIPS'99 (Denver, CO, USA, 2000), pp. 547–553
P. Pudil, J. Novovicov, J. Kittler et al., Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)
H.N. Qu, G.Z. Li, W.S. Xu, An asymmetric classifier based on partial least squares. Pattern Recognit. 43(10), 3448–3457 (2010). Elsevier
J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo, 1993)
M. Ronen, Z. Jacob, Using simulated annealing to optimize feature selection problem in marketing applications. Eur. J. Oper. Res. 171, 842–858 (2006)
A. Ross, A. Jain, Information fusion in biometrics. Pattern Recognit. Lett. 24, 2115–2125 (2003)
A. Ross, R. Govindarajan, Feature level fusion using hand and face biometrics, in Proceedings of SPIE Conference on Biometric Technology for Human Identification II, (Orlando, USA, 2005), pp. 196–204
R.E. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)
H. Shao, G.Z. Li, G.P. Liu, Y. Wang, Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Sci. China Info. Sci. 56, 052118(13) (2011) (DOI: 10.1007/s11432-011-4406-5)
A. Sokolov, D. Whitley, Unbiased tournament election, in Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (ACM, Washington, DC, 2005), pp. 1131–1138
G. Tsoumakas, I. Katakis, I. Vlahavas, Mining multi-label data, in Data Mining and Knowledge Discovery Handbook, ed. by O. Maimon, L. Rokach (Springer, Boston, 2009), pp. 667–685
Y.Q. Wang, Diagnostics of TCM (Chinese Medicine Science and Technology Press, Beijing, 2004)
Y. Wang, Progress and prospect of objectivity study on four diagnostic methods in traditional Chinese medicine, in Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on (Hongkong, China, 2010)
J. Wang, Q.Y. He, K.W. Yao, W. Rong, Y.W. Xing, Z. Yue, Support vector machine (SVM) and traditional Chinese medicine: syndrome factors based an SVM from coronary heart disease treated by prominent traditional Chinese medicine doctors, in Fifth International Conference on Natural Computation: 14–16 August 2009; Tianjian, ed. by H.Y. Wang, K.S. Low, K.X. Wei, J.Q. Sun (IEEE Computer Society, Los Alamitos, 2009a), pp. 176–180
Y.Q. Wang, Z.X. Xu, F.F. Li, H.X. Yan, Research ideas and methods about objectification of the four diagnostic methods of traditional Chinese medicine. Acta Universitatis Traditionis Medicalis Sinensis Pharmacologiaeque Shanghai 23, 4–8 (2009b)
H. Wold, Path models with latent variables: the NIPALS approach, in Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building (Academic, New York, 1975), pp. 307–357
J. Yang, V. Honavar, Feature subset selection using a genetic algorithm. IEEE Intell. Syst. Appl. 13, 44–49 (1998)
M.Y. You, G.Z. Li, X.Q. Zeng, L. Ge, L. Bi, S. Huang, J.Y. Yang, M.Q. Yang, A personalized traditional Chinese medicine system in the case of Cai’s gynecology. Int. J. Funct. Inform. Personal. Med. 1(4), 419–438 (2008). Inderscience
K. Yu, S.P. Yu, V. Tresp, Multi-label informed latent semantic indexing, in Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 2005, pp. 258–265
Z.K. Yuan, X.P. Huang, F.Y. Fan, Analysis of the tongue micro-indexes of qi-blood patterns of heart disorders. J. Tradit. Chin. Med. Univ. Hunan (1998-04)
X.-Q. Zeng, G.-Z. Li, G.-F. Wu, J.Y. Yang, M.Q. Yang, Irrelevant gene elimination for partial least squares based dimension reduction by using feature probes. Int. J. Data Min. Bioinform. 3(1), 85–103 (2009). Inderscience
M. Zhang, MLA 2010. Grigorios Tsoumakas, Ioannis Katakis. Multi-Label Classification: An Overview. International Journal of Data Warehousing & Mining, 3(3), 1–13, July–September 2007
M.L. Zhang, Z.H. Zhou, Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
M.L. Zhang, Z.H. Zhou, ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
M.L. Zhang, J.M. Pena, V. Robles et al., Feature selection for multi-label naive Bayes classification. Inform. Sci. 179(19), 3218–3229 (2009)
Y. Zhang, Z.-H. Zhou. Multi-label dimensionality reduction via dependency maximization. ACM Transactions on Knowledge Discovery from Data (ACM TKDD), 4(3), Article 14 (2010)
Z.H. Zhou, X.Y. Liu, Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
X. Zhou, S. Chen, B. Liu, R. Zhang, Y. Wang, P. Li, Y. Guo, Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif. Intell. Med. 48, 139–152 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
You, M., Li, GZ. (2014). Medical Diagnosis by Using Machine Learning Techniques. In: Poon, J., K. Poon, S. (eds) Data Analytics for Traditional Chinese Medicine Research. Springer, Cham. https://doi.org/10.1007/978-3-319-03801-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-03801-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03800-1
Online ISBN: 978-3-319-03801-8
eBook Packages: Computer ScienceComputer Science (R0)