Machine Learning Framework for Classification in Medicine and Biology
Systems modeling and quantitative analysis of large amounts of complex clinical and biological data may help to identify discriminatory patterns that can uncover health risks, detect early disease formation, monitor treatment and prognosis, and predict treatment outcome. In this talk, we describe a machine-learning framework for classification in medicine and biology. It consists of a pattern recognition module, a feature selection module, and a classification modeler and solver. The pattern recognition module involves automatic image analysis, genomic pattern recognition, and spectrum pattern extractions. The feature selection module consists of a combinatorial selection algorithm where discriminatory patterns are extracted from among a large set of pattern attributes. These modules are wrapped around the classification modeler and solver into a machine learning framework. The classification modeler and solver consist of novel optimization-based predictive models that maximize the correct classification while constraining the inter-group misclassifications. The classification/predictive models 1) have the ability to classify any number of distinct groups; 2) allow incorporation of heterogeneous, and continuous/time-dependent types of attributes as input; 3) utilize a high-dimensional data transformation that minimizes noise and errors in biological and clinical data; 4) incorporate a reserved-judgement region that provides a safeguard against over-training; and 5) have successive multi-stage classification capability. Successful applications of our model to developing rules for gene silencing in cancer cells, predicting the immunity of vaccines, identifying the cognitive status of individuals, and predicting metabolite concentrations in humans will be discussed. We acknowledge our clinical/biological collaborators: Dr. Vertino (Winship Cancer Institute, Emory), Drs. Pulendran and Ahmed (Emory Vaccine Center), Dr. Levey (Neurodegenerative Disease and Alzheimer’s Disease), and Dr. Jones (Clinical Biomarkers, Emory).
KeywordsSupport Vector Machine Discriminant Analysis Yellow Fever Vaccine Discriminatory Pattern Machine Learn Framework
Unable to display preview. Download preview PDF.
- 1.Brooks, J.P., Lee, E.K.: Solving a Mixed-Integer Programming Formulation of a Multi-Category Constrained Discrimination Model. In: INFORMS Proceedings of Artificial Intelligence and Data Mining, pp. 1–6 (2006)Google Scholar
- 2.Brooks, J.P., Lee, E.K.: Analysis of the Consistency of a Mixed Integer Programming-based Multi-Category Constrained Discriminant Model. Annals of Operations Research on Data Mining (Early version appeared online) (in press, 2008)Google Scholar
- 5.Gallagher, R.J., Lee, E.K., Patterson, D.: An Optimization Model for Constrained Discriminant Analysis and Numerical Experiments with Iris, Thyroid, and Heart Disease Datasets. In: Cimino, J.J. (ed.) Proceedings of the 1996 American Medical Informatics Association, pp. 209–213 (1996)Google Scholar
- 10.Lee, E.K., Fung, A.Y.C., Brooks, J.P., Zaider, M.: Automated Tumor Volume Contouring in Soft-Tissue Sarcoma Adjuvant Brachytherapy Treatment. International Journal of Radiation Oncology, Biology and Physics 47(11), 1891–1910 (2002)Google Scholar
- 12.Lee, E.K., Galis, Z.S.: Fingerprinting Native and Angiogenic Microvascular Networks through Pattern Recognition and Discriminant Analysis of Functional Perfusion Data (submitted, 2008)Google Scholar
- 13.Lee, E.K., Ashfaq, S., Jones, D.P., Rhodes, S.D., Weintrau, W.S., Hopper, C.H., Vaccarino, V., Harrison, D.G., Quyyumi, A.A.: Prediction of early atherosclerosis in healthy adults via novel markers of oxidative stress and d-ROMs. Working paper (2009)Google Scholar
- 14.Lee, E.K., Wu, T.L.: Classification and disease prediction via mathematical programming. In: Seref, O., Kundakcioglu, O.E., Pardalos, P. (eds.) Data Mining, Systems Analysis, and Optimization in Biomedicine, AIP Conference Proceedings, vol. 953, pp. 1–42 (2007)Google Scholar