Molecular classification using robust biochemical measurements provides a level of diagnostic precision that is unattainable using indirect phenotypic measurements. Multidimensional measurements of proteins, genes, or metabolites (analytes) can identify subtle differences in the pathophysiology of patients with asthma in a way that is not otherwise possible using physiological or clinical assessments. We overview a method for relating biochemical analyte measurements to generate predictive models of discrete (categorical) clinical outcomes, a process referred to as “supervised classification.” We consider problems inherent in wide (small n and large p) high-dimensional data, including the curse of dimensionality, collinearity and lack of information content. We suggest methods for reducing the data to the most informative features. We describe different approaches for phenotypic modeling, using logistic regression, classification and regression trees, random forest and nonparametric regression spline modeling. We provide guidance on post hoc model evaluation and methods to evaluate model performance using ROC curves and generalized additive models. The application of validated predictive models for outcome prediction will significantly impact the clinical management of asthma.
This is a preview of subscription content, log in to check access.
ad-hoc writing committee of the Assembly on Allergy, Immunology and Inflammation (2000) In: Proceedings of the ATS Workshop on Refractory Asthma. Current understanding, recommendations, and unanswered questions. Am J Respir Crit Care Med 162:2341–51Google Scholar
Bhavnani S, Victor S, Calhoun WJ et al (2011) How cytokines co-occur across asthma patients: from Bipartite Network Analysis to a molecular-based classification. J Biomed Inform 44:24–30CrossRefGoogle Scholar
Brasier AR, Victor S, Boetticher G et al (2008) Molecular phenotyping of severe asthma using pattern recognition of bronchoalveolar lavage-derived cytokines. J Allergy Clin Immunol 121:30–37PubMedCrossRefGoogle Scholar
Brasier AR, Victor S, Ju H et al (2010) Predicting intermediate phenotypes in asthma using bronchoalveolar lavage-derived cytokines. Clin Transl Sci 13:147–57CrossRefGoogle Scholar
Brasier AR, Garcia J, Wiktorowicz JE et al (2011) A candidate biomarker panel for predicting dengue hemorrhagic fever using discovery proteomics and nonparametric modeling. Clin Transl Sci 5:8–20CrossRefGoogle Scholar
Breiman L (2001a) Random forests. Machine Learning 45:525–31Google Scholar
Breiman L (2001b) Random forests, random features. University of California, Berkeley, CAGoogle Scholar
Efron B, Tibshirani R, Storey JD et al (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–60CrossRefGoogle Scholar