Analysis and Predictive Modeling of Asthma Phenotypes

  • Allan R. BrasierEmail author
  • Hyunsu Ju
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 795)


Molecular classification using robust biochemical measurements provides a level of diagnostic precision that is unattainable using indirect phenotypic measurements. Multidimensional measurements of proteins, genes, or metabolites (analytes) can identify subtle differences in the pathophysiology of patients with asthma in a way that is not otherwise possible using physiological or clinical assessments. We overview a method for relating biochemical analyte measurements to generate predictive models of discrete (categorical) clinical outcomes, a process referred to as “supervised classification.” We consider problems inherent in wide (small n and large p) high-dimensional data, including the curse of dimensionality, collinearity and lack of information content. We suggest methods for reducing the data to the most informative features. We describe different approaches for phenotypic modeling, using logistic regression, classification and regression trees, random forest and nonparametric regression spline modeling. We provide guidance on post hoc model evaluation and methods to evaluate model performance using ROC curves and generalized additive models. The application of validated predictive models for outcome prediction will significantly impact the clinical management of asthma.


Multivariate analysis Supervised learning False discovery rate Feature reduction Significance of microarrays (SAM) Receiver operating characteristic (ROC) curve Logistic regression Random forest Multivariate adaptive regression splines (MARS) Generalized additive models (GAMs) 


  1. ad-hoc writing committee of the Assembly on Allergy, Immunology and Inflammation (2000) In: Proceedings of the ATS Workshop on Refractory Asthma. Current understanding, recommendations, and unanswered questions. Am J Respir Crit Care Med 162:2341–51Google Scholar
  2. Bhavnani S, Victor S, Calhoun WJ et al (2011) How cytokines co-occur across asthma patients: from Bipartite Network Analysis to a molecular-based classification. J Biomed Inform 44:24–30CrossRefGoogle Scholar
  3. Brasier AR, Victor S, Boetticher G et al (2008) Molecular phenotyping of severe asthma using pattern recognition of bronchoalveolar lavage-derived cytokines. J Allergy Clin Immunol 121:30–37PubMedCrossRefGoogle Scholar
  4. Brasier AR, Victor S, Ju H et al (2010) Predicting intermediate phenotypes in asthma using bronchoalveolar lavage-derived cytokines. Clin Transl Sci 13:147–57CrossRefGoogle Scholar
  5. Brasier AR, Garcia J, Wiktorowicz JE et al (2011) A candidate biomarker panel for predicting dengue hemorrhagic fever using discovery proteomics and nonparametric modeling. Clin Transl Sci 5:8–20CrossRefGoogle Scholar
  6. Breiman L (2001a) Random forests. Machine Learning 45:525–31Google Scholar
  7. Breiman L (2001b) Random forests, random features. University of California, Berkeley, CAGoogle Scholar
  8. Efron B, Tibshirani R, Storey JD et al (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–60CrossRefGoogle Scholar
  9. Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67CrossRefGoogle Scholar
  10. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic curve. Radiology 143:29–36PubMedGoogle Scholar
  11. Hastie T, Tibshirani R (1995) Generalized additive models for medical research. Stat Methods Med Res 4:187–96PubMedCrossRefGoogle Scholar
  12. Hastie T, Tibshirani R, Friedman A (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkCrossRefGoogle Scholar
  13. Spratt H, Ju H, Brasier AR (2012) A structured approach to predictive modeling of a two-class problem using multidimensional data sets. Methods 61:73–85CrossRefGoogle Scholar
  14. Steinberg D, Colla P (1997) CART classification and regression trees. Salford Systems, San Diego, CAGoogle Scholar
  15. Stone M (1977) An asymptotic equivalence of choice of model by cross-validation and Akaikes’ criterion. J R Stat Soc, Series B (Methodological) 39:44–47Google Scholar
  16. Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–95PubMedCrossRefGoogle Scholar
  17. Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–5PubMedCrossRefGoogle Scholar
  18. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–21PubMedCrossRefGoogle Scholar
  19. Wenzel SE, Busse WW (2007) Severe asthma: lessons from the Severe Asthma Research Program. J Allergy Clin Immunol 119:14–21PubMedCrossRefGoogle Scholar
  20. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–29CrossRefGoogle Scholar
  21. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–20CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Texas Medical BranchGalvestonUSA
  2. 2.University of Texas Medical BranchGalvestonUSA

Personalised recommendations