Something Borrowed, Something New: Precise Prediction of Outcomes from Diverse Genomic Profiles

  • J. Sunil Rao
  • Jie Fan
  • Erin Kobetz
  • Daniel Sussman


Precise outcome predictions at an individual level from diverse genomic data is a problem of great interest as the focus on precision medicine grows. This typically requires estimation of subgroup-specific models which may differ in their mean and/or variance structure. Thus in order to accurately predict outcomes for new individuals, it’s necessary to map them to a subgroup from which the prediction can be derived. The situation becomes more interesting when some predictors are common across subgroups and others are not. We describe a series of statistical methodologies under two different scenarios that can provide this mapping, as well as combine information that can be shared across subgroups, with information that is subgroup-specific. We demonstrate that prediction errors can be markedly reduced as compared to not borrowing strength at all. We then apply the approaches in order to predict colon cancer survival from DNA methylation profiles that vary by age groups, and identify those significant methylation sites that are shared across the age groups and those that are age-specific.


Funding and Acknowledgements

J.S.R. was partially funded by NIH grants R01-CA160593A1, R01-GM085205 and NSF grant DMS 1513266. E.K. was partially funded by NIH grant (put in details here). E.K. and D.S. were partially funded by Bankhead-Coley Team Science grant 2BT02 and ACS Institutional Research Grant 98-277-10. J.S.R., E.K. and D.S. were partially funded by NIH grant UL1-TR000460. The authors declare that they have no competing financial interests.


  1. 1.
    Alizadeh, A.A., M.B. Eisen, and R.E. Davis. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511.CrossRefGoogle Scholar
  2. 2.
    George, A.W., P.M. Visscher, and C.S. Haley. 2000. Mapping quantitative trait loci in complex pedigrees: a two-step variance component approach. Genetics 156: 2081–2092.Google Scholar
  3. 3.
    Gilmour, A., B. Cullis, S. Welham, B. Gogel, and R. Thompson. 2004. An efficient computing strategy for prediction in mixed linear models. Computational Statistics & Data Analysis 44 (4): 571–586.MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Henderson, C.R. 1948. Estimation of general, specific and maternal combining abilities in crosses among inbred lines of swine. Ph. D. Thesis, Iowa State University, Ames, Iowa.Google Scholar
  5. 5.
    Henderson, C.R. 1984. Application of linear models in animal breeding. Technical Report, (University of Guelph, Ontario).Google Scholar
  6. 6.
    Ishwaran, H., and J.S. Rao. 2003. Detecting differentially expressed genes in microarrays using Bayesian model selection. Journal of the American Statistical Association 98: 438–455.MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Ishwaran, H., and J.S. Rao. 2005. Spike and slab gene selection for multigroup microarray data. Journal of the American Statistical Association 100: 764–780.MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Jiang, J., T. Nguyen, and J.S. Rao. 2011. Best predictive small area estimation. Journal of the American Statistical Association 106: 732–745.MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Jiang, J., J.S., Rao, Fan, J., and Ngyuen, T. 2015. Classified mixed prediction. Technical Report, University of Miami, Division of Biostatisics.Google Scholar
  10. 10.
    Jiang, J. 2007. Linear and generalized linear mixed models and their applications. New York: Springer.MATHGoogle Scholar
  11. 11.
    Jiang, J., T. Nguyen, and J.S. Rao. 2011. Best predictive small area estimation. Journal of American Statistics Association 106: 732–745.MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Jiang, J., and W. Zhang. 2002. Distributional-free prediction intervals in mixed linear models. Statistica Sinica 12: 537–553.MathSciNetMATHGoogle Scholar
  13. 13.
    Kang, H.M., et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178: 1709–1723.CrossRefGoogle Scholar
  14. 14.
    Kang, H.M., et al. 2010. Variance component model to account for sample structure in genome-wide association studies. Nature Genetics 42: 348–354.CrossRefGoogle Scholar
  15. 15.
    Kennedy, B.W., M. Quinton, and J.A.M. van Arendonk. 1992. Estimation of effects of single genes on quantitative trait. Journal of Animal Science 70: 2000–2012.CrossRefGoogle Scholar
  16. 16.
    Khan, M.H.R., and J.E.H. Shaw. 2016. Variable selection for survival data with a class of adaptive elastic net techniques. Statistics and Computing 26: 725–741.MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Listgarten, J., C. Kadie, E.E. Schadt, and D. Heckerman. 2010. Correction for hidden confounders in the genetic analysis of gene expression. Proceedings of the National Academy of Sciences of the United States of America 107: 16465–16470.CrossRefGoogle Scholar
  18. 18.
    Rao, J.S., Kobetz, E. and Coppede, F. 2016. PRISM regression models: The anatomical and genetic to gender and age-related changes of DNA methylation in colorectal cancer (submitted).Google Scholar
  19. 19.
    Robinson, G.K. 1991. That BLUP is a good thing: The estimation of random effects (with discussion). Statistical Science 6: 15–51.MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Schnitt, S.J. 2010. Classification and prognosis of invasive breast cancer: from morphology to molecular taxonomy. Modern Pathology 23: S60–S64.CrossRefGoogle Scholar
  21. 21.
    Stute, W. 1993. Consistent estimation under random censorship when covariates are available. Journal of Multivariate Analysis 45: 89–103.MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Tibshirani, R.J. 2013. The lasso problem and uniqueness. Electronic Journal of Statistics 7: 1456–1490.MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Welham, S., B. Cullis, B. Gogel, A. Gilmour, and R. Thompson. 2004. Prediction in linear mixed models. Australian & New Zealand Journal of Statistics 46 (3): 325–347.MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    West, L., S.J. Vidwans, N.P. Campbell, J. Shrager, G.R. Simon, R. Bueno, P.A. Dennis, G.A. Otterson, and R. Salgia. 2012. A novel classification of lung cancer into molecular subtypes. PLoS ONE 7: e31906.
  25. 25.
    Yu, J., G. Pressoir, W.H. Briggs, I. Vroh Bi, M. Yamasaki, J.F. Doebley, and E.S. Buckler. 2005. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38 (2): 203–208.CrossRefGoogle Scholar
  26. 26.
    Zhang, Z., E. Ersoz, C. Lai, R.J. Todhunter, H.K. Tiwari, M.A. Gore, and E.S. Buckler. 2010. Mixed linear model approach adapted for genome-wide association studies. Nature Genetics 42 (4): 355–360.CrossRefGoogle Scholar
  27. 27.
    Zhou, X., and M. Stephens. 2012. Genome-wide efficient mixed model analysis for association studies. Nature Genetics 44 (7): 821–824.CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • J. Sunil Rao
    • 1
  • Jie Fan
    • 1
  • Erin Kobetz
    • 1
  • Daniel Sussman
    • 1
  1. 1.Division of Biostatistics, Department of Public Health Sciences, Miller School of MedicineUniversity of MiamiMiamiUSA

Personalised recommendations