Big and Complex Data Analysis pp 347-367 | Cite as
Identifying Gene–Environment Interactions Associated with Prognosis Using Penalized Quantile Regression
Abstract
In the omics era, it has been well recognized that for complex traits and outcomes, the interactions between genetic and environmental factors (i.e., the G×E interactions) have important implications beyond the main effects. Most of the existing interaction analyses have been focused on continuous and categorical traits. Prognosis is of essential importance for complex diseases. However with significantly more complexity, prognosis outcomes have been less studied. In the existing interaction analysis on prognosis outcomes, the most common practice is to fit marginal (semi)parametric models (for example, Cox) using likelihood-based estimation and then identify important interactions based on significance level. Such an approach has limitations. First data contamination is not uncommon. With likelihood-based estimation, even a single contaminated observation can result in severely biased estimation and misleading conclusions. Second, when sample size is not large, the significance-based approach may not be reliable. To overcome these limitations, in this study, we adopt the quantile-based estimation which is robust to data contamination. Two techniques are adopted to accommodate right censoring. For identifying important interactions, we adopt penalization as an alternative to significance level. An efficient computational algorithm is developed. Simulation shows that the proposed method can significantly outperform the alternative. We analyze a lung cancer prognosis study with gene expression measurements.
Keywords
Quantile Regression Inverse Probability Prognosis Outcome Data Contamination Minimax Concave PenaltyNotes
Acknowledgements
We thank the organizers and participants of “The Fourth International Workshop on the Perspectives on High-dimensional Data Analysis.” The authors were supported by the China Postdoctoral Science Foundation (2014M550799), National Science Foundation of China (11401561), National Social Science Foundation of China (13CTJ001, 13&ZD148), National Institutes of Health (CA165923, CA191383, CA016359), and U.S. VA Cooperative Studies Program of the Department of Veterans Affairs, Office of Research and Development.
References
- 1.Bang, H., Tsiatis, A.A.: Median regression with censored cost data. Biometrics 58 (3), 643–649 (2002)MathSciNetCrossRefMATHGoogle Scholar
- 2.Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 (1), 232 (2011)MathSciNetCrossRefMATHGoogle Scholar
- 3.Caspi, A., Moffitt, T.E.: Gene-environment interactions in psychiatry: joining forces with neuroscience. Nat. Rev. Neurosci. 7 (7), 583–590 (2006)CrossRefGoogle Scholar
- 4.Cordell, H.J.: Detecting gene–gene interactions that underlie human diseases. Nat. Rev. Genet. 10 (6), 392–404 (2009)CrossRefGoogle Scholar
- 5.Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96 (456), 1348–1360 (2001)MathSciNetCrossRefMATHGoogle Scholar
- 6.Hunter, D.R.: MM algorithms for generalized Bradley-Terry models. Ann. Stat. 32, 384–406 (2004)MathSciNetCrossRefMATHGoogle Scholar
- 7.Hunter, D.J.: Gene-environment interactions in human diseases. Nat. Rev. Genet. 6 (4), 287–298 (2005)CrossRefGoogle Scholar
- 8.Hunter, D.R., Lange, K.: Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 9 (1), 60–77 (2000)MathSciNetGoogle Scholar
- 9.Hunter, D.R., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33 (4), 1617 (2005)MathSciNetCrossRefMATHGoogle Scholar
- 10.Koenker, R.: Quantile Regression, vol. 38. Cambridge University Press, Cambridge (2005)CrossRefMATHGoogle Scholar
- 11.Koenker, R., Bassett Jr, G.: Regression quantiles. Econometrica: J. Econom. Soc. 33–50 (1978)Google Scholar
- 12.Liu, J., Huang, J., Xie, Y., Ma, S.: Sparse group penalized integrative analysis of multiple cancer prognosis datasets. Genet. Res. 95 (2–3), 68–77 (2013)CrossRefGoogle Scholar
- 13.Liu, J., Huang, J., Zhang, Y., Lan, Q., Rothman, N., Zheng, T., Ma, S.: Identification of gene-environment interactions in cancer studies using penalization. Genomics 102 (4), 189–194 (2013)CrossRefGoogle Scholar
- 14.Lopez, O., Patilea, V.: Nonparametric lack-of-fit tests for parametric mean-regression models with censored data. J. Multivar. Anal. 100 (1), 210–230 (2009)MathSciNetCrossRefMATHGoogle Scholar
- 15.Mazumder, R., Friedman, J.H., Hastie, T.: Sparsenet: Coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106 (495), 1125–1138 (2011)MathSciNetCrossRefMATHGoogle Scholar
- 16.North, K.E., Martin, L.J.: The importance of gene-environment interaction implications for social scientists. Sociol. Methods Res. 37 (2), 164–200 (2008)MathSciNetCrossRefGoogle Scholar
- 17.Shi, X., Liu, J., Huang, J., Zhou, Y., Xie, Y., Ma, S.: A penalized robust method for identifying gene-environment interactions. Genet. Epidemiol. 38 (3), 220–230 (2014)CrossRefGoogle Scholar
- 18.Thomas, D.: Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. Ann. Rev. Public Health 31, 21 (2010)CrossRefGoogle Scholar
- 19.Wang, H.J., Wang, L.: Locally weighted censored quantile regression. J. Am. Stat. Assoc. 104 (487), 1117–1128 (2009)MathSciNetCrossRefMATHGoogle Scholar
- 20.Wu, C., Ma, S.: A selective review of robust variable selection with applications in bioinformatics. Brief. Bioinform. 16 (5), 873–883 (2015)CrossRefGoogle Scholar
- 21.Xie, Y., Xiao, G., Coombes, K.R., Behrens, C., Solis, L.M., Raso, G., Girard, L., Erickson, H.S., Roth, J., Heymach, J.V., et al.: Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients. Clin. Cancer Res. 17 (17), 5705–5714 (2011)CrossRefGoogle Scholar
- 22.Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)MathSciNetCrossRefMATHGoogle Scholar
- 23.Zhu, R., Zhao, H., Ma, S.: Identifying gene-environment and gene–gene interactions using a progressive penalization approach. Genet. Epidemiol. 38 (4), 353–368 (2014)CrossRefGoogle Scholar