Skip to main content
Log in

Gene–environment interaction analysis under the Cox model

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

For the survival of cancer and many other complex diseases, gene–environment (G-E) interactions have been established as having essential importance. G-E interaction analysis can be roughly classified as marginal and joint, depending on the number of G variables analyzed at a time. In this study, we focus on joint analysis, which can better reflect disease biology and is statistically more challenging. Many approaches have been developed for joint G-E interaction analysis for survival outcomes and led to important findings. However, without rigorous statistical development, quite a few methods have a weak theoretical ground. To fill this knowledge gap, in this article, we consider joint G-E interaction analysis under the Cox model. Sparse group penalization is adopted for regularizing estimation and selecting important main effects and interactions. The “main effects, interactions” variable selection hierarchy, which has been strongly advocated in recent literature, is satisfied. Significantly advancing from some published studies, we rigorously establish the consistency properties under high dimensionality. An effective computational algorithm is developed, simulation demonstrates competitive performance of the proposed approach, and analysis of The Cancer Genome Atlas (TCGA) data on stomach adenocarcinoma (STAD) further demonstrates its practical utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Andersen, P. K., Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. Annals of Statistics, 10(4), 1100–1120.

    Article  MathSciNet  MATH  Google Scholar 

  • Bien, J., Taylor, J. E., Tibshirani, R. (2013). A lasso for hierarchical interactions. Annals of Statistics, 41(3), 1111–1141.

    Article  MathSciNet  MATH  Google Scholar 

  • Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®in Machine Learning, 3(1), 1–122.

  • Bradic, J., Fan, J., Jiang, J. (2011). Regularization for cox’s proportional hazards model with np-dimensionality. Annals of Statistics, 39(6), 3092–3120.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, J., Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.

    Article  MathSciNet  MATH  Google Scholar 

  • Choi, N. H., Li, W., Zhu, J. (2010). Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105(489), 354–364.

    Article  MathSciNet  MATH  Google Scholar 

  • Eriksson, F., Martinussen, T., Nielsen, S. (2019). Large sample results for frequentist multiple imputation for cox regression with missing covariate data. Annals of the Institute of Statistical Mathematics, 72, 969–996.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Feng, S., Zhang, M., Tong, T. (2021). Variable selection for functional linear models with strong heredity constraint. Annals of the Institute of Statistical Mathematics, 74, 321–339.

    Article  MathSciNet  MATH  Google Scholar 

  • Fleming, T. R., Harrington, D. P. (2011). Counting processes and survival analysis. Hoboken, NJ, United States: Wiley.

    MATH  Google Scholar 

  • Fujimori, K. (2022). The variable selection by the dantzig selector for cox’s proportional hazards model. Annals of the Institute of Statistical Mathematics, 74(3), 515–537.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, J., Ma, S., Xie, H., Zhang, C. (2009). A group bridge approach for variable selection. Biometrika, 96(2), 339–355.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, J., Sun, T., Ying, Z., Yu, Y., Zhang, C. (2013). Oracle inequalities for the lasso in the cox model. Annals of Statistics, 41(3), 1142–1165.

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter, D. J. (2005). Gene-environment interactions in human diseases. Nature Reviews Genetics, 6(4), 287–298.

    Article  Google Scholar 

  • Liu, X., Zhong, P.-S., Cui, Y. (2020). Joint test of parametric and nonparametric effects in partial linear models for gene-environment interaction. Statistica Sinica, 30(1), 325–346.

    MathSciNet  MATH  Google Scholar 

  • Luo, S., Xu, J., Chen, Z. (2015). Extended bayesian information criterion in the cox model with a high-dimensional feature space. Annals of the Institute of Statistical Mathematics, 67(2), 287–311.

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, S., Huang, J. (2015). A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association, 112(517), 410–423.

    Article  MathSciNet  Google Scholar 

  • McAllister, K. A., Mechanic, L. E., Amos, C. I., Aschard, H., Blair, I. A., Chatterjee, N., Conti, D. V., Gauderman, W. J., Hsu, L., Hutter, C., Jankowska, M. M., Kerr, J., Kraft, P., Montgomery, S. B., Mukherjee, B., Papanicolaou, G. J., Patel, C. J., Ritchie, M. D., Ritz, B. R., Witte, J. S. (2017). Current challenges and new opportunities for gene-environment interaction studies of complex diseases. American Journal of Epidemiology, 186(7), 753–761.

    Article  Google Scholar 

  • Nocedal, J., Wright, S. (2006). Numerical optimization. Berlin/Heidelberg, Germany: Springer.

    MATH  Google Scholar 

  • Simon, N., Friedman, J. H., Hastie, T., Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22, 231–245.

    Article  MathSciNet  Google Scholar 

  • Smilde, A. K., Kiers, H. A. L., Bijlsma, S., Rubingh, C. M., van Erk, M. J. (2009). Matrix correlations for high-dimensional data: The modified rv-coefficient. Bioinformatics, 25(3), 401–405.

    Article  Google Scholar 

  • Stute, W., Wang, J. (1993). The strong law under random censorship. Annals of Statistics, 21(3), 1591–1607.

    Article  MathSciNet  MATH  Google Scholar 

  • Tang, X., Xue, F., Qu, A. (2021). Individualized multidirectional variable selection. Journal of the American Statistical Association, 116(535), 1280–1296.

    Article  MathSciNet  MATH  Google Scholar 

  • Thomas, D. C. (2010). Gene-environment-wide association studies: Emerging approaches. Nature Reviews Genetics, 11(4), 259–272.

    Article  Google Scholar 

  • Uno, H., Cai, T., Pencina, M., D’Agostino, R., Wei, L. (2011). On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine, 30(10), 1105–1117.

    Article  MathSciNet  Google Scholar 

  • Winham, S. J., Biernacka, J. M. (2013). Gene-environment interactions in genome-wide association studies: Current approaches and new directions. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 54(10), 1120–1134.

    Article  Google Scholar 

  • Wu, C., Jiang, Y., Ren, J., Cui, Y., Ma, S. (2018). Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Statistics in Medicine, 37(3), 437–456.

    Article  MathSciNet  Google Scholar 

  • Wu, M., Zhang, Q., Ma, S. (2020). Structured gene-environment interaction analysis. Biometrics, 76(1), 23–35.

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, Y., Wu, M., Zhang, Q., Ma, S. (2019). Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics, 111(5), 1115–1123.

    Article  Google Scholar 

  • Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(3), 894–942.

    MathSciNet  MATH  Google Scholar 

  • Zhang, X., Liu, J., Zhu, Z. (2022) Learning coefficient heterogeneity over networks: A distributed spanning-tree-based fused-lasso regression. Journal of the American Statistical Association, 0(0), 1–13.

  • Zhao, P., Yu, B. (2006). On model selection consistency of lasso. The Journal of Machine Learning Research, 7, 2541–2563.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor, Associate Editor, and two referees for their insightful comments which have led to a significant improvement of this article. This study is partly supported by National Bureau of Statistics of China (2022LZ34), National Natural Science Foundation of China (11971404, 72071169, 71988101, 82204153), National Social Science Foundation of China (21 &ZD146), and NIH (CA204120, CA121974, and CA196530).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingzhao Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 303 KB)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, K., Li, J., Xu, Y. et al. Gene–environment interaction analysis under the Cox model. Ann Inst Stat Math 75, 931–948 (2023). https://doi.org/10.1007/s10463-023-00871-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-023-00871-9

Keywords

Navigation