Skip to main content
Log in

Robust ridge regression for estimating the effects of correlated gene expressions on phenotypic traits

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Statistical packages such as edgeR and DESeq are intended to detect genes that are relevant to phenotypic traits and diseases. A few studies have also modeled the relationships between gene expressions and traits. In the presence of multicollinearity and outliers, which are unavoidable in genetic data, the robust ridge regression estimator can be applied with the trait value as the response variable and the gene expressions as explanatory variables. In some simulation scenarios, the robust ridge estimator is resistant to outliers and less susceptible to multicollinearity than the ordinary least-squares (OLS) estimator. This study investigated the reliability of the robust ridge estimator, in a scenario where the explanatory variables have tail-dependence and negative binomial distributions, by comparing its performance to that of OLS using vine copula to model the tail-dependence among gene expressions. The robust ridge estimator and OLS were both applied to an ecological dataset. First, statistical analysis was used to compare RNA sequencing data between two treatments; then, 15 differentially expressed genes were selected. Next, the regression parameter estimates of robust ridge and OLS for the effects of the 15 contigs (explanatory variables) on trait values (response variables) were compared. Robust ridge regression was found to detect fewer positive and negative slopes than OLS regression. These results indicate that robust ridge regression can be successfully applied for RNA sequencing analysis to estimate the effect of trait-associated genes using real data, and holds great promise as a tool for modeling the association between RNA expression and phenotypic traits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aas K, Czado C, Frigessi A, Bakken H (2009) Pair-copula constructions of multiple dependence. Insurance Math Econ 44:182–198

    Article  Google Scholar 

  • Ali S, Khan H, Shah L, Butt MM, Suhail M (2019) A comparison of some new and old robust ridge regression estimators. Comm Stat Simul Comput. https://doi.org/10.1080/03610918.2019.1597119

    Article  Google Scholar 

  • Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106

    Article  CAS  Google Scholar 

  • Bedford T, Cooke RM (2002) Vines—a new graphical model for dependent random variables. Ann Stat 30:1031–1068

    Article  Google Scholar 

  • Chang B, Joe H (2019) Prediction based on conditional distributions of vine copulas. Comput Stat Data Anal 139:45–63

    Article  Google Scholar 

  • Chou JW, Zhou T, Kaufmann WK, Paules RS, Bushel PR (2007) Extracting gene expression patterns and identifying co-expressed genes from microarray data reveals biologically responsive processes. BMC Bioinformatics 8:427

    Article  Google Scholar 

  • Farcomeni A (2008) A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat Methods Med Res 17(4):347–388

    Article  Google Scholar 

  • Forsberg LA, Absher D, Dumanski JP (2013) Non-heritable genetics of human disease: spotlight on post-zygotic genetic variation acquired during lifetime. J Med Genet 50:1–10

    Article  CAS  Google Scholar 

  • Grogan LF, Cashins SD, Skerratt LF, Berger L, McFadden MS, Harlow P, Hunter DA, Scheele BC, Mulvenna J (2018) Evolution of resistance to chytridiomycosis is associated with a robust early immune response. Mol Ecol 27:919–934

    Article  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  Google Scholar 

  • Hoerl AE, Kannard RW, Baldwin KF (1975) Ridge regression: some simulations. Commun. Stat 4:105–123

    Google Scholar 

  • Huber PJ (1981) Robust statistics. Wiley, Hoboken

    Book  Google Scholar 

  • Ishwaran H, Rao JS (2014) Geometry and properties of generalized ridge regression in high dimensions. Contemp Math. 622:81–93

    Article  Google Scholar 

  • Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, London

    Book  Google Scholar 

  • Joehanes R, Zhang X, Huan T, Yao C, Ying SX, Nguyen QT, Demirkale CY, Feolo ML, Sharopova NR, Sturcke A, Schäffer AA, Heard-Costa N, Chen H, Liu P, Wang R, Woodhouse KA, Tanriverdi K, Freedman JE, Raghavachari N, Dupuis J, Johnson AD, O’Donnell CJ, Levy D, Munson PJ (2017) Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol 18:16

    Article  Google Scholar 

  • Maronna RA (2011) Robust ridge regression for high-dimensional data. Technometrics 53(1):44–53

    Article  Google Scholar 

  • Michimae H, Yoshida A, Emura T, Matsunami M, Nishimura K (2018) Reconsidering the estimation of costs of phenotypic plasticity using the robust ridge estimator. Ecol Inform 44:7–20

    Article  Google Scholar 

  • Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis, 5th edn. Wiley, Hoboken

    Google Scholar 

  • Nagler T, Bumann C, Czado C (2019) Model selection in sparse high-dimensional vine copula models with application to portfolio risk. J Multivar Anal 172:180–192

    Article  Google Scholar 

  • Nelsen R (2006) An introduction to copulas. Springer, Berlin

    Google Scholar 

  • Norouzirad M, Arashi M (2017) Preliminary test and Stein-type shrinkage ridge estimators in robust regression. Stat Pap. https://doi.org/10.1007/s00362-017-0899-3

    Article  Google Scholar 

  • Robinson MD, Smyth GK (2008) Small sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9:321–332

    Article  Google Scholar 

  • Schafer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol. https://doi.org/10.2202/1544-6115.1175

    Article  PubMed  Google Scholar 

  • Schepsmeier U, Stoeber J (2014) Derivatives and Fisher information of bivariate copulas. Stat Papers 55:525–542

    Article  Google Scholar 

  • Seo M, Kim K, Yoon J, Jeong JY, Lee HJ, Cho S, Kim H (2016) RNA-seq analysis for detecting quantitative trait-associated genes. Sci Rep 6:24375

    Article  CAS  Google Scholar 

  • Silvapulle MJ (1991) Robust ridge regression based on an M-estimator. Aust N Z J Stat 33:319–333

    Article  Google Scholar 

  • Sklar A (1959) Fonctions de R´epartion `a n Dimensions et Leur Marges. Publications de l’Institut de Statistique de l’Universit´e de Paris 8:229–231

  • Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Series B 64:479–498

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B (Stat Methodol) 58:267–288

    Google Scholar 

  • Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31:46–53

    Article  CAS  Google Scholar 

  • Wong KY, Chiu SN (2015) An iterative approach to minimize the mean squared error in ridge regression. Comput Statistics 30(2):625–639

    Article  Google Scholar 

  • Yang SP, Emura T (2017) A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing. Commun Stat-Simul 46(8):6083–6105

    Article  Google Scholar 

  • Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, Robinson GJ, Lundberg AE, Bartlett PF, Wray NR, Zhao QY (2014) A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS ONE 9:e103207

    Article  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B Stat Methodol 67:301–320

    Article  Google Scholar 

Download references

Acknowledgements

The authors sincerely thank the two anonymous referees for their invaluable suggestions that helped to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hirofumi Michimae.

Additional information

Handling Editor: Bryan F. J. Manly.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Michimae, H., Matsunami, M. & Emura, T. Robust ridge regression for estimating the effects of correlated gene expressions on phenotypic traits. Environ Ecol Stat 27, 41–72 (2020). https://doi.org/10.1007/s10651-019-00434-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-019-00434-3

Keywords

Navigation