Skip to main content

Statistical Methods in High Dimensions

  • Chapter
  • First Online:
Phenotypes and Genotypes

Part of the book series: Computational Biology ((COBO,volume 18))

  • 2243 Accesses

Abstract

This is the core chapter that introduces the theory related to the advanced statistical methods applied in the later chapters on QTL mapping and GWAS analysis. More basic statistical methods are included in the Appendix. Section 3.2 covers the use of classical procedures, like the Bonferroni correction, in multiple testing, as well as approaches based on permutation and resampling, which guarantee control of the familywise error rate (FWER). Afterwards, more modern techniques, like the Benjamini-Hochberg procedure to control the false discovery rate (FDR), are discussed and a somewhat advanced theoretical discussion on optimal multiple testing strategies in high dimensions follows. The second part of this chapter is concerned with model selection. Section 3.3 starts by introducing the basic concepts of likelihood and then recapitulates the development of Akaike’s information criterion (AIC) using information theoretic principles. This is then compared with the use of the Bayesian information criterion (BIC) in the context of Bayesian model selection. It is then pointed out why both AIC and BIC fail to work in a high-dimensional setting and different modifications of BIC designed to control either FWER or FDR are presented. The chapter ends by discussing various further approaches to model selection in high dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Stat. 34, 584–653 (2006)

    Google Scholar 

  2. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

    Google Scholar 

  3. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd International Symposium on Information Theory, 267–281 (1973)

    Google Scholar 

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995)

    Google Scholar 

  5. Benjamini, Y., Hochberg, Y.: On the adaptive control of the false discovery fate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25, 60–83 (2000)

    Google Scholar 

  6. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)

    Google Scholar 

  7. Bera, A.K., Bilias, Y.: Rao’s score, Neyman’s \(C(\alpha )\) and Silvey’s LM tests: an essay on historical developments and some new results. J. Stat. Plan. Infer. 97, 9–44 (2001)

    Google Scholar 

  8. Birgé, L., Massart, P.: Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3, 203–268 (2001)

    Google Scholar 

  9. Bogdan, M., Chakrabarti, A., Frommlet, F., Ghosh, J.K.: Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Stat. 39, 1551–1579 (2011)

    Google Scholar 

  10. Bogdan, M., Frommlet, F., Szulc, P., Tang H.: Model selection approach for genome wide association studies in admixed populations. Technical Report (2013)

    Google Scholar 

  11. Bogdan, M., Ghosh, J.K., Doerge, R.W.: Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitive trait loci. Genetics 167, 989–999 (2004)

    Google Scholar 

  12. Bogdan, M., Ghosh, J.K., Tokdar S.T.: A comparison of the Simes-Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. In: Balakrishnan, N., Peña, E., Silvapulle, M.J. (eds.) Beyond Parametrics in Interdisciplinary Research: Fetschrift in Honor of Professor Pranab K. Sen, IMS collections, vol. 1, pp. 211–230. Beachwood Ohio (2008)

    Google Scholar 

  13. Bogdan, M., van den Berg, E., Sabatti, C., Su, W., Candès, E.J.: SLOPE—Adaptive Variable Selection via Convex Optimization. Ann. Appl. Stat. 9, 1103–1140 (2015)

    Google Scholar 

  14. Bogdan, M., van den Berg, E., Su, W., Candès, E.J.: Statistical estimation and testing via the sorted \(\ell _1\) norm. arXiv:1310.1969 (2013)

  15. Bogdan, M., Żak-Szatkowska, M., Ghosh, J.K.: Selecting explanatory variables with the modified version of Bayesian Information criterion. Qual. Reliab. Eng. Int. 24, 627–641 (2008)

    Google Scholar 

  16. Boyd, S., Vandenberghe, L.: Convex Optimization. Kluwer, Cambridge University Press (2004)

    Google Scholar 

  17. Broberg, P.: A comparative review of estimates of the proportion unchanged genes and the false discovery rate. BMC Bioinform. 6, 199 (2005)

    Google Scholar 

  18. Broman, K.W., Speed, T.P.: A model selection approach for the identification of quantitative trait loci in experimental crosses. J. Roy. Stat. Soc.: Ser. B (Stat. Meth.) 64(4), 641–656 (2002)

    Google Scholar 

  19. Bühlmann, P., van de Geer, S.: Statistics for High-Dimensional Data. Springer, Heidelberg (2011)

    Google Scholar 

  20. Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference, 2nd edn. Springer, New York (2002)

    Google Scholar 

  21. Cai, T., Jin, J.: Optimal rates of convergence for estimating the null and proportion of non-null effects in large-scale multiple testing. Ann. Stat. 38, 100–145 (2010)

    Google Scholar 

  22. Candès, E.J., Plan, Y.: Near-ideal model selection by l1 minimization. Ann. Stat. 37, 2145–2177 (2007)

    Google Scholar 

  23. Chipman, H., George, E.I., McCulloch, R.E.: The practical implementation of bayesian model selection. In: Lahiri, P. (ed.) Model Selection (IMS Lecture Notes), pp. 65–116. Beachwood, OH (2001)

    Google Scholar 

  24. Chun, H., Keles, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. Roy. Stat. Soc.: Ser. B (Stat. Meth.) 72(1), 3–25 (2010)

    Google Scholar 

  25. Churchill, G.A., Doerge, R.W. Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971 (1994)

    Google Scholar 

  26. De Leeuw, J., Hornik, K., Mair, P.: Isotone optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and active set methods. Journal of statistical software 32 (5): 1–24, (2009)

    Google Scholar 

  27. Do, K., Müller, P., Tang, F.: A Bayesian mixture model for differential gene expression. Appl. Stat. 54, 627–644 (2005)

    MATH  Google Scholar 

  28. Doerge, R.W., Churchill, G.A.: Permutation tests for multiple loci affecting a quantitative character. Genetics 142, 285–294 (1996)

    Google Scholar 

  29. Donoho, D., Tanner, J.: Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Phil. Trans. R. Soc. A 367, 4273–4293 (2009)

    Google Scholar 

  30. Dudoit, S., Shaffer, J.P., Boldrick, J.C.: Multiple hypothesis testing in microarray experiments. Stat. Sci. 18, 71–103 (2003)

    Google Scholar 

  31. Dudoit, S., van der Laan, M.J.: Multiple Testing Procedures with Applications to Genomics. Springer, New York (2008)

    Google Scholar 

  32. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Google Scholar 

  33. Efron, B., Tibshirani, R., Storey, J.D., Tusher, V.: Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96, 1151–1160 (2001)

    Google Scholar 

  34. Efron, B., Tibshirani, R.: Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23, 70–86 (2002)

    Google Scholar 

  35. Efron, B.: Microarrays, empirical Bayes and the two-group model. Stat. Sci. 23(1), 1–22 (2008)

    Google Scholar 

  36. Ferreira, J.A., Zwinderman, A.H.: On the Benjamini-Hochberg method. Ann. Stat. 34(4), 1827–1849 (2006)

    Google Scholar 

  37. Foster, D.P., Stine, R.A.: Local asymptotic coding and the minimum description length. IEEE Trans. Inf. Theor. 45, 1289–1293 (1999)

    Google Scholar 

  38. Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–148 (1993)

    Google Scholar 

  39. Frommlet, F., Bogdan, M: Some optimality properties of FDR controlling rules under sparsity. Technical Report (2012)

    Google Scholar 

  40. Frommlet, F., Chakrabarti, A., Murawska, M., Bogdan, M.: Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative. arXiv:1005.4753 (2011)

  41. Genovese, C., Wasserman, L.: A stochastic process approach to false discovery control. Ann. Stat. 32, 1035–1061 (2004)

    Google Scholar 

  42. Genovese, C., Wasserman, L.: Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Stat. Soc. Ser. B 64, 499–517 (2002)

    Google Scholar 

  43. George, E.I. Foster, D.F.: Calibration and empirical Bayes variable selection. Biometrika 87, 731–747 (2000)

    Google Scholar 

  44. Ghosh, J.K., Samanta, T.: Model selection—an overview. Curr. Sci. 80, 1135–1144 (2001)

    Google Scholar 

  45. Hochberg, Y., Tamhane, A.C.: Multiple Comparison Procedures. Wiley, New York (1987)

    Google Scholar 

  46. Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–803 (1988)

    Google Scholar 

  47. Hoerl A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)

    Google Scholar 

  48. Holm, S.: A simple sequentially rejective Bonferroni test procedure. Scand. J. Stat. 6, 65–70 (1979)

    Google Scholar 

  49. Hsu, J.C.: Multiple Comparisons: Theory and Methods. Chapman and Hall, New York (1996)

    Google Scholar 

  50. James, W., Stein, C.: Estimation with quadratic loss, Proc. Fourth Berkeley Symp. Math. Stat. Prob. 1, 361–79 (1961)

    Google Scholar 

  51. Jin, J., Cai, T.C.: Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J. Am. Stat. Assoc. 102, 495–506 (2007)

    Google Scholar 

  52. Johnstone, I.M., Silverman, B.W.: EbayesThresh: R programs for empirical Bayes thresholding. J. Stat. Softw. 12(8) (2005)

    Google Scholar 

  53. Johnstone, I.M., Silverman, B.W.: Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Ann. Stat. 32, 1594–1649 (2004)

    Google Scholar 

  54. Korn, E.L., Troendleb, J.F., McShanea, L.M., Simona, R.: Controlling the number of false discoveries: application to high-dimensional genomic data. J. Stat. Plan. Infer. 124(2), 379–398 (2004)

    Google Scholar 

  55. Kullback, S.: Information Theory and Statistics. John Wiley and Sons, New York (1959)

    Google Scholar 

  56. Lehmann, E.L., Romano, J.P.: Generalizations of the familywise error rate. Ann.Stat. 33, 1138–1154 (2005)

    Google Scholar 

  57. Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer, New York (2005)

    Google Scholar 

  58. Lehmann, E.L. D’Abrera, H.J.M.: Nonparametrics: Statistical Methods Based on Ranks. McGraw-Hill, New York (1975)

    Google Scholar 

  59. Marcus, R., Peritz, E., Gabriel, K.R.: On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655–660 (1976)

    Google Scholar 

  60. Martin, R., Tokdar, S.T.: A nonparametric empirical Bayes framework for large-scale multiple testing. Biostatistics. 13, 427–439 (2012)

    Google Scholar 

  61. Müller, P., Giovanni, P., Rice, K.: FDR and Bayesian multiple comparisons rules. In: Proceedings of the Valencia/ISBA 8th World Meeting on Bayesian Statistics. Oxford University Press (2007)

    Google Scholar 

  62. Neuvial, P., Roquain, E.: On false discovery rate thresholding for classification under sparsity. Ann. Stat. 40, 2572–2600 (2012)

    Google Scholar 

  63. Neyman, J., Pearson, E.: On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Ser. A 231: 289–337 (1933)

    Google Scholar 

  64. Rao, C.R., Wu, Y.: On model selection. In: Lahiri, P. (ed.) Model selection (IMS Lecture Notes), pp. 1–57. Beachwood, OH (2001)

    Google Scholar 

  65. Schwarz, G: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Google Scholar 

  66. Scott, J.G., Berger, J.O.: An exploration of aspects of Bayesian multiple testing. J. Stat. Plan. Inf. 136, 2144–2162 (2006)

    Google Scholar 

  67. Seber, A.F., Lee, A.J.: Linear Regression Analysis. John Wiley and Sons (2003)

    Google Scholar 

  68. Seeger, P.: A note on a method for the analysis of significance en masse. Technometrics. 10, 586–593 (1968)

    Google Scholar 

  69. Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–584 (1995)

    Google Scholar 

  70. Simes, R.J.: An improved Bonferroni procedure for multiple tests of significance. Biometrika 73(3), 751–754 (1986)

    Google Scholar 

  71. Stein, C.: Inadmissibility of the usual estimator for the mean of a multivariate distribution. Proc. Third Berkeley Symp. Math. Stat. Prob. 1, 197–06 (1956)

    Google Scholar 

  72. Storey, J.D.: The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Stat. 31(6), 2013–2035 (2003)

    Google Scholar 

  73. Storey, J.D.: A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B 64, 479–498 (2002)

    Google Scholar 

  74. Sun, T., Zhang, C.-H.: Scaled sparse linear regression. Biometrika 99(4), 879–898 (2012)

    Google Scholar 

  75. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc B. 58(1), 267–288 (1996)

    Google Scholar 

  76. Tibshirani, R. Knight, K.: The covariance inflation criterion for adaptive model selection, J. Roy. Stat. Soc. B 55, 757–796 (1999)

    Google Scholar 

  77. Westfall, P.H., Young, S.S.: Resampling-Based Multiple Testing. Wiley, New York (1993)

    Google Scholar 

  78. Wettenhall, J. M., Smyth G. K.: limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics 20(18): 3705–3706 (2004)

    Google Scholar 

  79. Wold, H.: Estimation of principal components and related models by iterative least squares. In Krishnaiaah, P.R. (ed.) Multivariate Analysis, pp. 391–420. Academic Press, New York (1966)

    Google Scholar 

  80. Yuan, M., Lin, Y. Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. Ser. B 68(1), 49–67 (2007)

    Google Scholar 

  81. Żak-Szatkowska, M., Bogdan, M.: Modified versions of Bayesian information criterion for sparse generalized linear models. CSDA 55, 2908–2924 (2011)

    Google Scholar 

  82. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc B 67(2), 301–320 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Frommlet .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag London

About this chapter

Cite this chapter

Frommlet, F., Bogdan, M., Ramsey, D. (2016). Statistical Methods in High Dimensions. In: Phenotypes and Genotypes. Computational Biology, vol 18. Springer, London. https://doi.org/10.1007/978-1-4471-5310-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5310-8_3

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5309-2

  • Online ISBN: 978-1-4471-5310-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics