A Simulation Study on the Impact of Strong Dependence in High-Dimensional Multiple-Testing I: The Case without Effects

  • Antonio Carvajal-Rodríguez
  • Jacobo de Uña-Álvarez
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)

Abstract

When working with high-dimensional biological data the so-called multiple hypothesis testing problem emerges. That is, when many separate tests are performed, several will be significant by chance provoking false positive results. Many statistical methods have been developed to deal with this problem. An important topic concerning multiple hypothesis testing efforts applied to high-throughput experiments is the intrinsic inter-dependency in gene effects. Here we simulate data resembling the testing scenario used in a well-known data set from breast cancer microarray studies. The objective of the study is to see the impact of high correlation within gene blocks onto the multiple-testing correction methods as Sequential Bonferroni (SB), Benjamini and Hochberg FDR (BH) and Sequential Goodness of Fit (SGoF).

Keywords

Multiple testing microarrays false discovery rate FDR SGoF 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Efron, B.: Correlation and Large-Scale Simultaneous Significance Testing. Journal of the American Statistical Association 102, 93–103 (2007)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Farcomeni, A.: A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 17, 347–388 (2008)MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Shi, J., Levinson, D.F., Whittemore, A.S.: Significance levels for studies with correlated test statistics. Biostatistics 9, 458–466 (2008)MATHCrossRefGoogle Scholar
  4. 4.
    Storey, J.D., Taylor, J.E., Siegmund, D.: Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B-Statistical Methodology 66, 187–205 (2004)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Storey, J.D., Tibshirani, R.: Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U. S. A. 100, 9440–9445 (2003)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J., Raffeld, M., Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvberger, S., Loman, N., Johannsson, O., Olsson, H., Sauter, G.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344, 539–548 (2001)CrossRefGoogle Scholar
  7. 7.
    Storey, J.D., Day, J., Leek, J.: The optimal discovery procedure II: applications to comparative microarray experiments (2005), http://www.bepress.com/uwbiostat/paper260
  8. 8.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)MathSciNetMATHGoogle Scholar
  9. 9.
    Benjamini, Y., Hochberg, Y.: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289–300 (1995)MathSciNetMATHGoogle Scholar
  10. 10.
    Carvajal-Rodriguez, A., de Uña, A., Rolan-Alvarez, E.: A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics 10, 209 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Antonio Carvajal-Rodríguez
    • 1
  • Jacobo de Uña-Álvarez
    • 2
  1. 1.Área de Genética Facultad de BiologíaUniversidad de VigoVigoSpain
  2. 2.Departamento de Estadística e Investigación Operativa Facultad de EconómicasUniversidad de VigoVigoSpain

Personalised recommendations