Estimating the number of true null hypotheses from a histogram of p values

  • Dan Nettleton
  • J. T. Gene Hwang
  • Rico A. Caldo
  • Roger P. Wise
Article

Abstract

In an earlier article, an intuitively appealing method for estimating the number of true null hypotheses in a multiple test situation was proposed. That article presented an iterative algorithm that relies on a histogram of observed p values to obtain the estimator. We characterize the limit of that iterative algorithm and show that the estimator can be computed directly without iteration. We compare the performance of the histogram-based estimator with other procedures for estimating the number of true null hypotheses from a collection of observed p values and find that the histogram-based estimator performs well in settings similar to those encountered in microarray data analysis. We demonstrate the approach using p values from a large microarray experiment aimed at uncovering molecular mechanisms of barley resistance to a fungal pathogen.

Key Words

False discovery rate Microarray data Multiple testing 

References

  1. Allison, D. B., Gadbury, G. L., Heo, M., Fernández, J. R., Lee, C.-K., Prolla, T. A., and Weindruch, R. (2002). “A Mixture Model Approach for the Analysis of Microarray Gene Expression Data,” Computational Statistics and Data Analysis, 39, 1–20.MATHCrossRefMathSciNetGoogle Scholar
  2. Benjamini, Y., and Hochberg, Y. (1995), “Controlling False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistical Society, Series B, 57, 289–300.MATHMathSciNetGoogle Scholar
  3. — (2000), “On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistics,” Journal of Educational and Behavioral Statistics, 25, 60–83.Google Scholar
  4. Brem, R. B., Yvert, G., Clinton, R., and Kruglyak, L. (2002), “Genetic Dissection of Transcriptional Regulation in Budding Yeast,” Science, 296, 752–755.CrossRefGoogle Scholar
  5. Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M. T., Wiltshire, T., Su, A. I., Vellenga, E., Wang, J., Manly, K. F., Lu, L., Chesler, E. J., Alberts, R., Jansen, R. C., Williams, R. W., Cooke, M. P. and de Haan, G. (2005). “Uncovering Regulatory Pathways that Affect Hematopoietic Stem Cell Function Using ‘Genetical Genomics’,” Nature Genetics, 37, 225–232.CrossRefGoogle Scholar
  6. Caldo, R. A., Nettleton, D., and Wise, R. P. (2004), “Interaction-Dependent Gene Expression in Mla-Specified Response to Barley Powdery Mildew,” The Plant Cell, 16, 2514–2528.CrossRefGoogle Scholar
  7. Chesler, E. J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H. C., Mountz, J. D., Baldwin, N. E., Langston, M. A., Threadgill, D. W., Manly, K. F. and Williams, R. W. (2005), “Complex Trait Analysis of Gene Expression Uncovers Polygenic and Pleiotropic Networks that Modulate Nervous System Function,” Nature Genetics, 37, 233–242.CrossRefGoogle Scholar
  8. Close, T. J., Wanamaker, S., Caldo, R., Turner, S. M., Ashlock, D. A., Dickerson, J. A., Wing, R. A., Muehlbauer, G. J., Kleinhofs, A. and Wise, R. P. (2004), “A New Resource for Cereal Genomics: 22K Barley GeneChip Comes of Age,” Plant Physiology, 134, 960–968.CrossRefGoogle Scholar
  9. DeCook, R., Lall, S., Nettleton, D., and Howell, S. H. (2006), “Genetic Regulation of Gene Expression During Shoot Development in Arabidopsis,” Genetics, 172, 1155–1164.CrossRefGoogle Scholar
  10. Fernando, R. L., Nettleton, D., Southey, B. R., Dekkers, J. C. M., Rothschild, M. F., and Soller, M. (2004), “Controlling the Proportion of False Positives (PFP) in Multiple Dependent Tests,” Genetics, 166, 611–619.CrossRefGoogle Scholar
  11. Genovese, C. R., and Wasserman, L. (2004), “A Stochastic Process Approach to False Discovery Control,” The Annals of Statistics, 32, 1035–1061.MATHCrossRefMathSciNetGoogle Scholar
  12. Hochberg, Y., and Benjamini, Y. (1990), “More Powerful Procedures for Multiple Significance Testing,” Statistics and Medicine, 9, 811–818.CrossRefGoogle Scholar
  13. Hsueh, H., Chen, J. J., and Kodell, R. L. (2003), “Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing,” Journal of Biopharmaceutical Statistics, 13, 675–689.MATHCrossRefGoogle Scholar
  14. Hubner, N., Wallace, C. A., Zimdahl, H., Petretto, E., Schulz, H., Maciver, F., Mueller, M., Hummel, O., Monti, J., Zidek, V., Musilova, A., Kren, V., Causton, H., Game, L., Born, G., Schmidt, S., Müller, A., Cook, S., Kurtz, T. W., Whittaker, J., Pravenec, M., and Aitman, T. J. (2005), “Integrated Transcriptional Profiling and Linkage Analysis for Identification of Genes Underlying Disease,” Nature Genetics, 37, 243–253.CrossRefGoogle Scholar
  15. Jansen, R. C., and Nap, J. P. (2001), “Genetical Genomics: The Added Value from Segregation,” Trends in Genetics, 17, 388–391.CrossRefGoogle Scholar
  16. Langaas, M., Ferkingstad, E., and Lindqvist, B. H. (2005), “Estimating the Proportion of True Null Hypotheses, with Application to DNA Microarray Data,” Journal of the Royal Statistics Society, Series B, 67, 555–572.MATHCrossRefMathSciNetGoogle Scholar
  17. Lipshutz, R. J., Fodor, S. P., Gingeras, T. R. and Lockhart, D. J. (1999), “High Density Synthetic Oligonucleotide Arrays,” Nature Genetics, 21 Supplement, 20–24.CrossRefGoogle Scholar
  18. Mosig, M. O., Lipkin, E., Galina, K., Tchourzyna, E., Soller, M., and Friedmann, A. (2001), “A Whole Genome Scan for Quantitative Trait Loci Affecting Milk Protein Percentage in Israeli-Holstein Cattle, by Means of Selective Milk DNA Pooling in a Daughter Design, Using an Adjusted False Discovery Rate Criterion,” Genetics, 157, 1683–1698.Google Scholar
  19. Nguyen, D. V. (2004), “On Estimating the Proportion of True Null Hypotheses for False Discovery Rate Controlling Procedures in Exploratory DNA Microarray Studies,” Computational Statistics and Data Analysis, 47, 611–637.MATHCrossRefMathSciNetGoogle Scholar
  20. Pomp, D., Allan, M. F., and Wesolowski, S. R. (2004), “Quantitative Genomics: Exploring the Genetic Architecture of Complex Trait Predisposition,” Journal of Animal Science, 82, E300–312.Google Scholar
  21. Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V. Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G., Linsley, P.S., Mao, M., Stoughton, R.B., and Friend, S.H. (2003a), “Genetics of Gene Expression Surveyed In Maize, Mouse And Man,” Nature, 422, 297–302.CrossRefGoogle Scholar
  22. Schadt, E. E., Monks, S. A., and Friend, S. H. (2003b), “A New Paradigm for Drug Discovery: Integrating Clinical, Genetic, Genomic and Molecular Phenotype Data to Identify Drug Targets,” Biochemical Society Transactions, 31, 437–443.CrossRefGoogle Scholar
  23. Schweder, T., and Spjøtvoll, E. (1982), “Plots of P-values to Evaluate Many Tests Simultaneously,” Biometrika, 69, 493–502.Google Scholar
  24. Simes, R. J. (1986), “An Improved Bonferroni Procedure for Multiple Tests of Significance,” Biometrika, 73, 751–754.MATHCrossRefMathSciNetGoogle Scholar
  25. Storey, J. D. (2002a), “A Direct Approach to False Discovery Rates,” Journal of the Royal Statistical Society, Series B, 64, 479–498.MATHCrossRefMathSciNetGoogle Scholar
  26. Storey, J. D. (2002b), “False Discovery Rates: Theory and Applicatons to DNA Microarrays,” unpublished Ph.D. thesis, Department of Statistics, Stanford University.Google Scholar
  27. — (2003), “The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value,” The Annals of Statistics, 31, 2013–2035.MATHCrossRefMathSciNetGoogle Scholar
  28. Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong Control, Conservative Point Estimation, and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach,” Journal of the Royal Statistical Society, Series B, 66, 187–205.MATHCrossRefMathSciNetGoogle Scholar
  29. Storey, J. D., and Tibshirani, R. (2003), “Statistical Significance for Genomewide Studies,” in Proceedings of the National Academy of Sciences, 100, pp. 9440–9445.MATHCrossRefMathSciNetGoogle Scholar
  30. Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., Mackelprang, R., and Kruglyak, L. (2003), “Trans-acting Regulatory Variation in Saccharomyces cerevisiae and the Role of Transcription Factors,” Nature Genetics, 35, 57–64.CrossRefGoogle Scholar

Copyright information

© International Biometric Society 2006

Authors and Affiliations

  • Dan Nettleton
    • 1
  • J. T. Gene Hwang
    • 2
  • Rico A. Caldo
    • 3
  • Roger P. Wise
    • 3
    • 4
  1. 1.Department of StatisticsIowa State UniversityAmes
  2. 2.Departments of Mathematics and StatisticsCornell UniversityIthaca
  3. 3.Department of Plant Pathology and Center for Plant Responses to Environmental StressesIowa State UniversityAmes
  4. 4.USDA-ARS-Corn Insects and Crop Genetics Research UnitIowa State UniversityAmes

Personalised recommendations