Abstract
In an earlier article, an intuitively appealing method for estimating the number of true null hypotheses in a multiple test situation was proposed. That article presented an iterative algorithm that relies on a histogram of observed p values to obtain the estimator. We characterize the limit of that iterative algorithm and show that the estimator can be computed directly without iteration. We compare the performance of the histogram-based estimator with other procedures for estimating the number of true null hypotheses from a collection of observed p values and find that the histogram-based estimator performs well in settings similar to those encountered in microarray data analysis. We demonstrate the approach using p values from a large microarray experiment aimed at uncovering molecular mechanisms of barley resistance to a fungal pathogen.
Similar content being viewed by others
References
Allison, D. B., Gadbury, G. L., Heo, M., Fernández, J. R., Lee, C.-K., Prolla, T. A., and Weindruch, R. (2002). “A Mixture Model Approach for the Analysis of Microarray Gene Expression Data,” Computational Statistics and Data Analysis, 39, 1–20.
Benjamini, Y., and Hochberg, Y. (1995), “Controlling False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal Statistical Society, Series B, 57, 289–300.
— (2000), “On the Adaptive Control of the False Discovery Rate in Multiple Testing with Independent Statistics,” Journal of Educational and Behavioral Statistics, 25, 60–83.
Brem, R. B., Yvert, G., Clinton, R., and Kruglyak, L. (2002), “Genetic Dissection of Transcriptional Regulation in Budding Yeast,” Science, 296, 752–755.
Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M. T., Wiltshire, T., Su, A. I., Vellenga, E., Wang, J., Manly, K. F., Lu, L., Chesler, E. J., Alberts, R., Jansen, R. C., Williams, R. W., Cooke, M. P. and de Haan, G. (2005). “Uncovering Regulatory Pathways that Affect Hematopoietic Stem Cell Function Using ‘Genetical Genomics’,” Nature Genetics, 37, 225–232.
Caldo, R. A., Nettleton, D., and Wise, R. P. (2004), “Interaction-Dependent Gene Expression in Mla-Specified Response to Barley Powdery Mildew,” The Plant Cell, 16, 2514–2528.
Chesler, E. J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H. C., Mountz, J. D., Baldwin, N. E., Langston, M. A., Threadgill, D. W., Manly, K. F. and Williams, R. W. (2005), “Complex Trait Analysis of Gene Expression Uncovers Polygenic and Pleiotropic Networks that Modulate Nervous System Function,” Nature Genetics, 37, 233–242.
Close, T. J., Wanamaker, S., Caldo, R., Turner, S. M., Ashlock, D. A., Dickerson, J. A., Wing, R. A., Muehlbauer, G. J., Kleinhofs, A. and Wise, R. P. (2004), “A New Resource for Cereal Genomics: 22K Barley GeneChip Comes of Age,” Plant Physiology, 134, 960–968.
DeCook, R., Lall, S., Nettleton, D., and Howell, S. H. (2006), “Genetic Regulation of Gene Expression During Shoot Development in Arabidopsis,” Genetics, 172, 1155–1164.
Fernando, R. L., Nettleton, D., Southey, B. R., Dekkers, J. C. M., Rothschild, M. F., and Soller, M. (2004), “Controlling the Proportion of False Positives (PFP) in Multiple Dependent Tests,” Genetics, 166, 611–619.
Genovese, C. R., and Wasserman, L. (2004), “A Stochastic Process Approach to False Discovery Control,” The Annals of Statistics, 32, 1035–1061.
Hochberg, Y., and Benjamini, Y. (1990), “More Powerful Procedures for Multiple Significance Testing,” Statistics and Medicine, 9, 811–818.
Hsueh, H., Chen, J. J., and Kodell, R. L. (2003), “Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing,” Journal of Biopharmaceutical Statistics, 13, 675–689.
Hubner, N., Wallace, C. A., Zimdahl, H., Petretto, E., Schulz, H., Maciver, F., Mueller, M., Hummel, O., Monti, J., Zidek, V., Musilova, A., Kren, V., Causton, H., Game, L., Born, G., Schmidt, S., Müller, A., Cook, S., Kurtz, T. W., Whittaker, J., Pravenec, M., and Aitman, T. J. (2005), “Integrated Transcriptional Profiling and Linkage Analysis for Identification of Genes Underlying Disease,” Nature Genetics, 37, 243–253.
Jansen, R. C., and Nap, J. P. (2001), “Genetical Genomics: The Added Value from Segregation,” Trends in Genetics, 17, 388–391.
Langaas, M., Ferkingstad, E., and Lindqvist, B. H. (2005), “Estimating the Proportion of True Null Hypotheses, with Application to DNA Microarray Data,” Journal of the Royal Statistics Society, Series B, 67, 555–572.
Lipshutz, R. J., Fodor, S. P., Gingeras, T. R. and Lockhart, D. J. (1999), “High Density Synthetic Oligonucleotide Arrays,” Nature Genetics, 21 Supplement, 20–24.
Mosig, M. O., Lipkin, E., Galina, K., Tchourzyna, E., Soller, M., and Friedmann, A. (2001), “A Whole Genome Scan for Quantitative Trait Loci Affecting Milk Protein Percentage in Israeli-Holstein Cattle, by Means of Selective Milk DNA Pooling in a Daughter Design, Using an Adjusted False Discovery Rate Criterion,” Genetics, 157, 1683–1698.
Nguyen, D. V. (2004), “On Estimating the Proportion of True Null Hypotheses for False Discovery Rate Controlling Procedures in Exploratory DNA Microarray Studies,” Computational Statistics and Data Analysis, 47, 611–637.
Pomp, D., Allan, M. F., and Wesolowski, S. R. (2004), “Quantitative Genomics: Exploring the Genetic Architecture of Complex Trait Predisposition,” Journal of Animal Science, 82, E300–312.
Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V. Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G., Linsley, P.S., Mao, M., Stoughton, R.B., and Friend, S.H. (2003a), “Genetics of Gene Expression Surveyed In Maize, Mouse And Man,” Nature, 422, 297–302.
Schadt, E. E., Monks, S. A., and Friend, S. H. (2003b), “A New Paradigm for Drug Discovery: Integrating Clinical, Genetic, Genomic and Molecular Phenotype Data to Identify Drug Targets,” Biochemical Society Transactions, 31, 437–443.
Schweder, T., and Spjøtvoll, E. (1982), “Plots of P-values to Evaluate Many Tests Simultaneously,” Biometrika, 69, 493–502.
Simes, R. J. (1986), “An Improved Bonferroni Procedure for Multiple Tests of Significance,” Biometrika, 73, 751–754.
Storey, J. D. (2002a), “A Direct Approach to False Discovery Rates,” Journal of the Royal Statistical Society, Series B, 64, 479–498.
Storey, J. D. (2002b), “False Discovery Rates: Theory and Applicatons to DNA Microarrays,” unpublished Ph.D. thesis, Department of Statistics, Stanford University.
— (2003), “The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value,” The Annals of Statistics, 31, 2013–2035.
Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong Control, Conservative Point Estimation, and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach,” Journal of the Royal Statistical Society, Series B, 66, 187–205.
Storey, J. D., and Tibshirani, R. (2003), “Statistical Significance for Genomewide Studies,” in Proceedings of the National Academy of Sciences, 100, pp. 9440–9445.
Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., Mackelprang, R., and Kruglyak, L. (2003), “Trans-acting Regulatory Variation in Saccharomyces cerevisiae and the Role of Transcription Factors,” Nature Genetics, 35, 57–64.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nettleton, D., Hwang, J.T.G., Caldo, R.A. et al. Estimating the number of true null hypotheses from a histogram of p values. JABES 11, 337–356 (2006). https://doi.org/10.1198/108571106X129135
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1198/108571106X129135