Abstract
We study asymptotic properties of Bayesian multiple testing procedures and provide sufficient conditions for strong consistency under general dependence structure. We also consider a novel Bayesian multiple testing procedure and associated error measures that coherently accounts for the dependence structure present in the model. We advocate posterior versions of FDR and FNR as appropriate error rates and show that their asymptotic convergence rates are directly associated with the Kullback–Leibler divergence from the true model. The theories hold regardless of the class of postulated models being misspecified. We illustrate our results in a variable selection problem with autoregressive response variables and compare our procedure with some existing methods through simulation studies. Superior performance of the new procedure compared to the others indicates that proper exploitation of the dependence structure by multiple testing methods is indeed important. Moreover, we obtain encouraging results in a maize dataset, where we select influential marker variables.
Similar content being viewed by others
References
Benjamini, Y., Heller, R. (2007). False discovery rates for spatial signals. Journal of the American Statistical Association, 102(480), 1272–1281.
Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1), 289–300.
Benjamini, Y., Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29(4), 1165–1188. https://doi.org/10.1214/aos/1013699998.
Berry, D. A., Hochberg, Y. (1999). Bayesian perspectives on multiple comparisons. Journal of Statistical Planning and Inference, 82(1), 215–227.
Brown, A., Lazar, N. A., Dutta, G. S., Jang, W., McDowell, J. E. (2014). Incorporating spatial dependence into bayesian multiple testing of statistical parametric maps in functional Neuroimaging. NeuroImage, 84(1), 97–112.
Buckler, E. S., Holland, J. B., Bradbury, P. J., Acharya, C. B., Brown, P. J., Browne, C., Ersoz, E., et al. (2009). The genetic architecture of maize flowering time. Science, 325(5941), 714–718. https://doi.org/10.1126/science.1174276.
Chandra, N. K., Bhattacharya, S. (2019). Non-marginal decisions: A novel Bayesian multiple testing procedure. Electronic Journal of Statistics, 13(1), 489–535. https://doi.org/10.1214/19-EJS1535.
Chandra, N. K., Bhattacharya, S. (2020). High-dimensional asymptotic theory of Bayesian multiple testing procedures under general dependent setup and possible misspecification. arXiv preprint arXiv:2005.00066.
Chandra, N. K., Singh, R., Bhattacharya, S. (2019). A novel Bayesian multiple testing approach to deregulated miRNA discovery harnessing positional clustering. Biometrics, 75(1), 202–209. https://doi.org/10.1111/biom.12967.
Efron, B. (2007). Correlation and large-scale simultaneous significance testing. Journal of the American Statistical Association, 102(477), 93–103.
Fan, J., Han, X., Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association, 107(499), 1019–1035. https://doi.org/10.1080/01621459.2012.7204784.
Finner, H., Roters, M. (2002). Multiple hypotheses testing and expected number of type I. Errors. The Annals of Statistics, 30(1), 220–238. https://doi.org/10.1214/aos/1015362191.
Finner, H., Dickhaus, T., Roters, M. (2007). Dependency and false discovery rate: Asymptotics. The Annals of Statistics, 35(4), 1432–1455. https://doi.org/10.1214/009053607000000046.
Finner, H., Dickhaus, T., Roters, M. (2009). On the false discovery rate and an asymptotically optimal rejection curve. The Annals of Statistics, 37(2), 596–618. https://doi.org/10.1214/07-AOS569.
Geman, S., Hwang, C. R. (1982). Nonparametric maximum likelihood estimation by the method of sieves. The Annals of Statistics, 10(2), 401–414. https://doi.org/10.1214/aos/1176345782.
Ghosal, S., Ghosh, J. K., van der Vaart, A. W. (2000). Convergence rates of posterior distributions. The Annals of Statistics, 28(2), 500–531. https://doi.org/10.1214/aos/1016218228.
Ghosh, D., Chen, W., Raghunathan, T. (2006). The false discovery rate: A variable selection perspective. Journal of Statistical Planning and Inference, 136(8), 2668–2684. https://doi.org/10.1016/j.jspi.2004.10.024.
Ishwaran, H., Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. The Annals of Statistics, 33(2), 730–773. https://doi.org/10.1214/009053604000001147.
Jaccard, P. (1901). Étude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.
Jaccard, P. (1908). Nouvelles recherches sur la distribution florale. Bulletin de la Société Vaudoise des Sciences Naturelles, 44, 223–270.
Jensen, S. T., Erkan, I., Arnardottir, E. S., Small, D. S. (2009). Bayesian testing of many hypotheses \(\times\) many genes: A study of sleep apnea. The Annals of Applied Statistics, 3(3), 1080–1101.
Liu, Y., Sarkar, S. K., Zhao, Z. (2016). A new approach to multiple testing of grouped hypotheses. Journal of Statistical Planning and Inference, 179, 1–14. https://doi.org/10.1016/j.jspi.2016.07.004.
Müller, P., Parmigiani, G., Robert, C., Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. Journal of the American Statistical Association, 99(468), 990–1001.
Narisetty, N. N., He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. The Annals of Statistics, 42(2), 789–817. https://doi.org/10.1214/14-AOS1207.
Risser, M. D., Paciorek, C. J., Stone, D. A. (2019). Spatially dependent multiple testing under model misspecification, with application to detection of anthropogenic influence on extreme climate events. Journal of the American Statistical Association, 114(525), 61–78.
Sarkar, S. K., Zhou, T., Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statistica Sinica, 18(3), 925–945.
Schwartz, L. (1965). On bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 4(1), 10–26.
Schwartzman, A., Lin, X. (2011). The effect of correlation in false discovery rate estimation. Biometrika, 98(1), 199–214.
Scott, J. G. (2009). Nonparametric Bayesian multiple testing for longitudinal performance stratification. The Annals of Applied Statistics, 3(4), 1655–1674.
Scott, J. G., Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38(5), 2587–2619. https://doi.org/10.1214/10-AOS792.
Shalizi, C. R. (2009). Dynamics of Bayesian updating with dependent data and misspecified models. Electronic Journal of Statistics, 3, 1039–1074. https://doi.org/10.1214/09-EJS485.
Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. The Annals of Statistics, 31(6), 2013–2035. https://doi.org/10.1214/aos/1074290335.
Sun, W., Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102(479), 901–912.
Sun, W., Cai, T. T. (2009). Large-scale multiple testing under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 393–424.
Sun, W., Reich, B. J., Tony Cai, T., Guindani, M., Schwartzman, A. (2015). False discovery control in large-scale spatial multiple testing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(1), 59–83. https://doi.org/10.1111/rssb.12064.
Welch, B. L. (1939). On confidence limits and sufficiency, and particular reference to parameters of location. Annals of Mathematical Statistics, 10, 58–69.
Xie, J., Cai, T. T., Maris, J., Li, H. (2011). Optimal false discovery rate control for dependent data. Statistics and Its Interface, 4(4), 417.
Zhang, C., Fan, J., Yu, T. (2011). Multiple testing via FDR\(_l\) for large scale imaging data. The Annals of Statistics, 39(1), 613–642. https://doi.org/10.1214/10-AOS848.
Acknowledgements
We sincerely express our gratitude to the Editor, the Associate Editor, and the referees for their responsible handling of our paper and providing valuable comments that led to significant improvement in the presentation and readability of our paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
About this article
Cite this article
Chandra, N.K., Bhattacharya, S. Asymptotic theory of dependent Bayesian multiple testing procedures under possible model misspecification. Ann Inst Stat Math 73, 891–920 (2021). https://doi.org/10.1007/s10463-020-00770-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-020-00770-3