Selective inference for false discovery proportion in a hidden Markov model

Perrot-Dockès, Marie; Blanchard, Gilles; Neuvial, Pierre; Roquain, Etienne

doi:10.1007/s11749-023-00886-7

Selective inference for false discovery proportion in a hidden Markov model

Original Paper
Published: 14 September 2023

Volume 32, pages 1365–1391, (2023)
Cite this article

TEST Aims and scope Submit manuscript

Marie Perrot-Dockès ORCID: orcid.org/0000-0001-6495-1006^1,3,4,
Gilles Blanchard²,
Pierre Neuvial³ &
…
Etienne Roquain⁴

107 Accesses
Explore all metrics

Abstract

We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a hidden Markov model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (J R Stat Soc Ser B (Stat Methodol) 71:393–424, 2009). While previous work has concentrated on deriving specific procedures with a controlled false discovery rate under this model, following a recent trend in selective inference, we consider the problem of establishing confidence bounds on the false discovery proportion, for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We develop a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we propose a bootstrap-based methodology to take into account the effect of parameter estimation error. We show that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

New results for adaptive false discovery rate control with p-value weighting

Article 22 October 2022

Bayesian and frequentist evidence in one-sided hypothesis testing

Article 08 June 2021

Closure properties of classes of multiple testing procedures

Article Open access 05 May 2017

References

Abraham K, Castillo I, Gassiat E (2021a) Multiple testing in nonparametric hidden Markov models: an empirical Bayes approach. arXiv:2101.03838
Abraham K, Castillo I, Roquain E (2021b) Empirical Bayes cumulative \(\ell \)-value multiple testing procedure for sparse sequences
Albertson DG, Collins C, McCormick F, Gray JW (2003) Chromosome aberrations in solid tumors. Nat Genet 34:369–376
Article Google Scholar
Alexandrovich G, Holzmann H, Leister A (2016) Nonparametric identification and maximum likelihood estimation for hidden Markov models. Biometrika 103:423–434
Article MathSciNet Google Scholar
Azriel D, Schwartzman A (2015) The empirical distribution of a large number of correlated normal variables. J Am Stat Assoc 110:1217–1228. https://doi.org/10.1080/01621459.2014.958156
Article MathSciNet Google Scholar
Bachoc F, Blanchard G, Neuvial P (2018) On the post selection inference constant under restricted isometry properties. Electron J Stat 12:3736–3757. https://doi.org/10.1214/18-EJS1490
Article MathSciNet Google Scholar
Bachoc F, Leeb H, Pötscher BM (2019) Valid confidence intervals for post-model-selection predictors. Ann Stat 47:1475–1504. https://doi.org/10.1214/18-AOS1721
Article MathSciNet Google Scholar
Benjamini Y, Bogomolov M (2014) Selective inference on multiple families of hypotheses. J R Stat Soc Ser B (Stat Methodol) 76:297–318
Article MathSciNet Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
MathSciNet Google Scholar
Benjamini Y, Yekutieli D (2005) False discovery rate-adjusted multiple confidence intervals for selected parameters. J Am Stat Assoc 100:71–81
Article MathSciNet Google Scholar
Berk R, Brown L, Buja A, Zhang K, Zhao L (2013) Valid post-selection inference. Ann Stat 41:802–837. https://doi.org/10.1214/12-AOS1077
Article MathSciNet Google Scholar
Blanchard G, Neuvial P, Roquain E (2020) Post hoc confidence bounds on false positives using reference families. Ann Stat 48:1281–1303. https://doi.org/10.1214/19-AOS1847
Article MathSciNet Google Scholar
Cai TT, Jin J (2010) Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing. Ann Stat 38:100–145. https://doi.org/10.1214/09-AOS696
Article MathSciNet Google Scholar
Cai TT, Sun W (2009) Simultaneous testing of grouped hypotheses: finding needles in multiple haystacks. J Am Stat Assoc 104:1467–1481. https://doi.org/10.1198/jasa.2009.tm08415
Article MathSciNet Google Scholar
Cai TT, Sun W, Wang W (2019) Covariate-assisted ranking and screening for large-scale two-sample inference. J R Stat Soc Ser B (Stat Methodol) 81:187–234. https://doi.org/10.1111/rssb.12304
Article MathSciNet Google Scholar
Cappé O, Moulines E, Rydén T (2006) Inference in hidden Markov models. Springer, Berlin
Google Scholar
Castillo I, Roquain E (2020) On spike and slab empirical Bayes multiple testing. Ann Stat 48:2548–2574
Article MathSciNet Google Scholar
Dawid AP (1994) Selection paradoxes of Bayesian inference. Lect Notes Monogr Ser 24:211–220
Article MathSciNet Google Scholar
De Castro Y, Gassiat E, Le Corff S (2017) Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models. IEEE Trans Inf Theory 63:4758–4777
Article MathSciNet Google Scholar
Durand G, Blanchard G, Neuvial P, Roquain E (2020) Post hoc false positive control for structured hypotheses. Scand J Stat 47:1114–1148. https://doi.org/10.1111/sjos.12453
Article MathSciNet Google Scholar
Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 99:96–104. https://doi.org/10.1198/016214504000000089
Article MathSciNet Google Scholar
Efron B (2007) Doing thousands of hypothesis tests at the same time. Metron Int J Stat LXV:3–21
Google Scholar
Efron B (2008) Microarrays, empirical Bayes and the two-groups model. Stat Sci 23:1–22. https://doi.org/10.1214/07-STS236
Article MathSciNet Google Scholar
Efron B (2009) Empirical Bayes estimates for large-scale prediction problems. J Am Stat Assoc 104:1015–1028. https://doi.org/10.1198/jasa.2009.tm08523
Article MathSciNet Google Scholar
Efron B (2011) Tweedie’s formula and selection bias. J Am Stat Assoc 106:1602–1614
Article MathSciNet Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160
Article MathSciNet Google Scholar
Fan J, Han X (2017) Estimation of the false discovery proportion with unknown dependence. J R Stat Soc Ser B (Stat Methodol) 79:1143–1164
Article MathSciNet Google Scholar
Fan J, Ke Y, Sun Q, Zhou W-X (2019) Farmtest: factor-adjusted robust multiple testing with approximate false discovery control. J Am Stat Assoc 1–29
Franke J, Kreiss J-P, Mammen E, Neumann MH (2002) Properties of the nonparametric autoregressive bootstrap. J Time Ser Anal 23:555–585
Article MathSciNet Google Scholar
Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN (2004) Hidden Markov models approach to the analysis of array CGH data. J Multivar Anal 90:132–153
Article MathSciNet Google Scholar
Friguet C, Kloareg M, Causeur D (2009) A factor model approach to multiple testing under dependence. J Am Stat Assoc 104:1406–1415
Article MathSciNet Google Scholar
Gales M, Young S (2008) The application of hidden Markov models in speech recognition. Now Publishers Inc, Hanover
Google Scholar
Gassiat É, Cleynen A, Robin S (2016) Inference in finite state space non parametric hidden Markov models and applications. Stat Comput 26:61–71
Article MathSciNet Google Scholar
Genovese CR, Wasserman L (2006) Exceedance control of the false discovery proportion. J Am Stat Assoc 101:1408–1417
Article MathSciNet Google Scholar
Goeman JJ, Solari A (2011) Multiple testing for exploratory research. Stat Sci 26:584–597. https://doi.org/10.1214/11-STS356
Article MathSciNet Google Scholar
Hall P, DiCiccio TJ, Romano JP (1989) On smoothing and the bootstrap. Ann Stat 17:692–704
Article MathSciNet Google Scholar
Heller R, Rosset S (2021) Optimal control of false discovery criteria in the two-group model. J R Stat Soc Ser B (Stat Methodol) 83:133–155
Article MathSciNet Google Scholar
Heller R, Yekutieli D (2014) Replicability analysis for genome-wide association studies. Ann Appl Stat 8:481–498. https://doi.org/10.1214/13-AOAS697
Article MathSciNet Google Scholar
Horowitz JL (2003) Bootstrap methods for Markov processes. Econometrica 71:1049–1082
Article MathSciNet Google Scholar
Jin J, Cai TT (2007) Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J Am Stat Assoc 102:495–506. https://doi.org/10.1198/016214507000000167
Article MathSciNet Google Scholar
Katsevich E, Ramdas A (2020) Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings. Ann Stat 48:3465–3487. https://doi.org/10.1214/19-AOS1938
Article MathSciNet Google Scholar
Kim C-J, Nelson CR et al (1999) State-space models with regime switching: classical and Gibbs-sampling approaches with applications, vol 1. The MIT press, Cambridge
Google Scholar
Koski T (2001) Hidden Markov models for bioinformatics, vol 2. Springer, Berlin
Google Scholar
Lee JD, Sun DL, Sun Y, Taylor JE et al (2016) Exact post-selection inference, with application to the lasso. Ann Stat 44:907–927
Article MathSciNet Google Scholar
Leek JT, Storey JD (2008) A general framework for multiple testing dependence. Proc Natl Acad Sci 105:18718–18723
Article Google Scholar
Luo F (2019) A systematic evaluation of copy number alterations detection methods on real SNP array and deep sequencing data. BMC Bioinform 20:1–16
Article MathSciNet Google Scholar
Nguyen VH, Matias C (2014) Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation. ESAIM PS 18:584–612. https://doi.org/10.1051/ps/2013041
Article MathSciNet Google Scholar
Okamoto A, Sehouli J, Yanaihara N, Hirata Y, Braicu I, Kim B-G, Takakura S, Saito M, Yanagida S, Takenaka M et al (2015) Somatic copy number alterations associated with Japanese or endometriosis in ovarian clear cell adenocarcinoma. PLoS ONE 10:e0116977
Article Google Scholar
Panigrahi S, Taylor J, Weinstein A (2020) Integrative methods for post-selection inference under convex constraints
Pierre-Jean M, Neuvial P (2017) acnr: annotated copy-number regions R package version 1.0.0
Pierre-Jean M, Rigaill G, Neuvial P (2015) Performance evaluation of DNA copy number segmentation methods. Brief Bioinform 16:600–615
Article Google Scholar
Pierre-Jean M, Rigaill G, Neuvial P (2019) jointseg: Joint segmentation of multivariate (copy number) signals R package version 1.0.2
Rebafka T, Roquain E, Villers F (2019) Graph inference with clustering and false discovery rate control
Robin S, Bar-Hen A, Daudin J-J, Pierre L (2007) A semi-parametric approach for mixture models: application to local false discovery rate estimation. Comput Stat Data Anal 51:5483–5493
Article MathSciNet Google Scholar
Roquain E, Verzelen N (2020) False discovery rate control with unknown null distribution: is it possible to mimic the oracle?
Scheffé H (1959) The analysis of variance. Chapman & Hall Ltd, London, p 0116429
Google Scholar
Schwartzman A (2010) Comment: correlated \(z\)-values and the accuracy of large-scale statistical estimates. J Am Stat Assoc 105:1059–1063. https://doi.org/10.1198/jasa.2010.tm10237
Article Google Scholar
Senn S (2008) A note concerning a selection “paradox’’ of Dawid’s. Am Stat 62:206–210
Article MathSciNet Google Scholar
Shah SP, Cheung K-J Jr, Johnson NA, Alain G, Gascoyne RD, Horsman DE, Ng RT, Murphy KP (2009) Model-based clustering of array CGH data. Bioinformatics 25:i30–i38
Article Google Scholar
Stephens M (2017) False discovery rates: a new deal. Biostatistics 18:275–294
MathSciNet Google Scholar
Sun W, Cai TT (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102:901–912. https://doi.org/10.1198/016214507000000545
Article MathSciNet Google Scholar
Sun W, Cai TT (2009) Large-scale multiple testing under dependence. J R Stat Soc Ser B (Stat Methodol) 71:393–424
Article MathSciNet Google Scholar
Sun L, Stephens M (2018) Solving the empirical Bayes normal means problem with correlated noise
Sun Y, Zhang NR, Owen AB (2012) Multiple hypothesis testing adjusted for latent variables, with an application to the agemap gene expression data. Ann Appl Stat 6:1664–1688
Article MathSciNet Google Scholar
Tibshirani RJ, Rinaldo A, Tibshirani R, Wasserman L (2018) Uniform asymptotic inference and the bootstrap after model selection. Ann Stat 46:1255–1287
Article MathSciNet Google Scholar
Weinstein A, Ramdas A (2019) Online control of the false coverage rate and false sign rate
Yekutieli D (2012) Adjusted Bayesian inference for selected parameters. J R Stat Soc Ser B (Stat Methodol) 74:515–541
Article MathSciNet Google Scholar
Zhang NR (2010) DNA copy number profiling in normal and tumor genomes. In: Feng J, Fu W, Sun F (eds) Frontiers in computational and systems biology. Springer, Berlin, pp 259–281. https://doi.org/10.1007/978-1-84996-196-7_14
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to thank an associate editor and the two referees, whose insightful comments led to considerable improvements to this paper. This work has been supported by ANR-16-CE40-0019 (SansSouci), ANR-17-CE40-0001 (BASICS), ANR-19-CHIA-0021-01 (BiSCottE), ANR-21-CE23-0035 (ASCAI), the UPSaclay Excellency Chair REC-2019-044, the DFG CRC 1294 - 318763901 ’Data Assimilation’, and by the GDR ISIS through the “projets exploratoires” program (project TASTY).

Author information

Authors and Affiliations

CNRS MAP5 UMR 8145, Université de Paris, Paris, France
Marie Perrot-Dockès
Inria Laboratoire de mathématiques d’Orsay, CNRS, Université Paris-Saclay, 91405, Orsay, France
Gilles Blanchard
Institut de Mathématiques de Toulouse UMR 5219, CNRS UPS, Université de Toulouse, 31062, Toulouse Cedex 9, France
Marie Perrot-Dockès & Pierre Neuvial
Laboratoire de Probabilités, Statistique et Modélisation, CNRS, Sorbonne Université, Université de Paris, Paris, France
Marie Perrot-Dockès & Etienne Roquain

Authors

Marie Perrot-Dockès
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Blanchard
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Neuvial
View author publications
You can also search for this author in PubMed Google Scholar
Etienne Roquain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marie Perrot-Dockès.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 3188 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Perrot-Dockès, M., Blanchard, G., Neuvial, P. et al. Selective inference for false discovery proportion in a hidden Markov model. TEST 32, 1365–1391 (2023). https://doi.org/10.1007/s11749-023-00886-7

Download citation

Received: 07 July 2022
Accepted: 24 July 2023
Published: 14 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11749-023-00886-7

Keywords

Mathematics Subject Classification

62J15

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selective inference for false discovery proportion in a hidden Markov model

Abstract

Access this article

Similar content being viewed by others

New results for adaptive false discovery rate control with p-value weighting

Bayesian and frequentist evidence in one-sided hypothesis testing

Closure properties of classes of multiple testing procedures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 3188 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Selective inference for false discovery proportion in a hidden Markov model

Abstract

Access this article

Similar content being viewed by others

New results for adaptive false discovery rate control with p-value weighting

Bayesian and frequentist evidence in one-sided hypothesis testing

Closure properties of classes of multiple testing procedures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 3188 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation