Skip to main content
Log in

Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test

  • Theory & Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agresti, A. (2015). Foundations of linear and generalized linear models. Wiley.

    Google Scholar 

  • Begg, C. B., & Berlin, J. A. (1988). Publication bias: A problem in interpreting medical data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 151(3), 419–445.

    Article  Google Scholar 

  • Benjamini, Y. (2020). Selective inference: The silent killer of replicability. Harvard Data Science Review, 2(4). https://hdsr.mitpress.mit.edu/pub/l39rpgyc.

  • Berger, R. L. (1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics, 24(4), 295–300.

    Article  Google Scholar 

  • Brodeur, A., Lé, M., Sangnier, M., & Zylberberg, Y. (2016). Star wars: The empirics strike back. American Economic Journal: Applied Economics, 8(1), 1–32.

    Google Scholar 

  • Caserotti, M., Girardi, P., Rubaltelli, E., Tasso, A., Lotto, L., & Gavaruzzi, T. (2021). Associations of COVID-19 risk perception with vaccine hesitancy over time for Italian residents. Social Science & Medicine, 272, 113688.

    Article  Google Scholar 

  • De Santis, R., Goeman, J. J., Hemerik, J., & Finos, L. (2022). Inference in generalized linear models with robustness to misspecified variances.

  • Dragicevic, P., Jansen, Y., Sarma, A., Kay, M., & Chevalier, F. (2019). Increasing the transparency of research papers with explorable multiverse analyses. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1–15).

  • Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A.-W., Cronin, E., Decullier, E., Easterbrook, P. J., Von Elm, E., Gamble, C., et al. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE, 3(8), e3081.

    Article  PubMed  PubMed Central  Google Scholar 

  • Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891–904.

    Article  Google Scholar 

  • Finner, H. (1999). Stepwise multiple test procedures and control of directional errors. The Annals of Statistics, 27(1), 274–289.

    Article  Google Scholar 

  • Finos, L., Hemerik, J., & Goeman, J. J. (2023). jointest: Multivariate testing through joint sign-flip scores. R package version 1.2.0.

  • Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.

    Google Scholar 

  • Flachaire, E. (1999). A better way to bootstrap pairs. Economics Letters, 64(3), 257–262.

    Article  Google Scholar 

  • Freedman, D. A. (1981). Bootstrapping regression models. The Annals of Statistics, 9(6), 1218–1228.

    Article  Google Scholar 

  • Frey, R., Richter, D., Schupp, J., Hertwig, R., & Mata, R. (2021). Identifying robust correlates of risk preference: A systematic approach using specification curve analysis. Journal of Personality and Social Psychology, 120(2), 538.

    Article  PubMed  Google Scholar 

  • Gelman, A., & Loken, E. (2014). The statistical crisis in science data-dependent analysis—a “garden of forking paths’’—explains why many statistically significant comparisons don’t hold up. American Scientist, 102(6), 460.

    Article  Google Scholar 

  • Genovese, C. R., & Wasserman, L. (2006). Exceedance control of the false discovery proportion. Journal of the American Statistical Association, 101(476), 1408–1417.

    Article  Google Scholar 

  • Goeman, J. J., Hemerik, J., & Solari, A. (2021). Only closed testing procedures are admissible for controlling false discovery proportions. Annals of Statistics, 49(2), 1218–1238.

    Article  Google Scholar 

  • Goeman, J. J., & Solari, A. (2011). Multiple testing for exploratory research. Statistical Science, 26(4), 584–597.

    Article  Google Scholar 

  • Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82(1), 1.

    Article  Google Scholar 

  • Harder, J. A. (2020). The multiverse of methods: Extending the multiverse analysis to address data-collection decisions. Perspectives on Psychological Science, 15(5), 1158–1177.

    Article  PubMed  Google Scholar 

  • Hemerik, J., & Goeman, J. J. (2018). Exact testing with random permutations. TEST, 27, 811–825.

    Article  PubMed  Google Scholar 

  • Hemerik, J., Goeman, J. J., & Finos, L. (2020). Robust testing in generalized linear models by sign flipping score contributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(3), 841–864.

    Article  Google Scholar 

  • Klau, S., Hoffmann, S., Patel, C. J., Ioannidis, J. P., & Boulesteix, A.-L. (2020). Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework. International Journal of Epidemiology, 50(1), 266–278.

    Article  PubMed Central  Google Scholar 

  • Liptak, T. (1958). On the combination of independent tests. Magyar Tud. Akad. Mat. Kutató Int. Közl., 3, 1971–1977.

    Google Scholar 

  • Liu, Y., Kale, A., Althoff, T., & Heer, J. (2020). Boba: Authoring and visualizing multiverse analyses. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1753–1763.

    Article  Google Scholar 

  • Marcus, R., Peritz, E., & Gabriel, K. R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63(3), 655–660.

    Article  Google Scholar 

  • Mirman, J. H., Murray, A. L., Mirman, D., & Adams, S. A. (2021). Advancing our understanding of cognitive development and motor vehicle crash risk: A multiverse representation analysis. Cortex, 138, 90–100.

    Article  PubMed  Google Scholar 

  • Modecki, K. L., Low-Choy, S., Uink, B. N., Vernon, L., Correia, H., & Andrews, K. (2020). Tuning into the real effect of smartphone use on parenting: A multiverse analysis. Journal of Child Psychology and Psychiatry, 61(8), 855–865.

    Article  PubMed  Google Scholar 

  • Nosek, B. A., & Lakens, D. (2014). A method to increase the credibility of published results. Social Psychology, 45(3), 137–141.

    Article  Google Scholar 

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science, vol. 349. American Association for the Advancement of Science.

  • Pesarin, F. (2001). Multivariate Permutation Tests: With Applications in Biostatistics. New York: Wiley.

    Google Scholar 

  • R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

  • Ramdas, A., Barber, R. F., Candès, E. J., & Tibshirani, R. J. (2023). Permutation tests using arbitrary permutation distributions. Sankhya A, 1–22.

  • Rijnhart, J. J., Twisk, J. W., Deeg, D. J., & Heymans, M. W. (2021). Assessing the robustness of mediation analysis results using multiverse analysis. Prevention Science, 1–11.

  • Shaffer, J. P. (1980). Control of directional errors with stagewise multiple test procedures. The Annals of Statistics, 8(6), 1342–1347.

    Article  Google Scholar 

  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.

    Article  PubMed  Google Scholar 

  • Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208–1214.

    Article  PubMed  Google Scholar 

  • Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712.

    Article  PubMed  Google Scholar 

  • Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance-or vice versa. Journal of the American Statistical Association, 54(285), 30–34.

    Google Scholar 

  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press.

    Book  Google Scholar 

  • Vesely, A., Goeman, J. J., & Finos, L. (2022). Resampling-based multisplit inference for high-dimensional regression. arXiv:2205.12563.

  • Vesely, A., Finos, L., & Goeman, J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 85(3), 664–683.

    Article  Google Scholar 

  • Wessel, I., Albers, C. J., Zandstra, A. R. E., & Heininga, V. E. (2020). A multiverse analysis of early attempts to replicate memory suppression with the think/no-think task. Memory, 28(7), 870–887.

    Article  PubMed  Google Scholar 

  • Westfall, P. H., & Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. New York: Wiley.

    Google Scholar 

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Girardi.

Ethics declarations

Data Availability

All R code and data associated with the real data application are available at https://osf.io/3ebw9/, while further analyses can be developed through the dedicated package Jointest (Finos et al., 2023) available at https://github.com/livioivil/jointest

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Girardi, P., Vesely, A., Lakens, D. et al. Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test. Psychometrika (2024). https://doi.org/10.1007/s11336-024-09973-6

Download citation

  • Received:

  • Published:

  • DOI: https://doi.org/10.1007/s11336-024-09973-6

Keywords

Navigation