Skip to main content
Log in

Type S error rates for classical and Bayesian single and multiple comparison procedures

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

In classical statistics, the significance of comparisons (e.g., θ1− θ2) is calibrated using the Type 1 error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. We set up a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence,” “θ2 > θ1 with confidence,” or “no claim with confidence.” We focus on the Type S (for sign) error, which occurs when you claim “θ1 > θ2 with confidence” when θ2> θ1 (or vice-versa). We compute the Type S error rates for classical and Bayesian confidence statements and find that classical Type S error rates can be extremely high (up to 50%). Bayesian confidence statements are conservative, in the sense that claims based on 95% posterior intervals have Type S error rates between 0 and 2.5%. For multiple comparison situations, the conclusions are similar.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • Berger, J. O., and Delampandy, M. (1987). Testing precise hypotheses (with discussion). Statistical Science, 2, 317–352.

    Article  MathSciNet  Google Scholar 

  • Berger, J. O., and Sellke, T. (1987). Testing a point null hypothesis: the irreconcilability of P-values and evidence (with discussion). Journal of the American Statistical Association, 82, 112–139.

    MathSciNet  MATH  Google Scholar 

  • Carlin, B. P., and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman and Hall.

    MATH  Google Scholar 

  • Casella, G., and Berger, R. L. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). Journal of the American Statistical Association, 82, 106–111.

    Article  MathSciNet  Google Scholar 

  • Gelman, A. (1996). Discussion of “Hierarchical generalized linear models,” by Y. Lee and J. A. Neider. Journal of the Royal Statistical Society B.

  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis. London: Chapman and Hall.

    Book  Google Scholar 

  • Gelman, A., and Little, T. C. (1997). Poststratification into many categories using hierarchical logistic regression. Survey Methodology, 23, 127–135.

    Google Scholar 

  • Harris, R. J. (1997). Reforming significance testing via three-valued logic. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 145–174. Mahwah, N.J.: Lawrence Erlbaum Associates.

    Google Scholar 

  • Klockars, A.J., and Sax, G. (1986). Multiple Comparisons. Newbury Park: Sage.

    Book  Google Scholar 

  • Kirk, R. E. (1995). Experimental Design: Procedures for the Behavioral Sciences, third edition. Brooks/Cole.

  • Maghsoodloo, S., and Huang, C. L. (1995) Computing probability integrals of a bivariate normal distribution. Interstat. http://interstat.stat.vt.edu/

  • Meng, X. L. (1994). Posterior predictive p-values. Annals of Statistics, 22, 1142–1160.

    Article  MathSciNet  Google Scholar 

  • Morris, C. (1983). Parametric empirical Bayes inference: theory and applications (with discussion). Journal of the American Statistical Association, 78, 47–65.

    Article  MathSciNet  Google Scholar 

  • Pruzek, R. M. (1997). An introduction to Bayesian inference and its applications. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 287–318. Mahwah, N.J.: Lawrence Erlbaum Associates.

    Google Scholar 

  • Rindskopf, D. M. (1997). Testing “small,” not null, hypotheses: classical and Bayesian approaches. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 319–332. Mahwah, N.J.: Lawrence Erlbaum Associates.

    Google Scholar 

  • Robins, J. M., van der Vaart, A., and Ventura, V. (1998). The asymptotic distribution of p-values in composite null models. Technical report.

  • Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random effects (with discussion). Statistical Science, 6, 15–51.

    Article  MathSciNet  Google Scholar 

  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.

    Article  MathSciNet  Google Scholar 

  • Scheffe, H. (1959). The Analysis of Variance. New York: Wiley.

    MATH  Google Scholar 

  • Tukey, J. W. (1960). Conclusions vs. decisions. Technometrics, 2, 423–433.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

We thank David H. Krantz, the editor, and two referees for helpful comments. This work was supported in part by the U.S. National Science Foundation grant SBR-9708424 and Young Investigator Award DMS-9796129. The second author is a research assistant for the Fund of Scientific Research — Flanders.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gelman, A., Tuerlinckx, F. Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational Statistics 15, 373–390 (2000). https://doi.org/10.1007/s001800000040

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s001800000040

Keywords

Navigation