Skip to main content
Log in

Reaching for the stars: attention to multiple testing problems and method recommendations using simulation for business research

  • Original Paper
  • Published:
Journal of Business Economics Aims and scope Submit manuscript

Abstract

Multiple hypotheses tests decide on more than one hypothesis based upon the same data set. Despite the significant relevance for business research, we find that multiple testing methods are not systematically applied in our research area. As surprising this finding is, as crucial it is to the significance of research findings. False positive findings can inflate with the number of employed single tests. The focus on medicine and psychology related topics of current multiple testing literature might deter business researchers from employing the tests. Hence, we provide guidance to researchers by exposing the importance of multiple testing. Based on an application oriented systematization, Monte Carlo simulations, and a decision theoretic evaluation scheme, we highlight method recommendations for different conservatism definitions regarding errors. As implicit results we find, that the dependency structure in the data does not influence the method outcomes significantly. In addition, intuitive single step margin based approaches perform similar than sophisticated tests for typical multiple testing problems in a conservative setting. More liberal methods show a stable relation for all simulation designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Barillas F, Shanken J (2018) Comparing asset pricing models. J Finance 73:715–754

    Article  Google Scholar 

  • Bartenschlager CC, Brunner JO (2018) A new user specific multiple testing method for business applications: the SiMaFlex procedure. Working paper, University of Augsburg

  • Bartenschlager CC, Krapp M (2015) Theory and methods of multiple comparisons: a review of 80 years of multiple testing. AStA Wirtschafts und Sozialstatistisches Archiv 9:107–129

    Article  Google Scholar 

  • Benjamini Y (2010a) Discovering the false discovery rate. J R Stat Soc B 72:405–416

    Article  Google Scholar 

  • Benjamini Y (2010b) Simultaneous and selective inference: current successes and future challenges. Biometr J 52:708–721

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300

    Google Scholar 

  • Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25:60–83

    Article  Google Scholar 

  • Benjamini Y, Liu W (1999) A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Inference 82:163–170

    Article  Google Scholar 

  • Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188

    Article  Google Scholar 

  • Benjamini Y, Krieger AM, Yekutieli D (2006) Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93:491–507

    Article  Google Scholar 

  • Bessec M, Fouquau J (2018) Short-run electricity load forecasting with combinations of stationary wavelet transforms. Eur J Oper Res 264:149–164

    Article  Google Scholar 

  • Blakesley RE, Mazumdar S, Dew MA, Houck PR, Tang G, Reynolds CF, Butters MA (2009) Comparisons of methods for multiple hypothesis testing in neuropsychological research. Neuropsychology 23:255–264

    Article  Google Scholar 

  • Blanchard G, Roquain E (2008) Two simple sufficient conditions for FDR control. Electr J Stat 2:963–992

    Article  Google Scholar 

  • Chen Y, Hao JK (2018) Two phased hybrid local search for the periodic capacitated arc routing problem. Eur J Oper Res 264:55–65

    Article  Google Scholar 

  • Clarke S, Hall P (2009) Robustness of multiple testing procedures against dependence. Ann Stat 37:332–358

    Article  Google Scholar 

  • Cohen A, Sackrowitz HB (2005) Decision theory results for one-sided multiple comparison procedures. Ann Stat 33:126–144

    Article  Google Scholar 

  • Cramer AOJ, van Ravenzwaaij D, Matzke D, Steingroever H, Wetzels R, Grasmann RPPP, Waldorp LJ, Wagenmakers EJ (2016) Hidden multiplicity in exploratory multiway ANOVA: prevalence and remedies. Psychon Bull Rev 23:640–647

    Article  Google Scholar 

  • Dickhaus T (2014) Simultaneous statistical inference with applications in the life sciences. Springer, Berlin

    Book  Google Scholar 

  • Dmitrienko A, Tamhane AC, Bretz F (2010) Multiple testing problems in pharmaceutical statistics. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  • Dudoit S, van der Laan MJ (2008) Multiple testing procedures with applications to genomics. Springer, New York

    Book  Google Scholar 

  • Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18:71–103

    Article  Google Scholar 

  • Farcomeni A (2008) A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat Methods Med Res 17:347–388

    Article  Google Scholar 

  • Finner H, Dickhaus T, Roters M (2009) On the false discovery rate and an asymptotically optimal rejection curve. Ann Stat 37:596–618

    Article  Google Scholar 

  • Fisher RA (1935) The design of experiments. Oliver and Boyd, Edinburgh und London, currently: edition 9. 1974. Hafner, New York

  • Fox J (2015) Applied regression analysis and generalized linear models, 3rd edn. Sage, London

    Google Scholar 

  • Gerum W, Mölls SH, Shen C (2018) Corporate governance, capital market orientation and firm performance: empirical evidence for large publicly traded German corporations. J Bus Econ 88:203–252

    Article  Google Scholar 

  • Goeman JJ, Solari A (2014) Multiple hypothesis testing in genomics. Stat Med 33:1946–1978

    Article  Google Scholar 

  • Hartmann WR, Klapper D (2018) Super bowl ads. Market Sci 37:78–96

    Article  Google Scholar 

  • Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–802

    Article  Google Scholar 

  • Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70

    Google Scholar 

  • Hommel G (1988) A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75:383–386

    Article  Google Scholar 

  • Horn M, Vollandt R (1995) Multiple Tests und Auswahlverfahren. Fischer, Stuttgart

    Google Scholar 

  • Keppo J, Korte J (2018) Risk targeting and policy illusions—evidence from the announcement of the Volcker rule. Manag Sci 64:215–234

    Article  Google Scholar 

  • Korhonen PJ, Maio P, Pajala T, Ravaja N, Somervuori O (2018) Context matters: the impact of product type, emotional attachment and information overload on choice quality. Eur J Operat Res 264:270–279

    Article  Google Scholar 

  • Lehmann EL (1957a) A theory of some multiple decision problems 1. Ann Math Stat 28:1–25

    Article  Google Scholar 

  • Lehmann EL (1957b) A theory of some multiple decision problems 2. Ann Math Stat 28:547–572

    Article  Google Scholar 

  • Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer, New York

    Google Scholar 

  • Lisovskaja V, Burman CF (2015) A decision theoretic approach to optimization of multiple testing procedures. Biometr J 57:64–75

    Article  Google Scholar 

  • Maas S, Schuster T, Hartmann E (2018) Stakeholder pressures, environmental practice adoption and economic performance in German third-party logistics industry—a contingency perspective. J Business Econ 88:167–201

    Article  Google Scholar 

  • Morey RD, Rouder JN, Verhagen J, Wagenmakers EJ (2014) Why hypothesis tests are essential for psychological science: a comment on cumming. Psychol Sci 25:1289–1290

    Article  Google Scholar 

  • Morey RD, Romeijn JW, Rouder JN (2016) The philosophy of Bayes factors and the quantification of statistical evidence. J Math Psychol 72:6–18

    Article  Google Scholar 

  • Ostermaier A (2018) Incentives for students: effects of certificates and deadlines on students performance. J Bus Econ 88:65–96

    Article  Google Scholar 

  • Parmigiani G, Inoue L (2009) Decision theory: principles and approaches. Wiley, Chichester

    Book  Google Scholar 

  • Peng DX, Lai F (2012) Using partial least squares in operations management research: a practical guideline and summary of past research. J Oper Manag 30:467–480

    Article  Google Scholar 

  • Pigeot I (2000) Basic concepts of multiple tests: a survey. Stat Pap 41:3–36

    Article  Google Scholar 

  • Rao CV, Swarupchand U (2009) Multiple comparison procedures: a note and a bibliography. J Stat 16:66–109

    Google Scholar 

  • Rom DM (1990) A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77:663–665

    Article  Google Scholar 

  • Schipper BC (2015) Sex hormones and competitive bidding. Manag Sci 61:249–266

    Article  Google Scholar 

  • Shaffer JP (1995) Multiple hypothesis testing. Annu Rev Psychol 46:561–584

    Article  Google Scholar 

  • Šidák Z (1967) Rectangular confidence regions for the mean of multivariate normal distributions. J Am Stat Assoc 62:626–633

    Google Scholar 

  • Simes RJ (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73:751–754

    Article  Google Scholar 

  • Stange J, Bodnar T, Dickhaus T (2015) uncertainty quantification for the family-wise error rate in multivariate copula models. AStA Adv Stat Anal 99:281–310

    Article  Google Scholar 

  • Storey JD, Taylor JE, Siegmund D (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J R Stat Soc B 66:187–205

    Article  Google Scholar 

  • Tsai C, Chen JJ (2007) Kernel estimation for adjusted p values in multiple testing. Comput Stat Data Anal 51:3885–3897

    Article  Google Scholar 

  • Vasilopoulos T, Morey TE, Dhatariya K, Rice MJ (2016) Limitations of significance testing in clinical research: a review of multiple comparison corrections and effect size calculations with correlated measures. Anesth Analg 122:825–830

    Article  Google Scholar 

  • Wagenmakers EJ (2007) A practical solution to pervasive problems of p values. Psychon Bull Rev 14:779–804

    Article  Google Scholar 

  • Wald A (1949) Statistical decision functions. Ann Math Stat 20:165–205

    Article  Google Scholar 

  • Westfall P, Young SS (1993) Resampling-based multiple testing. Wiley, New York

    Google Scholar 

  • Wetzels R, Matzke D, Lee MD, Rouder JN, Iverson GJ, Wagenmakers EJ (2011) Statistical evidence in experimental psychology: an empirical comparison using 855 t Tests. Perspect Psychol Sci 6:291–298

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christina C. Bartenschlager.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 18 kb)

Appendices

Appendix 1

See Appendix Table 4.

Table 4 Study on the application of regression models and (number of) multiple tests in 2015 Management Science articles

Appendix 2

See Appendix Fig. 8.

Fig. 8
figure 8

Number of hypotheses tests in 2015 Management Science publications

Appendix 3

See Appendix Fig. 9.

Fig. 9
figure 9

Type I error inflation for independent individual tests with a 5% level

Appendix 4: Synopsis of the simulation test design per setting

  1. 1.

    Generate positive definite m × m covariance matrix

    • Fix the proportion of true null hypotheses \( \frac{{m_{0} }}{m} = 0.1 \) and define the expectation vector

      →Simulate 100 normally distributed 100 × m data sets

      →Save mean type I and II error counts for each multiple testing method

    • Fix the proportion of true null hypotheses \( \frac{{m_{0} }}{m} = 0.5 \) and define the expectation vector

      →Simulate 100 normally distributed 100 × m data sets

      →Save mean type I and II error counts for each multiple testing method

    • Fix the proportion of true null hypotheses \( \frac{{m_{0} }}{m} = 0.9 \) and define the expectation vector

      →Simulate 100 normally distributed 100 × m data sets

      →Save mean type I and II error counts for each multiple testing method

  2. 2.

    Repeat [1] 100 times

  3. 3.

    Calculate decision theoretic results.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bartenschlager, C.C., Brunner, J.O. Reaching for the stars: attention to multiple testing problems and method recommendations using simulation for business research. J Bus Econ 89, 447–479 (2019). https://doi.org/10.1007/s11573-018-0919-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11573-018-0919-3

Keywords

JEL Classification

Navigation