A 20-Year Review of Outcome Reporting Bias in Moderated Multiple Regression

Abstract

Moderated multiple regression (MMR) remains the most popular method of testing interactions in management and applied psychology. Recent discussions of MMR have centered on their small effect sizes and typically being statistically underpowered (e.g., Murphy & Russell, Organizational Research Methods, 2016). Although many MMR tests are likely plagued by type II errors, they may also be particularly prone to outcome reporting bias (ORB) resulting in elevated false positives (type I errors). We tested the state of MMR through a 20-year review of six leading journals. Based on 1218 MMR tests nested within 343 studies, we found that despite low statistical power, most MMR tests (54%) were reported as statistically significant. Further, although sample size has remained relatively unchanged (r = − .002), statistically significant MMR tests have risen from 41% (1995–1999) to 49% (2000–2004), to 60% (2005–2009), and to 69% (2010–2014). This could indicate greater methodological and theoretical precision but leaves open the possibility of ORB. In our review, we found evidence that both increased rigor and theoretical precision play an important role in MMR effect size magnitudes, but also found evidence for ORB. Specifically, (a) smaller sample sizes are associated with larger effect sizes, (b) there is a substantial frequency spike in p values just below the .05 threshold, and (c) recalculated p values less than .05 always converged with authors’ conclusions of statistical significance but recalculated p values between .05 and .10 only converged with authors’ conclusions about half (54%) of the time. The findings of this research provide important implications for future application of MMR.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. 1.

    We did conduct a series of multilevel analyses, and the results provided to the editor and reviewers are virtually identical to the meta-regression results presented below. Further, we tested the models with different weighting schemes (e.g., unweighted, weighted by sample size, weighted by inverse standard error of the semipartial correlation), various effect sizes (e.g., semipartial correlation, shrunken semipartial correlation, f2, shrunken f2) calculated in different ways (e.g., based on t statistics using Cohen and Cohen’s (1983) formulas, change in R2 alone), with and without outliers, and with a variety of subsamples in the data (e.g., randomly selected effect sizes from a study, averaged effect sizes). Our intention was not to “hack” the data, but to assure that our results were robust. Across the more than 30 different analyses, our results are remarkably consistent not just in the overall conclusions, but also in the specific effect size directions and magnitudes for the focal variables. The full set of analyses is available from the first author.

  2. 2.

    We thank an anonymous reviewer for raising this concern and recommending the unweighted approach for the reported analyses.

  3. 3.

    Again, we are thankful to an anonymous reviewer for this suggestion.

References

  1. Aguinis, H., & Gottfredson, R. K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31, 776–786. https://doi.org/10.1002/job.686.

    Article  Google Scholar 

  2. Aguinis, H., & Stone-Romero, E. F. (1997). Methodological artifacts in moderated multiple regression and their effects on statistical power. Journal of Applied Psychology, 82, 192–206. https://doi.org/10.1037//0021-9010.82.1.192.

    Article  Google Scholar 

  3. Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks, CA: Sage.

    Google Scholar 

  4. Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. The Leadership Quarterly, 28(1), 5–21.

  5. Banks, G. C., Kepes, S., & McDaniel, M. A. (2015). Publication bias: Understanding the myths concerning threats to the advancement of science. In C. E. Lance & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 36–64). New York, NY: Routledge.

    Google Scholar 

  6. Banks, G. C., & McDaniel, M. A. (2011). The kryptonite of evidence-based I-O psychology. Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 40–44. https://doi.org/10.1111/j.1754-9434.2010.01292.x.

    Article  Google Scholar 

  7. Banks, G. C., O’Boyle, E. H., Pollack, J. M., White, C. D., Batchelor, J. H., Whelpley, C. E., et al. (2016). Questions about questionable research practices in the field of management: A guest commentary. Journal of Management, 42, 5–20. https://doi.org/10.1177/0149206315619011.

    Article  Google Scholar 

  8. Banks, G. C., Rogelberg, S. G., Woznyj, H. M., Landis, R. S., & Rupp, D. E. (2016). Evidence on questionable research practices: The good, the bad, and the ugly. Journal of Business and Psychology, 31, 323–338. https://doi.org/10.1007/s10869-01609456-7.

    Article  Google Scholar 

  9. Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173.

    Article  PubMed  Google Scholar 

  10. Bennett, R. J., & Robinson, S. L. (2000). Development of a measure of workplace deviance. Journal of Applied Psychology, 85, 349–360. https://doi.org/10.1037/0021-9010.85.3.349.

    Article  PubMed  Google Scholar 

  11. Bergh, D. D., Sharp, B. M., & Li, M. (2017). Tests for identifying “red flags” in empirical findings: Demonstration and recommendations for authors, reviewers, and editors. Academy of Management Learning and Education, 16, 110–124. https://doi.org/10.5465/amle.2015.0406.

    Article  Google Scholar 

  12. Biemann, T. (2013). What if we were Texas sharpshooters? Predictor reporting bias in regression analysis. Organizational Research Methods, 16, 335–363. https://doi.org/10.1177/1094428113485135.

    Article  Google Scholar 

  13. Bobko, P. (1986). A solution to some dilemmas when testing hypotheses about ordinal interactions. Journal of Applied Psychology, 71, 323–326. https://doi.org/10.1037/0021-9010.71.2.323.

    Article  Google Scholar 

  14. Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2016). HARKing’s threat to organizational research: Evidence from primary and meta-analytic sources. Personnel Psychology, 69, 709–750. https://doi.org/10.1111/peps.12111.

    Article  Google Scholar 

  15. Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449. https://doi.org/10.1037/a0038047.

    Article  PubMed  Google Scholar 

  16. Cohen, J. E. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

    Google Scholar 

  17. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  18. Cortina, J. M. (1993). Interaction, nonlinearity, and multicollinearity: Implications for multiple regression. Journal of Management, 19, 915–922. https://doi.org/10.1016/0149-2063(93)90035-L.

    Article  Google Scholar 

  19. Cortina, J. M., Green, J. P., Keeler, K. R., & Vandenberg, R. J. (in press). Degrees of freedom in SEM: Are we testing the models that we claim to test? Organizational Research Methods. 1094428116676345.

  20. Cronbach, L. J. (1987). Statistical tests for moderator variables: Flaws in analyses recently proposed. Psychological Bulletin, 102, 414–417. https://doi.org/10.1037/0033-2909.102.3.414.

    Article  Google Scholar 

  21. de Winter, J. C., & Dodou, D. (2015). A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ, 3, e733. https://doi.org/10.7717/peerj.733.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Editors. (1909). The reporting of unsuccessful cases. The Boston Medical and Surgical Journal, 161, 263–264. https://doi.org/10.1056/NEJM190908191610809.

  23. Edwards, J. R., & Berry, J. W. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13, 668–689. https://doi.org/10.1177/1094428110380467.

    Article  Google Scholar 

  24. Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170, 1934–1939. https://doi.org/10.1001/archinternmed.2010.406.

    Article  PubMed  Google Scholar 

  25. Evans, M. G. (1985). A Monte Carlo study of the effects of correlated method variance in moderated multiple regression analysis. Organizational Behavior and Human Decision Processes, 36, 305–323. https://doi.org/10.1016/0749-5978(85)90002-0.

    Article  Google Scholar 

  26. Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891–904. https://doi.org/10.1007/s11192-011-0494-7.

    Article  Google Scholar 

  27. Finkel, E. J., Eastwick, P. W., & Reis, H. T. (2015). Best research practices in psychology: Illustrating epistemological and pragmatic considerations with the case of relationship science. Journal of Personality and Social Psychology, 108, 275–297. https://doi.org/10.1037/pspi0000007.

    Article  PubMed  Google Scholar 

  28. Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345, 1502–1505. https://doi.org/10.1126/science.1255484.

    Article  PubMed  Google Scholar 

  29. Gerber, A. S., & Malhotra, N. (2008a). Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quarterly Journal of Political Science, 3, 313–326. https://doi.org/10.1561/100.00008024.

    Article  Google Scholar 

  30. Gerber, A. S., & Malhotra, N. (2008b). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37, 3–30. https://doi.org/10.1177/0049124108318973.

    Article  Google Scholar 

  31. Grand, J. A., Rogelberg, S. G., Banks, G. C., Landis, R. S., Tonidandel, S. (in press). From outcome to process focus: Fostering a more robust psychological science through registered reports and results-blind reviewing. Perspectives on Psychological Science.

  32. Greco, L. M., O’Boyle, E. H., Cockburn, B. S., & Yuan, Z. (in press). A reliability generalization examination of organizational behavior constructs. Journal of Management Studies.

  33. Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20. https://doi.org/10.1037/h0076157.

    Article  Google Scholar 

  34. Hardwicke, T. E., Mathur, M., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., ... Tessler, M. H. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition.

    Google Scholar 

  35. Hartgerink, C. H., van Aert, R. C., Nuijten, M. B., Wicherts, J. M., & Van Assen, M. A. (2016). Distributions of p-values smaller than .05 in psychology: What is going on? PeerJ, 4, e1935. https://doi.org/10.7717/peerj.1935.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Hollenbeck, J. R., & Wright, P. M. (2016). Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43, 5–18. https://doi.org/10.1177/0149206316679487.

    Article  Google Scholar 

  37. Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology, 19, 640–648. https://doi.org/10.1097/EDE.0b013e31818131e7.

    Article  PubMed  Google Scholar 

  38. Jaccard, J., Wan, C. K., & Turrisi, R. (1990). The detection and interpretation of interaction effects between continuous variables in multiple regression. Multivariate Behavioral Research, 25, 467–478. https://doi.org/10.1207/s15327906mbr2504_4.

    Article  PubMed  Google Scholar 

  39. James, L. R., & Brett, J. M. (1984). Mediators, moderators, and tests for mediation. Journal of Applied Psychology, 69, 307–321. https://doi.org/10.1037/0021-9010.69.2.307.

    Article  Google Scholar 

  40. John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https://doi.org/10.1177/0956797611430953.

    Article  PubMed  Google Scholar 

  41. Journal Citation Reports® (2014). Social Science Edition. (Thompson Reuters, 2015). http://jcr.incites.thomsonreuters.com.

  42. Kepes, S., Banks, G. C., McDaniel, M. A., & Whetzel, D. L. (2012). Publication bias in the organizational sciences. Organizational Research Methods, 15, 624–662. https://doi.org/10.1177/1094428112452760.

    Article  Google Scholar 

  43. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217. https://doi.org/10.1207/s15327957pspr0203_4.

    Article  PubMed  Google Scholar 

  44. Krawczyk, M. (2015). The search for significance: A few peculiarities in the distribution of P values in experimental psychology literature. PLoS One, 10(6), e0127872. https://doi.org/10.1371/journal.pone.0127872.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS One, 9(9), e105825. https://doi.org/10.1371/journal.pone.0105825.

    Article  PubMed  PubMed Central  Google Scholar 

  46. LeBreton, J. M. (2016). Editorial. Organizational Research Methods, 19, 3–7. https://doi.org/10.1177/1094428115622097.

    Article  Google Scholar 

  47. LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized relative importance analysis: A technique for the comprehensive decomposition of variance in higher order regression models. Organizational Research Methods, 16, 449–473. https://doi.org/10.1177/1094428113481065.

    Article  Google Scholar 

  48. Leggett, N. C., Thomas, N. A., Loetscher, T., & Nicholls, M. E. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66, 2303–2309. https://doi.org/10.1080/17470218.2013.863371.

    Article  PubMed  Google Scholar 

  49. Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below. 05. The Quarterly Journal of Experimental Psychology and Aging, 65, 2271–2279. https://doi.org/10.1080/17470218.2012.711335.

    Article  Google Scholar 

  50. Matthes, J., Marquart, F., Naderer, B., Arendt, F., Schmuck, D., & Adam, K. (2015). Questionable research practices in experimental communication research: A systematic analysis from 1980 to 2013. Communication Methods and Measures, 9(4), 193–207. https://doi.org/10.1080/19312458.2015.1096334.

    Article  Google Scholar 

  51. Murphy, K. R., & Russell, C. J. (2016). Mend it or end it: Redirecting the search for interactions in the organizational sciences. Organizational Research Methods. 1094428115625322.

  52. Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture: The TOP guidelines for journals. Science, 348, 1422–1425. https://doi.org/10.1126/science.aab2374.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening scientific communication. Psychological Inquiry, 23(3), 217–243. https://doi.org/10.1080/1047840X.2012.692215.

    Article  Google Scholar 

  54. Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48, 1205–1226. https://doi.org/10.3758/s13428-015-0664-2.

    Article  PubMed  Google Scholar 

  55. O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43, 376–399 doi: 0149206314527133.

    Article  Google Scholar 

  56. Orlitzky, M. (2012). How can significance tests be deinstitutionalized? Organizational Research Methods, 15, 199–228. https://doi.org/10.1177/109442811428356.

    Article  Google Scholar 

  57. Porter, T. M. (1992). Quantification and the accounting ideal in science. Social Studies of Science, 22, 633–652. https://doi.org/10.1177/030631292022004004.

    Article  Google Scholar 

  58. Robinson, S. L., & Bennett, R. J. (1995). A typology of deviant workplace behaviors: A multidimensional scaling study. Academy of Management Journal, 38, 555–572. https://doi.org/10.2307/256693.

    Article  Google Scholar 

  59. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638–641. https://doi.org/10.1037/0033-2909.86.3.638.

    Article  Google Scholar 

  60. Russell, C. J., & Bobko, P. (1992). Moderated regression analysis and Likert scales: Too coarse for comfort. Journal of Applied Psychology, 77, 336–342. https://doi.org/10.1037//0021-9010.77.3.336.

    Article  PubMed  Google Scholar 

  61. Scandura, T. A., & Williams, E. A. (2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43, 1248–1264. https://doi.org/10.2307/1556348.

    Article  Google Scholar 

  62. Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in research findings (3rd ed.). Thousand Oaks, CA: Sage.

    Google Scholar 

  63. Schwab, A., & Starbuck, W. H. (in press). A call for openness in research reporting: How to turn covert practices into helpful tools. Academy of Management Learning and Education, 16, 125–141. https://doi.org/10.5465/amle.2016.0039.

  64. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632.

    Article  PubMed  Google Scholar 

  65. Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144, 1146–1152. https://doi.org/10.1037/xge0000104.

    Article  Google Scholar 

  66. Song, F., Parekh, S., Hooper, L., Loke, Y. K., Ryder, J., Sutton, A. J., et al. (2010). Dissemination and publication of research findings: An updated review of related biases. Health Technology Assessment, 14, 1–220. https://doi.org/10.3310/hta14080.

    Article  Google Scholar 

  67. Spector, P. E., & Fox, S. (2005). The stressor-emotion model of counterproductive work behavior. In S. Fox & P. E. Spector (Eds.), Counterproductive work behavior: Investigations of actors and targets (pp. 151–174). Washington, DC: American Psychological Association.

    Google Scholar 

  68. Starbuck, W. H. (in press). 60th anniversary essay: How journals could improve research practices in social science. Administrative Science Quarterly, 61, 165–183. https://doi.org/10.1177/0001839216629644.

  69. Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—Or vice versa. Journal of the American Statistical Association, 54, 30–34. https://doi.org/10.1080/01621459.1959.10501497.

    Article  Google Scholar 

  70. Tonidandel, S., & LeBreton, J. M. (2011). Relative importance analysis: A useful supplement to regression analysis. Journal of Business and Psychology, 26, 1–9. https://doi.org/10.1007/s10869-010-9204-3.

    Article  Google Scholar 

  71. Tsang, E. W., & Kwan, K. M. (1999). Replication and theory development in organizational science: A critical realist perspective. Academy of Management Review, 24, 759–780. https://doi.org/10.5465/AMR.1999.2553252.

    Article  Google Scholar 

  72. Viechtbauer, W. (2010). Conducting meta-analyses in R with the Metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03.

    Article  Google Scholar 

  73. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632–638. https://doi.org/10.1177/1745691612463078.

    Article  PubMed  Google Scholar 

  74. Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS One, 6(11), e26828. https://doi.org/10.1371/journal.pone.0026828.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ernest O’Boyle.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

O’Boyle, E., Banks, G.C., Carter, K. et al. A 20-Year Review of Outcome Reporting Bias in Moderated Multiple Regression. J Bus Psychol 34, 19–37 (2019). https://doi.org/10.1007/s10869-018-9539-8

Download citation

Keywords

  • Outcome reporting bias
  • Publication bias
  • Questionable reporting practices
  • Moderated multiple regression
  • Meta-analysis