Abstract
Moderated multiple regression (MMR) remains the most popular method of testing interactions in management and applied psychology. Recent discussions of MMR have centered on their small effect sizes and typically being statistically underpowered (e.g., Murphy & Russell, Organizational Research Methods, 2016). Although many MMR tests are likely plagued by type II errors, they may also be particularly prone to outcome reporting bias (ORB) resulting in elevated false positives (type I errors). We tested the state of MMR through a 20-year review of six leading journals. Based on 1218 MMR tests nested within 343 studies, we found that despite low statistical power, most MMR tests (54%) were reported as statistically significant. Further, although sample size has remained relatively unchanged (r = − .002), statistically significant MMR tests have risen from 41% (1995–1999) to 49% (2000–2004), to 60% (2005–2009), and to 69% (2010–2014). This could indicate greater methodological and theoretical precision but leaves open the possibility of ORB. In our review, we found evidence that both increased rigor and theoretical precision play an important role in MMR effect size magnitudes, but also found evidence for ORB. Specifically, (a) smaller sample sizes are associated with larger effect sizes, (b) there is a substantial frequency spike in p values just below the .05 threshold, and (c) recalculated p values less than .05 always converged with authors’ conclusions of statistical significance but recalculated p values between .05 and .10 only converged with authors’ conclusions about half (54%) of the time. The findings of this research provide important implications for future application of MMR.
Similar content being viewed by others
Notes
We did conduct a series of multilevel analyses, and the results provided to the editor and reviewers are virtually identical to the meta-regression results presented below. Further, we tested the models with different weighting schemes (e.g., unweighted, weighted by sample size, weighted by inverse standard error of the semipartial correlation), various effect sizes (e.g., semipartial correlation, shrunken semipartial correlation, f2, shrunken f2) calculated in different ways (e.g., based on t statistics using Cohen and Cohen’s (1983) formulas, change in R2 alone), with and without outliers, and with a variety of subsamples in the data (e.g., randomly selected effect sizes from a study, averaged effect sizes). Our intention was not to “hack” the data, but to assure that our results were robust. Across the more than 30 different analyses, our results are remarkably consistent not just in the overall conclusions, but also in the specific effect size directions and magnitudes for the focal variables. The full set of analyses is available from the first author.
We thank an anonymous reviewer for raising this concern and recommending the unweighted approach for the reported analyses.
Again, we are thankful to an anonymous reviewer for this suggestion.
References
Aguinis, H., & Gottfredson, R. K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31, 776–786. https://doi.org/10.1002/job.686.
Aguinis, H., & Stone-Romero, E. F. (1997). Methodological artifacts in moderated multiple regression and their effects on statistical power. Journal of Applied Psychology, 82, 192–206. https://doi.org/10.1037//0021-9010.82.1.192.
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks, CA: Sage.
Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. The Leadership Quarterly, 28(1), 5–21.
Banks, G. C., Kepes, S., & McDaniel, M. A. (2015). Publication bias: Understanding the myths concerning threats to the advancement of science. In C. E. Lance & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 36–64). New York, NY: Routledge.
Banks, G. C., & McDaniel, M. A. (2011). The kryptonite of evidence-based I-O psychology. Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 40–44. https://doi.org/10.1111/j.1754-9434.2010.01292.x.
Banks, G. C., O’Boyle, E. H., Pollack, J. M., White, C. D., Batchelor, J. H., Whelpley, C. E., et al. (2016). Questions about questionable research practices in the field of management: A guest commentary. Journal of Management, 42, 5–20. https://doi.org/10.1177/0149206315619011.
Banks, G. C., Rogelberg, S. G., Woznyj, H. M., Landis, R. S., & Rupp, D. E. (2016). Evidence on questionable research practices: The good, the bad, and the ugly. Journal of Business and Psychology, 31, 323–338. https://doi.org/10.1007/s10869-01609456-7.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173.
Bennett, R. J., & Robinson, S. L. (2000). Development of a measure of workplace deviance. Journal of Applied Psychology, 85, 349–360. https://doi.org/10.1037/0021-9010.85.3.349.
Bergh, D. D., Sharp, B. M., & Li, M. (2017). Tests for identifying “red flags” in empirical findings: Demonstration and recommendations for authors, reviewers, and editors. Academy of Management Learning and Education, 16, 110–124. https://doi.org/10.5465/amle.2015.0406.
Biemann, T. (2013). What if we were Texas sharpshooters? Predictor reporting bias in regression analysis. Organizational Research Methods, 16, 335–363. https://doi.org/10.1177/1094428113485135.
Bobko, P. (1986). A solution to some dilemmas when testing hypotheses about ordinal interactions. Journal of Applied Psychology, 71, 323–326. https://doi.org/10.1037/0021-9010.71.2.323.
Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2016). HARKing’s threat to organizational research: Evidence from primary and meta-analytic sources. Personnel Psychology, 69, 709–750. https://doi.org/10.1111/peps.12111.
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449. https://doi.org/10.1037/a0038047.
Cohen, J. E. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cortina, J. M. (1993). Interaction, nonlinearity, and multicollinearity: Implications for multiple regression. Journal of Management, 19, 915–922. https://doi.org/10.1016/0149-2063(93)90035-L.
Cortina, J. M., Green, J. P., Keeler, K. R., & Vandenberg, R. J. (in press). Degrees of freedom in SEM: Are we testing the models that we claim to test? Organizational Research Methods. 1094428116676345.
Cronbach, L. J. (1987). Statistical tests for moderator variables: Flaws in analyses recently proposed. Psychological Bulletin, 102, 414–417. https://doi.org/10.1037/0033-2909.102.3.414.
de Winter, J. C., & Dodou, D. (2015). A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ, 3, e733. https://doi.org/10.7717/peerj.733.
Editors. (1909). The reporting of unsuccessful cases. The Boston Medical and Surgical Journal, 161, 263–264. https://doi.org/10.1056/NEJM190908191610809.
Edwards, J. R., & Berry, J. W. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13, 668–689. https://doi.org/10.1177/1094428110380467.
Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170, 1934–1939. https://doi.org/10.1001/archinternmed.2010.406.
Evans, M. G. (1985). A Monte Carlo study of the effects of correlated method variance in moderated multiple regression analysis. Organizational Behavior and Human Decision Processes, 36, 305–323. https://doi.org/10.1016/0749-5978(85)90002-0.
Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891–904. https://doi.org/10.1007/s11192-011-0494-7.
Finkel, E. J., Eastwick, P. W., & Reis, H. T. (2015). Best research practices in psychology: Illustrating epistemological and pragmatic considerations with the case of relationship science. Journal of Personality and Social Psychology, 108, 275–297. https://doi.org/10.1037/pspi0000007.
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345, 1502–1505. https://doi.org/10.1126/science.1255484.
Gerber, A. S., & Malhotra, N. (2008a). Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quarterly Journal of Political Science, 3, 313–326. https://doi.org/10.1561/100.00008024.
Gerber, A. S., & Malhotra, N. (2008b). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37, 3–30. https://doi.org/10.1177/0049124108318973.
Grand, J. A., Rogelberg, S. G., Banks, G. C., Landis, R. S., Tonidandel, S. (in press). From outcome to process focus: Fostering a more robust psychological science through registered reports and results-blind reviewing. Perspectives on Psychological Science.
Greco, L. M., O’Boyle, E. H., Cockburn, B. S., & Yuan, Z. (in press). A reliability generalization examination of organizational behavior constructs. Journal of Management Studies.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20. https://doi.org/10.1037/h0076157.
Hardwicke, T. E., Mathur, M., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., ... Tessler, M. H. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition.
Hartgerink, C. H., van Aert, R. C., Nuijten, M. B., Wicherts, J. M., & Van Assen, M. A. (2016). Distributions of p-values smaller than .05 in psychology: What is going on? PeerJ, 4, e1935. https://doi.org/10.7717/peerj.1935.
Hollenbeck, J. R., & Wright, P. M. (2016). Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43, 5–18. https://doi.org/10.1177/0149206316679487.
Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology, 19, 640–648. https://doi.org/10.1097/EDE.0b013e31818131e7.
Jaccard, J., Wan, C. K., & Turrisi, R. (1990). The detection and interpretation of interaction effects between continuous variables in multiple regression. Multivariate Behavioral Research, 25, 467–478. https://doi.org/10.1207/s15327906mbr2504_4.
James, L. R., & Brett, J. M. (1984). Mediators, moderators, and tests for mediation. Journal of Applied Psychology, 69, 307–321. https://doi.org/10.1037/0021-9010.69.2.307.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https://doi.org/10.1177/0956797611430953.
Journal Citation Reports® (2014). Social Science Edition. (Thompson Reuters, 2015). http://jcr.incites.thomsonreuters.com.
Kepes, S., Banks, G. C., McDaniel, M. A., & Whetzel, D. L. (2012). Publication bias in the organizational sciences. Organizational Research Methods, 15, 624–662. https://doi.org/10.1177/1094428112452760.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217. https://doi.org/10.1207/s15327957pspr0203_4.
Krawczyk, M. (2015). The search for significance: A few peculiarities in the distribution of P values in experimental psychology literature. PLoS One, 10(6), e0127872. https://doi.org/10.1371/journal.pone.0127872.
Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS One, 9(9), e105825. https://doi.org/10.1371/journal.pone.0105825.
LeBreton, J. M. (2016). Editorial. Organizational Research Methods, 19, 3–7. https://doi.org/10.1177/1094428115622097.
LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized relative importance analysis: A technique for the comprehensive decomposition of variance in higher order regression models. Organizational Research Methods, 16, 449–473. https://doi.org/10.1177/1094428113481065.
Leggett, N. C., Thomas, N. A., Loetscher, T., & Nicholls, M. E. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66, 2303–2309. https://doi.org/10.1080/17470218.2013.863371.
Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below. 05. The Quarterly Journal of Experimental Psychology and Aging, 65, 2271–2279. https://doi.org/10.1080/17470218.2012.711335.
Matthes, J., Marquart, F., Naderer, B., Arendt, F., Schmuck, D., & Adam, K. (2015). Questionable research practices in experimental communication research: A systematic analysis from 1980 to 2013. Communication Methods and Measures, 9(4), 193–207. https://doi.org/10.1080/19312458.2015.1096334.
Murphy, K. R., & Russell, C. J. (2016). Mend it or end it: Redirecting the search for interactions in the organizational sciences. Organizational Research Methods. 1094428115625322.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture: The TOP guidelines for journals. Science, 348, 1422–1425. https://doi.org/10.1126/science.aab2374.
Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening scientific communication. Psychological Inquiry, 23(3), 217–243. https://doi.org/10.1080/1047840X.2012.692215.
Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48, 1205–1226. https://doi.org/10.3758/s13428-015-0664-2.
O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43, 376–399 doi: 0149206314527133.
Orlitzky, M. (2012). How can significance tests be deinstitutionalized? Organizational Research Methods, 15, 199–228. https://doi.org/10.1177/109442811428356.
Porter, T. M. (1992). Quantification and the accounting ideal in science. Social Studies of Science, 22, 633–652. https://doi.org/10.1177/030631292022004004.
Robinson, S. L., & Bennett, R. J. (1995). A typology of deviant workplace behaviors: A multidimensional scaling study. Academy of Management Journal, 38, 555–572. https://doi.org/10.2307/256693.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638–641. https://doi.org/10.1037/0033-2909.86.3.638.
Russell, C. J., & Bobko, P. (1992). Moderated regression analysis and Likert scales: Too coarse for comfort. Journal of Applied Psychology, 77, 336–342. https://doi.org/10.1037//0021-9010.77.3.336.
Scandura, T. A., & Williams, E. A. (2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43, 1248–1264. https://doi.org/10.2307/1556348.
Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in research findings (3rd ed.). Thousand Oaks, CA: Sage.
Schwab, A., & Starbuck, W. H. (in press). A call for openness in research reporting: How to turn covert practices into helpful tools. Academy of Management Learning and Education, 16, 125–141. https://doi.org/10.5465/amle.2016.0039.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632.
Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144, 1146–1152. https://doi.org/10.1037/xge0000104.
Song, F., Parekh, S., Hooper, L., Loke, Y. K., Ryder, J., Sutton, A. J., et al. (2010). Dissemination and publication of research findings: An updated review of related biases. Health Technology Assessment, 14, 1–220. https://doi.org/10.3310/hta14080.
Spector, P. E., & Fox, S. (2005). The stressor-emotion model of counterproductive work behavior. In S. Fox & P. E. Spector (Eds.), Counterproductive work behavior: Investigations of actors and targets (pp. 151–174). Washington, DC: American Psychological Association.
Starbuck, W. H. (in press). 60th anniversary essay: How journals could improve research practices in social science. Administrative Science Quarterly, 61, 165–183. https://doi.org/10.1177/0001839216629644.
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—Or vice versa. Journal of the American Statistical Association, 54, 30–34. https://doi.org/10.1080/01621459.1959.10501497.
Tonidandel, S., & LeBreton, J. M. (2011). Relative importance analysis: A useful supplement to regression analysis. Journal of Business and Psychology, 26, 1–9. https://doi.org/10.1007/s10869-010-9204-3.
Tsang, E. W., & Kwan, K. M. (1999). Replication and theory development in organizational science: A critical realist perspective. Academy of Management Review, 24, 759–780. https://doi.org/10.5465/AMR.1999.2553252.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the Metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632–638. https://doi.org/10.1177/1745691612463078.
Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS One, 6(11), e26828. https://doi.org/10.1371/journal.pone.0026828.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
O’Boyle, E., Banks, G.C., Carter, K. et al. A 20-Year Review of Outcome Reporting Bias in Moderated Multiple Regression. J Bus Psychol 34, 19–37 (2019). https://doi.org/10.1007/s10869-018-9539-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10869-018-9539-8