A 20-Year Review of Outcome Reporting Bias in Moderated Multiple Regression
- 288 Downloads
- 2 Citations
Abstract
Moderated multiple regression (MMR) remains the most popular method of testing interactions in management and applied psychology. Recent discussions of MMR have centered on their small effect sizes and typically being statistically underpowered (e.g., Murphy & Russell, Organizational Research Methods, 2016). Although many MMR tests are likely plagued by type II errors, they may also be particularly prone to outcome reporting bias (ORB) resulting in elevated false positives (type I errors). We tested the state of MMR through a 20-year review of six leading journals. Based on 1218 MMR tests nested within 343 studies, we found that despite low statistical power, most MMR tests (54%) were reported as statistically significant. Further, although sample size has remained relatively unchanged (r = − .002), statistically significant MMR tests have risen from 41% (1995–1999) to 49% (2000–2004), to 60% (2005–2009), and to 69% (2010–2014). This could indicate greater methodological and theoretical precision but leaves open the possibility of ORB. In our review, we found evidence that both increased rigor and theoretical precision play an important role in MMR effect size magnitudes, but also found evidence for ORB. Specifically, (a) smaller sample sizes are associated with larger effect sizes, (b) there is a substantial frequency spike in p values just below the .05 threshold, and (c) recalculated p values less than .05 always converged with authors’ conclusions of statistical significance but recalculated p values between .05 and .10 only converged with authors’ conclusions about half (54%) of the time. The findings of this research provide important implications for future application of MMR.
Keywords
Outcome reporting bias Publication bias Questionable reporting practices Moderated multiple regression Meta-analysisReferences
- Aguinis, H., & Gottfredson, R. K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31, 776–786. https://doi.org/10.1002/job.686.Google Scholar
- Aguinis, H., & Stone-Romero, E. F. (1997). Methodological artifacts in moderated multiple regression and their effects on statistical power. Journal of Applied Psychology, 82, 192–206. https://doi.org/10.1037//0021-9010.82.1.192.Google Scholar
- Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks, CA: Sage.Google Scholar
- Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. The Leadership Quarterly, 28(1), 5–21.Google Scholar
- Banks, G. C., Kepes, S., & McDaniel, M. A. (2015). Publication bias: Understanding the myths concerning threats to the advancement of science. In C. E. Lance & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 36–64). New York, NY: Routledge.Google Scholar
- Banks, G. C., & McDaniel, M. A. (2011). The kryptonite of evidence-based I-O psychology. Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 40–44. https://doi.org/10.1111/j.1754-9434.2010.01292.x.Google Scholar
- Banks, G. C., O’Boyle, E. H., Pollack, J. M., White, C. D., Batchelor, J. H., Whelpley, C. E., et al. (2016). Questions about questionable research practices in the field of management: A guest commentary. Journal of Management, 42, 5–20. https://doi.org/10.1177/0149206315619011.Google Scholar
- Banks, G. C., Rogelberg, S. G., Woznyj, H. M., Landis, R. S., & Rupp, D. E. (2016). Evidence on questionable research practices: The good, the bad, and the ugly. Journal of Business and Psychology, 31, 323–338. https://doi.org/10.1007/s10869-01609456-7.Google Scholar
- Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173.Google Scholar
- Bennett, R. J., & Robinson, S. L. (2000). Development of a measure of workplace deviance. Journal of Applied Psychology, 85, 349–360. https://doi.org/10.1037/0021-9010.85.3.349.Google Scholar
- Bergh, D. D., Sharp, B. M., & Li, M. (2017). Tests for identifying “red flags” in empirical findings: Demonstration and recommendations for authors, reviewers, and editors. Academy of Management Learning and Education, 16, 110–124. https://doi.org/10.5465/amle.2015.0406.Google Scholar
- Biemann, T. (2013). What if we were Texas sharpshooters? Predictor reporting bias in regression analysis. Organizational Research Methods, 16, 335–363. https://doi.org/10.1177/1094428113485135.Google Scholar
- Bobko, P. (1986). A solution to some dilemmas when testing hypotheses about ordinal interactions. Journal of Applied Psychology, 71, 323–326. https://doi.org/10.1037/0021-9010.71.2.323.Google Scholar
- Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2016). HARKing’s threat to organizational research: Evidence from primary and meta-analytic sources. Personnel Psychology, 69, 709–750. https://doi.org/10.1111/peps.12111.Google Scholar
- Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449. https://doi.org/10.1037/a0038047.Google Scholar
- Cohen, J. E. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.Google Scholar
- Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar
- Cortina, J. M. (1993). Interaction, nonlinearity, and multicollinearity: Implications for multiple regression. Journal of Management, 19, 915–922. https://doi.org/10.1016/0149-2063(93)90035-L.Google Scholar
- Cortina, J. M., Green, J. P., Keeler, K. R., & Vandenberg, R. J. (in press). Degrees of freedom in SEM: Are we testing the models that we claim to test? Organizational Research Methods. 1094428116676345.Google Scholar
- Cronbach, L. J. (1987). Statistical tests for moderator variables: Flaws in analyses recently proposed. Psychological Bulletin, 102, 414–417. https://doi.org/10.1037/0033-2909.102.3.414.Google Scholar
- de Winter, J. C., & Dodou, D. (2015). A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ, 3, e733. https://doi.org/10.7717/peerj.733.Google Scholar
- Editors. (1909). The reporting of unsuccessful cases. The Boston Medical and Surgical Journal, 161, 263–264. https://doi.org/10.1056/NEJM190908191610809.
- Edwards, J. R., & Berry, J. W. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13, 668–689. https://doi.org/10.1177/1094428110380467.Google Scholar
- Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170, 1934–1939. https://doi.org/10.1001/archinternmed.2010.406.Google Scholar
- Evans, M. G. (1985). A Monte Carlo study of the effects of correlated method variance in moderated multiple regression analysis. Organizational Behavior and Human Decision Processes, 36, 305–323. https://doi.org/10.1016/0749-5978(85)90002-0.Google Scholar
- Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891–904. https://doi.org/10.1007/s11192-011-0494-7.Google Scholar
- Finkel, E. J., Eastwick, P. W., & Reis, H. T. (2015). Best research practices in psychology: Illustrating epistemological and pragmatic considerations with the case of relationship science. Journal of Personality and Social Psychology, 108, 275–297. https://doi.org/10.1037/pspi0000007.Google Scholar
- Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345, 1502–1505. https://doi.org/10.1126/science.1255484.Google Scholar
- Gerber, A. S., & Malhotra, N. (2008a). Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quarterly Journal of Political Science, 3, 313–326. https://doi.org/10.1561/100.00008024.Google Scholar
- Gerber, A. S., & Malhotra, N. (2008b). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37, 3–30. https://doi.org/10.1177/0049124108318973.Google Scholar
- Grand, J. A., Rogelberg, S. G., Banks, G. C., Landis, R. S., Tonidandel, S. (in press). From outcome to process focus: Fostering a more robust psychological science through registered reports and results-blind reviewing. Perspectives on Psychological Science.Google Scholar
- Greco, L. M., O’Boyle, E. H., Cockburn, B. S., & Yuan, Z. (in press). A reliability generalization examination of organizational behavior constructs. Journal of Management Studies. Google Scholar
- Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20. https://doi.org/10.1037/h0076157.Google Scholar
- Hardwicke, T. E., Mathur, M., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., ... Tessler, M. H. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition.Google Scholar
- Hartgerink, C. H., van Aert, R. C., Nuijten, M. B., Wicherts, J. M., & Van Assen, M. A. (2016). Distributions of p-values smaller than .05 in psychology: What is going on? PeerJ, 4, e1935. https://doi.org/10.7717/peerj.1935.Google Scholar
- Hollenbeck, J. R., & Wright, P. M. (2016). Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43, 5–18. https://doi.org/10.1177/0149206316679487.Google Scholar
- Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology, 19, 640–648. https://doi.org/10.1097/EDE.0b013e31818131e7.Google Scholar
- Jaccard, J., Wan, C. K., & Turrisi, R. (1990). The detection and interpretation of interaction effects between continuous variables in multiple regression. Multivariate Behavioral Research, 25, 467–478. https://doi.org/10.1207/s15327906mbr2504_4.Google Scholar
- James, L. R., & Brett, J. M. (1984). Mediators, moderators, and tests for mediation. Journal of Applied Psychology, 69, 307–321. https://doi.org/10.1037/0021-9010.69.2.307.Google Scholar
- John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https://doi.org/10.1177/0956797611430953.Google Scholar
- Journal Citation Reports® (2014). Social Science Edition. (Thompson Reuters, 2015). http://jcr.incites.thomsonreuters.com.
- Kepes, S., Banks, G. C., McDaniel, M. A., & Whetzel, D. L. (2012). Publication bias in the organizational sciences. Organizational Research Methods, 15, 624–662. https://doi.org/10.1177/1094428112452760.Google Scholar
- Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217. https://doi.org/10.1207/s15327957pspr0203_4.Google Scholar
- Krawczyk, M. (2015). The search for significance: A few peculiarities in the distribution of P values in experimental psychology literature. PLoS One, 10(6), e0127872. https://doi.org/10.1371/journal.pone.0127872.Google Scholar
- Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS One, 9(9), e105825. https://doi.org/10.1371/journal.pone.0105825.Google Scholar
- LeBreton, J. M. (2016). Editorial. Organizational Research Methods, 19, 3–7. https://doi.org/10.1177/1094428115622097.Google Scholar
- LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized relative importance analysis: A technique for the comprehensive decomposition of variance in higher order regression models. Organizational Research Methods, 16, 449–473. https://doi.org/10.1177/1094428113481065.Google Scholar
- Leggett, N. C., Thomas, N. A., Loetscher, T., & Nicholls, M. E. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66, 2303–2309. https://doi.org/10.1080/17470218.2013.863371.Google Scholar
- Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p values just below. 05. The Quarterly Journal of Experimental Psychology and Aging, 65, 2271–2279. https://doi.org/10.1080/17470218.2012.711335.Google Scholar
- Matthes, J., Marquart, F., Naderer, B., Arendt, F., Schmuck, D., & Adam, K. (2015). Questionable research practices in experimental communication research: A systematic analysis from 1980 to 2013. Communication Methods and Measures, 9(4), 193–207. https://doi.org/10.1080/19312458.2015.1096334.Google Scholar
- Murphy, K. R., & Russell, C. J. (2016). Mend it or end it: Redirecting the search for interactions in the organizational sciences. Organizational Research Methods. 1094428115625322.Google Scholar
- Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., et al. (2015). Promoting an open research culture: The TOP guidelines for journals. Science, 348, 1422–1425. https://doi.org/10.1126/science.aab2374.Google Scholar
- Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening scientific communication. Psychological Inquiry, 23(3), 217–243. https://doi.org/10.1080/1047840X.2012.692215.Google Scholar
- Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48, 1205–1226. https://doi.org/10.3758/s13428-015-0664-2.Google Scholar
- O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43, 376–399 doi: 0149206314527133.Google Scholar
- Orlitzky, M. (2012). How can significance tests be deinstitutionalized? Organizational Research Methods, 15, 199–228. https://doi.org/10.1177/109442811428356.Google Scholar
- Porter, T. M. (1992). Quantification and the accounting ideal in science. Social Studies of Science, 22, 633–652. https://doi.org/10.1177/030631292022004004.Google Scholar
- Robinson, S. L., & Bennett, R. J. (1995). A typology of deviant workplace behaviors: A multidimensional scaling study. Academy of Management Journal, 38, 555–572. https://doi.org/10.2307/256693.Google Scholar
- Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638–641. https://doi.org/10.1037/0033-2909.86.3.638.Google Scholar
- Russell, C. J., & Bobko, P. (1992). Moderated regression analysis and Likert scales: Too coarse for comfort. Journal of Applied Psychology, 77, 336–342. https://doi.org/10.1037//0021-9010.77.3.336.Google Scholar
- Scandura, T. A., & Williams, E. A. (2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43, 1248–1264. https://doi.org/10.2307/1556348.Google Scholar
- Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in research findings (3rd ed.). Thousand Oaks, CA: Sage.Google Scholar
- Schwab, A., & Starbuck, W. H. (in press). A call for openness in research reporting: How to turn covert practices into helpful tools. Academy of Management Learning and Education, 16, 125–141. https://doi.org/10.5465/amle.2016.0039.
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632.Google Scholar
- Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144, 1146–1152. https://doi.org/10.1037/xge0000104.Google Scholar
- Song, F., Parekh, S., Hooper, L., Loke, Y. K., Ryder, J., Sutton, A. J., et al. (2010). Dissemination and publication of research findings: An updated review of related biases. Health Technology Assessment, 14, 1–220. https://doi.org/10.3310/hta14080.Google Scholar
- Spector, P. E., & Fox, S. (2005). The stressor-emotion model of counterproductive work behavior. In S. Fox & P. E. Spector (Eds.), Counterproductive work behavior: Investigations of actors and targets (pp. 151–174). Washington, DC: American Psychological Association.Google Scholar
- Starbuck, W. H. (in press). 60th anniversary essay: How journals could improve research practices in social science. Administrative Science Quarterly, 61, 165–183. https://doi.org/10.1177/0001839216629644.
- Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—Or vice versa. Journal of the American Statistical Association, 54, 30–34. https://doi.org/10.1080/01621459.1959.10501497.Google Scholar
- Tonidandel, S., & LeBreton, J. M. (2011). Relative importance analysis: A useful supplement to regression analysis. Journal of Business and Psychology, 26, 1–9. https://doi.org/10.1007/s10869-010-9204-3.Google Scholar
- Tsang, E. W., & Kwan, K. M. (1999). Replication and theory development in organizational science: A critical realist perspective. Academy of Management Review, 24, 759–780. https://doi.org/10.5465/AMR.1999.2553252.Google Scholar
- Viechtbauer, W. (2010). Conducting meta-analyses in R with the Metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03.Google Scholar
- Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632–638. https://doi.org/10.1177/1745691612463078.Google Scholar
- Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS One, 6(11), e26828. https://doi.org/10.1371/journal.pone.0026828.Google Scholar