HARKing: How Badly Can Cherry-Picking and Question Trolling Produce Bias in Published Results?

Murphy, Kevin R.; Aguinis, Herman

doi:10.1007/s10869-017-9524-7

HARKing: How Badly Can Cherry-Picking and Question Trolling Produce Bias in Published Results?

Original Paper
Published: 11 December 2017

Volume 34, pages 1–17, (2019)
Cite this article

Journal of Business and Psychology Aims and scope Submit manuscript

7076 Accesses
75 Citations
66 Altmetric
5 Mentions
Explore all metrics

Abstract

The practice of hypothesizing after results are known (HARKing) has been identified as a potential threat to the credibility of research results. We conducted simulations using input values based on comprehensive meta-analyses and reviews in applied psychology and management (e.g., strategic management studies) to determine the extent to which two forms of HARKing behaviors might plausibly bias study outcomes and to examine the determinants of the size of this effect. When HARKing involves cherry-picking, which consists of searching through data involving alternative measures or samples to find the results that offer the strongest possible support for a particular hypothesis or research question, HARKing has only a small effect on estimates of the population effect size. When HARKing involves question trolling, which consists of searching through data involving several different constructs, measures of those constructs, interventions, or relationships to find seemingly notable results worth writing about, HARKing produces substantial upward bias particularly when it is prevalent and there are many effects from which to choose. Results identify the precise circumstances under which different forms of HARKing behaviors are more or less likely to have a substantial impact on a study’s substantive conclusions and the field’s cumulative knowledge. We offer suggestions for authors, consumers of research, and reviewers and editors on how to understand, minimize, detect, and deter detrimental forms of HARKing in future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

Patrik Aspers & Ugo Corte

Criteria for Good Qualitative Research: A Comprehensive Review

Article Open access 18 September 2021

Drishti Yadav

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Gordon W. Cheung, Helena D. Cooper-Thomas, … Linda C. Wang

Notes

When ρ is very large, ceiling effects can limit the biases produced by HARKing. When ρ is equal to or very near 0, bias is limited because the largest effect is equally likely to be negative as it is to be positive. In addition, when ρ = 0, HARKing will produce a distribution of sample effects whose mean is not changed but whose standard deviation is inflated.
Although this method is rarely encountered in the research literature, several software packages (e.g., NCSS, JMP) include an even more aggressive option—i.e., one that evaluates all possible regression models, starting with models that include two variables and examining every possible combination of predictors until the full p-variable model is tested.

References

Aguinis, H., & Vandenberg, R. J. (2014). An ounce of prevention is worth a pound of cure: Improving research quality before data collection. Annual Review of Organizational Psychology and Organizational Behavior, 1, 569–595.
Article Google Scholar
Aguinis, H., Werner, S., Abbott, J. L., Angert, C., Park, J. H., & Kohlhausen, D. (2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13, 515–539.
Article Google Scholar
Aguinis, H., Dalton, D. R., Bosco, F. A., Pierce, C. A., & Dalton, C. M. (2011). Meta-analytic choices and judgment calls: Implications for theory building and testing, obtained effect sizes, and scholarly impact. Journal of Management, 37, 5–38.
Article Google Scholar
Aguinis, H., Shapiro, D. L., Antonacopoulou, E., & Cummings, T. G. (2014). Scholarly impact: A pluralist conceptualization. Academy of Management Learning and Education, 13, 623–639.
Article Google Scholar
Aguinis, H., Cascio, W. F., & Ramani, R. S. (2017). Science’s reproducibility and replicability crisis: International business is not immune. Journal of International Business Studies, 48, 653–663.
Article Google Scholar
Aguinis, H., Ramani, R. S., & Alabduljader, N. (in press). What you see is what you get? Enhancing methodological transparency in management research. Academy of Management Annals. https://doi.org/10.5465/annals.2016.0011.
Bamberger, P., & Ang, S. (2016). The quantitative discovery: What is it and how to get it published. Academy of Management Discoveries, 2, 1–6.
Article Google Scholar
Banks, G. C., O’Boyle, E. H., Pollack, J. M., White, C. D., Batchelor, J. H., Whelpley, C. E., …, Adkins, C. L. (2016a). Questions about questionable research practices in the field of management: A guest commentary. Journal of Management, 42, 5–20.
Banks, G. C., Rogelberg, S. G., Woznyj, H. M., Landis, R. S., & Rupp, D. E. (2016b). Editorial: Evidence on questionable research practices: The good, the bad and the ugly. Journal of Business and Psychology, 31, 323–338.
Article Google Scholar
Bedeian, A. G., Taylor, S. G., & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9, 715–725.
Google Scholar
Bergh, D. D., Aguinis, H., Heavey, C., Ketchen, D. J., Boyd, B. K., Su, P., Lau, C., & Joo, H. (2016). Using meta-analytic structural equation modeling to advance strategic management research: Guidelines and an empirical illustration via the strategic leadership-performance relationship. Strategic Management Journal, 37, 477–497.
Article Google Scholar
Bergh, D. D., Sharp, B. M., Aguinis, H., & Li, M. (2017). Is there a credibility crisis in strategic management research? Evidence on the reproducibility of study findings. Strategic Organization, 15, 423–436.
Article Google Scholar
Bernerth, J., & Aguinis, H. (2016). A critical review and best-practice recommendations for control variable usage. Personnel Psychology, 69, 229–283.
Article Google Scholar
Bettis, R. A., Ethiraj, S., Gambardella, A., Helfat, C., & Mitchell, W. (2016). Creating repeatable cumulative knowledge in strategic management: A call for a broad and deep conversation among authors, referees, and editors. Strategic Management Journal, 37, 257–261.
Article Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley.
Book Google Scholar
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449.
Article Google Scholar
Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2016). HARKing’s threat to organizational research: Evidence from primary and meta-analytic sources. Personnel Psychology, 69, 709–750.
Article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum.
Google Scholar
Cortina, J. M., & Landis, R. S. (2009). When small effect sizes tell a big story, and when large effect sizes don’t. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Doctrine, verity, and fable in the organizational and social sciences (pp. 287–308). New York: Routledge.
Google Scholar
Cortina, J. M., Aguinis, H., & DeShon, R. P. (2017). Twilight of dawn or of evening? A century of research methods in the Journal of Applied Psychology. Journal of Applied Psychology, 102, 274–290.
Article Google Scholar
Derksen, S., & Keselman, H. J. (1992). Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45, 265–282.
Article Google Scholar
Edwards, J. R., Berry JW. (2010). The presence of something or the absence of nothing: Increasing theoretical precision in management research. Organizational Research Methods, 13, 668–689. https://doi.org/10.1177/1094428110380467
Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One, 4, e5738.
Article Google Scholar
Fisher, G., & Aguinis, H. (2017). Using theory elaboration to make theoretical advancements. Organizational Research Methods, 20, 438–464.
Article Google Scholar
Grand, J. A., Rogelberg, S. G., Allen, T. D., Landis, R. S., Reynolds, D. H., Scott, J. C., Tonidandel, S., & Truxillo, D. M. (in press). A systems-based approach to fostering robust science in industrial-organizational psychology. Industrial and Organizational Psychology: Perspectives on Science and Practice.
Hambrick DC. (2007). The field of management’s devotion to theory: Too much of a good thing? Academy of Management Journal, 50, 1346–1352. http://doi.org/10.2307/20159476
Harrell, H. (2011). Regression modeling strategies with applications to linear models, logistic regression and survival analysis. New York: Springer-Verlag.
Google Scholar
Hayduk, L. A. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore: Johns Hopkins University Press.
Google Scholar
Hitchcock, C., & Sober, E. (2004). Prediction versus accommodation and the risk of overfitting. British Journal for the Philosophy of Science, 55, 1–34.
Article Google Scholar
Hollenbeck, J. H., & Wright, P. M. (2017). Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43, 5–18.
Article Google Scholar
Honig, B., Lampel, J., Siegel, D., & Drnevich, P. (2014). Ethics in the production and dissemination of management research: Institutional failure or individual fallibility. Journal of Management Studies, 51, 118–142.
Article Google Scholar
Hubbard R, Armstrong JS. (1997). Publication bias against null results. Psychological Reports, 80, 337–338. https://doi.org/10.2466/pr0.1997.80.1.337
Jensen, A. (1980). Bias in mental testing. New York: Free Press.
Google Scholar
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science, 23, 524–532.
Article Google Scholar
Judd, C. M., & McClelland, G. H. (1989). Data analysis: A model comparison approach. New York: Harcourt.
Google Scholar
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality & Social Psychology Review, 2, 196.
Article Google Scholar
Ketchen, D. J., Boyd, B. K., & Bergh, D. D. (2008). Research methodology in strategic management past accomplishments and future challenges. Organizational Research Methods, 11, 643–658.
Article Google Scholar
Ketchen, D. J., Ireland, R. D., & Baker, L. T. (2013). The use of archival proxies in strategic management studies: Castles made of sand? Organizational Research Methods, 16, 32–42.
Article Google Scholar
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press.
Google Scholar
Landers, R. N., Brusso, R. C., Cavanaugh, K. J., & Collmus, A. B. (2016). A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research. Psychological Methods, 21, 475–492.
Article Google Scholar
Landis, R. S., Edwards, B. D., & Cortina, J. M. (2009). On the practice of allowing correlated residuals among indicators in structural equation models. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 193–214). New York: Routledge/Taylor & Francis Group.
Google Scholar
Leung, K. (2011). Presenting post hoc hypotheses as a priori: Ethical and theoretical issues. Management and Organization Review, 7, 471–479.
Article Google Scholar
Lipton, P. (2005). Testing hypotheses: Prediction and prejudice. Science, 307, 219–221.
Article Google Scholar
Lo, A. W., & MacKinlay, A. C. (1990). Data-snooping biases in tests of financial asset pricing models. Review of Financial Studies, 3, 431–467.
Article Google Scholar
Locke, E. A. (2007). The case for inductive theory building. Journal of Management, 33, 867–890.
Article Google Scholar
Locke, K., Golden-Biddle, K., & Feldman, M. S. (2008). Perspective-making doubt generative: Rethinking the role of doubt in the research process. Organization Science, 19, 907–918.
Article Google Scholar
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational and goal-oriented perspectives. Newbury Park: Sage.
Google Scholar
Neuroskeptic. (2012). The nine circles of scientific hell. Perspectives on Psychological Science, 7, 643–644.
Article Google Scholar
O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43, NPi. https://doi.org/10.1177/0149206314527133.
Orlitzky M. (2012). How can significance tests be deinstitutionalized? Organizational Research Methods, 15, 199–228. https://doi.org/10.1177/1094428111428356
Pfeffer J. (2007). A modest proposal: How we might change the process and prod- uct of managerial research. Academy of Management Journal, 50, 1334–1345. https://doi.org//10.2307/20159475
Pigliucci, M. (2009). The end of theory in science? EMBO Reports, 10, 534.
Article Google Scholar
Shaw, J. B. (2017). Advantages of starting with theory. Academy of Management Journal, 60, 819–822.
Article Google Scholar
Shen, W., Kiger, T. B., Davies, S. E., Rasch, R. L., Simon, K. M., & Ones, D. S. (2011). Samples in applied psychology: Over a decade of research in review. Journal of Applied Psychology, 96, 1055–1064.
Article Google Scholar
Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384.
Article Google Scholar
Thurstone, L. L. (1934). The vectors of the mind. American Psychologist, 41, 1–32.
Google Scholar
Tonidandel, S., King, E. B., & Cortina, J. M. (Eds.). (2016). Big data at work: The data science revolution and organizational psychology. New York: Routledge.
Google Scholar
Wasserman, R. (2013). Ethical issues and guidelines for conducting data analysis in psychological research. Ethics and Behavior, 23, 3–15.
Article Google Scholar
White R. (2003). The epistemic advantage of prediction over accommodation. Mind, 112, 653–683. https://doi.10.1093/mind/112.448.653
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604.
Article Google Scholar
Wing, H. (1982). Statistical hazards in the determination of adverse impact with small samples. Personnel Psychology, 35, 153–162.
Article Google Scholar
Wright, P. M. (2016). Ensuring research integrity: An editor’s perspective. Journal of Management, 42, 1037–1043.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Personnel and Employment Relations, Kemmy Business School, University of Limerick, Plassey Road, Castletroy, Limerick, Ireland
Kevin R. Murphy
George Washington University, Washington, DC, USA
Herman Aguinis

Authors

Kevin R. Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Herman Aguinis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin R. Murphy.

Appendix. R Codes Used in Simulation Studies

The codes below calculate the expected results if 100% of studies engage in either cherry-picking or question trolling. The final estimates of the values expected if some proportion of all studies involve either cherry-picking or question trolling are obtained by calculating the weighted average (weighted by estimated prevalence) of the values produced by the codes below and the expected value of R = 0.20 if there is no cherry-picking or question trolling.

R code for cherry-picking

R code for question trolling

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murphy, K.R., Aguinis, H. HARKing: How Badly Can Cherry-Picking and Question Trolling Produce Bias in Published Results?. J Bus Psychol 34, 1–17 (2019). https://doi.org/10.1007/s10869-017-9524-7

Download citation

Published: 11 December 2017
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10869-017-9524-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HARKing: How Badly Can Cherry-Picking and Question Trolling Produce Bias in Published Results?

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

Criteria for Good Qualitative Research: A Comprehensive Review

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Notes

References