Journal of Business and Psychology

, Volume 31, Issue 3, pp 323–338 | Cite as

Editorial: Evidence on Questionable Research Practices: The Good, the Bad, and the Ugly

  • George C. Banks
  • Steven G. Rogelberg
  • Haley M. Woznyj
  • Ronald S. Landis
  • Deborah E. Rupp
Editorial

Abstract

Purpose

Questionable research or reporting practices (QRPs) contribute to a growing concern regarding the credibility of research in the organizational sciences and related fields. Such practices include design, analytic, or reporting practices that may introduce biased evidence, which can have harmful implications for evidence-based practice, theory development, and perceptions of the rigor of science.

Design/Methodology/Approach

To assess the extent to which QRPs are actually a concern, we conducted a systematic review to consider the evidence on QRPs. Using a triangulation approach (e.g., by reviewing data from observations, sensitivity analyses, and surveys), we identified the good, the bad, and the ugly.

Findings

Of the 64 studies that fit our criteria, 6 appeared to find little to no evidence of engagement in QRPs and the other 58 found more severe evidence (91 %).

Implications

Drawing upon the findings, we provide recommendations for future research related to publication practices and academic training.

Originality/value

We report findings from studies that suggest that QRPs are not a problem, that QRPs are used at a suboptimal rate, and that QRPs present a threat to the viability of organizational science research.

Keywords

Questionable research practices QRPs Research methodology Philosophy of science Ethics Research methods 

Introduction

Concerns exist regarding the credibility of research in the social and natural sciences (Cortina 2015; Kepes and McDaniel 2013; Nosek et al. 2015; Schmidt and Hunter 2015). These concerns are linked, in part, to the use of questionable research or reporting practices (QRPs). QRPs have been defined as “design, analytic, or reporting practices that have been questioned because of the potential for the practice to be employed with the purpose of presenting biased evidence in favor of an assertion” (Banks et al. 2016, p. 3). Examples of commonly discussed QRPs include selectively reporting hypotheses with a preference for those that are statistically significant, “cherry picking” fit indices in structural equation modeling (SEM), and presenting post hoc hypotheses as if they were developed a priori (Banks and O’Boyle 2013; John et al. 2012). Other typical QRPs might include reporting that a p value of 0.054 is p < 0.05 rather than p = 0.05, as well as adding and removing data and control variables in order to turn null results into statistically significant ones (Banks et al. 2016; John et al. 2012). These practices can occur with or without intent to deceive, but exist out of normative assumptions around research. By their presence in the literature, QRPs may harm the development of theory, evidence-based practice, and perceptions of the rigor and relevance of science. Herein, we review the available evidence from the social sciences in order to make conclusions about whether, given what we know to date, such concerns are warranted.

We review the evidence on methodological design and analysis of QRPs in a systematic fashion searching for evidence of the good, the bad, and the ugly. In other words, we looked for instances where QRPs seem not to be a problem (the good), where QRPs are used at a suboptimal rate, but perhaps are not overly problematic (the bad), and finally, we looked for evidence that QRPs represented a serious threat to the inferences made based on reported results (the ugly). We focus primarily on the organizational sciences and related social science fields such as education, political science, and accounting.

Following best practices for a systematic search (Kepes et al. 2013; Reed and Baxter 2009), we conducted a search in December 2015 using primarily Google Scholar and ProQuest Dissertations in order to identify both published and unpublished studies. We also searched for working papers at the National Bureau of Economic Research (http://www.nber.org/papers.html) and Social Science Research Network (http://www.ssrn.com/en/). First, databases were searched using combinations of the following keywords: (1) questionable research practices, (2) questionable reporting practices, (3) QRP, (4) HARKing, (5) p-hacking, (6) p-curve, (7) outcome-reporting bias, (8) underreporting, and (9) research ethics. Second, in addition to searching through the databases, we also conducted a citation search using references identified in Google Scholar. This involved backward- and forward-reference searches where we examined older studies cited by our identified studies and newer studies that cited our identified studies. Third, we submitted a call for published and unpublished studies over listservs, such as those sponsored by the Organizational Behavior, Human Resources, and Research Methods divisions of the Academy of Management.

We limited our search to the social sciences due to the levels of criticism of late directed toward the social and organizational sciences (for reviews see Banks et al. 2016; Kepes and McDaniel 2013; Schmidt and Hunter 2015). Given the differences between the social and natural sciences regarding research methodologies (e.g., experimental designs, iterative research processes), there were concerns that the findings of one might not generalize to another. Furthermore, because of our interest in actual levels of engagement in methodological design-, analytic-, and reporting-QRPs, we excluded studies that focused (as a topic of study) on the treatment of human subjects, plagiarism, sample-level publication bias (i.e., entire samples are missing from the literature; see Kepes et al. 2012), simulations, replications, studies that only used hypothetical scenarios surrounding QRPs (e.g., vignettes) as opposed to considering actual behavior, or studies that focused on individual cases (e.g., retractions). In total, we identified 64 studies through this search. Despite our exhaustive search, we cannot rule out the possibility that a systematic difference may exist between studies that were available for identification compared to those that were not. Given the context, it may be that studies reporting a higher prevalence of engagement in QRPs were more likely to be identified by our search.

Review of Existing Evidence

Our review used a triangulation approach. Triangulation is characterized as “multiple reference points to locate an object’s exact position” (Jick 1979, p. 602). All methodological approaches have limitations and are only as accurate as their underlying assumptions. Triangulation approaches in research use particular methods to compensate for weaknesses in other designs (e.g., Harrison et al. 2014; Rogelberg and Laber 2002). Thus, this approach draws upon multiple study designs, settings, and samples to consider engagement in QRPs (Kepes et al. 2012; Sackett and Larson 1990). Hence, our approach was holistic and allowed the consideration of many types of QRPs.

In the current review, we consider four primary types of evidence. First, we begin with a review of evidence from behavioral observations. This methodology primarily focuses on investigating how unpublished, raw studies in the form of protocols, dissertations, and conference papers transform into published journal articles. Second, we consider evidence from sensitivity analyses. These studies consider the probability of certain results and statistics appearing in journal articles. Third, we review evidence from self-report survey research where people indicate their own engagement in QRPs. Finally, we examine observer reports through survey research where people indicate the extent to which they have observed or know of others who have engaged in QRPs. Within each methodological category, we highlight examples of the research findings.

We summarize our findings in the Appendix, which provides the author, year, field, study topic, sample type, and key findings from each article that we reviewed. To the extent possible, we draw text regarding the key findings directly from the abstracts of each study with a focus on reporting results that highlight the extent to which QRPs are used. We encourage interested readers to refer to the primary studies for additional discussion of the nuanced results that are beyond the scope of our review. In text, we highlight studies that represent the range of findings identified as opposed to just focusing on those that are most exemplar.

Evidence from Behavioral Observations

A common technique used in behavioral observation studies of QRPs is to compare research protocols or early versions of a study (e.g., dissertations, conference papers) to the final paper that is published (O’Boyle et al. 2014; Pigott et al. 2013). The goal is to see if unsupported results were just as likely to appear in the final version as supported results. Further, one can also compare whether or not behaviors such as the removal of data, and adding/removing control variables was associated with turning a nonsignificant result into a significant one.

An advantage of the behavioral observation approach is that one does not have to be concerned with the potential for biased reporting due to social desirability as is the case when self- and observer-report surveys are used. A second advantage of this approach is that it is not dependent on the ability of researchers to recall engagement in QRPs that may have occurred years ago. A third advantage is that the technique is not concerned with researchers’ perceptions of whether their behaviors are inappropriate or appropriate. Rather, the behavioral technique is focused on objectively describing how a paper changed over the course of its history.

That being said, this approach is not without limitations. For instance, there are many studies that are not available as protocols or unpublished manuscripts. Hence, the representativeness of samples used in this sort of research can be questioned. A second limitation of the behavioral approach is that one cannot determine if the motivation for engagement in QRPs was driven by authors or by reviewers and editors who may have pressured authors to use suboptimal research practices as a condition for publication, or a joint combination of the two. Further, it is not known if changes in the reported results may have been due to research practices improving as a result of editor/reviewer feedback and overall author development (which may occur in the event of student dissertations becoming junior faculty publications).

A total of 19 studies were identified that fit our criteria (see “Appendix” section). From these 19 studies, results suggest that although researchers engaged in QRPs to a varying extent, the influence of such practices appear to be severe. Of the 19 behavioral observation studies, 4 appeared to find little to no evidence of engagement in QRPs and the other 15 found more severe evidence. The most common forms of QRPs identified by the behavioral approach tend to be centered on an overabundance of significant findings (versus unsupported hypotheses) or on lax reporting practices with regard to methodological procedures, data cleaning, and/or data analysis. Here are a few examples to highlight the range of findings using this approach:
  • When investigating the potential for data fabrication among undergraduates, Allen et al. (2015) found some evidence of inappropriate behavior. The authors concluded that there was a potential that the behavior was driven by a poor understanding of appropriate research methods and analysis.

  • Bakker and Wicherts (2014) found no differences in median p-values when comparing studies that reported excluding outliers versus those that did not report dropping observations. Yet, this study did find that many studies do not report removing data despite the fact that reported statistics suggest that such removal did occur.

  • O’Boyle et al. (2014) illustrated that when dissertations became published articles, the ratio of supported to unsupported hypotheses more than doubled (0.82:1 vs 1.94:1).

  • After comparing conference papers and associated published journal articles, Banks et al. (2013) concluded that engagement in QRPs was infrequent relative to similar studies in the literature (e.g., O’Boyle et al. 2014; Mazzola and Deuling 2013; Pigott et al. 2013). However, when QRPs were used (e.g., data were removed; hypotheses that predicted a positive relationship were changed to a negative relationship), 34.5 % of unsupported hypotheses became supported, relative to just 13.2 % of supported hypotheses becoming unsupported.

  • When looking across time, Fanelli (2012) found that, from 1990 to 2007, there was a 22 % increase in significant findings in research studies.

Evidence from Sensitivity Analyses

Sensitivity analyses can be used to evaluate engagement in QRPs by calculating the probability that a set of results is possible (Francis et al. 2014). As with the behavioral approach, sensitivity analyses have strengths and limitations. For instance, sensitivity analyses do not require researchers to answer truthfully on questionnaires, nor do researchers need to rely on respondents’ memories of past behaviors. Sensitivity analyses are also not concerned with researchers’ rationalizations of such behaviors, but rather focus on statistical probability estimations. Unlike the behavioral approach, one advantage of sensitivity analyses is that they do not require protocols or early drafts of a study in order to investigate engagement in QRPs. However, this approach can be limited. For instance, sensitivity analyses lose quite a bit of accuracy when attempting to establish the probability that a certain result was found in any individual study. Rather, sensitivity analyses are more accurate when evaluating the probability of a set of results across hundreds of reported results.

A total of 14 studies were identified that fit our criteria and used sensitivity analyses (see “Appendix” section). Of these studies, none appeared to find little to no evidence of engagement in QRPs and the other 14 found more severe evidence. Considering the evidence from sensitivity analyses, it seems that p value manipulation is a widespread practice among the fields included in the current review. That is, a majority of the studies that employed sensitivity analyses suggest that researchers are incorrectly rounding p values or perhaps p-hacking to make their results seem “more significant” than they actually are. Below, we offer a few examples that highlight the range of findings:
  • In their research on p values, de Winter and Dodou (2015) reported that dramatic increases of significant results may have been the result of both QRPs, but also improved methodological designs.

  • After reviewing more than 30,000 articles, Hartgerink et al. (2016) reported direct evidence of p values being rounded incorrectly.

  • Using a sample of over 250,000 p values reported in 20 years of research, Nuijten et al. (2015) found that
    • Half of all published psychology papers that use null hypothesis significance testing (NHST) contained at least one p value that was inconsistent with its test statistic and degrees of freedom.

    • One in eight papers contained a grossly inconsistent p value that may have affected the conclusion drawn.

    • The average prevalence of inconsistent p values has been stable over the years, or has declined.

    • The prevalence of gross inconsistencies was higher in p values reported as significant than in p values reported as nonsignificant.

  • Leggett et al. (2013) found an overabundance of p-values immediately below the critical 0.05 threshold relative to other ranges, despite the finding that the probability of this result was unlikely. Further, the prevalence of this practice seems to have increased over the past 40 years. Several other studies reported similar results of unlikely levels of p-values immediately below the 0.05 threshold (Gerber and Malhotra 2008a, b; Masicampo and Lalande 2012).

  • Despite low power, O’Boyle et al. (2015) found that most moderated multiple regression analyses identify statistically significant results. Further, while sample sizes have remained largely stable over time, the percent of significant results associated with tests of interactions has increased from approximately 42 % in the early 1990s to approximately 72 % in more recent research.

Evidence from Self-Report Surveys

The use of self-report surveys to investigate QRPs has several methodological strengths and limitations. First, given the degree of autonomy and discretion researchers have, there is a great deal of opportunity to engage in suboptimal research practices. In many cases, it is unlikely that even coauthors would be aware if inappropriate techniques were being used to manipulate results. Hence, self-report surveys are a way to identify engagement in QRPs that might not otherwise be observed. Surveys may also be used to investigate the extent to which engagement in QRPs is attributable to authors’ own volitions compared to reviewer and editor requests in the review process. It may be the case that authors engage in inappropriate behavior in anticipation of mitigating reviewers’ and editors’ biases (Banks et al. 2016). Thus, surveys can help to sort out the motives behind engagement in QRPs and external pressures potentially associated with such practices. Relatedly, surveys can assist in disentangling how “questionable” some research behaviors really are for individual researchers. For instance, dropping an outlier, either for theoretical or methodological reasons, can change the conclusions one draws from the results. If a researcher has sound logic for this practice and is transparent, that practice is less questionable than if a researcher manipulates an analysis for the express purpose of turning a nonsignificant result into a statistically significant one. Carefully worded surveys can inform these sorts of issues.

Yet, there are also limitations to self-report surveys. The most obvious is that, even under conditions of confidentiality, researchers may not respond truthfully to survey questions due to socially desirable responding (Berry et al. 2012). Further, researchers may not be honest with themselves and may either answer that they did not engage in a practice or they might rationalize their behavior and make the argument that the behaviors were justified, even if they were not transparent in their reporting. Among those knowingly carrying out unethical practices, there is an incentive to under-report the use of QRPs so that such individuals might continue to keep such practices “under the radar.” Thus, as with any method, there are advantages and disadvantages to the self-report survey when studying QRPs. One of the more problematic concerns may be the underreporting of QRP engagement (which is in many ways similar to the underreporting of counterproductive work behaviors in organizations; Berry et al. 2012). Thus, what we observe may be low-end estimates of QRPs.

A total of 17 studies were identified that fit our criteria and used self-report surveys (see “Appendix” section). Of these studies, 1 appeared to find little to no evidence of engagement in QRPs and the other 16 found more severe evidence. Many of the self-report studies tended to consider a range of QRPs. Overall, though most studies employing self-report methods suggest that researchers are engaging in QRPs, the extent of engagement seemed to vary by QRP type. Taken as a whole, however, our review of the survey research indicates that QRPs are being used at a problematic rate. Here are a few examples that represent the range of findings:
  • Bailey (2015) found a minimal association between researchers’ acceptance of QRPs and the number of publications one has. For example, if researchers believe that the use of QRPs is appropriate, does that tend to mean they are more successful in publishing? While other studies have found a correlation between engagement in QRPs and publishing one’s work in higher impact journals (Banks et al. 2013; O’Boyle et al. 2014), this study considered the issue more indirectly.

  • John et al. (2012) reported that 45–50 % of the researchers surveyed stated that they engaged in selectively reporting results, 22–23 % reported having incorrectly reported p values, and 38–43 % reported having excluded data after considering how the practice would impact the results.

  • Fiedler and Schwarz (2015) criticized past research and suggested that engagement in QRPs may be lower than has been implied and reported elsewhere. They argue that past research has asked if researchers ever engaged in a practice, while Fiedler and Schwarz focused more on how frequently researchers engage in such practices. They argue that their results suggest that base rates were lower than what has been found in other studies.

  • Banks et al. (2016) found that about 11 % of researchers admitted to inappropriately reporting p values. Approximately 50 % of researchers said that they selectively reported results and presented post hoc findings as if they had been determined a priori. About a third of researchers surveyed reported engaging in post hoc exclusion of data and decisions to include/exclude control variables to turn nonsignificant results into significant ones. The reporting of QRPs was not found to vary by academic rank.

Evidence from Observer-Report Surveys

Similar to the previously discussed methodological approaches, there are strengths and limitations to using observer-report surveys to study engagement in QRPs. Many QRPs may occur that cannot be identified via behavioral observations or sensitivity analyses. As with self-report surveys, one advantage of observer reports is that they can unearth those QRPs that can only be studied by asking researchers what occurred behind the scenes of data collection, analysis, and reporting of results. Another advantage of using observer reports is that it reduces the potential for socially desirable responding (as compared to self-report surveys). Nonetheless, even observers in the form of coauthors or colleagues cannot observe and account for all analytic decisions made by other researchers. Thus, similar to self-reports, there is the potential for observer reports to provide underestimates of QRP frequency. While the observer-report approach is not perfect, it does provide complementary information to the other approaches described thus far.

A total of 14 studies were identified that fit our criteria and used observer-report surveys (see “Appendix” section). Of these studies, 1 appeared to find little to no evidence of engagement in QRPs and the other 13 found more severe evidence. Similar to the self-report surveys, the observer reports tended to investigate many QRPs within an individual study. Compared to the evidence from the self-report approach, observer reports paint an even grimmer picture of our scientific practices. The differences in results between the two survey approaches highlight the strengths and weaknesses of each method and illustrate the advantages of triangulation. Whereas people may be more reluctant to self-report their own behaviors, they are willing to report when they have witnessed others engaging in QRPs. Results suggest that a large number of researchers are engaging in QRPs, though, like the self-report evidence, the extent of engagement varies by type. Here are a few examples that represent the range of findings uncovered:
  • Bedeian et al. (2010) found that 79 % of researchers surveyed reported having observed others withholding methodological details or results. Ninety-two percent of respondents also reported having seen others present post hoc findings as those developed a priori and 78 % saw others selectively report findings.

  • In another study focused on doctoral students, Banks et al. (2016) found that 12 % of doctoral student respondents indicated observing inappropriate reporting of p values, 55 % had seen selective reporting of results, and 58 % had seen the practice of reporting post hoc findings as a priori.

  • In a meta-analysis of surveys asking about the behavior of colleagues, Fanelli (2009) found that 72 % of respondents reported observing a variety of QRPs, such as data manipulation.

Summary of the Good, the Bad, and the Ugly in QRP Research

We summarize our key findings in Table 1. In general, there were very few studies which identified little to no evidence for engagement in QRPs. It is not clear if this is because engagement in QRPs is ubiquitous, because of the designs of the QRP studies, or because we had limited access to studies that found little to no engagement in QRPs.
Table 1

Summary of key findings

 

Description

1

The vast majority of studies included in the current review identified evidence for the engagement in QRPs (91 % of studies)

2

Engagement in QRPs occurs at a nonideal rate and the current rates are likely underestimates

3

The extent to which QRPs are problematic varies by type and engagement frequency

4

Observer reports of QRPs typically illustrated higher prevalence rates than self-reports of engagement in QRPs

5

Nonsurvey-based methods, such as behavioral observations or sensitivity analyses, also provided consistent evidence of engagement in QRPs

6

Some of the more common QRPs include HARKing and selectively reporting results with a preference for those findings that are statistically significant

7

Editors and reviewers appear to play a role in the prevalence of QRPs

8

Engagement in QRPs has not been shown to vary by academic rank

9

The vast majority of QRP research has focused primarily on practices that affect p-values; more work is needed that investigates other types of QRPs

The extent to which a finding is “bad” relative to “ugly” may depend on the practice itself as well as the frequency with which it is used. For instance, estimates of data falsification from self-reports are roughly 1–2 % (Banks et al. 2016; John et al. 2012). However, when observer reports are used, this number may be as large as 7 % (Banks et al. 2016), 14 % (Fanelli 2009), or even 27 % (Bedeian et al. 2010). Other levels of engagement in QRPs may be considered “bad,” but less harmful, such as inappropriately rounding p values (Banks et al. 2016; John et al. 2012). Some QRPs, such as presenting a post hoc hypothesis as a priori, likely occur at more alarming rates (Banks et al. 2016; Bosco et al. 2015; John et al. 2012; Kerr and Harris 1998). Further, evidence of outcome-reporting bias seems to indicate that the practice is quite prevalent (John et al. 2012; Mazzola and Deuling 2013; O’Boyle et al. 2015; O’Boyle et al. 2014; Pigott et al. 2013) and that editors and reviewers play a role in the prevalence of this practice (Banks et al. 2016; LeBel et al. 2013). Additionally, though some studies found perhaps more mixed evidence of p-values occurring at high rates immediately below the traditional 0.05 threshold (Hartgerink et al. 2016; Nuijten et al. 2015), more found evidence that the practice was much more common (Leggett et al. 2013; Masicampo and Lalande 2012; Gerber and Malhotra 2008a, b).

When interpreting the good, the bad, and the ugly results from the current review, we want to note that there are many examples of sound research practice in our literature (e.g., Becker 2005; Locke 2007). Yet, engagement in QRPs is occurring at rates that far surpass what should be considered acceptable. Thus, some type of action is clearly needed to improve the state of our science. Below, we provide some recommendations for improving publication practices and academic training.

Recommendations for Publication Practices and Academic Training

We believe the QRP discussion engenders a similar ‘debate’ as one sees in a discussion of climate change. For many years (nee, decades), scientists reported findings that indicated significant changes to the Earth’s climate and were met with skepticism about whether the phenomenon was real, the degree to which climate change posed a significant problem, and whether human behavior was responsible. The current review is intended to provide a foundation upon which there can be agreement as to the extent that QRPs have been and are being practiced within the social and organizational sciences. Although the precise scope of the problem may be debated, there is sufficient evidence such that we cannot deny the significant presence of engagement in QRPs—the data do indeed triangulate. The challenges that remain before us are more about how we should best deal with QRPs.

While there are countless recommendations that can be made to address engagement in QRPs, we focus on those recommendations that we believe to be the most impactful. We do wish to note that we believe the challenge of QRPs is more of a “bad barrels” problem than a “bad apples” problem. That is, whereas there will always be individuals who violate responsible and ethical norms of research conduct, the majority of research to date suggests that our research systems inadvertently prime/reward the types of behaviors that derail our science (O’Boyle et al. 2014). Hence, our recommendations focus on addressing the issue of QRPs systematically. We summarize our recommendations in Table 2.
Table 2

Summary of key recommendations

Description

Changes to how we review and reward

 1. Journals should be more explicit about what sorts of research practices are and are not acceptable, and hold authors accountable for following journal policy

 2. Journals should consider the implementation and evaluation of the effectiveness of Registered Reports or Hybrid Registered Reports

 3. Journals should be more accepting of null results

 4. Journals should seek to increase diversity of research that is published to include more exploratory, inductive research as well as research based on abduction

 5. Journals should be more welcoming of replication research

Changes to how we train students

 6. Doctoral programs should emphasize cultures that promote open science and collaboration while discouraging engagement in QRPs

 7. Research ethics training in research methods classes should expand beyond a sole focus on the protection of human subjects to also include discussions regarding QRPs

Conduct more research

 8. There is still more we need to understand about QRPs. To date there has been a focus primarily on practices that affect p-values. More work is needed that examines other types of QRPs, such as in structural equation modeling, specification of priors in Bayesian statistics, or misreporting data in qualitative research

Changes to How We Review and Reward

First, we recommend that journals be more explicit about what sorts of research practices are and are not acceptable, and that they hold authors accountable for following journal policy. Despite the evidence that exists regarding engagement in QRPs, a recent review highlighted the fact that many journals in applied psychology and management, for instance, do not have explicitly stated policies pertaining to the vast majority of the QRPs reviewed in the current study (Banks et al. 2016). This could be easily rectified through the adoption of simple policy statements, and the requirement that submitting authors acknowledging (e.g., by checking boxes during the submission process) that they did not engage in multiple, separately, and explicitly described QRPs.

Second, we acknowledge that authors may engage in QRPs largely due to the pressures associated with publication. In particular, p-hacking, HARKing, selective reporting of results, and the like, are all encouraged by publication practices that implicitly reward the finding of ‘significant results that confirm study hypotheses. Publication models such as Registered Reports or Hybrid Registered Reports address such practices by having authors only submit ‘proposals’ (e.g., https://cos.io/prereg/). That is, the review process is initially results blind. Manuscripts are evaluated on the basis of their theoretical and/or conceptual foundations and proposed methodologies. In the case of registered reports, in-principle acceptances may be offered to studies that are submitted prior to the submission of results and discussion sections.

The advantage of these types of submission models is that authors recognize that the quality of their research questions, hypotheses, and methodology will be evaluated independent of the research results. Thus, for example, if a researcher submitted a compelling and well-designed study as a (hybrid) registered report, which yielded null results, their chance of publishing the study should not be harmed. This approach should therefore serve to temper incentives for engaging in QRPs. Such submission models should also lead to more accurate/less biased reviewer ratings, given that reviewers have been shown to be more critical of research methodologies when null results are present (Emerson et al. 2010).

Several journals in management and applied psychology have begun to offer these sorts of review options for authors (for details see https://jbp.uncc.edu/). Nonprofit organizations, such as The Center for Open Science (https://cos.io/), have offered individual researchers the opportunity to preregister studies independent of journals and even offered 1000 research teams $1000 for successfully publishing preregistered research in order to promote the initiative (https://cos.io/prereg/). In general, journals should also be more accepting of studies with nulls results. Perhaps more special issues on nulls results, such as the effort by the Journal of Business and Psychology are warranted (Landis et al. 2014).

As a third major approach to dealing with the engagement in QRPs, journals might also seek to increase the diversity of research that is published. Rather than an almost exclusive emphasis on papers that conform to the hypothetico-deductive model, editors and reviewers could be more welcoming of papers built upon inductive reasoning (for a review see Spector et al. 2014). More specifically, some have lamented the potential overemphasis on theory for a priori findings without allowing the opportunity for interesting results to ultimately lead to the advancement of theory (Hambrick 2007). Locke (2007) stated that such policies among journals “encourages—in fact demands, premature theorizing and often leads to making up hypotheses after the fact—which is contrary to the intent of the hypothetico-deductive model” (p. 867). Exploratory, inductive research has led to the development of many well-known theories, such as goal-setting theory (Locke and Latham 2002) and social cognitive theory (Bandura 2001). Consequently, inductive research should be encouraged by journals as well as abductive approaches to research (Aliseda 2006). In general, journal editors could be more inclusive of different types of studies and correspondingly match their reviewer rating forms, examples, and exemplars—and furthermore, reviewers could be trained to welcome broader types of research.

In the end, well-conducted impactful research, in the many forms it can come, should be what we value (and publish). We have to make sure that our publication practices ensure that this is the case. We believe that (1) innovations to the review process, (2) promotion of inductive and abductive research, and (3) emphasis on publishing high-quality null results are three of the most critical steps that journal editors can take. The preceding points aside, there are many other tangible changes that can be made to our publication practices. For instance, principles such as those comprising the Editor Ethics code of conduct (https://editorethics.uncc.edu/) encourage implementing practices to reduce engagement in QRPs among action editors, reviewers, and authors. Further, journals may consider policies that promote open science and sharing among researchers by following the Transparency and Openness Guidelines (Nosek et al. 2015).

Changes to How We Train Students

To this point, our recommendations have largely focused on editorial policies. This emphasis is because editors and reviewers (including promotion and tenure reviewers) act as critical gatekeepers and so we believe that they have great responsibility to promote positive change (Rupp 2011). In other words, it is our general contention that authors will ultimately align with whatever editors and reviewers reward. That being said, we believe that authors still have important responsibilities to engage in sound, scientific practices, and codes of ethics exist to provide such guidance (see http://aom.org/About-AOM/Code-of-Ethics.aspx as well as http://www.apa.org/ethics/code/). At the same time, scholars must constantly engage in self-development exercises to ensure their personal competence in making the right decisions and being able to effectively evaluate others research. Programs such as the Center for the Advancement of Research Methods and Analysis (CARMA) could serve to improve students’ (as well as their mentors’ and instructors’) understanding and use of statistics, such as p-values and fit indices.

Conduct More Research

Finally, there is still more we need to understand about QRPs. We note that most QRP research to date has focused primarily on practices that affect p values and that more work is needed that investigates other types of QRPs, such as, for example, fit indices in SEM (Banks and O’Boyle 2013), the specification of priors in Bayesian statistics (Banks et al. 2016), or misreporting interview results in qualitative research. Research has indicated that engagement in QRPs occurs when implementing null hypothesis significance testing (NHST), but it is not clear the extent to which engagement in QRPs is problematic for these other research approaches. We concur that research is sorely needed to evaluate the effectiveness of all the recommended strategies for reducing QRPs suggested herein.

Conclusion

The current study conducted a search of the literature on methodological design, analytic, and reporting of research practices that are questionable in nature. Of the 64 studies that fit our criteria, 6 appeared to find little to no evidence of engagement in QRPs and the other 58 found more severe evidence (91 %). Each of the studies reviewed had limitations associated with the various methods they employed. However, our triangulation approach allows us to have greater confidence that the findings uncovered are robust. Based on this analysis, we conclude that it is unlikely that most researchers engage in QRPs every time a study is conducted. For instance, if a team of researchers designs a study and finds support for most of their hypotheses, it is doubtful that there is motivation or a need to engage in QRPs. Yet, if initial support is largely not found, given the time, money, and energy that went into conducting a study and the enormous pressure from the current incentive system to publish, it is likely that researchers begin to consciously or subconsciously tinker with their analyses, their processes, and their reporting in order to present the best possible story to reviewers—to win the publishing “game.” We hope that this review and our subsequent recommendations serve to advance a collegial dialogue on QRPs and to promote tangible and needed change.

References

  1. Aliseda, A. (2006). Abductive reasoning: Logical investigations into discovery and explanation. Dordrecht: Springer.Google Scholar
  2. Allen, P. J., Lourenco, A., & Roberts, L. D. (2015). Detecting duplication in students’ research data: A method and illustration. Ethics & Behavior. doi:10.1080/10508422.2015.1019070.Google Scholar
  3. Bailey, C. D. (2015). Psychopathy, academic accountants’ attitudes toward unethical research practices, and publication success. The Accounting Review, 90(4), 1307–1332.CrossRefGoogle Scholar
  4. Bakker, M., & Wicherts, J. M. (2014). Outlier removal and the relation with reporting errors and quality of psychological research. PLoS One, 9(7), e103360.CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bandura, A. (2001). Social cognitive theory: An agentic perspective. Annual Review of Psychology, 52, 1–26.CrossRefPubMedGoogle Scholar
  6. Banks, G. C., et al. (2016). Questions about questionable research practices in the field of management: A guest commentary. Journal of Management, 42(1), 5–20.CrossRefGoogle Scholar
  7. Banks, G. C., & O’Boyle, E. H. (2013). Why we need industrial-organizational psychology to fix industrial-organizational psychology. Industrial and Organizational Psychology, 6, 291–294.CrossRefGoogle Scholar
  8. Banks, G. C., O’Boyle, E. H., White, C. D., & Batchelor, J. H. (2013). Tracking SMA papers to journal publication: An investigation into the phases of dissemination bias, Paper presented at the 2013 annual meeting of the Southern Management Association, New Orleans, LA.Google Scholar
  9. Becker, T. E. (2005). Potential problems in the statistical control of variables in organizational research: A qualitative analysis with recommendations. Organizational Research Methods, 8, 274–289.CrossRefGoogle Scholar
  10. Bedeian, A. G., Taylor, S. G., & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9(4), 715–725. doi:10.5465/amle.2010.56659889.CrossRefGoogle Scholar
  11. Berry, C. M., Carpenter, N. C., & Barratt, C. L. (2012). Do other reports of counterproductive work behavior provide an incremental contribution over self-reports? A meta-analytic comparison. Journal of Applied Psychology, 97, 613–636. doi:10.1037/a0026739.CrossRefPubMedGoogle Scholar
  12. Bosco, F. A., Aguinis, H., Field, J. G., Pierce, C. A., & Dalton, D. R. (2015). HARKing’s threat to organizational research: Evidence from primary and meta-analytic sources. Personnel Psychology. doi:10.1111/peps.12111.Google Scholar
  13. Braun, M., & Roussos, A. J. (2012). Psychotherapy researchers: Reported misbehaviors and opinions. Journal of Empirical Research on Human Research Ethics, 7(5), 25–29.CrossRefPubMedGoogle Scholar
  14. Cortina, J. M. (2015). A revolution with a solution. Opening plenary presented at the meeting of the Society for Industrial/Organizational Psychology, Philadelphia, PA.Google Scholar
  15. Davis, M. S., Riske-Morris, M., & Diaz, S. R. (2007). Causal factors implicated in research misconduct: Evidence from ORI case files. Science and Engineering Ethics, 13(4), 395–414.CrossRefPubMedGoogle Scholar
  16. de Winter, J. C. F., & Dodou, D. (2015). A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ, 3, e733. doi:10.7717/peerj.733.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S. (2010). Testing for the presence of positive-outcome bias in peer review: A randomized controlled trial. Archives of Internal Medicine, 170, 1934–1939. doi:10.1001/archinternmed.2010.406.CrossRefPubMedGoogle Scholar
  18. Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One, 4(5), e5738.CrossRefPubMedPubMedCentralGoogle Scholar
  19. Fanelli, D. (2010). Do pressures to publish increase scientists’ bias? An empirical support from US States Data. PloS One, 5(4), e10271.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891–904. doi:10.1007/s11192-011-0494-7.CrossRefGoogle Scholar
  21. Fiedler, K., & Schwarz, N. (2015). Questionable research practices revisited. Social Psychological and Personality Science, 7, 45–52.CrossRefGoogle Scholar
  22. Field, J. G., Mihm, D., O’Boyle, E. H., Bosco, F. A., Uggerslev, K., & Steel, P. (2015). An examination of the funding-finding relation in the field of management. Academy of Management Proceedings. Paper presented at the Academy of Management Annual Meeting, Vancouver, Canada (p. 17463).Google Scholar
  23. Field et al. (2016). The extent of p-hacking in I/O psychology. Paper presented at the Society of Industrial/Organizational Psychology Annual Conference in Anaheim, CA.Google Scholar
  24. Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin & Review, 21(5), 1180–1187.CrossRefGoogle Scholar
  25. Francis, G., Tanzman, J., & Matthews, W. J. (2014). Excess success for psychology articles in the journal Science. PLoS One, 9(12), e114255.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Franco, A., Malhotra, N., & Simonovits, G. (2016). Underreporting in psychology experiments evidence from a study registry. Social Psychological and Personality Science, 7(1), 8–12.CrossRefGoogle Scholar
  27. Gerber, A., & Malhotra, N. (2008a). Do statistical reporting standards affect what is published? Publication bias in two leading political science journals. Quarterly Journal of Political Science, 3, 313–326. doi:10.1561/100.00008024.CrossRefGoogle Scholar
  28. Gerber, A. S., & Malhotra, N. (2008b). Publication bias in empirical sociological research do arbitrary significance levels distort published results? Sociological Methods & Research, 37, 3–30. doi:10.1177/0049124108318973.CrossRefGoogle Scholar
  29. Glick, J. L., & Shamoo, A. E. (1994). Results of a survey on research practices, completed by attendees at the third conference on research policies and quality assurance. Accountability in Research, 3(4), 275–280.CrossRefGoogle Scholar
  30. Hambrick, D. C. (2007). The field of management’s devotion to theory: Too much of a good thing? Academy of Management Journal, 50, 1346–1352.CrossRefGoogle Scholar
  31. Harrison, J. S., Banks, G. C., Pollack, J. M., O’Boyle Jr., E. H., & Short, J. C. (2014). Publication bias in strategic management research. Journal of Management. doi:10.1177/0149206314535438. Google Scholar
  32. Hartgerink, C. H., van Aert, R. C., Nuijten, M. B., Wicherts, J. M., & van Assen, M. A. (2016). Distributions of p-values smaller than.05 in Psychology: What is going on? PeerJ, 4, e1935.CrossRefPubMedPubMedCentralGoogle Scholar
  33. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biology, 13(3), e1002106.CrossRefPubMedPubMedCentralGoogle Scholar
  34. Jick, T. D. (1979). Mixing qualitative and quantitative methods: Triangulation in action. Administrative Science Quarterly, 24, 602–611. doi:10.2307/2392366.CrossRefGoogle Scholar
  35. John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. doi:10.1177/0956797611430953.CrossRefPubMedGoogle Scholar
  36. Jørgensen, M., Dybå, T., Liestøl, K., & Sjøberg, D. I. (2015). Incorrect results in software engineering experiments: How to improve research practices. Journal of Systems and Software,. doi:10.1016/j.jss.2015.03.065.Google Scholar
  37. Kattenbraker, M. (2007). Health education research and publication: ethical considerations and the response of health educators (Unpublished thesis). Southern Illinois University Carbondale, Carbondale, IL.Google Scholar
  38. Kepes, S., Banks, G. C., McDaniel, M. A., & Whetzel, D. L. (2012). Publication bias in the organizational sciences. Organizational Research Methods, 15, 624–662. doi:10.1177/1094428112452760.CrossRefGoogle Scholar
  39. Kepes, S., & McDaniel, M. A. (2013). How trustworthy is the scientific literature in I-O psychology? Industrial and Organizational Psychology: Perspectives on Science and Practice, 6, 252–268.CrossRefGoogle Scholar
  40. Kepes, S., McDaniel, M. A., Brannick, M. T., & Banks, G. C. (2013). Meta-analytic reviews in the organizational sciences: Two meta-analytic schools on the way to MARS (the Meta-Analytic Reporting Standards). Journal of Business and Psychology, 28, 123–143.CrossRefGoogle Scholar
  41. Kerr, N. L., & Harris, S. E. (1998). HARKing: hypothesizing after the results are known: Views from three disciplines. Unpublished manuscript, Michigan State University, East Lansing.Google Scholar
  42. Krawczyk, M. (2015). The search for significance: A few peculiarities in the distribution of p-values in experimental psychology literature. PloS One, 10(6), e0127872.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Landis, R. S., Lance, C. E., Pierce, C. A., & Rogelberg, S. G. (2014). When is nothing something? Editorial for the null results special issue of Journal of Business and Psychology. Journal of Business and Psychology, 29, 163–167. doi:10.1007/s10869-014-9347-8.CrossRefGoogle Scholar
  44. LeBel, E. P., Borsboom, D., Giner-Sorolla, R., Hasselman, F., Peters, K. R., Ratliff, K. A., & Smith, C. T. (2013). PsychDisclosure.org grassroots support for reforming reporting standards in psychology. Perspectives on Psychological Science, 8(4), 424–432.CrossRefPubMedGoogle Scholar
  45. Leggett, N. C., Thomas, N. A., Loetscher, T., & Nicholls, M. E. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66(12), 2303–2309.CrossRefPubMedGoogle Scholar
  46. List, J. A., & Gallet, C. A. (2001). What experimental protocol influence disparities between actual and hypothetical stated values? Environmental and Resource Economics, 20(3), 241–254.CrossRefGoogle Scholar
  47. Locke, E. A. (2007). The case for inductive theory building. Journal of Management, 33, 867–890.CrossRefGoogle Scholar
  48. Locke, E. A., & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American Psychologist, 57, 705–717.CrossRefPubMedGoogle Scholar
  49. Martinson, B. C., Anderson, M. S., Crain, A. L., & De Vries, R. (2006). Scientists’ perceptions of organizational justice and self-reported misbehaviors. Journal of Empirical Research on Human Research Ethics, 1(1), 51–66.CrossRefPubMedPubMedCentralGoogle Scholar
  50. Martinson, B. C., Anderson, M. S., & De Vries, R. (2005). Scientists behaving badly. Nature, 435(7043), 737–738.CrossRefPubMedGoogle Scholar
  51. Martinson, B. C., Crain, A. L., Anderson, M. S., & De Vries, R. (2009). Institutions’ expectations for researchers’ self-funding, federal grant holding and private industry involvement: Manifold drivers of self-interest and researcher behavior. Academic Medicine: Journal of the Association of American Medical Colleges, 84(11), 1491–1499.CrossRefGoogle Scholar
  52. Masicampo, E. J., & Lalande, D. R. (2012). A peculiar prevalence of p-values just below. 05. The Quarterly Journal of Experimental Psychology and Aging, 65(11), 2271–2279. doi:10.1080/17470218.2012.711335.CrossRefGoogle Scholar
  53. Masters, E. A. (2012). Research misconduct in National Science Foundation funded research a mixed-methods analysis of 2007-2011 research awards (Unpublished doctoral dissertation). Northcentral University, Prescott Valley, AZ.Google Scholar
  54. Matthes, J., Marquart, F., Naderer, B., Arendt, F., Schmuck, D., & Adam, K. (2015). Questionable research practices in experimental communication research: A systematic analysis from 1980 to 2013. Communication Methods and Measures, 9(4), 193–207.CrossRefGoogle Scholar
  55. Mazzola, J. J., & Deuling, J. K. (2013). Forgetting what we learned as graduate students: HARKing and selective outcome reporting in I-O journal articles. Industrial and Organizational Psychology: Perspectives on Science and Practice, 6(03), 279–284.CrossRefGoogle Scholar
  56. Meyer, M. J., & McMahon, D. (2004). An examination of ethical research conduct by experienced and novice accounting academics. Issues in Accounting Education, 19(4), 413–442.CrossRefGoogle Scholar
  57. Nagel, M., Wicherts, J. M., & Bakker, M. Participant exclusion in psychological research: A study of its effects on research results. Unpublished manuscript.Google Scholar
  58. Necker, S. (2014). Scientific misbehavior in economics. Research Policy, 43(10), 1747–1759.CrossRefGoogle Scholar
  59. Nosek, B. A., et al. (2015). Promoting an open research culture: Author guidelines for journals to promote transparency, openness, and reproducibility. Science, 348, 1422–1425. doi:10.1126/science.aab2374.CrossRefPubMedPubMedCentralGoogle Scholar
  60. Nuijten, M. B., Hartgerink, C. H., van Assen, M. A., Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods. doi:10.3758/s13428-015-0664-2.Google Scholar
  61. O’Boyle, E. H., Banks, G. C., & Gonzalez-Mule, E. (2014). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management. doi:10.1177/0149206314527133.Google Scholar
  62. O’Boyle, E. H., Banks, G. C., Carter, K., Walter, S., & Yuan, Z. (2015). A 20-year review of outcome reporting bias in moderated multiple regression. Paper presented at the annual meeting of the Academy of Management, Vancouver, British Columbia.Google Scholar
  63. Pigott, T. D., Valentine, J. C., Polanin, J. R., Williams, R. T., & Canada, D. D. (2013). Outcome-reporting bias in education research. Educational Researcher. doi:10.3102/0013189X13507104.Google Scholar
  64. Rajah-Kanagasabai, C. J., & Roberts, L. D. (2015). Predicting self-reported research misconduct and questionable research practices in university students using an augmented Theory of Planned Behavior. Frontiers in Psychology, 6, 1–11.CrossRefGoogle Scholar
  65. Reed, J. G., & Baxter, P. M. (2009). Using reference databases. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (pp. 74–101). New York: Russell Sage Foundation.Google Scholar
  66. Riordan, C. A., & Marlin, N. A. (1987). Some good news about some bad practices. American Psychologist, 42(1), 104–106.CrossRefGoogle Scholar
  67. Rogelberg, S. G., & Laber, M. (2002). Securing our collective future: Challenges facing those designing and doing research in Industrial and Organizational Psychology. In S. G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 479–485). London: Blackwell.Google Scholar
  68. Rupp, D. E. (2011). Research and publishing ethics: Editor and reviewer responsibilities. Management and Organizational Review, 7, 481–493.CrossRefGoogle Scholar
  69. Sackett, P. R., & Larson, J. R. (1990). Research strategies and tactics in industrial and organizational psychology. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 1, pp. 419–489). Palo Alto, CA: Consulting Psychologists Press.Google Scholar
  70. Schimmack, U. (2014). Quantifying statistical research integrity: The Replicabilty-Index. Unpublished manuscript.Google Scholar
  71. Schmidt, F. L., & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in research findings (3rd ed.). Newbury Park, CA: Sage.Google Scholar
  72. Spector, P. E., Rogelberg, S. G., Ryan, A. M., Schmitt, N., & Zedeck, S. (2014). Moving the pendulum back to the middle: Reflections on and introduction to the inductive research special issue of Journal of Business and Psychology. Journal of Business and Psychology, 29, 499–502. doi:10.1007/s10869-014-9372-7.CrossRefGoogle Scholar
  73. Swazey, J. P., Anderson, M. S., Lewis, K. S., & Louis, K. S. (1993). Ethical problems in academic research. American Scientist, 81(6), 542–553.Google Scholar
  74. Tangney, J. P. (1987). Fraud will out-or will it? New Scientist, 115, 62–63.PubMedGoogle Scholar
  75. Titus, S. L., Wells, J. A., & Rhoades, L. J. (2008). Repairing research integrity. Nature, 453(7198), 980–982.CrossRefPubMedGoogle Scholar
  76. Trainor, B. P. (2015). Incomplete reporting: Addressing the problem of outcome-reporting bias in educational research (Unpublished doctoral dissertation). Loyala University, Chigao, IL.Google Scholar
  77. Vasilev, M. R. (2013). Negative results in European psychology journals. Europe’s Journal of Psychology, 9(4), 717–730.CrossRefGoogle Scholar
  78. Veldkamp, C. L., Nuijten, M. B., Dominguez-Alvarez, L., van Assen, M. A., & Wicherts, J. M. (2014). Statistical reporting errors and collaboration on statistical analyses in psychological science. PloS One, 9(12), e114876.CrossRefPubMedPubMedCentralGoogle Scholar
  79. Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4(3), 274–290.CrossRefPubMedGoogle Scholar
  80. Wilson, K., Schreier, A., Griffin, A., & Resnik, D. (2007). Research records and the resolution of misconduct allegations at research universities. Accountability in Research, 14(1), 57–71.CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • George C. Banks
    • 1
  • Steven G. Rogelberg
    • 1
  • Haley M. Woznyj
    • 1
  • Ronald S. Landis
    • 2
  • Deborah E. Rupp
    • 3
    • 4
  1. 1.University of North Carolina at CharlotteBelk College of BusinessCharlotteUSA
  2. 2.Illinois Institute of TechnologyChicagoUSA
  3. 3.Purdue UniversityWest LafayetteUSA
  4. 4.University of the Western Cape BellvilleSouth Africa

Personalised recommendations