Introduction

Concerns exist regarding the credibility of research in the social and natural sciences (Cortina 2015; Kepes and McDaniel 2013; Nosek et al. 2015; Schmidt and Hunter 2015). These concerns are linked, in part, to the use of questionable research or reporting practices (QRPs). QRPs have been defined as “design, analytic, or reporting practices that have been questioned because of the potential for the practice to be employed with the purpose of presenting biased evidence in favor of an assertion” (Banks et al. 2016, p. 3). Examples of commonly discussed QRPs include selectively reporting hypotheses with a preference for those that are statistically significant, “cherry picking” fit indices in structural equation modeling (SEM), and presenting post hoc hypotheses as if they were developed a priori (Banks and O’Boyle 2013; John et al. 2012). Other typical QRPs might include reporting that a p value of 0.054 is p < 0.05 rather than p = 0.05, as well as adding and removing data and control variables in order to turn null results into statistically significant ones (Banks et al. 2016; John et al. 2012). These practices can occur with or without intent to deceive, but exist out of normative assumptions around research. By their presence in the literature, QRPs may harm the development of theory, evidence-based practice, and perceptions of the rigor and relevance of science. Herein, we review the available evidence from the social sciences in order to make conclusions about whether, given what we know to date, such concerns are warranted.

We review the evidence on methodological design and analysis of QRPs in a systematic fashion searching for evidence of the good, the bad, and the ugly. In other words, we looked for instances where QRPs seem not to be a problem (the good), where QRPs are used at a suboptimal rate, but perhaps are not overly problematic (the bad), and finally, we looked for evidence that QRPs represented a serious threat to the inferences made based on reported results (the ugly). We focus primarily on the organizational sciences and related social science fields such as education, political science, and accounting.

Following best practices for a systematic search (Kepes et al. 2013; Reed and Baxter 2009), we conducted a search in December 2015 using primarily Google Scholar and ProQuest Dissertations in order to identify both published and unpublished studies. We also searched for working papers at the National Bureau of Economic Research (http://www.nber.org/papers.html) and Social Science Research Network (http://www.ssrn.com/en/). First, databases were searched using combinations of the following keywords: (1) questionable research practices, (2) questionable reporting practices, (3) QRP, (4) HARKing, (5) p-hacking, (6) p-curve, (7) outcome-reporting bias, (8) underreporting, and (9) research ethics. Second, in addition to searching through the databases, we also conducted a citation search using references identified in Google Scholar. This involved backward- and forward-reference searches where we examined older studies cited by our identified studies and newer studies that cited our identified studies. Third, we submitted a call for published and unpublished studies over listservs, such as those sponsored by the Organizational Behavior, Human Resources, and Research Methods divisions of the Academy of Management.

We limited our search to the social sciences due to the levels of criticism of late directed toward the social and organizational sciences (for reviews see Banks et al. 2016; Kepes and McDaniel 2013; Schmidt and Hunter 2015). Given the differences between the social and natural sciences regarding research methodologies (e.g., experimental designs, iterative research processes), there were concerns that the findings of one might not generalize to another. Furthermore, because of our interest in actual levels of engagement in methodological design-, analytic-, and reporting-QRPs, we excluded studies that focused (as a topic of study) on the treatment of human subjects, plagiarism, sample-level publication bias (i.e., entire samples are missing from the literature; see Kepes et al. 2012), simulations, replications, studies that only used hypothetical scenarios surrounding QRPs (e.g., vignettes) as opposed to considering actual behavior, or studies that focused on individual cases (e.g., retractions). In total, we identified 64 studies through this search. Despite our exhaustive search, we cannot rule out the possibility that a systematic difference may exist between studies that were available for identification compared to those that were not. Given the context, it may be that studies reporting a higher prevalence of engagement in QRPs were more likely to be identified by our search.

Review of Existing Evidence

Our review used a triangulation approach. Triangulation is characterized as “multiple reference points to locate an object’s exact position” (Jick 1979, p. 602). All methodological approaches have limitations and are only as accurate as their underlying assumptions. Triangulation approaches in research use particular methods to compensate for weaknesses in other designs (e.g., Harrison et al. 2014; Rogelberg and Laber 2002). Thus, this approach draws upon multiple study designs, settings, and samples to consider engagement in QRPs (Kepes et al. 2012; Sackett and Larson 1990). Hence, our approach was holistic and allowed the consideration of many types of QRPs.

In the current review, we consider four primary types of evidence. First, we begin with a review of evidence from behavioral observations. This methodology primarily focuses on investigating how unpublished, raw studies in the form of protocols, dissertations, and conference papers transform into published journal articles. Second, we consider evidence from sensitivity analyses. These studies consider the probability of certain results and statistics appearing in journal articles. Third, we review evidence from self-report survey research where people indicate their own engagement in QRPs. Finally, we examine observer reports through survey research where people indicate the extent to which they have observed or know of others who have engaged in QRPs. Within each methodological category, we highlight examples of the research findings.

We summarize our findings in the Appendix, which provides the author, year, field, study topic, sample type, and key findings from each article that we reviewed. To the extent possible, we draw text regarding the key findings directly from the abstracts of each study with a focus on reporting results that highlight the extent to which QRPs are used. We encourage interested readers to refer to the primary studies for additional discussion of the nuanced results that are beyond the scope of our review. In text, we highlight studies that represent the range of findings identified as opposed to just focusing on those that are most exemplar.

Evidence from Behavioral Observations

A common technique used in behavioral observation studies of QRPs is to compare research protocols or early versions of a study (e.g., dissertations, conference papers) to the final paper that is published (O’Boyle et al. 2014; Pigott et al. 2013). The goal is to see if unsupported results were just as likely to appear in the final version as supported results. Further, one can also compare whether or not behaviors such as the removal of data, and adding/removing control variables was associated with turning a nonsignificant result into a significant one.

An advantage of the behavioral observation approach is that one does not have to be concerned with the potential for biased reporting due to social desirability as is the case when self- and observer-report surveys are used. A second advantage of this approach is that it is not dependent on the ability of researchers to recall engagement in QRPs that may have occurred years ago. A third advantage is that the technique is not concerned with researchers’ perceptions of whether their behaviors are inappropriate or appropriate. Rather, the behavioral technique is focused on objectively describing how a paper changed over the course of its history.

That being said, this approach is not without limitations. For instance, there are many studies that are not available as protocols or unpublished manuscripts. Hence, the representativeness of samples used in this sort of research can be questioned. A second limitation of the behavioral approach is that one cannot determine if the motivation for engagement in QRPs was driven by authors or by reviewers and editors who may have pressured authors to use suboptimal research practices as a condition for publication, or a joint combination of the two. Further, it is not known if changes in the reported results may have been due to research practices improving as a result of editor/reviewer feedback and overall author development (which may occur in the event of student dissertations becoming junior faculty publications).

A total of 19 studies were identified that fit our criteria (see “Appendix” section). From these 19 studies, results suggest that although researchers engaged in QRPs to a varying extent, the influence of such practices appear to be severe. Of the 19 behavioral observation studies, 4 appeared to find little to no evidence of engagement in QRPs and the other 15 found more severe evidence. The most common forms of QRPs identified by the behavioral approach tend to be centered on an overabundance of significant findings (versus unsupported hypotheses) or on lax reporting practices with regard to methodological procedures, data cleaning, and/or data analysis. Here are a few examples to highlight the range of findings using this approach:

  • When investigating the potential for data fabrication among undergraduates, Allen et al. (2015) found some evidence of inappropriate behavior. The authors concluded that there was a potential that the behavior was driven by a poor understanding of appropriate research methods and analysis.

  • Bakker and Wicherts (2014) found no differences in median p-values when comparing studies that reported excluding outliers versus those that did not report dropping observations. Yet, this study did find that many studies do not report removing data despite the fact that reported statistics suggest that such removal did occur.

  • O’Boyle et al. (2014) illustrated that when dissertations became published articles, the ratio of supported to unsupported hypotheses more than doubled (0.82:1 vs 1.94:1).

  • After comparing conference papers and associated published journal articles, Banks et al. (2013) concluded that engagement in QRPs was infrequent relative to similar studies in the literature (e.g., O’Boyle et al. 2014; Mazzola and Deuling 2013; Pigott et al. 2013). However, when QRPs were used (e.g., data were removed; hypotheses that predicted a positive relationship were changed to a negative relationship), 34.5 % of unsupported hypotheses became supported, relative to just 13.2 % of supported hypotheses becoming unsupported.

  • When looking across time, Fanelli (2012) found that, from 1990 to 2007, there was a 22 % increase in significant findings in research studies.

Evidence from Sensitivity Analyses

Sensitivity analyses can be used to evaluate engagement in QRPs by calculating the probability that a set of results is possible (Francis et al. 2014). As with the behavioral approach, sensitivity analyses have strengths and limitations. For instance, sensitivity analyses do not require researchers to answer truthfully on questionnaires, nor do researchers need to rely on respondents’ memories of past behaviors. Sensitivity analyses are also not concerned with researchers’ rationalizations of such behaviors, but rather focus on statistical probability estimations. Unlike the behavioral approach, one advantage of sensitivity analyses is that they do not require protocols or early drafts of a study in order to investigate engagement in QRPs. However, this approach can be limited. For instance, sensitivity analyses lose quite a bit of accuracy when attempting to establish the probability that a certain result was found in any individual study. Rather, sensitivity analyses are more accurate when evaluating the probability of a set of results across hundreds of reported results.

A total of 14 studies were identified that fit our criteria and used sensitivity analyses (see “Appendix” section). Of these studies, none appeared to find little to no evidence of engagement in QRPs and the other 14 found more severe evidence. Considering the evidence from sensitivity analyses, it seems that p value manipulation is a widespread practice among the fields included in the current review. That is, a majority of the studies that employed sensitivity analyses suggest that researchers are incorrectly rounding p values or perhaps p-hacking to make their results seem “more significant” than they actually are. Below, we offer a few examples that highlight the range of findings:

  • In their research on p values, de Winter and Dodou (2015) reported that dramatic increases of significant results may have been the result of both QRPs, but also improved methodological designs.

  • After reviewing more than 30,000 articles, Hartgerink et al. (2016) reported direct evidence of p values being rounded incorrectly.

  • Using a sample of over 250,000 p values reported in 20 years of research, Nuijten et al. (2015) found that

    • Half of all published psychology papers that use null hypothesis significance testing (NHST) contained at least one p value that was inconsistent with its test statistic and degrees of freedom.

    • One in eight papers contained a grossly inconsistent p value that may have affected the conclusion drawn.

    • The average prevalence of inconsistent p values has been stable over the years, or has declined.

    • The prevalence of gross inconsistencies was higher in p values reported as significant than in p values reported as nonsignificant.

  • Leggett et al. (2013) found an overabundance of p-values immediately below the critical 0.05 threshold relative to other ranges, despite the finding that the probability of this result was unlikely. Further, the prevalence of this practice seems to have increased over the past 40 years. Several other studies reported similar results of unlikely levels of p-values immediately below the 0.05 threshold (Gerber and Malhotra 2008a, b; Masicampo and Lalande 2012).

  • Despite low power, O’Boyle et al. (2015) found that most moderated multiple regression analyses identify statistically significant results. Further, while sample sizes have remained largely stable over time, the percent of significant results associated with tests of interactions has increased from approximately 42 % in the early 1990s to approximately 72 % in more recent research.

Evidence from Self-Report Surveys

The use of self-report surveys to investigate QRPs has several methodological strengths and limitations. First, given the degree of autonomy and discretion researchers have, there is a great deal of opportunity to engage in suboptimal research practices. In many cases, it is unlikely that even coauthors would be aware if inappropriate techniques were being used to manipulate results. Hence, self-report surveys are a way to identify engagement in QRPs that might not otherwise be observed. Surveys may also be used to investigate the extent to which engagement in QRPs is attributable to authors’ own volitions compared to reviewer and editor requests in the review process. It may be the case that authors engage in inappropriate behavior in anticipation of mitigating reviewers’ and editors’ biases (Banks et al. 2016). Thus, surveys can help to sort out the motives behind engagement in QRPs and external pressures potentially associated with such practices. Relatedly, surveys can assist in disentangling how “questionable” some research behaviors really are for individual researchers. For instance, dropping an outlier, either for theoretical or methodological reasons, can change the conclusions one draws from the results. If a researcher has sound logic for this practice and is transparent, that practice is less questionable than if a researcher manipulates an analysis for the express purpose of turning a nonsignificant result into a statistically significant one. Carefully worded surveys can inform these sorts of issues.

Yet, there are also limitations to self-report surveys. The most obvious is that, even under conditions of confidentiality, researchers may not respond truthfully to survey questions due to socially desirable responding (Berry et al. 2012). Further, researchers may not be honest with themselves and may either answer that they did not engage in a practice or they might rationalize their behavior and make the argument that the behaviors were justified, even if they were not transparent in their reporting. Among those knowingly carrying out unethical practices, there is an incentive to under-report the use of QRPs so that such individuals might continue to keep such practices “under the radar.” Thus, as with any method, there are advantages and disadvantages to the self-report survey when studying QRPs. One of the more problematic concerns may be the underreporting of QRP engagement (which is in many ways similar to the underreporting of counterproductive work behaviors in organizations; Berry et al. 2012). Thus, what we observe may be low-end estimates of QRPs.

A total of 17 studies were identified that fit our criteria and used self-report surveys (see “Appendix” section). Of these studies, 1 appeared to find little to no evidence of engagement in QRPs and the other 16 found more severe evidence. Many of the self-report studies tended to consider a range of QRPs. Overall, though most studies employing self-report methods suggest that researchers are engaging in QRPs, the extent of engagement seemed to vary by QRP type. Taken as a whole, however, our review of the survey research indicates that QRPs are being used at a problematic rate. Here are a few examples that represent the range of findings:

  • Bailey (2015) found a minimal association between researchers’ acceptance of QRPs and the number of publications one has. For example, if researchers believe that the use of QRPs is appropriate, does that tend to mean they are more successful in publishing? While other studies have found a correlation between engagement in QRPs and publishing one’s work in higher impact journals (Banks et al. 2013; O’Boyle et al. 2014), this study considered the issue more indirectly.

  • John et al. (2012) reported that 45–50 % of the researchers surveyed stated that they engaged in selectively reporting results, 22–23 % reported having incorrectly reported p values, and 38–43 % reported having excluded data after considering how the practice would impact the results.

  • Fiedler and Schwarz (2015) criticized past research and suggested that engagement in QRPs may be lower than has been implied and reported elsewhere. They argue that past research has asked if researchers ever engaged in a practice, while Fiedler and Schwarz focused more on how frequently researchers engage in such practices. They argue that their results suggest that base rates were lower than what has been found in other studies.

  • Banks et al. (2016) found that about 11 % of researchers admitted to inappropriately reporting p values. Approximately 50 % of researchers said that they selectively reported results and presented post hoc findings as if they had been determined a priori. About a third of researchers surveyed reported engaging in post hoc exclusion of data and decisions to include/exclude control variables to turn nonsignificant results into significant ones. The reporting of QRPs was not found to vary by academic rank.

Evidence from Observer-Report Surveys

Similar to the previously discussed methodological approaches, there are strengths and limitations to using observer-report surveys to study engagement in QRPs. Many QRPs may occur that cannot be identified via behavioral observations or sensitivity analyses. As with self-report surveys, one advantage of observer reports is that they can unearth those QRPs that can only be studied by asking researchers what occurred behind the scenes of data collection, analysis, and reporting of results. Another advantage of using observer reports is that it reduces the potential for socially desirable responding (as compared to self-report surveys). Nonetheless, even observers in the form of coauthors or colleagues cannot observe and account for all analytic decisions made by other researchers. Thus, similar to self-reports, there is the potential for observer reports to provide underestimates of QRP frequency. While the observer-report approach is not perfect, it does provide complementary information to the other approaches described thus far.

A total of 14 studies were identified that fit our criteria and used observer-report surveys (see “Appendix” section). Of these studies, 1 appeared to find little to no evidence of engagement in QRPs and the other 13 found more severe evidence. Similar to the self-report surveys, the observer reports tended to investigate many QRPs within an individual study. Compared to the evidence from the self-report approach, observer reports paint an even grimmer picture of our scientific practices. The differences in results between the two survey approaches highlight the strengths and weaknesses of each method and illustrate the advantages of triangulation. Whereas people may be more reluctant to self-report their own behaviors, they are willing to report when they have witnessed others engaging in QRPs. Results suggest that a large number of researchers are engaging in QRPs, though, like the self-report evidence, the extent of engagement varies by type. Here are a few examples that represent the range of findings uncovered:

  • Bedeian et al. (2010) found that 79 % of researchers surveyed reported having observed others withholding methodological details or results. Ninety-two percent of respondents also reported having seen others present post hoc findings as those developed a priori and 78 % saw others selectively report findings.

  • In another study focused on doctoral students, Banks et al. (2016) found that 12 % of doctoral student respondents indicated observing inappropriate reporting of p values, 55 % had seen selective reporting of results, and 58 % had seen the practice of reporting post hoc findings as a priori.

  • In a meta-analysis of surveys asking about the behavior of colleagues, Fanelli (2009) found that 72 % of respondents reported observing a variety of QRPs, such as data manipulation.

Summary of the Good, the Bad, and the Ugly in QRP Research

We summarize our key findings in Table 1. In general, there were very few studies which identified little to no evidence for engagement in QRPs. It is not clear if this is because engagement in QRPs is ubiquitous, because of the designs of the QRP studies, or because we had limited access to studies that found little to no engagement in QRPs.

Table 1 Summary of key findings

The extent to which a finding is “bad” relative to “ugly” may depend on the practice itself as well as the frequency with which it is used. For instance, estimates of data falsification from self-reports are roughly 1–2 % (Banks et al. 2016; John et al. 2012). However, when observer reports are used, this number may be as large as 7 % (Banks et al. 2016), 14 % (Fanelli 2009), or even 27 % (Bedeian et al. 2010). Other levels of engagement in QRPs may be considered “bad,” but less harmful, such as inappropriately rounding p values (Banks et al. 2016; John et al. 2012). Some QRPs, such as presenting a post hoc hypothesis as a priori, likely occur at more alarming rates (Banks et al. 2016; Bosco et al. 2015; John et al. 2012; Kerr and Harris 1998). Further, evidence of outcome-reporting bias seems to indicate that the practice is quite prevalent (John et al. 2012; Mazzola and Deuling 2013; O’Boyle et al. 2015; O’Boyle et al. 2014; Pigott et al. 2013) and that editors and reviewers play a role in the prevalence of this practice (Banks et al. 2016; LeBel et al. 2013). Additionally, though some studies found perhaps more mixed evidence of p-values occurring at high rates immediately below the traditional 0.05 threshold (Hartgerink et al. 2016; Nuijten et al. 2015), more found evidence that the practice was much more common (Leggett et al. 2013; Masicampo and Lalande 2012; Gerber and Malhotra 2008a, b).

When interpreting the good, the bad, and the ugly results from the current review, we want to note that there are many examples of sound research practice in our literature (e.g., Becker 2005; Locke 2007). Yet, engagement in QRPs is occurring at rates that far surpass what should be considered acceptable. Thus, some type of action is clearly needed to improve the state of our science. Below, we provide some recommendations for improving publication practices and academic training.

Recommendations for Publication Practices and Academic Training

We believe the QRP discussion engenders a similar ‘debate’ as one sees in a discussion of climate change. For many years (nee, decades), scientists reported findings that indicated significant changes to the Earth’s climate and were met with skepticism about whether the phenomenon was real, the degree to which climate change posed a significant problem, and whether human behavior was responsible. The current review is intended to provide a foundation upon which there can be agreement as to the extent that QRPs have been and are being practiced within the social and organizational sciences. Although the precise scope of the problem may be debated, there is sufficient evidence such that we cannot deny the significant presence of engagement in QRPs—the data do indeed triangulate. The challenges that remain before us are more about how we should best deal with QRPs.

While there are countless recommendations that can be made to address engagement in QRPs, we focus on those recommendations that we believe to be the most impactful. We do wish to note that we believe the challenge of QRPs is more of a “bad barrels” problem than a “bad apples” problem. That is, whereas there will always be individuals who violate responsible and ethical norms of research conduct, the majority of research to date suggests that our research systems inadvertently prime/reward the types of behaviors that derail our science (O’Boyle et al. 2014). Hence, our recommendations focus on addressing the issue of QRPs systematically. We summarize our recommendations in Table 2.

Table 2 Summary of key recommendations

Changes to How We Review and Reward

First, we recommend that journals be more explicit about what sorts of research practices are and are not acceptable, and that they hold authors accountable for following journal policy. Despite the evidence that exists regarding engagement in QRPs, a recent review highlighted the fact that many journals in applied psychology and management, for instance, do not have explicitly stated policies pertaining to the vast majority of the QRPs reviewed in the current study (Banks et al. 2016). This could be easily rectified through the adoption of simple policy statements, and the requirement that submitting authors acknowledging (e.g., by checking boxes during the submission process) that they did not engage in multiple, separately, and explicitly described QRPs.

Second, we acknowledge that authors may engage in QRPs largely due to the pressures associated with publication. In particular, p-hacking, HARKing, selective reporting of results, and the like, are all encouraged by publication practices that implicitly reward the finding of ‘significant results that confirm study hypotheses. Publication models such as Registered Reports or Hybrid Registered Reports address such practices by having authors only submit ‘proposals’ (e.g., https://cos.io/prereg/). That is, the review process is initially results blind. Manuscripts are evaluated on the basis of their theoretical and/or conceptual foundations and proposed methodologies. In the case of registered reports, in-principle acceptances may be offered to studies that are submitted prior to the submission of results and discussion sections.

The advantage of these types of submission models is that authors recognize that the quality of their research questions, hypotheses, and methodology will be evaluated independent of the research results. Thus, for example, if a researcher submitted a compelling and well-designed study as a (hybrid) registered report, which yielded null results, their chance of publishing the study should not be harmed. This approach should therefore serve to temper incentives for engaging in QRPs. Such submission models should also lead to more accurate/less biased reviewer ratings, given that reviewers have been shown to be more critical of research methodologies when null results are present (Emerson et al. 2010).

Several journals in management and applied psychology have begun to offer these sorts of review options for authors (for details see https://jbp.uncc.edu/). Nonprofit organizations, such as The Center for Open Science (https://cos.io/), have offered individual researchers the opportunity to preregister studies independent of journals and even offered 1000 research teams $1000 for successfully publishing preregistered research in order to promote the initiative (https://cos.io/prereg/). In general, journals should also be more accepting of studies with nulls results. Perhaps more special issues on nulls results, such as the effort by the Journal of Business and Psychology are warranted (Landis et al. 2014).

As a third major approach to dealing with the engagement in QRPs, journals might also seek to increase the diversity of research that is published. Rather than an almost exclusive emphasis on papers that conform to the hypothetico-deductive model, editors and reviewers could be more welcoming of papers built upon inductive reasoning (for a review see Spector et al. 2014). More specifically, some have lamented the potential overemphasis on theory for a priori findings without allowing the opportunity for interesting results to ultimately lead to the advancement of theory (Hambrick 2007). Locke (2007) stated that such policies among journals “encourages—in fact demands, premature theorizing and often leads to making up hypotheses after the fact—which is contrary to the intent of the hypothetico-deductive model” (p. 867). Exploratory, inductive research has led to the development of many well-known theories, such as goal-setting theory (Locke and Latham 2002) and social cognitive theory (Bandura 2001). Consequently, inductive research should be encouraged by journals as well as abductive approaches to research (Aliseda 2006). In general, journal editors could be more inclusive of different types of studies and correspondingly match their reviewer rating forms, examples, and exemplars—and furthermore, reviewers could be trained to welcome broader types of research.

In the end, well-conducted impactful research, in the many forms it can come, should be what we value (and publish). We have to make sure that our publication practices ensure that this is the case. We believe that (1) innovations to the review process, (2) promotion of inductive and abductive research, and (3) emphasis on publishing high-quality null results are three of the most critical steps that journal editors can take. The preceding points aside, there are many other tangible changes that can be made to our publication practices. For instance, principles such as those comprising the Editor Ethics code of conduct (https://editorethics.uncc.edu/) encourage implementing practices to reduce engagement in QRPs among action editors, reviewers, and authors. Further, journals may consider policies that promote open science and sharing among researchers by following the Transparency and Openness Guidelines (Nosek et al. 2015).

Changes to How We Train Students

To this point, our recommendations have largely focused on editorial policies. This emphasis is because editors and reviewers (including promotion and tenure reviewers) act as critical gatekeepers and so we believe that they have great responsibility to promote positive change (Rupp 2011). In other words, it is our general contention that authors will ultimately align with whatever editors and reviewers reward. That being said, we believe that authors still have important responsibilities to engage in sound, scientific practices, and codes of ethics exist to provide such guidance (see http://aom.org/About-AOM/Code-of-Ethics.aspx as well as http://www.apa.org/ethics/code/). At the same time, scholars must constantly engage in self-development exercises to ensure their personal competence in making the right decisions and being able to effectively evaluate others research. Programs such as the Center for the Advancement of Research Methods and Analysis (CARMA) could serve to improve students’ (as well as their mentors’ and instructors’) understanding and use of statistics, such as p-values and fit indices.

Conduct More Research

Finally, there is still more we need to understand about QRPs. We note that most QRP research to date has focused primarily on practices that affect p values and that more work is needed that investigates other types of QRPs, such as, for example, fit indices in SEM (Banks and O’Boyle 2013), the specification of priors in Bayesian statistics (Banks et al. 2016), or misreporting interview results in qualitative research. Research has indicated that engagement in QRPs occurs when implementing null hypothesis significance testing (NHST), but it is not clear the extent to which engagement in QRPs is problematic for these other research approaches. We concur that research is sorely needed to evaluate the effectiveness of all the recommended strategies for reducing QRPs suggested herein.

Conclusion

The current study conducted a search of the literature on methodological design, analytic, and reporting of research practices that are questionable in nature. Of the 64 studies that fit our criteria, 6 appeared to find little to no evidence of engagement in QRPs and the other 58 found more severe evidence (91 %). Each of the studies reviewed had limitations associated with the various methods they employed. However, our triangulation approach allows us to have greater confidence that the findings uncovered are robust. Based on this analysis, we conclude that it is unlikely that most researchers engage in QRPs every time a study is conducted. For instance, if a team of researchers designs a study and finds support for most of their hypotheses, it is doubtful that there is motivation or a need to engage in QRPs. Yet, if initial support is largely not found, given the time, money, and energy that went into conducting a study and the enormous pressure from the current incentive system to publish, it is likely that researchers begin to consciously or subconsciously tinker with their analyses, their processes, and their reporting in order to present the best possible story to reviewers—to win the publishing “game.” We hope that this review and our subsequent recommendations serve to advance a collegial dialogue on QRPs and to promote tangible and needed change.