To the best of our knowledge, this is the first direct test of hypotheses about the causes of scientific misconduct to be conducted on an unbiased sample of papers containing flawed or fabricated data. Our sample included papers containing honest errors and intentional fabrications in unknown relative proportions. However, we correctly predicted that category 1 duplications would exhibit smaller or null effects, whilst most significant effects, if observed at all, would be observed in categories 2 and 3 (Fig. 1, see pre-registration at osf.io/w53yu). Support of this prediction retrospectively confirms that, as suggested in a previous description of this sample (Bik et al. 2016), category 1 duplications are most likely the result of unintentional errors or flawed methodologies, whilst categories 2 and 3 duplications are likely to contain a substantial proportion of intentional fabrications.
Results obtained on categories 2 and 3 papers, corroborated by multiple secondary analyses (see SI), supported some predictions of the hypotheses tested, but did not support or openly contradicted other preditions:
-
Pressure to publish hypothesis was either not supported, or partially supported. On the one hand, early-career researchers, and researchers working in countries where publications are rewarded with cash incentives were at higher risk of image duplication, as predicted. The correlation with cash incentives may not be taken to imply that such incentives were directly involved in the problematic image duplications, but simply that such incentives may reflect the value system in certain research communities that might incentivize negligence and/or misconduct in research.
-
On the other hand, however, countries having other publication incentive policies had a null or even negative risk (Fig. 1b). In further refutation of predictions, an author’s individual publication rate and impact were not or negatively associated with image duplication. Surprisingly, in secondary multivariable analyses we observed a positive association between the publication rate of first authors and the risk of duplication (Fig S1). The latter finding might represent the first direct support of this prediction, or might be a false positive resulting from the multiple analyses conducted in this study. Future studies should be directed at confirming this result by repeating our protocol on a targeted sample of studies with image duplications or other proxies of scientific misconduct, published in the United States, EU15, Canada, New Zealand or Australia.
-
Social control hypothesis was supported. In univariable analyses, predictions based on socio-cultural academic environments of different countries were in large agreement with observations (Fig. 1c). In multivariable analyses that adjusted for country characteristics, we observed a consistent negative interaction between the number of authors and the number of countries per author in a paper, which is in agreement with predictions stemming from this hypothesis (Fig. 4).
-
Misconduct policy hypothesis was partially supported. Countries with national and legally enforceable policies against scientific misconduct were significantly less likely to produce image duplications (Figs. 1e, 4a). However, other misconduct policy categories were not associated with a reduced risk of image duplication, and tended if anything to have a higher risk. As already noted above with regard to the effect of publication incentive policies, this study cannot prove a cause-effect relationship. The negative association between the risk of image duplication and the presence of national misconduct policies may be explained by hypothesizing that scientists in countries with such policies follow higher standards of research methodology and integrity.
-
Gender hypothesis was not supported. In none of the main and secondary analyses did we observe the predicted higher risk for males. Indeed, contrary to predictions, in secondary analyses we observed a significant positive association between female gender and risk of image duplication (Fig. 4b). Given the multiplicity of analyses conducted in this study, however, this latter finding may be spurious and should be corroborated by future confirmatory studies.
A previous, analogous analysis conducted on retracted and corrected papers and co-authored by two of us (DF and RC) had led to largely similar conclusions (Fanelli et al. 2015). The likelihood to correct papers for unintentional errors was not associated with most parameters, similarly to what this study observed for category 1 duplications. The likelihood to retract papers, instead, was found to be significantly associated with misconduct policies, academic culture, as well as early-career status and average impact score of first or last author, similarly to what was found here for categories 2 and 3 duplications.
Differently from what we observed on image duplications, however, the individual publication rate had been found in a previous analysis to be negatively associated with the risk of retraction and positively with that of corrections (Fanelli et al. 2015). We hypothesize that at least two factors may explain this difference in results. First, the previous study included retractions for every possible error and form of misconduct, including plagiarism, whereas the present analysis is dedicated to a specific form of error or manipulation. Second, analyses on retractions are intrinsically biased and subject to many confounding factors, because retractions are the end result of a complex chain of events (e.g. a reader signals a possible problem to the journal, the journal editor contacts the author, the author’s institution starts an investigation, etc.) which can be subjected to many sources of noise and bias. Results of this study, therefore, if on the one hand may be less generalizable to other forms of research misconduct, on the other hand are likely to be more accurate than results obtained on retractions.
Results of this study are also in remarkable agreement with results of a recent assessment of the causes of bias in science, co-authored by two of us (Fanelli et al. 2017). This latter study tested similar hypotheses using identical independent variables on a completely different outcome (i.e. the likelihood to over-estimate results in meta-analysis) and using a completely different study design. This convergence of results is striking and strongly suggests that our quantitative analyses are detecting genuine underlying patterns that reflect a connection between research integrity and characteristics of the authors, team and country.
The present study has avoided many of the confounding factors that limit studies on retractions, but could not avoid other limitations. An overall limitation is that the kind of image duplication analyzed in this study is only one amongst many possible forms of data falsification and fabrication. This restriction limits in principle broad generalizations. However, as discussed above, our results are in large agreement with previous analyses that encompassed all forms of bias and/or misconduct (Fanelli et al. 2017; Fanelli et al. 2015), which suggests that our findings are consistent with general patterns linked to these phenomena.
Two other possible limitations of our study design make our results conservative. Firstly, we cannot ascertain which of the duplications were actually due to scientific misconduct and which to honest error, systematic error or negligence. Secondly, our individual-level analyses focused on characteristics of the first and the last author, under the assumption that these authors are most likely to be responsible for any flaws in a publication, but we do not actually know who, amongst the co-authors of each paper, was actually responsible. Both of these limitations ought to increase confidence in our results, because they are likely to have reduced the magnitude of measurable effect sizes. As our univariable analyses confirmed, image duplications that are most likely not due to scientific misconduct but to unintentional error are also less likely to be associated with any tested factor (Fig. 1). Similarly, if and to any extent that duplicated images had not been produced by first or last authors, then characteristics of first and last author should not be associated with the incidence of data duplication. Therefore, to any extent that our study was affected by either of these two limitations, these have simply introduced random noise in our data, reducing the magnitude of any measurable effect and thus making our results more conservative.
Based on conventions adopted in biomedical research, first authors are usually the team members who were mostly responsible for producing the data. This would suggest that first authors are also more likely to be responsible for data duplications compared to last authors, and therefore might exhibit stronger effects in the predicted direction. Across our analyses, we found very little differences between effects estimated on first and last authors. However, country was likely to be a significant confounding factor. In 93.8% of papers in our sample, first and last author were from the same country, which suggests that the characteristics of first and last authors were more similar within papers than between papers. Intriguingly, in secondary multivariable analyses restricted to a relatively homogenous subgroup of industrialized countries (USA, Canada, EU15, Australia and New Zealand), we observed that for first authors, all effects were observed to statistically significant levels, and with magnitudes significantly above those of last authors (Fig. 4b). This suggests again that, when significant confounding factors are taken into account, individual characteristics might indeed successfully predict the risk of scientific misconduct, and that first authors are the category most at risk. These results cannot be considered conclusive, due to their exploratory nature. Future studies should be designed to confirm these observations, which might be of exceptional importance to understand and predict research misconduct.
The possible presence of random error and noise in our data might have reduced the statistical power of our analyses, for the same reasons discussed above. However, our statistical power was relatively large. Even when restricted to the smallest subset (e.g. category 3 duplications) our analyses had over 89% power to detect an effect of small magnitude. We can therefore conclude that, despite the limitations discussed above, all of our tests had sufficient power to detect small effects and a high power to detect large and medium effects.
A further possible limitation in our analysis pertains to the accuracy with which we could measure individual-level parameters. Our ability to correctly classify the gender and to reconstruct the publication profile of each author was subject to standard disambiguation errors (Caron and Van Eck 2014) which may be higher for authors in certain subsets of countries. In particular, authors from South- and East-Asian countries have names that are difficult to classify, and often publish in local journals that are not indexed in the Web of Science and were therefore not captured by our algorithms. Any systematic bias or error in quantifying parameters for authors from these countries would significantly skew our results because country-level factors were found in this study, as well in previous studies on retractions, to have significant effects (Fanelli et al. 2015). However, we based all of our main conclusions on effects that were measured consistently in subsets of countries at lower risk of disambiguation error. Moreover, this limitation is only likely to affect the subset of tests that focused on author characteristics.
Indeed, one of the most important findings of this study is that significant individual-level effects might not be detectable unless country-level effects are removed or adjusted for, because the latter effects are prominent. We found strong evidence that problematic image duplications are overwhelmingly more likely to come from China, India and possibly other developing countries, consistent with the findings of a previous descriptive analysis of this data (Bik et al. 2016). Regardless of whether the image duplications that we have examined in this study were due to misconduct or unintentional error, our results suggest that particular efforts ought to be devoted to improving the quality and integrity of research in these countries.
In conclusion, we stress again that the present analysis of image duplications as well as previous analyses on retractions, corrections and bias (Fanelli et al. 2015, 2017) cannot demonstrate causality. However, all these analyses consistently suggest that developing national misconduct policies and fostering an academic culture of mutual criticism and research integrity might be effective preventive measures to ensure the integrity of future research in all countries.