Verification of nucleotide sequence reagent identities in original publications in high impact factor cancer research journals

Pathmendra, Pranujan; Park, Yasunori; Enguita, Francisco J.; Byrne, Jennifer A.

doi:10.1007/s00210-023-02846-2

Verification of nucleotide sequence reagent identities in original publications in high impact factor cancer research journals

Research
Open access
Published: 09 January 2024

Volume 397, pages 5049–5066, (2024)
Cite this article

Download PDF

You have full access to this open access article

Naunyn-Schmiedeberg's Archives of Pharmacology Aims and scope Submit manuscript

Verification of nucleotide sequence reagent identities in original publications in high impact factor cancer research journals

Download PDF

Pranujan Pathmendra¹,
Yasunori Park¹,
Francisco J. Enguita² &
…
Jennifer A. Byrne^1,3

1762 Accesses
1 Citation
7 Altmetric
Explore all metrics

A Publisher Correction to this article was published on 16 January 2024

This article has been updated

Abstract

Human gene research studies that describe wrongly identified nucleotide sequence reagents have been mostly identified in journals of low to moderate impact factor, where unreliable findings could be considered to have limited influence on future research. This study examined whether papers describing wrongly identified nucleotide sequences are also published in high-impact-factor cancer research journals. We manually verified nucleotide sequence identities in original Molecular Cancer articles published in 2014, 2016, 2018, and 2020, including nucleotide sequence reagents that were claimed to target circRNAs. Using keywords identified in some 2018 and 2020 Molecular Cancer papers, we also verified nucleotide sequence identities in 2020 Oncogene papers that studied miRNA(s) and/or circRNA(s). Overall, 3.8% (251/6647) and 4.0% (47/1165) nucleotide sequences that were verified in Molecular Cancer and Oncogene papers, respectively, were found to be wrongly identified. Wrongly identified nucleotide sequences were distributed across 18% (91/500) original Molecular Cancer papers, including 38% (31/82) Molecular Cancer papers from 2020, and 40% (21/52) selected Oncogene papers from 2020. Original papers with wrongly identified nucleotide sequences were therefore unexpectedly frequent in two high-impact-factor cancer research journals, highlighting the risks of employing journal impact factors or citations as proxies for research quality.

RNA-Seq Data Analysis in Galaxy

A survey of best practices for RNA-seq data analysis

Article Open access 26 January 2016

Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements

Article Open access 20 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Despite technological advances, growing research workforce capacity, and billion-dollar budgets devoted to biomedical research in first-world countries, biomedical research translation continues to fall short of the expectations generated by research investments (Bowen and Casadevall 2015). Inefficient research translation is fueled by the reproducibility crisis, where many pre-clinical research results cannot be independently reproduced (Mobley et al. 2013; Pusztai et al. 2013; Errington et al. 2021). The emphasis upon publication of positive findings has likely led to publication of false-positive results (Pusztai et al. 2013; Smaldino and McElreath 2016; Kaelin 2017). Where these results are not reproduced by other studies, these contradictory or discordant results may be less likely to be reported, leading to a growing problem of falsely positive research results in the biomedical literature (Smaldino and McElreath 2016; Kaelin 2017).

While most incorrect pre-clinical research is believed to derive from genuine research (Brown et al. 2018), some irreproducible research results may reflect data falsification and fabrication (Stroebe et al. 2012; Gopalakrishna et al. 2022). Over the past several years, the analysis of research fraud has shifted from focusing on research fraud perpetrated by individuals, to include research fraud that may be enabled by organizations known as paper mills (Byrne 2019; Byrne and Christopher 2020; COPE, STM 2022; Christopher 2021; Heck et al. 2021; Parker et al. 2022; Bricker-Anthony and Giangrande 2022; Frederickson and Herzog 2022). There is growing evidence suggesting that human genes could be targeted by paper mills for the production of preclinical research manuscripts (Byrne and Labbé 2017; Qi et al. 2017; Han and Li 2018; Byrne et al. 2019, 2021b, 2022; Labbé et al. 2019; Clark and Buckmaster 2021; Cooper and Han 2021; Seifert 2021; Park et al. 2022; Pérez-Neri et al. 2022; Wittau et al. 2023). The rapid production of many gene research manuscripts at minimal cost could provide limited time for quality control, which could result in errors such as wrongly identified nucleotide sequence reagents (Byrne and Labbé 2017; Byrne et al. 2019).

Wrongly identified RT-PCR primers and gene knockdown reagents could arise in different research contexts, as the identities of these reagents typically cannot be judged by eye (Byrne et al. 2019, 2021b) (Table 1). As the disclosure of short nucleotide sequences also enables their reuse in future studies, the semi-automated tool Seek & Blastn was created to verify the identities of published nucleotide sequence reagents that are claimed to target human genes and transcripts (Labbé et al. 2019). The application of Seek & Blastn has demonstrated the widespread occurrence of wrongly identified nucleotide sequence reagents in repetitive human gene research papers (Labbé et al. 2019; Byrne et al. 2021b; Park et al. 2022). Our most recent application of Seek & Blastn screened over 11,700 original human research papers and identified 712 papers that described wrongly identified nucleotide sequence(s), including papers that studied gene functions in the context of chemosensitivity or -resistance (Park et al. 2022). Seek & Blastn screening of original papers in the journals Gene and Oncology Reports revealed that yearly proportions of original papers with wrongly identified sequence(s) ranged from 0.5 to 4.2% and 8.3 to 12.6%, respectively (Park et al. 2022).

Table 1 Potential causes of wrongly identified nucleotide sequence reagents, possible predisposing factors, and how errors can be detected

Full size table

Most human gene research papers with wrongly identified nucleotide sequences have been identified in journals of low to moderate impact factor (IF) (Byrne and Labbé 2017; Labbé et al. 2019; Byrne et al. 2021b; Park et al. 2022). This finding is likely to at least partly reflect the skewed distribution of journal IF’s (Romanovsky 2019; Siler and Larivière 2022), where high IF cancer research journals defined by an IF ≥ 7.0 (Kempf et al. 2018) correspond to ~ 20% of cancer research journals. While recognizing the limited utility of journal IF as a measure of research quality (Siler and Larivière 2022), the perceived significance of human gene research papers with wrongly identified sequences could be discounted through their publication in lower IF journals. Our team has also described examples of human gene research papers with wrongly identified nucleotide sequences that were published in high IF journals (Labbé et al. 2019; Park et al. 2022). It is currently unclear whether low numbers of human gene research papers with wrongly identified nucleotide sequences in high IF journals simply reflect low numbers of high IF journals (Romanovsky 2019; Siler and Larivière 2022), and/or that few papers with wrongly identified nucleotide sequences have been published by high IF journals.

We have therefore undertaken a literature screening approach to examine the frequency of human gene research papers with wrongly identified nucleotide sequence reagents in two high IF cancer research journals, as judged by 2019 journal IF (https://clarivate.com/). We chose to examine Molecular Cancer, an online, open-access journal published by BMC (Springer Nature), as Seek & Blastn screening of keyword-driven literature corpora had previously identified Molecular Cancer papers with wrongly identified nucleotide sequences that were published in 2014 (Park et al. 2022). Although Molecular Cancer was not a high IF journal in 2014 (IF = 4.3), Molecular Cancer has experienced a marked rise in journal IF, reaching IFs of 15.3 in 2019, 27.4 in 2020, and 41.4 in 2021 (Fig. 1). As a result, Molecular Cancer was the 3rd-ranked molecular biology and biochemistry journal in 2020 and 2021, following only Nature Medicine and Cell. We also verified nucleotide sequence reagent identities in a selected corpus of 2020 Oncogene papers. Oncogene is published by Springer Nature under a hybrid open-access/subscription publication model. Unlike Molecular Cancer, Oncogene has shown a relatively stable journal IF ranging from 6.6 to 9.9 during 2014–2021 (Fig. 1).

As most Molecular Cancer papers described nucleotide sequence reagents in supplementary files and not in the publication text, these papers proved to be unsuitable for Seek & Blastn screening (Labbé et al. 2019). We therefore manually verified the identities of all nucleotide sequence reagents that were claimed to target unmodified (wild-type) human gene targets in original Molecular Cancer papers published in 2014, 2016, 2018, and 2020. These publication years were chosen so that proportions of Molecular Cancer papers could be compared with those previously identified in Gene and Oncology Reports in 2014, 2016, and 2018 (Park et al. 2022). As some Molecular Cancer papers described nucleotide sequence reagents that were claimed to target human circular RNA (circRNA) transcripts, we developed protocols to verify the identities of circRNA targeting reagents. Using keywords identified in some Molecular Cancer papers (miRNA, miR, circular RNA, or circRNA), we undertook keyword-driven searches of all original 2020 Oncogene papers. We manually verified the identities of all nucleotide sequence reagents that were claimed to target unmodified human gene targets in all 2020 Oncogene papers that referred to microRNAs and/or circRNAs.

As we will describe, these analyses identified unexpectedly high proportions of human gene research papers with wrongly identified nucleotide sequences in two high IF cancer research journals. Our results therefore indicate that human gene research publications that describe wrongly identified nucleotide sequences may be unexpectedly frequent in some high IF cancer research journals.

Methods

Identification of literature corpora

Molecular Cancer papers were retrieved via the Web of Science using the search criteria: PY = “2014, 2016, 2018, 2020,” SO = “MOLECULAR CANCER,” AND DT = “Article.” Article titles were used as search queries on the Molecular Cancer website to obtain pdfs and supplementary files. Based on features of some Molecular Cancer papers with wrongly identified nucleotide sequence(s), selected Oncogene papers were retrieved via the Web of Science using the search criteria: PY = “2020,” SO = “ONCOGENE,” DT = “Article,” and keywords = [(“Circular RNA*.mp.” OR “circRNA*.mp.”) OR (“microRNA*.mp. OR “miR*.mp.”)]. Oncogene article titles were used as search queries to obtain article pdfs and supplementary files through the University of Sydney library.

Visual inspection of articles

Each article was subjected to visual screening and considered eligible for analysis if the study described the sequence of at least one nucleotide sequence reagent that was claimed to target an unmodified (wild-type) human transcript or genomic region. Publications including supplementary files were visually inspected to determine the claimed genetic and/or experimental identity of each nucleotide sequence. If the claimed target or experimental use of any sequence was not evident, or if a sequence was claimed to target a species other than human, the sequence was excluded from further analysis. We included papers with post-publication notices such as retractions and published corrections, except where post-publication corrections had corrected all wrongly identified nucleotide sequences at the time of publication screening. Eligible papers were identified by their PMIDs. Nucleotide sequences and their claimed identities were manually extracted from text and/or supplementary files using copy/paste functions, or transcribed from figures, and recorded in Microsoft Excel.

Manual verification of nucleotide sequence reagent identities

Nucleotide sequence reagents that were claimed to target human protein-coding genes and microRNAs were analyzed as described (Byrne et al. 2021a; Park et al. 2022). GeneCards (Stelzer et al. 2016) and GenBank (Sayers et al. 2019) were used to clarify synonymous human gene identifiers. For nucleotide sequence reagents that were claimed to target long non-coding RNAs (lncRNAs), the claimed identifier was searched on lncBASE (Karagkouni et al. 2020) and GeneCards (Stelzer et al. 2016) to identify the genomic coordinates of the claimed lncRNA. Claimed targeting reagent sequences were queried using BLAT against the GRCh38/hg38 assembly (Lee et al. 2022) and Blastn (Altschul et al. 1990) as described (Park et al. 2022).

Nucleotide sequence reagents that were claimed to target genomic sequences including gene promoters were queried using BLAT against the GRCh38/hg38 assembly (Lee et al. 2022) as described (Park et al. 2022). Claimed gene promoter targeting reagents were accepted as targeting if these reagents mapped within 100-kb upstream of the claimed target gene and if reagents did not include coding gene exons. Where the claimed reagent identity did not match the verified identity, sequences were queried using BLAT against earlier human genome assemblies (Lee et al. 2022).

Manual verification of claimed circular RNA targeting reagents

Verification of RT-PCR primers claimed to target circRNAs

circRNAs are alternatively spliced transcripts where gene exons are joined through back-splicing to create circular transcripts (Dudekula et al. 2016; Zhong et al. 2018; Nielsen et al. 2022). RT-PCR amplification of circRNAs requires two sets of RT-PCR primers (Dudekula et al. 2016; Zhong et al. 2018, 2019; Nielsen et al. 2022). Divergent RT-PCR primers are used to amplify the claimed circRNA by facing towards and amplifying across the circRNA BSJ (Dudekula et al. 2016; Zhong et al. 2018, 2019; Nielsen et al. 2022). Divergent RT-PCR primers should therefore not amplify linear transcripts from the host or any other human gene. In contrast, convergent RT-PCR primers are employed to amplify linear transcripts, typically from the claimed host gene (Dudekula et al. 2016; Zhong et al. 2018, 2019; Nielsen et al. 2022).

For claimed divergent RT-PCR primers, forward and reverse primers were first queried on circPRIMER (Zhong et al. 2018) using standard settings (Fig. 2). RT-PCR primers were accepted as correctly targeting if circPRIMER aligned both RT-PCR primer sequences to the claimed circRNA(s), such that RT-PCR primers faced towards and were predicted to amplify the back splice junction (BSJ) (Fig. 2). If circPRIMER analyses produced no output, we then checked whether the claimed circRNA was indexed by a publicly available circRNA database such as circBASE (Glažar et al. 2014) or circATLAS (Wu et al. 2020) through the disclosure of a specific circRNA identifier, or if the circRNA sequence and/or its genomic sequence coordinates were disclosed by the authors. If the claimed circRNA could not be identified, the claimed divergent RT-PCR primers were classified as non-verifiable (Patop and Kadener 2018). If the claimed circRNA could be identified but the BSJ could not be identified or predicted, claimed divergent RT-PCR reagents were also classified as non-verifiable.

If the claimed BSJ sequence was either disclosed or the associated genomic coordinates could be predicted, divergent RT-PCR primers were then queried either using the BLAT function of circBASE (Glažar et al. 2014), manually mapped to the claimed circRNA sequence, and/or queried using BLAT against the GRCh38/hg38 genomic assembly (Lee et al. 2022). Claimed divergent RT-PCR primers were classified as wrongly identified if they did not amplify the (predicted) BSJ (Fig. 2). Wrongly identified RT-PCR primers were subjected to further analyses to classify these reagents according to nucleotide sequence error categories (see below), as described (Park et al. 2022). Claimed convergent RT-PCR primers were verified as previously described for RT-PCR primers targeting linear transcripts (Labbé et al. 2019; Byrne et al. 2021a, 2021b; Park et al. 2022).

Verification of single-nucleotide sequence reagents claimed to target circRNAs

Single reagents such as si/shRNAs and other oligonucleotides acquire circRNA specificity by targeting specific BSJ sequences (Dudekula et al. 2016; Nielsen et al. 2022). We first determined whether the claimed circRNA was indexed in a publicly available circRNA database, as described above, and whether the BSJ sequence could be identified (Fig. 3). If claimed circRNA or the BSJ sequence could not be identified, reagents were classified as non-verifiable (Fig. 3).

Verifiable single reagents were manually aligned against the claimed circRNA BSJ sequence (Fig. 3). Single reagents were classified as correctly targeting if they showed 100% identity to 5–16 nucleotides on each side of the BSJ (Dudekula et al. 2016). If a claimed circRNA targeting reagent showed 100% identity to 17 or more consecutive nucleotides of any human linear transcript, including transcripts from the claimed host gene, the reagent was classified as wrongly identified, as such reagents would not be predicted to discriminate between circular and linear transcripts.

Classification of wrongly identified reagents according to error categories

Wrongly identified nucleotide sequence reagents were classified according to previously described error categories, namely (i) claimed targeting reagents that were predicted to target another human gene or genomic sequence, (ii) claimed targeting reagents that were predicted to be non-targeting in human, and (iii) claimed non-targeting reagents that were predicted to target a human gene or transcript (Labbé et al. 2019; Byrne et al. 2021b; Park et al. 2022). Claimed circRNA targeting reagents (divergent RT-PCR primers, si/shRNAs, molecular probes) that were predicted to (also) target linear transcripts (including from the claimed host gene) were classified as targeting a different gene/transcript from that claimed (category (i) above).

Summary of how nucleotide sequence reagent identities were manually verified

This study was conducted in the context of a student project (by PP), and hence all nucleotide sequence identities were verified by PP as described above. YP supported nucleotide sequence reagent identity verification in the early project stage, to ensure methodological consistency (Park et al. 2022). PP and JAB met regularly to discuss identity verification results for individual nucleotide sequences. JAB visually inspected the summary results for all nucleotide sequences that were predicted to be wrongly identified and recommended individual results for rechecking by PP and/or JAB. PP and JAB consulted with FJE for advice on targeting parameters and workflows for claimed divergent RT-PCR primers and single-nucleotide sequence reagents that were claimed to specifically target circRNAs (Figs. 2 and 3). JAB manually verified alignments between single circRNA reagents and claimed BSJ sequences for all single circRNA reagents that were predicted to not target the claimed BSJ. PP then rechecked the identities of all wrongly identified nucleotide sequences prior to reporting.

Additional publication analyses

For each eligible article, we recorded the number and proportion of wrongly identified nucleotide sequence reagents. We also recorded the numbers and identities of non-verifiable circRNA reagents, noting that we did not categorize non-verifiable reagents as wrongly identified. Publications were flagged if they included at least one wrongly identified nucleotide sequence reagent. Papers that described non-verifiable circRNA targeting reagent(s) but no wrongly identified nucleotide sequences were reported separately. Proportions of papers with wrongly identified sequence(s)/papers analyzed and papers with wrongly identified sequence(s)/papers screened and wrongly identified nucleotide sequences/nucleotide sequences analyzed were calculated for journals and publication years using MS Excel.

Publication titles were visually inspected to identify human gene or transcript identifiers, human cancer types, and drug identifiers which were confirmed through Google searches. Human genes were categorized as either protein-coding or ncRNAs according to GeneCards (Stelzer et al. 2016). The country of origin and institutional affiliation were identified as described (Park et al. 2022). Where there was no numeric majority, the first author’s affiliation was used to decide the country of origin and/or institutional affiliation. PubPeer notifications (Barbour and Stell 2020) were identified on 16 January 2023. Reported numbers of post-publication notices are those identified through PubMed and Google Scholar searches conducted on 17 January 2023. Citations according to Google Scholar were collected on 22 January 2023.

Statistics analyses

Fisher’s exact tests conducted on GraphPad PRISM compared proportions of Molecular Cancer papers according to publication year, and countries and institutions of origin. Shapiro-Wilk’s test was used to test for normality. The Mann-Whitney test was conducted to compare median numbers of wrongly identified sequences per Molecular Cancer article according to publication year, where reported p values have not been corrected for multiple comparisons. For all Molecular Cancer papers with wrongly identified nucleotide sequence(s), Spearman’s rank correlation coefficient was calculated between the numbers of wrongly identified sequences and numbers of analyzed nucleotide sequences per article. Graphs were produced on GraphPad PRISM 9.2.

Results

Molecular Cancer corpus

In total, 500 original Molecular Cancer papers were published in 2014, 2016, 2018, and 2020 (Table 2), where numbers of original papers ranged from 59 papers in 2016, to 249 papers in 2014 (Fig. 4A). Most (334/500, 67%) original Molecular Cancer papers were included for analysis as they described human research and included at least one nucleotide sequence that was claimed to target a non-modified human gene or genomic sequence (Fig. 4A, Table 2). The proportions of Molecular Cancer papers that met the study inclusion criteria ranged from 29/59 (49%) in 2016 to 74/82 (90%) in 2020 (Fig. 4A).

Table 2 Molecular Cancer and Oncogene corpora that were screened for wrongly identified nucleotide sequence reagents

Full size table

The 334 Molecular Cancer papers included 6647 nucleotide sequences, with a median of 13 nucleotide sequences/paper (range 1–153) (Table 2). The numbers of nucleotide sequence reagents per paper progressively increased from 2014 to 2020 (Fig. 4B). For example, the median number of nucleotide sequences per paper increased from 8 sequences/paper in 2014, to 32 sequences/paper in 2020 (Mann-Whitney test, p < 0.0001, n = 231) (Fig. 4B).

Whereas no 2014 or 2016 Molecular Cancer papers described nucleotide sequences that were claimed to target human circular RNAs (circRNAs), 39 Molecular Cancer papers in 2018 and 2020 described circRNA targeting reagents. As we had not previously verified the identities of circRNA targeting reagents, new protocols were developed to recognize the particular targeting requirements of some circRNA reagents (Figs. 2 and 3, see the “Methods” section).

Molecular Cancer papers with wrongly identified nucleotide sequence(s)

Of the 6647 nucleotide sequences whose identities were manually verified, 251 (3.8%) nucleotide sequences were predicted to be wrongly identified (Table 2, Fig. 5A, Table S1). Similar proportions of incorrect sequences represented targeting reagents that were either verified to target a different human gene or genomic sequence (135/251, 54%), or predicted to be non-targeting in human (114/251, 45%) (Table 2, Fig. 5B). In contrast, very few (2/251, 0.8%) wrongly identified sequences represented claimed non-targeting si/shRNA reagents that were instead predicted to target a human gene (Table 2, Fig. 5B).

The 251 wrongly identified nucleotide sequences were distributed across 91/334 (27%) screened Molecular Cancer papers (Fig. 5C) and 91/500 (18%) original Molecular Cancer papers (Table 2, Fig. 5D, Table S2). These 91 papers included 3 Molecular Cancer papers from 2014 that had been previously reported to describe wrongly identified nucleotide sequence(s) (Labbé et al. 2019; Park et al. 2022). Proportions of papers with wrongly identified nucleotide sequence(s) ranged from 6/59 (10%) in 2016 to 31/82 (38%) in 2020 (Fig. 5D). The median number of wrongly identified sequences/paper was 2 (range 1–14) (Table 2, Fig. 6). The numbers of wrongly identified and analyzed sequences per paper were not significantly correlated (Spearman’s rho = 0.1893, 95% Cl = − 0.02346–0.3857, p = 0.0723, n = 91).

The 91 Molecular Cancer papers with wrongly identified sequence(s) described experiments in human cancer models corresponding to 26 cancer types, most frequently gastric, colorectal, or non-small-cell lung cancer (Table S2). Almost all (84/91, 92%) papers analyzed a single cancer type. One quarter (23/91) of papers with wrongly identified sequence(s) either referred to a specific drug or to chemosensitivity or -resistance in their title (Table S2).

Molecular Cancer papers with wrongly identified sequence(s) described a median of 2 genes or transcripts in their titles (range 0–7) (Table S2). Most publication titles (78/91, 86%) mentioned at least one protein-coding gene, and approximately half (48/91, 53%) mentioned non-coding RNA(s) (ncRNAs), which were typically miR(s) (31/48, 65%) or circRNA(s) (15/48, 31%). Whereas most 2014 titles mentioned only protein-coding gene(s) (22/31, 71%), most 2020 titles combined protein-coding gene(s) and ncRNA(s) (22/31, 71%), which were again typically miR(s) (12/22, 55%). Fifteen papers with wrongly identified sequence(s) that referred to circRNA(s) in their titles were published in 2018 and 2020, where titles typically combined circRNA(s) with protein-coding gene(s) and/or miR(s) (13/15, 87%) (Table S2).

Wrongly identified or non-verifiable reagents for the analysis of human circRNAs

Nine Molecular Cancer papers described 20 wrongly identified reagents that were claimed to target circRNAs (Table 3, Table S1). These claimed circRNA targeting reagents were predicted to either target different human transcripts from those claimed (17/20, 85%) or to be non-targeting in human (3/20, 15%) (Table 3). Wrongly identified circRNA targeting sequences included claimed divergent RT-PCR primers that were predicted to amplify linear transcripts, and single reagents that showed significant identity to linear transcripts (see the “Methods” section, Table 3, Table S1). The identities of a further 29 circRNA targeting reagents could not be verified (Table 3), either because the claimed circRNA sequence could not be identified in external databases, or in the case of single reagents, because the BSJ sequence was not provided or identifiable elsewhere (see Methods, Tables S3-S5). Non-verifiable circRNA targeting reagents were identified in 3 Molecular Cancer papers that described wrongly identified nucleotide sequence(s) (Tables S3, S5). An additional 6 Molecular Cancer papers included non-verifiable circRNA targeting reagents, where all other nucleotide sequences appeared to be correctly identified (Tables S4, S5).

Table 3 Wrongly identified and non-verifiable nucleotide sequence reagents that were claimed to target human circRNAs in Molecular Cancer and Oncogene papers

Full size table

Targeted Oncogene corpus

To investigate whether original papers with wrongly identified or non-verifiable nucleotide sequences can be identified in other high IF cancer research journals, we verified nucleotide sequence reagent identities in a subset of original Oncogene papers. As described in the Methods, we employed keyword-driven searches of Oncogene papers published in 2020, using keywords identified in some Molecular Cancer papers (miRNA, miR, circular RNA, or circRNA). This search strategy identified a corpus of 52 Oncogene papers that commonly described the analysis of one or more miR’s and/or circRNAs (Table 2). Most (42/52, 81%) selected Oncogene papers described human research and at least one nucleotide sequence that was claimed to target a non-modified human gene or genomic sequence. These 42 papers described a median number of 20 sequences/paper (range 2–115) (Table 2).

Oncogene papers with wrongly identified nucleotide sequence(s)

The 42 Oncogene papers included 1165 nucleotide sequences, of which 47 (4.0%) sequences were predicted to be wrongly identified (Table 2, Table S1). These 47 wrongly identified sequences were distributed across 21/52 (40%) corpus papers and 21/42 (50%) screened papers (Table S2). These 21 Oncogene papers described a median of 2 wrongly identified sequences/paper (range 1–5) (Table 2). Oncogene papers with wrongly identified sequence(s) described experiments in human cancer models that corresponded to 14 different cancer types, most frequently breast cancer and hepatocellular carcinoma (Table S2) and referred to a median of 3 genes or transcripts in their titles (range 0–4), where most titles referred to miR(s) (13/21, 62%) (Table S2). Two Oncogene papers referred to chemical compounds in their titles (Table S2).

Wrongly identified sequences in 2020 Oncogene papers represented targeting reagents that were verified to target a different human gene or genomic sequence from that claimed (24/47, 51%), or claimed targeting reagents that were predicted to be non-targeting in human (23/47, 49%) (Table 2). Six wrongly identified sequences were claimed to target human circRNAs, which were either predicted to be non-targeting in human or to target linear transcript(s) from the claimed host gene (Table 3). A further 8 circRNA targeting sequences were not verifiable, either because the relevant BSJ sequence was not provided or because the claimed circRNA sequence could not be identified (Table 3, Tables S3, S5).

Countries of origin and institutional affiliations of Molecular Cancer and Oncogene papers with wrongly identified nucleotide sequence(s)

Molecular Cancer and Oncogene papers with wrongly identified sequence(s) were authored by teams from 12 and 5 different countries, respectively (Table 4, Table S2). Most Molecular Cancer (67/91, 74%) and Oncogene papers (17/21, 81%) were authored by teams from China, followed by authors from USA in the case of Molecular Cancer (7/91, 8%) (Table 4). When papers with wrongly identified sequence(s) were analyzed according to both country and institution of origin (Park et al. 2022), most Molecular Cancer and Oncogene papers from China were affiliated with hospitals, compared with minorities of papers from other countries (Table 4). Significantly more Molecular Cancer papers from China were authored by hospital-affiliated teams (57/67 (85%)), compared with papers from other countries (6/24 (25%)) (Fisher’s exact test, p < 0.0001, n = 91) (Table 4).

Table 4 Molecular Cancer and Oncogene papers with wrongly identified nucleotide sequence reagent(s) according to country of origin and institutional affiliation type

Full size table

Citations and post-publication commentary/corrections of Molecular Cancer and Oncogene papers with wrongly identified nucleotide sequence(s)

The 91 Molecular Cancer papers with wrongly identified nucleotide sequence(s) have been collectively cited 7932 times according to Google Scholar (Table S2). Some 33 Molecular Cancer papers have been cited at least 100 times, and 27 others have been cited at least 50 times (Fig. 7). Highly cited papers include 22 papers published in 2020 (Fig. 7). The 21 Oncogene papers from 2020 have been cited 878 times according to Google Scholar (Table S2), where one paper has been cited 168 times, and 5 other papers have been cited at least 50 times (Fig. 7).

Ten Molecular Cancer papers and 4 Oncogene papers with wrongly identified nucleotide sequence(s), and one Molecular Cancer paper with non-verifiable circRNA targeting reagents have associated published corrections, mostly in response to concerns about image integrity (Table 5). Two Molecular Cancer papers were corrected for wrongly identified sequences (Table S6), where one paper had been previously identified by our team (Park et al. 2022). In the other published correction, one nucleotide sequence remained wrongly identified in the correction notice (Table S6). Four Molecular Cancer papers have been retracted in response to image integrity and ethics concerns (Table 5). Just under one third (26/91, 29%) of Molecular Cancer papers and 5/21 (24%) Oncogene papers have been flagged on PubPeer, mostly for image integrity concerns (Table 5). Four Molecular Cancer papers have been flagged on PubPeer for wrongly identified nucleotide sequences, including one paper from a previous study (Labbé et al. 2019) (Table 5).

Table 5 Post-publication notices and PubPeer commentary for Molecular Cancer and Oncogene papers

Full size table

Discussion

Verifying the identities of nucleotide sequences published in Molecular Cancer has shown that 10–38% of all original Molecular Cancer papers published in 2014, 2016, 2018, and 2020 papers described wrongly identified nucleotide sequence(s). These proportions also rose from 2014–2020, when the journal IF increased from 4.3 to 27.4 (Fig. 1). We identified similar papers in the journal Oncogene, where 40% papers published in 2020 that studied miRs and/or circRNAs were found to describe wrongly identified nucleotide sequence(s). Many of these Molecular Cancer and Oncogene papers have been highly cited, including publications from 2020. These results support and extend previous findings demonstrating that human gene research papers with wrongly identified nucleotide sequences can be identified in high IF journals (Labbé et al. 2019; Park et al. 2022).

The analysis of Molecular Cancer and Oncogene papers that examined circRNAs in human cancer also identified incorrect circRNA targeting reagents, where some errors reflected the particular requirements of circRNA targeting reagents (Dudekula et al. 2016; Zhong et al. 2018; Nielsen et al. 2022). As also reported by Zhong et al. (2019), we identified claimed divergent RT-PCR primers that did not appear to discriminate between circular and linear transcripts, as well as single reagents that did not appear to be specific for the claimed circRNA target. The identities of other circRNA targeting reagents could not be verified, either because the claimed circRNA sequence or the BSJ sequence was not provided and/or could not be identified elsewhere. These results add to previous descriptions of cancer research papers in which claimed circRNAs could not be independently verified (Patop and Kadener 2018).

Study limitations

Before discussing our results further, it is important to recognize our study’s limitations, as well as study design factors that may have identified higher proportions of papers with wrongly identified nucleotide sequence reagent(s) than those previously reported (Park et al. 2022) (Table 6). We recognize that the present study has examined original papers from only two journals, due to the challenges of manually verifying nucleotide sequence identities in papers that frequently described 50–100 sequences per paper. In previous studies, we employed the semi-automated Seek & Blastn tool (Labbé et al. 2019), which screens publications for short nucleotide sequences and then verifies their claimed identities using blastn (Altschul et al. 1990). Screening original papers with Seek & Blastn and then manually verifying the results found that up to 4.2% and 12.6% of 2014–2018 papers in the journals Gene and Oncology Reports described wrongly identified nucleotide sequence(s) (Park et al. 2022). In the present study, every Molecular Cancer and Oncogene paper was analyzed manually, which may have reduced false-negative results associated with Seek & Blastn screening (Labbé et al. 2019; Park et al. 2022) (Table 6). At the same time, manual verification of nucleotide sequence identities does not preclude the possibility of human errors leading to false-positive results, particularly where thousands of individual nucleotide sequences are analyzed (Table 6).

Table 6 Strengths and weaknesses of manual validation of nucleotide sequence reagent identities

Full size table

The numbers of nucleotide sequences per Molecular Cancer paper also rose significantly from 2014 to 2020 (Fig. 4B). It seems possible that as the numbers of nucleotide sequence reagents per paper increase, more papers could describe wrongly identified sequences. However, we noted that the median numbers of wrongly identified sequences per Molecular Cancer paper were largely stable across 2014–2020, and no significant correlation was measured between wrongly identified and overall nucleotide sequence numbers. Median numbers of wrongly identified sequences in Molecular Cancer and Oncogene papers were also similar to those noted for papers in lower IF journals (Park et al. 2022). This suggests that the rising proportions of erroneous Molecular Cancer papers from 2014 to 2020 do not simply reflect the publication of increasingly complex papers during this time.

Possible explanations for wrongly identified nucleotide sequences

Wrongly identified nucleotide sequences can clearly occur in the context of genuine research (Park et al. 2022), particularly where papers describe many individual reagents (Table 1). At the same time, many nucleotide sequence identity errors in Molecular Cancer and Oncogene papers seem inconsistent with errors that might be made by expert authors, such as claimed human gene targeting sequences with no identifiable human target, where some sequences were instead predicted to target orthologous genes in species other than human. As we have previously described, research experts seem unlikely to select human gene targeting reagents that do not target any human gene (Park et al. 2022). Most researchers will also be aware that nucleotide sequence reagents that are identical to gene sequences in rodents, plants, or fungi will be unlikely to effectively target the orthologous human gene (Park et al. 2022). We were also surprised to discover numerous claimed circRNA targeting siRNAs that did not appear to target the claimed BSJ, despite the BSJ sequence being provided by the authors.

We recognize that as an external research team, we cannot draw firm conclusions about significance of the nucleotide sequence errors that we have described, or the contexts in which these errors occurred. Nonetheless, numerous papers in Molecular Cancer and Oncogene with wrongly identified nucleotide sequences could support other journals’ concerns that paper mills may be successfully targeting some high IF journals (Heck et al. 2021; Bricker-Anthony and Giangrande 2022; Frederickson and Herzog 2022). Given the prestige associated with publishing in high IF journals, some paper mills and clients could value or require publications in high IF journals, which may become acute as lower IF journals are recognized as possible paper mill targets (Zhang et al. 2022b). As the price per paper mill manuscript may be partly dictated by journal IF (Abalkina 2023), publishing in high IF journals could allow paper mills to charge higher manuscript fees, which could allow paper mills to produce more sophisticated manuscripts that more closely resemble genuine papers. Developments in artificial intelligence, in terms of both text (Floridi and Chiriatti 2020; Grimaldi and Ehrler 2023) and image generation (Wang et al. 2022; Gu et al. 2022), could add to paper mill capacity to produce sophisticated manuscripts that could meet the expectations of some high IF journals.

Impact of wrongly identified reagents in high IF journals

Due to limitations in available time and human cognition, academics and researchers have consistently described reading between ~ 150 and 400 research publications per year (Tenopir et al. 2009, 2015, 2019). As these numbers of papers are greatly exceeded by the quantity of available literature, many researchers use heuristics to help decide which papers they should read (Tenopir et al. 2016; Nicholas et al. 2019; Morales et al. 2021; Teplitskiy et al. 2022). Survey results consistently report that academics and researchers prioritize reading papers in high IF journals and/or with high citation numbers (Tenopir et al. 2016; Nicholas et al. 2019; Teplitskiy et al. 2022), where early career researchers may place more emphasis on journal IF and citations as proxies for research quality (Tenopir et al. 2016; Nicholas et al. 2019).

The repeated demonstration of researcher preferences for papers in high IF journals (Tenopir et al. 2016; Nicholas et al. 2019; Teplitskiy et al. 2022) means that publications in high IF cancer journals that describe wrongly identified nucleotide sequence reagents could impact future research. Highly cited papers in high IF journals are likely to be prioritized for reading (Tenopir et al. 2016; Nicholas et al. 2019; Teplitskiy et al. 2022), where a proportion of these papers could be used in future research. Researchers may also be more motivated to reproduce results published in high IF journals, as reflected by the design of the Cancer Biology Reproducibility Project that attempted to reproduce cancer research studies published in high IF journals (Errington et al. 2021). Gene research papers in high IF cancer journals could therefore encourage more researchers to attempt new research, and potentially waste time and resources through the experimental use of wrongly identified reagents (Park et al. 2022; Byrne et al. 2022). In cases where papers with wrongly identified reagents describe significant associations between gene expression and drug sensitivity or resistance, they could also stimulate potentially futile research in adjacent research fields such as pharmacology.

Due to the direct relationship between citation numbers and journal IF, citations to papers with wrongly identified nucleotide sequences could also be generating a positive feed-forward loop within the human gene literature. Highly cited gene research papers can boost journal IF, which could then bring these papers to the attention of more researchers who use journal IF and citation numbers as proxies for research quality (Tenopir et al. 2016; Nicholas et al. 2019). Awareness that ncRNA papers can attract high citation numbers (Fire and Guestrin 2019) could also encourage a range of journals to consider manuscripts that describe ncRNA research. The confluence between citation potential of ncRNA publications (Fire and Guestrin 2019) and the possible value of these gene topics to paper mills (Byrne and Christopher 2020; Cooper and Han 2021; Park et al. 2022; Pérez-Neri et al. 2022; Byrne et al. 2022; Wittau et al. 2023) could lead to the unintended acceptance of problematic human gene research manuscripts by high IF journals, which could then bring these publications to the attention of more researchers.

Suggested next steps

The identification of papers with wrongly identified nucleotide sequence reagents in high IF cancer research journals should encourage the analysis of recent papers in other high IF journals, including journals that publish gene research of relevance to pharmacology. Problematic papers in high IF journals could demonstrate the leading edge of paper mill capability and could help to predict the types of manuscripts that could be received by a broader range of journals in future (Byrne et al. 2022). The possibility of paper mills harnessing new and rapidly developing capacities for automated text generation (Grimaldi and Ehrler 2023) highlights the urgent need for more critical analyses of papers in high IF journals.

The field of circRNA research is also growing rapidly, where the majority of circRNA papers have been published by authors from few countries (Wu et al. 2021; Zhang et al. 2022a). In light of our results, we speculate that laboratory research involving circRNAs may be vulnerable to exploitation by paper mills. Incomplete and non-overlapping circRNA databases that can include poorly or incompletely annotated circRNA sequences (Costa and Enguita 2020; Dodbele et al. 2021; Vromman et al. 2021), combined with multiple circRNA nomenclature systems (Costa and Enguita 2020; Dodbele et al. 2021; Vromman et al. 2021; Nielsen et al. 2022), can collectively underpin superficial published descriptions of individual circRNAs, and render poor-quality circRNA research more challenging to detect. Individual circRNAs can also be linked with many different protein-coding genes and ncRNAs (Kristensen et al. 2018; Dodbele et al. 2021), which could enable the creation of large numbers of manuscripts that combine different circRNAs, ncRNAs, protein-coding genes, and/or drug treatments across different diseases such as human cancer types. The rapid growth in the numbers of circRNA papers (Dodbele et al. 2021; Wu et al. 2021; Zhang et al. 2022a) could also limit the availability of expert peer reviewers with in-depth knowledge of critical factors in circRNA research.

Our analyses show that some human circRNA papers in high IF journals are setting poor standards for methods and results reporting, particularly for readers who may be unfamiliar with the requirements of circRNA targeting reagents. Some descriptions of circRNA research in Molecular Cancer and Oncogene indicate the need for better reporting of circRNAs and their targeting reagents (Table 7), as also recognized by others (Kristensen et al. 2018; Patop and Kadener 2018; Costa and Enguita 2020; Dodbele et al. 2021; Vromman et al. 2021; Nielsen et al. 2022). The poor reporting practices that we and others have identified (Table 7) indicate the need for specific guidance around circRNA (reagent) reporting, and for such guidance to be more strictly enforced. Journals and publishers can take further steps to promote full disclosure and accurate reporting of nucleotide sequence reagents (Table 8), where high IF journals are well placed to show leadership on best practices.

Table 7 Recommendations for improved reporting of circRNA sequences and circRNA targeting reagents in research publications

Full size table

Table 8 Recommended actions to improve the reporting of nucleotide sequence reagents

Full size table

Summary and conclusions

Despite well-recognized limitations in the use of journal IF to predict research quality (Ioannidis and Thombs 2019; Siler and Larivière 2022), high IF journals are valued and relied upon by many biomedical researchers. Our results indicate that contrary to reasonable expectations, gene research papers with wrongly identified nucleotide sequence reagents may be frequent in some high IF cancer journals. This highlights the need for biomedical researchers to exercise caution when interpreting published gene research, including research published in high IF journals. Publications must not be exempt from critical analysis simply because they have been published in a high IF journal and/or achieved seemingly impressive numbers of citations. These findings also support recommendations that trainee and researcher education programs actively discuss features of trustworthy publications (Byrne et al. 2022).

Misplaced beliefs that paper mills are only a problem for lower IF journals risk exacerbating the vulnerability of high IF journals towards paper mills. Given their established brands, reputations, and available resources, we hope that high IF journals and their publishers will be responsive to reports of gene research papers with verifiable reagent errors and will lead efforts in recognizing and responding to threats posed by research paper mills.

Data availability

All data generated or analyzed during this study are included in this published article and its Supplementary Information files. All information extracted from or about analyzed publications, as well as Google Scholar citation data and PubPeer notifications is available within the public domain.

Change history

16 January 2024
A Correction to this paper has been published: https://doi.org/10.1007/s00210-024-02953-8

References

Abalkina A (2023) Publication and collaboration anomalies in academic papers originating from a paper mill: evidence from a Russia-based paper mill. Learn Publ 36:689–702
Article Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Article CAS PubMed Google Scholar
Barbour B, Stell BM (2020) PubPeer: Scientific assessment without metrics. In: Biagioli M, Lippman (eds) Gaming the metrics: Misconduct and manipulation in academic research. MIT Press, Cambridge, pp 149–155
Bowen A, Casadevall A (2015) Increasing disparities between resource inputs and outcomes, as measured by certain health deliverables, in biomedical research. Proc Natl Acad Sci USA 112:11335–11340
Article CAS PubMed PubMed Central Google Scholar
Bricker-Anthony and Giangrande, 2022 Bricker-Anthony C, Giangrande PH (2022) On integrity. Mol Ther Nucleic Acids 30:595
Brown AW, Kaiser KA, Allison DB (2018) Issues with data and analyses: errors, underlying themes, and potential solutions. Proc Natl Acad Sci USA 115:2563–2570
Article CAS PubMed PubMed Central Google Scholar
Bustin S, Nolan T (2017) Talking the talk, but not walking the walk: RT-qPCR as a paradigm for the lack of reproducibility in molecular research. Eur J Clin Invest 47:756–774
Article PubMed Google Scholar
Byrne J (2019) We need to talk about systematic fraud. Nature 566:9
Article CAS PubMed Google Scholar
Byrne JA, Labbé C (2017) Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines. Scientometrics 110:1471–1493
Article CAS Google Scholar
Byrne JA, Grima N, Capes-Davis A, Labbé C (2019) The possibility of systematic research fraud targeting under-studied human genes: causes, consequences and potential solutions. Biomarker Insights 14:1–12
Article Google Scholar
Byrne JA, Christopher J (2020) Digital magic, or the dark arts of the 21st century-how can journals and peer reviewers detect manuscripts and publications from paper mills? FEBS Lett 594:583–589
Article CAS PubMed Google Scholar
Byrne JA, Park Y, Capes-Davis A, Favier B, Cabanac G, Labbé C (2021a) Seek & Blastn Standard Operating Procedure V.1. https://www.protocols.io/view/seek-amp-blastn-standard-operating-procedure-bjhpkj5n
Byrne JA, Park Y, West RA, Capes-Davis A, Cabanac G, Labbé C (2021b) The thin ret(raction) line: biomedical journal responses to reports of incorrect non-targeting nucleotide sequence reagents in human gene knockdown publications. Scientometrics 126:3513–3534
Article CAS Google Scholar
Byrne JA, Park Y, Richardson RAK, Pathmendra P, Sun M, Stoeger T (2022) Protection of the human gene research literature from contract cheating organizations known as research paper mills. Nucleic Acids Res 50:12058–12070
Article CAS PubMed PubMed Central Google Scholar
Chiarella P, Carbonari D, Iavicoli S (2015) Utility of checklist to describe experimental methods for investigating molecular biomarkers. Biomarkers Med 9:989–995
Article CAS Google Scholar
Christopher J (2021) The raw truth about paper mills. FEBS Lett 595:1751–1757
Article CAS PubMed Google Scholar
Clark AJL, Buckmaster S (2021) Fake science for sale? How endocrine connections is tackling paper mills. Endocr Connect 10:E3–E4
Article PubMed PubMed Central Google Scholar
Cooper CDO, Han W (2021) A new chapter for a better Bioscience Reports. Biosci Rep 41:BSR20211016
COPE, STM (2022) Paper Mills - research report from COPE & STM - English. https://doi.org/10.24318/jtbG8IHL
Costa MC, Enguita FJ (2020) Towards a universal nomenclature standardization for circular RNAs. Non-Coding RNA Investig 4:2
Article Google Scholar
Dodbele S, Mutlu N, Wilusz JE (2021) Best practices to ensure robust investigation of circular RNAs: pitfalls and tips. EMBO Rep 22:e52072
Article CAS PubMed PubMed Central Google Scholar
Dudekula DB, Panda AC, Grammatikakis I, De S, Abdelmohsen K, Gorospe M (2016) CircInteractome: a web tool for exploring circular RNAs and their interacting proteins and microRNAs. RNA Biol 13:34–42
Article PubMed Google Scholar
Errington TM, Denis A, Perfito N, Iorns E, Nosek BA (2021) Challenges for assessing replicability in preclinical cancer biology. Elife 10:e67995
Article PubMed PubMed Central Google Scholar
Fire M, Guestrin C (2019) Over-optimization of academic publishing metrics: observing Goodhart’s Law in action. Gigascience 8:giz053
Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Minds Mach 30:681–694
Article Google Scholar
Frederickson RM, Herzog RW (2022) Addressing the big business of fake science. Mol Ther 30:2390
Article CAS PubMed PubMed Central Google Scholar
Glažar P, Papavasileiou P, Rajewsky N (2014) circBase: a database for circular RNAs. RNA 20:1666–1670
Article PubMed PubMed Central Google Scholar
Gopalakrishna G, Ter Riet G, Vink G, Stoop I, Wicherts JM, Bouter LM (2022) Prevalence of questionable research practices, research misconduct and their potential explanatory factors: a survey among academic researchers in The Netherlands. PLoS ONE 17:e0263023
Article CAS PubMed PubMed Central Google Scholar
Goudey B, Gear N, Verspoo K, Zobel J (2022) Propagation, detection and correction of errors using the sequence database network. Brief Bioinformatics 23:bbac416
Grimaldi G, Ehrler B (2023) AI et al.: Machines are about to change scientific publishing forever. ACS Energy Lett 8:878–880
Article CAS Google Scholar
Gu J, Wang X, Li C, Zhao J, Fu W, Liang G, Qiu J (2022) AI-enabled image fraud in scientific publications. Patterns 3:100511
Article PubMed PubMed Central Google Scholar
Han J, Li Z (2018) How metrics-based academic evaluation could systematically induce academic misconduct: a case study. East Asian Sci Tech Soc 12:165–179
Article Google Scholar
Heck S, Bianchini F, Souren NY, Wilhelm C, Ohl Y, Plass C (2021) Fake data, paper mills, and their authors: the International Journal of Cancer reacts to this threat to scientific integrity. Int J Cancer 149:492–493
Article CAS PubMed Google Scholar
Ioannidis JPA, Thombs BD (2019) A user’s guide to inflated and manipulated impact factors. Eur J Clin Invest 49:e13151
Article PubMed Google Scholar
Kaelin WG Jr (2017) Common pitfalls in preclinical cancer target validation. Nat Rev Cancer 17:425–440
Article CAS PubMed Google Scholar
Karagkouni D, Paraskevopoulou MD, Tastsoglou S, Skoufos G, Karavangeli A, Pierros V, Zacharopoulou E, Hatzigeorgiou AG (2020) DIANA-LncBase v3: indexing experimentally supported miRNA targets on non-coding transcripts. Nucleic Acids Res 48:D101–D110
CAS PubMed Google Scholar
Kempf E, de Beyer JA, Cook J, Holmes J, Mohammed S, Nguyên TL, Simera I, Trivella M, Altman DG, Hopewell S, Moons KG (2018) Overinterpretation and misreporting of prognostic factor studies in oncology: a systematic review. Br J Cancer 119:1288–1296
Article PubMed PubMed Central Google Scholar
Kristensen LS, Hansen TB, Venø MT, Kjems J (2018) Circular RNAs in cancer: opportunities and challenges in the field. Oncogene 37:555–565
Article CAS PubMed Google Scholar
Labbé C, Grima N, Gautier T, Favier B, Byrne JA (2019) Semi-automated fact-checking of nucleotide sequence reagents in biomedical research publications: The Seek & Blastn tool. PLoS ONE 14:e0213266
Article PubMed PubMed Central Google Scholar
Lee BT, Barber GP, Benet-Pagès A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs AS, Lee CM, Muthuraman P (2022) The UCSC Genome Browser database: 2022 update. Nucleic Acids Res 50:D1115–D1122
Article CAS PubMed Google Scholar
Mobley A, Linder SK, Braeuer R, Ellis LM, Zwelling L (2013) A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLoS ONE 8:e63221
Article CAS PubMed PubMed Central Google Scholar
Morales E, McKiernan EC, Niles MT, Schimanski L, Alperin JP (2021) How faculty define quality, prestige, and impact of academic journals. PLoS ONE 16:e0257340
Article CAS PubMed PubMed Central Google Scholar
Nicholas D, Watkinson A, Boukacem-Zeghmouri C, Rodríguez-Bravo B, Xu J, Abrizah A, Świgoń M, Clark D, Herman E (2019) So, are early career researchers the harbingers of change? Learn Publ 32:237–247
Article Google Scholar
Nielsen AF, Bindereif A, Bozzoni I, Hanan M, Hansen TB, Irimia M, Kadener S, Kristensen LS, Legnini I, Morlando M, Jarlstad Olesen MT (2022) Best practice standards for circular RNA research. Nat Methods 19:1208–1220
Article CAS PubMed PubMed Central Google Scholar
Park Y, West RA, Pathmendra P, Favier B, Stoeger T, Capes-Davis A, Cabanac G, Labbé C, Byrne JA (2022) Identification of human gene research articles with wrongly identified nucleotide sequences. Life Sci Alliance 5:e202101203
Article CAS PubMed PubMed Central Google Scholar
Parker L, Boughton S, Lawrence R, Bero L (2022) Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. J Clin Epidemiol 151:1–17
Article PubMed Google Scholar
Patop IL, Kadener S (2018) circRNAs in cancer. Curr Op Genet Dev 48:121–127
Article CAS PubMed Google Scholar
Pérez-Neri I, Pineda C, Sandoval H (2022) Threats to scholarly research integrity arising from paper mills: a rapid scoping review. Clin Rheumatol 41:2241–2248
Article PubMed Google Scholar
Pusztai L, Hatzis C, Andre F (2013) Reproducibility of research and preclinical validation: problems and solutions. Nat Rev Clin Oncol 10:720–724
Article PubMed Google Scholar
Qi X, Deng H, Guo X (2017) Characteristics of retractions related to faked peer reviews: an overview. Postgrad Med J 93:499–503
Article PubMed Google Scholar
Romanovsky M (2019) Distribution of scientific journals impact factor. arXiv 1904.05320 (preprint)
Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I (2019) GenBank. Nucleic Acids Res 47:D94–D99
Article CAS PubMed Google Scholar
Seifert R (2021) How Naunyn-Schmiedeberg’s Archives of Pharmacology deals with fraudulent papers from paper mills. Naunyn Schmiedeberg’s Arch Pharmacol 394:431–436
Article CAS Google Scholar
Siler K, Larivière V (2022) Who games metrics and rankings? Institutional niches and journal impact factor inflation. Res Policy 51:S0048733322001317
Article Google Scholar
Smaldino PE, McElreath R (2016) The natural selection of bad science. R Soc Open Sci 3:160384
Article PubMed PubMed Central Google Scholar
Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, Stein TI, Nudel R, Lieder I, Mazor Y, Kaplan S (2016) The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr Protocols Bioinf 54:1.30.1–1.30.33
Stroebe W, Postmes T, Spears R (2012) Scientific misconduct and the myth of self-correction in science. Perspect Psychol Sci 7:670–688
Article PubMed Google Scholar
Tenopir C, King DW, Spencer J, Wu L (2009) Variations in article seeking and reading patterns of academics: what makes a difference? Lib Inform Sci Res 31:139–148
Article Google Scholar
Tenopir C, King DW, Christian L, Volentine R (2015) Scholarly article seeking, reading, and use: a continuing evolution from print to electronic in the sciences and social sciences. Learn Publ 28:93–105
Article Google Scholar
Tenopir C, Levine K, Allard S, Christian L, Volentine R, Boehm R, Nichols F, Nicholas D, Jamali HR, Herman E, Watkinson A (2016) Trustworthiness and authority of scholarly information in a digital age: results of an international questionnaire. J Ass Inf Sci Tech 67:2344–2361
Article Google Scholar
Tenopir C, Christian L, Kaufman J (2019) Seeking, reading, and use of scholarly articles: an international study of perceptions and behavior of researchers. Publications 7:18
Article Google Scholar
Teplitskiy M, Duede E, Menietti M, Lakhani KR (2022) How status of research papers affects the way they are read and cited. Res Policy 51:104484
Article Google Scholar
Vromman M, Vandesompele J, Volders PJ (2021) Closing the circle: current state and perspectives of circular RNA databases. Brief Bioinform 22:288–297
Article CAS PubMed Google Scholar
Wang L, Zhou L, Yang W, Yu R (2022) Deepfakes: a new threat to image fabrication in scientific publications? Patterns 3:100509
Article PubMed PubMed Central Google Scholar
Wittau J, Celik S, Kacprowski T, Deserno T, Seifert R (2023) Fake paper identification in the pool of withdrawn and rejected manuscripts submitted to Naunyn-Schmiedeberg’s Archives of Pharmacology. Naunyn-Schmiedeberg’s Arch Pharmacol, advance online publication
Wu W, Ji P, Zhao F (2020) CircAtlas: an integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes. Genome Biol 21:101
Article CAS PubMed PubMed Central Google Scholar
Wu R, Guo F, Wang C, Qian B, Shen F, Huang F, Xu W (2021) Bibliometric analysis of global circular RNA research trends from 2007 to 2018. Cell J 23:238–246
PubMed PubMed Central Google Scholar
Zhang C, Kang Y, Kong F, Yang Q, Chang D (2022a) Hotspots and development frontiers of circRNA based on bibliometric analysis. Non-Coding RNA Res 7:77–88
Article CAS Google Scholar
Zhang L, Wei Y, Sivertsen G, Huang Y (2022b) The motivations and criteria behind China’s list of questionable journals. Learn Publ 35:467–480
Article Google Scholar
Zhong S, Wang J, Zhang Q, Xu H, Feng J (2018) CircPrimer: a software for annotating circRNAs and determining the specificity of circRNA primers. BMC Bioinform 19:292
Article Google Scholar
Zhong S, Zhou S, Yang S, Yu X, Xu H, Wang J, Zhang Q, Lv M, Feng J (2019) Identification of internal control genes for circular RNAs. Biotechnol Lett 41:1111–1119
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Dr. Thomas Stoeger and Mr Reese Richardson (Northwestern University, USA) for critical reading and discussions, and Prof Lenka Munoz (University of Sydney, Australia), Prof Cyril Labbé (Univ. Grenoble Alpes, France), and Prof Guillaume Cabanac (Univ. Toulouse, France) for discussions.

Funding

JAB gratefully acknowledges funding from the National Health and Medical Research Council of Australia (NHMRC) Ideas grant ID APP1184263, and from the Faculty of Medicine and Health at the University of Sydney. PP is supported by a Research Training Program scholarship at the University of Sydney.

Author information

Authors and Affiliations

School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, 2050, Australia
Pranujan Pathmendra, Yasunori Park & Jennifer A. Byrne
Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028, Lisbon, Portugal
Francisco J. Enguita
NSW Health Statewide Biobank, NSW Health Pathology, Camperdown, NSW, 2050, Australia
Jennifer A. Byrne

Authors

Pranujan Pathmendra
View author publications
You can also search for this author in PubMed Google Scholar
Yasunori Park
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Enguita
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer A. Byrne
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: JAB; Methodology: PP, FJE, YP, JAB; Formal analysis: PP, YP, JAB; Writing - original draft preparation: PP, JAB; Writing - review and editing: JAB, PP, FJE, YP; Funding acquisition: JAB, PP; Supervision: JAB. All authors reviewed the manuscript. The authors declare that all data were generated in-house and that no paper mill was used.

Corresponding author

Correspondence to Jennifer A. Byrne.

Ethics declarations

Ethical approval

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised. The Fig. 1 image is now corrected.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 61 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pathmendra, P., Park, Y., Enguita, F.J. et al. Verification of nucleotide sequence reagent identities in original publications in high impact factor cancer research journals. Naunyn-Schmiedeberg's Arch Pharmacol 397, 5049–5066 (2024). https://doi.org/10.1007/s00210-023-02846-2

Download citation

Received: 27 October 2023
Accepted: 09 November 2023
Published: 09 January 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00210-023-02846-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Verification of nucleotide sequence reagent identities in original publications in high impact factor cancer research journals

Abstract

Similar content being viewed by others

RNA-Seq Data Analysis in Galaxy

A survey of best practices for RNA-seq data analysis

Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements

Introduction

Methods

Identification of literature corpora

Visual inspection of articles

Manual verification of nucleotide sequence reagent identities

Manual verification of claimed circular RNA targeting reagents

Verification of RT-PCR primers claimed to target circRNAs

Verification of single-nucleotide sequence reagents claimed to target circRNAs

Classification of wrongly identified reagents according to error categories

Summary of how nucleotide sequence reagent identities were manually verified

Additional publication analyses

Statistics analyses

Results

Molecular Cancer corpus

Molecular Cancer papers with wrongly identified nucleotide sequence(s)

Wrongly identified or non-verifiable reagents for the analysis of human circRNAs

Targeted Oncogene corpus

Oncogene papers with wrongly identified nucleotide sequence(s)

Countries of origin and institutional affiliations of Molecular Cancer and Oncogene papers with wrongly identified nucleotide sequence(s)

Citations and post-publication commentary/corrections of Molecular Cancer and Oncogene papers with wrongly identified nucleotide sequence(s)

Discussion

Study limitations

Possible explanations for wrongly identified nucleotide sequences

Impact of wrongly identified reagents in high IF journals

Suggested next steps

Summary and conclusions

Data availability

Change history

16 January 2024

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (XLSX 61 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation