Introduction

In contemporary science policies, much emphasis is put on the evaluation of the producers of biomedical knowledge. The most recent trend in science evaluation, the incorporation of seemingly objective tools to measure publication output or citation impact gave rise to the idea that the former represents productivity, while the latter indicates quality, professionalism or excellence in biomedical research, (de Rijcke et al., 2016; Jappe, 2020; Petersohn et al., 2020). Biomedical researchers and other scientists have reacted to such incentives, not only by dividing their work into more publishable units, but also by in incorporating such evaluative categories deeply into their epistemic practices (Müller & de Rijcke, 2017). As a result, a growing number of publications, often with lacking novelty or little to add forced leading experts to proclaim a crisis in biomedical research. Beside the increasing efforts for practitioners and researchers to cope with the growing stock of information, experts criticized the decreasing quality of biomedical research, so that findings are less reliable, less credible or just “false” (Ioannidis, 2005, 696).

Ironically, especially the biomedical research community pursued a century of improving the credibility of its research outputs. Facing the outputs of the “clinical trial industry” due to the 1960s FDA regulation (Meldrum, 2000, 755), experts demanded more systematic assessments of medical knowledge in order to improve the health and life of patients (Chalmers et al., 2002). As a result, medical disciplines increasingly employed meta-analyses and systematic reviews in order to combine multiple studies into an overall and reliable result that can be considered as ‘evidence’ for applicatory contexts and the development of medical treatment guidelines (McKibbon, 1998; Moreira, 2005). In addition, systematic reviews became not only a more common genre in academic periodicals, but also the main focus of multinational organizations that followed the ideal of evidence-based practice, such as the Cochrane Collaboration or the Campbell Collaboration (Simons & Schniedermann, 2021).

Method experts constantly improve the recipes for systematic reviews in order to cope with newly discovered varieties of biases and preserve the epistemic credibility of this genre. For example, in a more recent trend, demands for a more transparent reporting of reviews were uttered and manifested in new standards, most notably the “Preferred Reporting Items for Systematic Reviews and Meta-Analyses” (Moher et al., 2009). Because PRISMA has been implemented in many editorial policies and is highly cited by systematic reviews, its developers ongoingly monitor the actual compliance to further improve the guideline. For example, in their recent evaluation of the PRISMA, Matthew Page and David Moher listed 57 different studies that attempt to analyze to which extent systematic reviews comply with the guideline’s rules (Page & Moher, 2017). They conclude that, although there is room for improvements, the guideline already positively affected the reporting quality of systematic reviews and meta-analyses.

Systematic reviews and their PRISMA-improved counterparts are often settled on top of the hierarchy of biomedical evidence (Goldenberg, 2009). As such, this genre is situated at the intersection of what is perceived as high levels of epistemic quality, and what is actually measured by the wide palette of bibliometric indicators. For example, because review articles generally accumulate higher citation counts than reports of primary research—an observation long known to bibliometricians—they are suspected to be used strategically by some academic journals and individual researchers (Blümel & Schniedermann, 2020). Some bibliometric assessments found that systematic reviews and meta-analyses inherit this characteristic and are also used strategically by impact-sensitive actors (Barrios et al., 2013; Colebunders et al., 2014; Ioannidis, 2016; Patsopoulos et al., 2005; Royle et al., 2013). But allegations of strategic behavior in academia are hard to prove solely on the basis of bibliometric measurements. Nevertheless, bibliometric assessments can point towards potential misuses and inform more profound investigations (Krell, 2014; Wyatt et al., 2017).

Several studies elaborated the bibliographic characteristics of systematic reviews, often with a focus on specific domains. Usually, medical researchers and method developers investigate the “epidemiology of [..] systematic reviews” by focusing on characteristics such language, inclusion criteria or whether a protocol was pre-registered or published (Page et al., 2016). Other observations are the high growth rates and publication counts of systematic reviews and resulting problems (Ioannidis, 2016; Page et al., 2016; Bastian, 2010). Other analyses focus on national affiliations, number of authors, funding sources or citation impact. For example, a study by Alabousi et al. found a positive correlation between the Journal Impact Factor and systematic reviews published in the field of medical imaging (Alabousi et al., 2019).

Studies that take methodological standards such as PRISMA into account often focus on specific sub-fields, or use bibliometric indicators that are inappropriate for the assessment of individual articles. For example, one study found a positive correlation between the quality of reporting of systematic reviews and absolute citation rates in radiology (Pol et al., 2015). A study by Nascimento et al. found a positive correlation between the endorsement of PRISMA and the Journal Impact Factor for systematic reviews about low back pain (Nascimento et al., 2020). Similarly, Mackinnon et al. found a weak correlation between reporting quality and 5-Year Journal Impact Factor in dementia biomarker studies (Mackinnon et al., 2018). Molléri et al. conclude that the rigor of software engineering studies is related to normalized citations, but argue that, at the same time, this may come at the costs of relevance (Molléri et al., 2018).

In order to provide a broader overview about the characteristics and dissemination of systematic reviews in biomedicine, this study focuses especially on the role and impact of (reporting) standards such as PRISMA on systematic reviewing. Although standardization is a common phenomenon in medical research and practice, the willingness and speed in applying new standards differs by medical subfields (cf. Timmermans & Epstein, 2010). For this reason, this study wants to highlight the fields, nations and topics that usually deal with systematic reviews and shed some light on their dynamics in effectively implementing PRISMA. By using different methods to identify systematic reviews and separating them from those that use the PRISMA guideline, this study extends existing research by understanding PRISMA-based systematic reviews as standardized and highly appraised forms of scientific output. Therefore, by analyzing impact measures, it is not only attempted to provide scores irrespective of sub-disciplines and publication years, but also contrast them with standardized and more rigorous forms of systematic reviews.

Methodology

Corpus construction

To compare the bibliometric characteristics of systematic reviews and PRISMA-based systematic reviews, corpora were constructed by a combination of bibliographic data from PubMed and Web of Science. Document types in Web of Science lack in accuracy and necessary detail, especially in the case of review articles (Donner, 2017). In contrast, PubMed contains of several document types related to secondary research and introduced the “Systematic Review-type in 2019, because of this genres’ importance for medical decision making (NLM, 2018).Footnote 1

In a first step, publication types and MeSH keywords were retrieved for 31 million PubMed items via the Entrez-API during March and April 2020 and stored in a PostgreSQL database. In contrast to Web of Science, each of these records is linked to 1.73 different document types on average, up to 10 assignments. The reason for this is a much richer classification system in PubMed consisting of 80 different types which also incorporate methodological information such as ‘Randomized Controlled Trial’ and funding information such as ‘Research Support, Non-U.S. Gov't’ (Knecht & Marcetich, 2005).

After matching the retrieved records with the inhouse version of Web of Science at the German Kompetenzzentrum Bibliometrie and restricting the publication years from 1991 to 2016 in order to account for changing indexing policies at MEDLINE (NLM, 2017), the resulting set contained 10 million records.

In comparison with the Web of Science classification consisting of “Article”, “Review”, “Editorial”, “Letter”, and “News”, precision and recall were calculated (Baeza-Yates and Ribeiro-Neto, 2011). While sufficiently high for articles (precision = 0.94/recall = 0.95), they are rather low for reviews (0.85/0.52) which means that while 85% of items labeled “Review” in Web of Science carry the same type in PubMed (true positive), 52% of items are labeled “Review” PubMed, but not in Web of Science (false negative). These results confirm earlier analyses (Donner, 2017), and show the huge differences between document type classification systems in Web of Science and PubMed (see also Harzing, 2013).

To build the guideline-based systematic review set, all items that cite PRISMA or one of its descendants have been separated. PRISMA documents have been identified via DOI and title search and relevant DOI’s have been downloaded from the PRISMA website (www.prisma-statement.org), as well as the EQUATOR network website (www.equator-network.org).

Statistical analysis

For the comparison of systematic reviews and PRISMA-based systematic reviews, several variables have been calculated. For the annual and compound growth rates, basic publication outputs were used. All variations other than document types are based on metadata from the Web of Science database. As such, field variations are based on extended subject categories. National variations are based on the fractional assignment of items to the authors’ affiliations, taking all affiliations into account. In addition, the country list has been restricted to those that published at least 500 reviews, or have 10% systematic reviews or 2% PRISMA-based systematic reviews over the whole timespan. In addition, countries that belong to the Commonwealth have been separated (cf. Encyclopaedia Britannica, 2021), due to their stronger commitment to evidence-based medicine (Groneberg et al., 2019).

For the impact assessment, all publications in the set have been compared by the major document types. To employ a 3-year citation window, citation analysis is restricted to the publication years between 2009 and 2015. For comparison, the mean citation impact of each group was calculated in terms of absolute citations, mean normalized citation scores (MNCS) and mean cumulative percentile ranks (CPIN). Both synthetic indicators, MNCS and CPIN are based on normalizations by Web of Science’s subject categories.

In contrast to traditional mean normalized citation scores, percentile measures are more robust against high impact outliers which skew citation distributions of all fields and are a common issue in bibliometric data (Bornmann, 2013). But at the same time, they reduce the data’s level of measurement to an ordinal ranking scale which prohibits any conclusion about the level of differences. However, this does not prohibit conclusions about mean differences of the document types.

Generally, percentile ranks reflect to which citation impact group a publication belongs. Groups can be delineated at wish, for example, a common set is the top 10% percentile, meaning that these papers are cited more often than 90% of rest within this field and publication year (Rousseau, 2012). But citation scores are discrete values which complicate pre-defined rank classes because the percentages are based on the amount of publications while the borders are based on the citation values. For example, if ten publications are cited (0,0,0,0,1,2,2,3,3,3) times, the top ten percent of these would amount one third of the whole set. In order to preserve pre-defined percentile ranking sets, fractional assignments have been introduced which, in turn, give up the binary nature of the ranks (Waltman and Schreiber 2012).

For an optimal percentile measure that can be further aggregated and is also interpretable, Lutz Bornmann and Richard Williams suggeted excluding cumulative percentiles, CPEX (2) and including cumulative percentiles CPIN (1) (Bornmann & Williams, 2020). These are calculated for each combination of classification and year of publication. For these sets all possible citation scores are based on a three-year citation window (I) with \(j, k \in I\), the amount of papers with j citations (\(c_{j}\)) and the total amount of papers (t).

$${\text{CPIN}}_{j} = \mathop \sum \limits_{k = 0}^{j} \frac{{c_{j} }}{t}*100$$
(1)
$${\text{CPEX}}_{j} = {\text{CPIN}}_{j} - \frac{{c_{j} }}{t}*100$$
(2)

After CPIN or CPEX have been calculated, each focal paper can be assigned such values based on field classification, publication year, citation window. Although both represent cumulative percentages, their interpretation is different. CPEX represents the amount of other publications that are cited less than the focal one within the field and year of publication. CPEX always has zero as its lowest value because no publications are cited less than zero times. CPIN represents the amount of other publications that are cited equal or less than the focal one. CPIN always has 100 as its highest value because no publication is cited more than the most cited ones.

Results and discussion

The growth and dissemination of systematic reviews

As displayed in Table 1, the analyzed dataset consists of 9.9 million total items. While most of the items are primary articles (8.1 million), substantial parts of scientific literature are secondary research, or review articles. These consist mostly of non-systematic reviews (950 k), systematic reviews (95 k), as well as PRISMA-based systematic reviews (11 k).

Table 1 Number of publications and growth rates for each document type

Comparing the timespans before (2002–2009) and after the publication of PRISMA (2009–2016), one can witness the overall decrease of the publication inflation from 52 to 39% overall growth. Beside this demise of genres such as “News” items which has a shrinking annual output by − 2.3%, especially the growth rates of primary research articles slowly decrease from annually 5.9% before, to 4.8% after the publication of PRISMA.

In contrast, the set shows a growing output of research syntheses. Beside a slight increase of the annual growth rates of reviews (from 4.5 to 4.7%), especially systematic reviews and PRISMA-based systematic reviews have increasing outputs. While systematic reviews increased annually by over a fifth every year before 2009 (22%) and over 14% since then, PRISMA-based systematic reviews almost doubled (82%) every year since the publication of the guideline, leading to an overall growth of 2675% until 2016. In addition, the overall growth of systematic reviews that do not comply with PRISMA halved since its inception (from 300 to 148%).

Systematic reviews and PRISMA-based systematic reviews play important intellectual roles for various medical fields and communities. Figure 1 shows Web of Science’s subject categories that are commonly assigned to systematic reviews or PRISMA-based systematic reviews. “Surgery”, with rather moderate rates of systematic reviews (18%) and PRISMA-based reviews (2.4%) accounts for the highest total number of PRISMA-based reviews (756 assignments).

Fig. 1
figure 1

Scatterplot of the rates of systematic reviews and PRISMA-standardized systematic reviews among all secondary research of Web of Science subject categories which published at least 500 reviews and 50 systematic reviews from 2009 to 2015

Whenever “Business & Economics” was assigned to a publication in PubMed, it was a systematic review in 42% of the cases which is the top value in the dataset. High rates are also found in “Information Science & Library Science” (38%) and “Medical Informatics” (37%). On the other site, of all assignations of “Chemistry” only 0.5% go to systematic reviews which is the lowest value. Others are “Biophysics” (1.1%) or “Cell Biology” (1.5%).

While “Evolutionary Biology” has a rather low rate of systematic reviews (7.6%), it even was never assigned to PRISMA-based systematic reviews, representing the lowest value in this category. On the other side, “Surgery”, a field in which the guideline was published, tops this variable with 756 assignations and a rate of 2.4% among all research syntheses, topping even broader categories like “General & Internal Medicine” (688 assignments, 27% rate) or “Science & Technology—Other Topics” (681 assignments, 23% rate). Not surprisingly, fields in which the guidelines have been published offer higher rates in both types of assignments.

How the field assignments to PRISMA-based systematic reviews changed over time is shown in Fig. 2. While fields such as “surgery” raised their share from 2.3% in 2010 to 7.3% in 2015, fields like “Obstetrics & Gynecology” decreased from 4.5% of all assignments in 2010 to 2.1% in 2015. Losing its top position in 2010, “General & Internal Medicine” decreased from 14 to 6.6% in 2015 but still marks second rank. Although the ranks change over the years, Fig. 3 shows that the differences in portions and counts between the fields are much smaller compared to what can be observed for systematic reviews in general.

Fig. 2
figure 2

Ranked cumulative proportions of field assignments and cumulative counts to PRISMA-based systematic reviews from 2010 to 2015. Restricted to fields that have at least a 2% rate of overall assignments

Fig. 3
figure 3

Overall number of research syntheses, portions of systematic reviews and PRISMA- based systematic reviews by country, and national association with the Commonwealth. Based on fractional assignment of author’s affiliations. Labels are restricted to countries that published at least 500 reviews, have at least 10% systematic reviews or 2% for PRISMA-based reviews accordingly

The data visualizes some popular narratives about the dissemination and occurrence of systematic reviews and their standardized counterparts. Generally, the differences show that although the guideline developers attempted to make PRISMA applicable for a broad range of fields and academic cultures, the actual levels of implementation and compliance differ substantially. Accepting a new standard means that established practices have to change. Although standardization and the change in standards is a rather regular phenomenon in biomedicine (Whitley, 2000), such attempts are sometimes accompanied by criticism and reluctance (Solomon, 2015; Timmermans & Berg, 2003; cf. Hunt, 1999). For example, communities with a strong track of qualitative research turned out to be rather critical of systematic reviewing (Cohen & Crabtree, 2008).

One reason for strong objections is the more procedural and prescriptive way of how systematic reviews or meta-analyses create medical evidence (Stegenga, 2011). Beside the objection that this employs a rather narrow perspective on scientific methodology and neglects the plurality in medical research (Berkwits, 1998), a proper execution of systematic review methods requires substantial skills in information retrieval, data cleaning or statistics (Moreira, 2007). Not surprisingly, the data shows that when disciplines such as “Information Science & Library Science” or “Medical Informatics” have been assigned to secondary articles, it was often a systematic review or PRISMA-based systematic review.

Beside objections against systematic reviews in some fields, this genre has grown a substantial footprint in fields that are committed to evidence-based medicine. For example, assignations of “Public, Environmental & Occupational Health” and “Health Care Sciences & Services” regularly go to systematic reviews (30% and 33% respectively) or PRISMA-based systematic reviews (3.5% and 4.2% respectively) with accumulating a total of 499 assignations and 323 assignations respectively, to this genre. These values correspond to the idea that systematic reviews are important for medical decision making, the development of clinical guidelines or their overall role for public health assessments—contexts in which systematic reviews proved to be fruitful in the past (Moreira, 2005; Hunt, 1999). Similarly, systematic reviews are suggested to reach final conclusions on certain topics by aggregating all available evidence. While this function is crucial for disciplines such as surgery, where a multitude of treatments have to be appraised (Maheshwari & Maheshwari, 2012), scientific disputes are not always solvable by this way (Vrieze, 2018).

Systematic reviews by countries

Although the biomedical sciences are an international community, national variations in the production of research synthesis are observable, as Fig. 3 suggests. While top producers like the United States (195,164 items), the United Kingdom (50,033 items), Germany (33,166 items), and China (29,940 items) publish the majority of research syntheses, they differ in their commitment to systematic reviews and PRISMA-based systematic reviews. While in the U.S. and Germany, less than a tenth of published reviews are systematic reviews (6.8% and 7%), this genre accounts for greater portions of reviews in the U.K. (18%), and especially China (39%). All four countries differ in their commitment to PRISMA-based systematic reviews in a similar manner. With U.S. and Germany having the lowest rates (both < 1%), higher rates can be found in the U.K. (2.1%) and China (4.0%). Notably, authors from Egypt published only 587 review articles of which only 11% have been systematic reviews but 6.6% PRISMA-based systematic reviews which is the highest rate in the dataset.

The annual ranks of national assignments to PRISMA-based systematic reviews are displayed in Fig. 4. In the first year after the inception of the guideline, leading producers were the United Kingdom (23%), the United States (21%) as well as Canada (12%). While the U.S. took first position in 2013, accounting for 20% of all PRISMA-based systematic reviews in 2015, especially China showed even higher growth rates. While marking seventh with 4.8%, its portions grew steadily year over year, taking the second rank in 2015 with over 1000 items and 18% of overall share.

Fig. 4
figure 4

Ranked cumulative proportions and cumulative counts of country assignments to PRISMA-based systematic reviews from 2010 to 2015

Groneberg et al. have shown that strong commitments towards evidence-based medicine can be associated with the Commonwealth (Groneberg et al., 2019). Similarly, the analysis of document types shows that such countries, especially the U.K., Canada, Australia or New Zealand, produce higher rates of systematic reviews and its PRISMA-based counterpart than other leading science nations (Figs. 3 and 4). Other countries with high rates of these genres, most notably the Netherlands or Denmark, are also among the top ten adopters to evidence-based medicine (ebd.). Although the “evidence-based” movement promotes and comes with a variety of knowledge generation practices, systematic reviewing remains an important cornerstone in EBM as other studies have indicated (Blümel & Schniedermann, 2020; Ojasoo et al., 2001). In that sense, the data provided here corresponds with other studies that mentioned the link between evidence-based paradigms and the emergence of systematic reviews (Timmermans & Berg, 2003; Straßheim and Kettunen 2014).

Besides being fourth in absolute number of reviews, China offers substantially higher rates of systematic reviews (39%) as well as PRISMA-based systematic reviews (4.0%) than all other countries. Although the exact causes may be very complex and unable to identify, two potential causes are proposed. First, being a rising science leader, the majority of Chinese scientists are latecomers to the international community (see Liang et al., 2020). As such, this group may be biased towards a more recent education in the conduct and reporting of biomedical studies. At the same time, Chinese science policy heavily incentivized the production of scientific literature which recently made China the global leader in scientific publishing (Stephen & Stahlschmidt, 2021). Second, due to its recent attempts for healthcare reform, China turned more explicitly towards evidence-based medicine and its commitments to systematic clinical research. Together with its growing overall output, China outnumbered even other countries that employ research-based medical systems (Wang & Jin, 2019).

Citation impact of systematic reviews PRISMA

Differences in the citation impact of the analyzed document types are displayed in Fig. 5. All three different citation indicators reveal the same resulting ranks, with the lowest values for articles and highest values for PRISMA-standardized systematic reviews. The latter receive 14 absolute citations (mean = 22), 1.57 normalized citations (mean = 2.67) and the 87% percentile rank (mean = 81%), meaning that 87% of the other document types in the same field receive equal or less citations in three years after publication. In comparison, other systematic reviews receive 12 absolute citations (mean = 19.5), 1.41 normalized citations (mean = 2.36) and reach the 83% percentile rank (mean = 76.2%). Reviews and articles rank lower correspondingly. Mean differences are highly significant with Kruskal–Wallis (p < 0.001) for all three sets as well as all pairwise Wilcoxon rank sums (p < 0.001).

Fig. 5
figure 5

Box plots with median and mean (dots) values showing citation impact ranks by document type for absolute citations (ABS), mean-normalized citations (MNCS) and cumulative percentile ranks (CPIN), based on publications published from 2009 to 2015 with a 3-year citation window. The y-axes for ABS and MNCS are stripped of extreme outliers for the purpose of better visualization

In Fig. 6, the development of the annual mean CPIN’s shows the dynamics of the impact ranks and corresponding errors for three different citation windows (see Wang, 2013). While 85% of all publications were equally or less cited in a 3-year window than PRISMA-standardized systematic reviews published in 2009, it decreased to 80% in 2013 and 79.7% in 2015. These developments are similar for the 5-year citation window, coming from 85% in 2009 over 80% in 2013 and 79,9% in 2015. In comparison, systematic reviews came from 80% in 2009 to 74% in 2015 for all three citation windows. Note that since citation data up to 2018 is used, the 10-year window basically represents the whole available citation data. Citation windows seem to play only a minor role. Compared with the 10-year window, the 3-year window bears higher values for articles and reviews, while having slightly lower values for the standardized review formats. Focusing on items published in 2009, the shorter span reveals CPIN differences of + 1.84% for articles, + 2.62% for reviews, -0.03% for systematic reviews and -0.4% for PRISMA-standardized systematic reviews.

Fig. 6
figure 6

Average annual cumulative percentile ranks (CPIN) for each document types and corresponding errors, grouped by three different citation windows. As 2018 is the maximum year of citation data, the 5- and 10-year windows are short of data in 2014 and 2009 respectively

Rankings are the same for all three citation windows. The structurally higher impact of systematic reviews and PRISMA-based systematic reviews is constantly decreasing, since both genres growing faster in publication counts than ordinary articles or reviews. In addition, the results provide greater error values for PRISMA-based systematic reviews since this group has the fewest item counts (6 items in 2009).

The differences in the changes of CPIN values indicates that the document types have different citation patterns, with articles and reviews are cited more consistently in the long run. On a first glimpse, phenomena like delayed recognition (Ke et al., 2015) or the “kiss of death” (Lachance et al., 2014) may influence long-term citation impact of primary research and syntheses differently. However, the different intellectual function of the systematic review, whether standardized or not, in comparison of what scholars now call “narrative review” (Ferrari, 2015) may explain why the former have a shorter citation lifespan. Since systematic reviews are usually designed to achieve consensus on a very particular research question, they draw much more on recent research and become outdated by new findings. Ideally, they are thought to be updated by the same or other authors whenever new trial results or review methods occur (Elliott et al., 2017). This limits their potential citation lifespan, since medical experts are supposed to rely on the most recent findings. In contrast, common reviews are usually of broader scope and may serve a wider array of intellectual functions, such as introductory or educational material, defining the field’s current missions or trends or even render field formation (Grant & Booth, 2009).

The results of the citation analysis provided here show that PRISMA-standardized systematic reviews have a higher citation impact than systematic reviews, reviews or primary articles. In the following, two lines of discussion are proposed.

First, research that complies with reporting standards is credited by higher citation impact afterwards, since it is of higher quality (1). While the conception that ‘citation indicates quality’ still perpetuates in biomedicine and elsewhere (for example, Abramo & D’Angelo, 2011; Durieux & Gevenois, 2010), authors from other domains at least uphold the idea that citations correlate with some specific epistemic values such as rigor or relevance (Molléri et al., 2018). Such a rather traditional conception of impact was common during the early phase of evaluative bibliometrics, but can only found occasionally today. Speaking of a Kuhnian revolution in bibliometrics, Bornmann and Haunschild argue that in contemporary evaluative bibliometrics, citation impact should be considered as only one aspect of a multidimensional concept of research quality, especially there are many different citation behaviors (Bornmann & Haunschild, 2017). For example, authors cite publications if they value the latter’s solidity and plausibility, originality and novelty, scientific value like topical relevance, or societal value, which are complex dimensions that sometimes even conflict each other (Aksnes et al., 2019).

Based on the interpretation that greater quality leads to higher citation impact, there is an important limitation to this study. In the analysis, all systematic reviews that cite the guideline and comply with its first requirement are assigned to the standardized systematic reviews set equally. But meta-studies in biomedicine have shown that the level or quality of guideline compliance varies a lot (Pussegoda et al., 2017). Therefore, to uphold a rather mechanistic relation between quality and citation impact, observed citations in the created set must be normalized by the level of guideline compliance in further research.

Second, beside the interpretation provided above, the results from this study may provide useful input to another interpretation of standards and scientific excellence. So, whatever makes research high impact, it also makes it more open towards standardization and more likely to adopt to new methodological standards (2). Since biomedicine could be understood as what Richard Whitley has called a “professional adhocracy” (Whitley, 2000, 187) standards occur to establish new research practices or tools and further enable formal communication. Authoritative organizations like Cochrane or leading experts in the development of standards form networks in which they negotiate the content and domain of a new standard (Solomon, 2015). By finding consensus, those networks redefine the borders of what counts as a properly reported systematic review which excludes those reviews that do not comply with the standard (for example, Yuan & Hunt, 2009; see also Gieryn, 1999). In this setup, high impact researchers also serve as first movers in adopting the standard. After the leading scientists have defined these standards, other researchers consent afterwards in order to be part of that those that publish ‘the good’ systematic reviews. Similar dynamics have also been observed for academic communities consenting on theories or methods (cf. Luetge, 2004; Zollman, 2007), or the role of reviews in the formation of new disciplines (Blümel, 2020).

In respect to the two different interpretations, the contribution of this analysis to bibliometric research is twofold. Generally, the provided results have revealed some basic characteristics and dynamics of a specific standardization in biomedical research. Together with further research questions about the dissemination and application of PRISMA, for example the characteristics of its developers or adopters, this analysis represents an informative usage of bibliometric data in studying the social aspects of biomedicine or science in general (Gläser & Laudel, 2001; Wyatt et al., 2017).

But identifying standardized research can also contribute to the evaluation of scientific outputs or the development of institutional quality frameworks. As mentioned above, reporting guidelines such as PRISMA have strengthened conceptions about transparency as a fundamental aspect what has to be considered good research. Not surprisingly, reporting guidelines play an important role in the Hong Kong principles for assessing researchers (Moher et al., 2020). Bibliometric data offers the potential to interpret citation in a more diverse form, for example examining the role of method papers, software packages or standards (cf. Li et al., 2017). Similarly, bibliometric analyses of reporting guidelines can enable research evaluations in this respect. In addition, more fine-grained discriminations of document types can ensure that published items are evaluated according to their intellectual contribution.

Conclusion

This study provided a comparative analysis of different document types in biomedical research. In order to understand the relation between methodological quality and citation impact, indexing data from PubMed was used to differentiate between ordinary systematic reviews and those that comply with the PRISMA reporting standard. Besides providing a general overview about the growth and dissemination of systematic reviews and their standardized counterparts, their citation impact was compared by using different indicators.

The results show that although the growth rates of all biomedical publications decrease, there is still a strong annual growth in systematic reviews and PRISMA-based systematic reviews. Focusing on subject categories, both types of systematic reviewing occur especially in fields that are related to epidemiology and public health. Although the number of assigned subject categories grew, a great portion of PRISMA-based systematic reviews are assigned to fields in which the original guideline was published. While the top producers of scientific literature also dominate the numbers of published systematic reviews, especially countries with a strong focus on evidence-based medicine achieve higher portions of systematic reviews and PRISMA-based reviews. In addition, China is the fourth biggest producer of systematic reviews and also provides substantially higher rates than other leading nations.

Ranking the citation impact of the different document types has revealed that PRISMA-based systematic reviews dominate irrespective of indicator and citation window. It was discussed that this dominance could represent the idea that methodological quality leads to higher citation impact and explain why this conception still perpetuates although it is dismissed in contemporary science studies. In contrast, the results may show that whatever makes authors achieve high citation impact also leads them to willingly apply new methodological standards. Irrespective of which interpretation one favors, bibliometric research could benefit from a more nuanced differentiation of document types in order to evaluate them in respect to their intellectual roles.