Introduction

In Germany, clinical researchers are incentivized to publish in high-ranking journals. These journals are ranked by their journal impact factor (JIF) score, which is calculated by dividing the number of citations by the number of articles published in the previous 2 years. On the one hand, the JIF has been praised for its objectivity as a scientific performance indicator as opposed to peer review networks. On the other hand, the deficiencies of the JIF, including its vulnerability to various “gaming” and the knowledge that higher citation counts are less likely from lesser-known or more specialized research fields are well documented (Alberts, 2013; PLoS Medicine, 2006). Authors are encouraged to complete research on safe topics, that a large number of others can reference as opposed to unique or specialist work, when being evaluated by the JIF (Alberts, 2013). While the JIF measures impact through journal citations and provides evidence of scientific impact through the spread of citations in the scientific community, medical guidelines might provide insight into which research affects patient care (Eriksson et al., 2020) and inherently have societal impact through policy (Tousoulis & Stefanadis, 2014). However, most performance or evaluation measures do not acknowledge contributions to medical guidelines through either cited references or guideline authorship (Herrmann-Lingen et al., 2014).

The Association of the Scientific Medical Societies (AWMF) serves as the foremost publisher of medical guidelines in Germany. The AWMF is a professional, scientific network consisting of one hundred-eighty member societies and three associated societies from the whole range of medical specialties and health-related areas. This network advises the government of the Federal Republic of Germany and the governments of the German federal states on all topics of scientific medicine and medical research and classification. It represents Germany in the Council for International Organizations of Medical Sciences (CIOMS). In 1995, the Advisory for Concerted Action in Health Care requested that the AWMF and its scientific members create a quality-controlled collection of medical recommendations. From here on out, these recommendations will be called the AWMF medical guidelines. These guidelines are an accumulation of scientific research intended to provide relevant diagnostic, preventative, and treatment information (Kryl et al., 2012)which help to bridge the gap between research and practice (Burgers et al., 2002).

Ovseiko et al. suggest that citation in the guidelines demonstrates utility. It is difficult to argue that utilizable research should be incentivized because it benefits researchers, the public, policy, and provides health gains (Hanney et al., 2003). While contributions to the medical guidelines might be considered a worthy measure of societal impact to be incentivized, evidence-based policy change is challenging and research intensive (Ovseiko et al., 2012). Considering the challenges involved in policy change, this article explores to what extent current performance indicators (JIF) already capture the impact of clinical guideline contributions by investigating whether and to what extent the journal impact factor of a reference is relevant in guideline development.

We hypothesize that (a) the 2–5 year JIF window does not reflect the “knowledge cycle” of article production to inclusion in the AWMF medical guidelines (Grant et al., 2000) (b) articles cited as justification for specific guideline recommendations are typically also cited more often by other publications than articles only mentioned in the guideline background text and (c) the references cited by the guidelines are not correlated with the references’ JIF. Addressing these issues should provide clarity on whether or not current evaluation solutions are sufficient to reflect contributions to the AWMF medical guidelines as a measure of impact.

Methods

We identified high-quality medical guidelines from the AWMF website. Following the identification of the appropriate guidelines for this study, we explored the type of cited references (books, articles, or reviews) through a reference analysis. An additional reference analysis was performed by extracting a sample of the guidelines and reviewing the literature, which either only provided background information or was used for supporting recommendations. Through a publication date analysis, we investigated how long after publication articles were considered relevant. We were mainly interested in how many references had been published during the 2 years preceding guideline publication (“2-year JIF window”). Lastly, we investigated the relevance of the JIF for selecting references for inclusion in the guidelines by performing correlation tests between the citation count of articles from particular journals and the JIF of publications referenced by the guidelines as a whole and between the citation count and the JIF of publications within topic-related guidelines.

Medical guidelines

We identified the S2e and S3 medical guidelines from 2017 and 2018 as high-quality guidelines relevant to our study (See Online Appendix A). These guidelines are categorized as either “Evidence-based” (S2e) or “Evidence- and consensus-based” based (S3; see Table 1). They were extracted in January 2019 from the AWMF website (https://www.awmf.org/leitlinien/leitlinien-suche.html). The medical societies in charge and the guidelines they coordinated were documented based on the subject code of each guideline. The number of guidelines by medical societies in charge can be found in Online Appendix B.

Table 1 Description of the two evidence levels of the included guidelines

The guidelines are available as pdf-documents and have neither meta-data nor separate automatically extractable reference lists. The references were therefore extracted, using the ‘PDF-XChange Editor’ (Tracker Software Products (Canada) Ltd., Version 7), into an excel sheet and then imported into ‘Citavi’ (Swiss Academic Software GmbH, Version 6.3), a reference manager. The reference manager matched the imported references with online databases (mainly PubMed and Crossref) and filled in relevant bibliographic data (authors, title, journal, year published, affiliations etc.) when possible. References that were not matched in Citavi were revised by hand and completed when possible. The references were organized by the guidelines in which they were cited.

Reference and publication date analyses

The reference manager’s various search fields identified the type of reference. The Review, Guideline, and Book categories were determined by searching in the ‘title’ field for ‘review,’ ‘guideline,’ or ‘Leitlinie’ and in the ‘reference type’ field for ‘book’ and ‘book, edited’.Footnote 1 All other references were included in the category Original Article. The references of the guidelines were sorted by publication year and counted. These citations, organized by publication year, were displayed, explicitly highlighting the 2-year window preceding guideline publication in order to compare the relevancy of newer (from an impact factor perspective) and older references in the guidelines. This was completed by jointly analyzing references from guidelines published in 2017 and 2018 (Table 2) and separately (Online Appendix C).

Table 2 Type of references broken down by time frame

Background vs. recommendation

A sample of the guidelines, characterized by author affiliation to at least one of three pilot faculties in Germany, was taken. References of the guidelines authored by members of at least one of these affiliations were qualitatively reviewed for whether they directly supported clinical recommendations or whether they were only cited in the background text, providing context to the recommendations.

Top journal citations: JIF correlations and representation in the guidelines

The references from the top 100 cited journals in the medical guidelines were counted. The journals, sorted by citation count, can be found in the appendix with their respective 2017 (or last recorded) JIFs, citation and JIF rankings, and their relative and cumulative citations (Online Appendix D). We assessed to what extent the number of citations correlates to the journal impact factor in both the top 50 and top 100 cited references. Lastly, the relative and cumulative citations were calculated for each of the journals in the top 100 journal data set.

Top journal citations: JIF correlations within guideline categories

We took three random guidelines from each of the five medical societies that have developed the most guidelines (see Online Appendix B). For each guideline, the references were sorted by citation count, and when possible, the journal impact factor of the guideline’s publication year for each cited journal was recorded. For journals with no impact factor, we used a value of 0.2 as recommended by the AWMF for purposes of research fund allocation. For further analysis, we completed Kendall’s Tau correlations between citations and their respective journal impact factors. Correlations were calculated not only for the individual guidelines in the specific category but also for the entire guideline subject.

Results

Medical guidelines

Seventy medical practice guidelines that met the criteria were found (see Online Appendix A): 31 published in 2017 and 39 published in 2018. Of the 70 guidelines, 62 are categorized as level S3 and 8 as level S2e. Four of the 70 guidelines were only available as a ‘Konsultationsfassung’(Consultation version), a publically available version of the guideline, which is overall finished, but still open for feedback by experts.

Reference and publication date analyses

A total of 33,473 references were extracted from the 70 guidelines. Of these, 31,894 could be specified with at least basic information, including title, author(s), and year of publication. Figure 1 shows the temporal distribution of all identified references from the 70 guidelines. The references date from 1775 to 2018. In order to establish a 2-year JIF comparison, references published within 2 years of their respective 2017 or 2018 guidelines were counted. For guidelines published in 2017, the portion of references published in the two previous years (2015 and 2016) was 7.9%. For guidelines published in 2018, the portion of references published in the two previous years was 10.1%, while > 50% of references had been published before 2010 (see Online Appendix C). In summary, the majority of the references in the guidelines exist outside JIF windows.

Fig. 1
figure 1

Guideline reference publication year and reference type

A total of 26,577 references included additional publication data for analysis. In the guidelines, 86.2% of references were original articles, reviews represent 9.49% of references which include sufficient publication data, 3.73% of cited articles are books, and 3.39% of references are guidelines. Table 2 (see below) outlines the reference type broken down by time frame.

Background vs. recommendation

From the 401 references authored by members of our pilot faculties sampled from the guidelines, 270 references provide context as background references only and 131 references provide clear rationale for clinical recommendations (see Fig. 2).

Fig. 2
figure 2

Pilot faculty background and recommendations in the medical guidelines

Results indicate that articles providing solely background information (M = 8.71, SD = 10.8) are preferably cited within the guidelines over articles which suggest clinical recommendations (M = 4.23, SD = 5.52), t(60) = 2.0522, p = 0.044. However, citations by other published articles were more frequent for references supporting clinical recommendations (13.0 citations per year from publication on average) than for those providing background information only (7.8 citations per year from publication on average).

Top journals cited by the guidelines

The figures below show how the citations and the 2017 journal impact factors of journals in the guidelines are represented in the S2e and S3 guidelines, with and without the top 3 outliers (NEJM, Lancet, JAMA, each showing JIF values > 40). Figure 3 shows the citation numbers of the top 100 journals and their journal impact factors with a trend line. See Online Appendix D for a list of the top 100 journals. The slope indicates a positive correlation between the number of citations and the journal impact factor. (Fig. 4).

Fig. 3
figure 3

Top 100 citations and their 2017 journal impact factors

Fig. 4
figure 4

Top 100 citations and their 2017 journal impact factors without outliers

However, the correlation was only of moderate size (Tau = 0.35) and further decreased to Tau = 0.31 when excluding the top 3 journals with extraordinarily high JIFs. The effect of outliers was even more extensive if only the top 50 journals were considered, where numbers of citations showed only small correlations with the JIF (Tau = 0.22 overall and Tau = 0.10 without outliers). Interestingly, the two journals most cited by the guidelines (Journal of Clinical Oncology, Cochrane Database of Systematic Reviews) show much lower JIFs than the typical “high impact” journals.

Guideline representation

Lastly, we explored relative and cumulative citations. The top 50 journals in the guidelines represent 44.02% of guideline citations, and the top 100 journals in the guidelines represent 57.95% of guideline citations. All journals after the 11th citation-ranked journal each represent less than 1% of the total citations in the guidelines. Journals after the 100th citation-ranked journal each represent less than 0.192% of the total citations in the guidelines. The relative and cumulative citations for the most cited 100 journals can be found in Online Appendix D.

Guideline categories

Kendall’s Tau correlations between citation count and JIF, for three of each of the 021 (Gastroenterology, Digestive and Metabolic Diseases), 032 (Cancer), 043 (Urology), 053 (General and Family Medicine), and 083 (Dentistry, Oral, and Maxillofacial Medicine) guidelines were calculated. These correlations and the year of guideline publications can be found in Table 3 (below). Figure 5a and b show the most cited journals in sample guidelines from Urology (code 043) and General Practice (code 053). Similar to Figs. 3 and 4, the figures show a scatterplots of the most cited journals and their respective journal impact factors, now broken down to the level of individual guidelines. As can be seen, most correlations for individual guidelines or subject areas were also in the low positive range that was mostly not significant. Some of the correlations were even negative, though also not significantly so. Only one of the guidelines (053-024) showed a significantly positive correlation of moderate size (Tau = 0.497; p = 0.001).

Table 3 Kendalls’s Tau correlations of the 021,032,043,053, and 083 guidelines
Fig. 5
figure 5

a Journal Citations and JIF of Guideline 043-044, b Journal Citations and JIF of Guideline 053-024

Discussion

Our study analyzed citations by German high-quality medical guidelines. For the reason that these guidelines have not yet been studied from a bibliometric perspective, we found relevant new information on the temporal distribution of guideline references, citation frequencies for references in background text vs. clinical recommendations, and correlations between journal citation numbers by the guidelines and the respective journals’ impact factors.

The Web of Science shows decreasing citation trends 2–3 years after publication (Eriksson et al., 2020). We found however, that even though recent research is cited more than older publications in the AWMF clinical guidelines compared to other publication years, less than 10% of references were from the 2 years preceding guideline publication, and more than 50% of guideline references are more than 8 years old. These findings are in agreement with Tousoulis et al. (2014), who also report that older publications are still frequently cited by medical guidelines which is a statement to their continued relevance past JIF standards.

In the development of the AWMF clinical guidelines, journal articles are preferred over books, reviews, and other guidelines. The guidelines also cite about twice as much background information as direct recommendations, although direct recommendations are cited more often externally. Also, when looking at the guidelines as a whole, there was only a weak correlation (Tau ≤ 0.35) between guideline citation numbers for particular journals and the JIF. More importantly, correlations are very low when looking at most guidelines from a guideline-specific or subject-specific perspective. Furthermore, references in the guidelines maintain usefulness regardless of their impact factor. For example, the Cochrane Database of Systematic Reviews had a moderate 2017 journal impact factor of 6.754. However, it is cited 811 times in the guidelines, making it the second most-cited source journal. On the other end of the spectrum, CA-A Cancer Journal for Clinicians boasts a JIF of 244.585 (2017) but is cited only 18 times in the guidelines and did not even make it to the top 100 journals. This suggests that authors of the guidelines scan through hundreds of different sources, independent of JIF, to meet their needs.

Strengths and limitations

The strength of our analyses lies in the full coverage of German high-quality guidelines from two successive years, yielding a good representation of guidelines across the whole field of medicine and of the references cited by them.

One issue in bibliometric studies is the accuracy and completeness of data retrievable, especially when it comes to dated research. While on the one hand, we were able to retrieve an extraordinary number of references, the completeness and accuracy of them is not perfect. Some references were missing data and the Citavi Reference Manager may have some errors through its automation recognition of sources. To what extent the relevant downloaded metadata of the references are accurate would require a manual check of over 33,000 references. Such a manual check might provide some insight into what types of data are missing.

Moreover, the AWMF medical guidelines currently lack metadata or labeled information. The references of the guidelines are not marked with successor-predecessor indicators and each guideline would need to be reviewed by hand to determine which references continue to be relevant, which references newly contribute to the guideline topic, or which references are simply carried over from previous versions.

In agreement with Eriksson et al. (2020), the AWMF medical guidelines, or other clinical practice guidelines, would benefit from well-managed and well-labeled references in a digital database. Not only would this provide a sense of transparency and help readers of the guidelines quickly review changes, but it would also help interested parties analyze the guidelines and provide a richer source of impact evaluation.

Conclusion

Whether or not it fully captures the scope of research impact, the journal impact factor plays a role when evaluating research or researchers. The necessity to investigate to what extent it can be used to measure or evaluate other forms of impact must be a starting point when discussing research evaluation. With that said, our first findings suggest that the temporal distribution of guideline references exceed the 2 or even 5 year “JIF window.” Relevance of older and newer articles could be reviewed with comprehensive labeling, review, or justification of the use of older citations (ex: this citation continues to be relevant because the treatment continues to be used widely in clinics across Germany). This inquiry in particular could provide insights into not only how the guidelines are updated but also into how long research continues to ‘impact’ patient care.

Also interesting, is the shift in which research is cited more often. As it stands, research focusing on clinical recommendations is cited more often externally, and therefore, more likely to be rewarded for it. It is not inconceivable that integrating a medical guideline performance indicator could balance the incentive, to some extent, in producing quality research whether it provides clear clinical recommendations or not.

All together, our study found that the Journal Impact Factor plays no relevant role in the medical guidelines with regard to publication dates, or as criteria for the selection of references. In order to capture the impact of guideline references, a new indicator should be considered and developed that reflect not only the “knowledge cycle” (Grant) of articles to guideline publication, but also the societal impact. While medical guidelines innately demonstrate societal value, further study of the AWMF guidelines might suggest that the citations of their references instead of the JIF of the references, like many other clinical guidelines, have higher citation rates than other articles in their respective journals (Thelwall et al., 2017).