Introduction

Reference Publication Year Spectroscopy (RPYS) is a method which has been developed for detecting historical roots and landmark papers in a certain research field (Liao et al., 2019; Thor et al., 2018b, 2021). RPYS visualizes how frequently cited references occur in the cited reference lists of papers published in the research field (Marx et al., 2014). Historical roots and landmark papers are those cited publications which occur very frequently in the cited reference lists compared to other cited publications. RPYS has also been shown to reveal various historical aspects of the research and publication culture in various fields of study. Wray and Bornmann, for example, employed RPYS in a study of philosophy of science (see Wray & Bornmann, 2015). They discovered that some of the cited publications with many occurrences are objects of study in philosophy of science, rather than being influential for their scholarly contribution to philosophy of science itself.

In this study, we propose to employ RPYS in an effort to better understand the historical research and publication culture in immunology. Our study focuses on the period from 1980 to 2021 (publication years of citing publications from the field). This period includes at least two major events in medicine involving immunology, namely AIDS-HIV and the COVID-19 pandemic. Moreover, it involved major advances in the understanding of innate immunology and T cell biology.

Our analyses focus on 20 cited publications from immunology papers with many occurrences in the period that we are studying. We found that these publications do not only report important discoveries in immunology, but also move the field forward introducing new methodologies. In fact, 35% of the publications present developments in methodology. These methodological developments, though, frequently have broader application than in immunology. Consequently, they are not aptly characterized as contributions to immunology narrowly conceived. We also found that 35% of the cited publications with many occurrences in our dataset concern research on T cells. T cells play a crucial role in the immune system. The remaining 30% of the cited publications with many occurrences are a consequence of other conceptual, theoretical, or empirical discoveries in immunology.

Methods

In contrast to the times cited analysis that is prospectively oriented, the cited references analysis is historically oriented. This historically oriented analysis is interested in the shoulders on which research in a certain research field (here: immunology) stands. The starting point of the cited references analysis are the citing publications from the research field in question. The cited references analysis counts how frequently cited references from the reference lists of citing publications occur in the reference lists of the citing publications. The identification of cited publications with many occurrences (these are possible landmark papers in the field) can be supported by spectrograms showing the distribution of the number of cited references over the publication years of the cited references. Peaks in these distributions point to cited references with many occurrences. The use of spectrograms in cited reference analyses is named RPYS.

We analyzed the spectrogram from RPYS regarding relevant citation peaks of the five-year median deviation. If the peak is very high for the cited reference year t and has a positive value for the number of cited references, many cited references (from the immunology field) fall on year t—compared to the years: t-2, t-1, t + 1, and t + 2. The RPYS is based on the principle that high peaks, that is, deviations, are hints to important publications in a research field; especially those peaks in early cited reference years point to the historical roots and landmark papers or citation classics of immunology. It is the premise of RPYS that important publications for researchers in a field, in this case immunology, are often cited in their publications (Bornmann & Marx, 2014).

Tukey’s fences (Tukey, 1977) were used in this study to support the identification of the most important peaks in the spectrogram: Important peaks are flagged based on the interquartile range of the median deviations (Thor et al., 2018a). The RPYS spectrogram was plotted using R (R Core Team, 2021) with the R package ‘BibPlots’ (Haunschild, 2021). In addition to the static plot in this paper, we produced an interactive RPYS version using the R package ‘dygraphs’ (Vanderkam et al., 2018).

Dataset

This study is based on papers assigned to the Web of Science (WoS, Clarivate) subject category “Immunology” that were published between 1980 and 2021. The metadata of the papers were extracted from the custom database of the Kompetenznetzwerk Bibliometrie (see https://bibliometrie.info/). The extracted files were imported in CRExplorer 2.0 (see www.crexplorer.net) for further processing (Thor et al., 2016a, 2016b). The export contains metadata of 1,125,330 citing papers, including 6,270,528 cited references. We restricted the publication years of the cited references to the period between 1800 and 2022. We do not anticipate that important contributions to immunology would have been published before 1800. The clustering and merging functionalities of CRExplorer have been used to clean the cited references dataset by variants (i.e., referenced variants of the same cited publication). Volume and page numbers have been used for the disambiguation process. Furthermore, we removed all cited references that were referenced less than 100 times (i.e., cited publications with less than 100 occurrences) to focus on important publications with considerable citation impact. The final dataset which has been used for RPYS contains metadata of 658,232 citing papers from 1980 to 2021 containing 27,331 cited references from 1900 to 2021.

Results

The Reference Publication Year Spectroscopy (RPYS)

Figure 1 shows the results of the RPYS analysis based on the citing publications in immunology. The figure presents the number of cited references (grey bars) and the deviation of the number of cited references in one reference publication year from the occurrences of cited references in the two bordering years (blue line). The most important peaks that had been identified by Tukey’s fences were marked by a star and the publication year of the cited references belonging to the peak is presented.

Fig. 1
figure 1

Number of cited references (grey bars) and median deviations of cited references (blue line). The peaks (with positive values) in the blue line show reference publication years with a significantly greater number of cited references than bordering years. The most important peaks identified by Tukey’s fences are marked by stars and the publication years of the cited references belonging to the peaks. An interactive version of the figure can be viewed at: https://s.gwdg.de/vX3Fr7. The years before 1930 are omitted in the static figure for better readability. The interactive online figure also includes the years before 1930. (Color figure online)

Table 1 shows the publications which are responsible for the peaks in Fig. 1 which had been identified by Tukey’s fences. The peaks are located at the following reference publication years: 1951, 1970, 1981, 1989, 1995, 1998, 2001, 2002, 2007, and 2020. Besides the bibliographic information, the table shows the number of occurrences of the cited publications (in other words, how often the publication has been cited by papers in immunology). The selection of the cited references underlying the highest peaks for Fig. 1 is partly arbitrary (by visual inspection of the results in CRExplorer). There is no established statistical method available which could be used for the selections. Thus, more or fewer cited references can be selected for identifying the landmark papers of immunology.

Table 1 Cited references with the largest number of occurrences (N_CR) in reference publication years (RPYs) with the highest peaks in Fig. 1 identified by Tukey’s fences

Seven of the 20 publications (35%) in Table 1 that are mainly responsible for citation peaks are methodological papers. But these papers are heterogeneous as the developments in methodology during this period in immunology have been varied. Sociologists of science have emphasized the important role developments in instrumentation have played in the creation of scientific fields (see, for example, Law, 1976; Mulkay & Edge, 1976). Thus, it seems that an interesting history could be written about the developments in methodology in immunology since 1950, and their impact on the theoretical content of immunology. Let us briefly consider the nature of these highly influential methodological papers.

The paper by Lowry et al. (1951) is a typical methodological paper, reporting on an experimental procedure. As they explain: “procedures are described for measuring protein in solution or after precipitation with acids or other agents” (Lowry et al., 1951, p. 265). This procedure has broad application beyond immunology. In fact, it is used in all research disciplines where proteins are analyzed, including, for instance, cell biology, infection biology, and protein chemistry. Similarly, the paper by Laemmli (1970) also reports on a new experimental method, specifically “a method of gel electrophoresis” (Laemmli, 1970, p. 680). This method allowed scientists to separate proteins by size and hence study individual proteins in more detail.

In contrast to these papers, which report laboratory techniques, Taswell (1981) reports on new methods of analysis. Taswell makes clear that “the method presented … provides a simple and rapid procedure for the valid determination of immunocompetent cell frequencies” (Taswell, 1981, p. 1614).

Burnettte (1981) also reports on a new experimental method, specifically the Western Blotting, a technique used for the detection of proteins. And Sambrook et al. (1989) is also a contribution to laboratory methods; but this publication is a laboratory manual, providing information on numerous methods. It was influential in the early days of molecular cloning. Livak and Schmittgen (2001) present a method of analysis, termed “real-time quantitative PCR” which allowed for the determination of relative gene expression (Livak & Schmittgen, 2001, p. 402). And, finally, Tamura et al. (2007) present new software for genetics analysis.

This brief survey of the methodological contributions responsible for citation peaks in immunology provides a useful history of factors affecting the field during the period covered by our study. In addition to developments in laboratory techniques, there have been significant advances in techniques of analysis, including the development of specialized software.

An equal number of papers as the methodological contributions discussed above are concerned with advancing our understanding of T cells. T cells play an important role in our immune responses and is the topic of three papers from 1981 that are responsible for a citation peak: Kappler et al. (1981), Uchiyama et al. (1981), and Robb et al. (1981). Mosmann and Coffman (1989) is also concerned with T cells, as is Sakaguchi et al. (1995). These papers are all concerned with how T cells “learn” tolerance against self and the identification of different subsets of T cells with different functions. And two papers in 2007 are also concerned with T cells, specifically Coombes et al. (2007) and Bendelac et al. (2007). Coombes et al. (2007) provides mechanistic information about how immunoregulatory T cells (Tregs) work, and Bendelac et al. (2007) is a review article that summarizes data on a then-novel immune cell type.

The remaining six papers in Table 1 that are associated with citation peaks cover a variety of topics, but all constitute contributions to empirical or theoretical knowledge in immunology rather than methodological advances. Przepiorka et al. (1995) is concerned with transplants. Rather than being a typical research article, this paper is from a conference proceeding. Banchereau and Steinman (1998) is concerned with dendritic cells. Poltorak et al. (1998) is concerned with defective signaling and uses mice models. This paper is important because it confirms Charles Janeway’s (1989) postulate that cells are endowed with a set of germline-encoded receptors that sense microbial molecules and induce inflammation and activation of T cells. Palella et al. (1998) deal with antiretroviral therapies, specifically in the context of AIDS-HIV research. Janeway and Medzhitov (2002) is concerned with toll-like receptors (TLR), following the prediction of their existence by Janeway (1989), and their experimental confirmation between 1996 and 1998 (see Lamaitre, et al., 1996; Medzhitov et al., 1997; Poltorak et al., 1998). Huang et al. (2020) focus on COVID-19. Both Palella et al. (1998) and Huang et al. (2020) are responding to wide-spread health crises, AIDS-HIV in the first case, and COVID-19 in the second case.

Part of the motivation for our study was a hunch that Janeway (1989) was a significant paper in immunology. With a number of occurrences (N_CR) equal to 961, the paper received many citations. But it is not at the top of the cited references list in 1989. A times cited analysis in Clarivate’s Web of Science database reveals that Janeway (1989) received only about half of its citation count from immunology. About another thousand citations originate from other fields related to biomedical research for example: “Biochemistry Molecular Biology”, “Cell Biology”, “Medicine Experimental”, and “Microbiology”. But we did find that another paper by Janeway, specifically, Janeway and Medzhitov (2002), was responsible for one of the peaks in the spectrogram. Whereas Janeway (1989) predicted the pattern recognition theory, Janeway and Medzhitov (2002) provide a review of the field after the discovery of receptors and mechanisms that strongly supported the pattern recognition theory.

Scientific authors and size of research teams

The results of the RPYS analysis have shed some light on the epistemic culture of immunology. Citation peaks in our dataset are frequently caused by developments in methodology. But conceptual, theoretical and empirical discoveries account for 65% of the citation peaks in our period of study, 1980–2021 (citing years) and 1800–2022 (cited years).Footnote 1 Of the 14 publications that are empirical or theoretical contributions to immunology, rather than methodological contributions, four are review articles. The remaining ten are research articles.

We now want to briefly examine the constitution of the research teams responsible for the 20 contributions that are the focus of our analyses. Knorr-Cetina (1999) found that the disciplines that she studied, specifically high-energy physics and molecular biology, had very different authorship and collaboration practices. As we will see, and not surprisingly, the research teams that publish in immunology are more similar to those in molecular biology than they are to those in high energy physics, where massive research teams are the norm. It is worth examining the size of the research teams involved in these publications.

The authors of the 20 publications range from one single scientist to a team of 29 scientists.Footnote 2 Three of the papers were single-authored (that is, 15% of the papers). This is somewhat surprising, as only about 1% of the papers published in the journal Science are single-authored (see Wray & Andersen, 2018). Including the single-authored papers, the mean size of the research teams producing these 20 publications is 5.2 scientists and the median is 3. So, these publications were generally not authored by large teams. This result corresponds to the results of Wu et al. (2019), who found that small teams are better able to produce disruptive results than large teams. There are many factors affecting team size in immunology. For instance, larger teams are more effective and efficient in data gathering. It should also be mentioned that since some of the highly cited publications are reviews, the number of authors on these papers does not necessarily reflect the number of authors on the experimental articles that form the basis for the topic discussed in the reviews.

As the results in Fig. 2 show, there is an increase in the size of the teams producing these publications as we approach the present. The best-fit curve is linear. The last single-authored paper in this group was published in 1980. So, it appears that opportunities for an individual scientist to produce highly influential research in immunology has decreased. But a team of only three scientists was responsible for one of the articles published in 2007, which suggests that small teams are still able to make effective and highly impactful contributions to the field.

Fig. 2
figure 2

Number of authors (or team-size) by year of publication. Each dot represents one of the 20 selected publications. R2 = 0.2572

The two outliers in Fig. 2 are the first paper, authored by four scientists in 1950, and the final paper, authored by 29 scientists in 2020. This last paper is an example of a paper that benefited from the relatively large team size. It was the first paper describing the immune response in patients with critical COVID-19 disease, and the large team was able to generate a lot of data in a relatively short period of time (on the benefits of collaborative, see Wray, 2006). Not counting that paper, the largest research team responsible for a highly cited paper in our dataset was a team of 13. This paper was concerned with defective signaling.

Disruption index

We have calculated the disruption index (DI) for 16 of the 20 papers on our list (see Wu et al., 2019; Funk & Owen-Smith, 2017).Footnote 3 The DI score reflects the extent to which a paper disrupts the direction of research. In this study, we applied a variant of the original DI score which has been proposed by Bornmann et al. (2020). Studies have shown that the variant is better able to identify disruptive research than the original index (named as DI5 in the literature overview on DI research by Leibel & Bornmann, 2023). The DI measures the difference between the literature citing that paper (the focal paper, FP) exclusively and the literature citing not only the FP but also cited publications by that FP. More precisely:

The disruptiveness of a FP [focal paper] is determined by the extent to which the cited references of the papers citing the FP overlap with the cited references of the FP. Disruptiveness is indicated if many citing papers do not refer to the FP’s cited references. Then, it seems that the FP presents something new which is independent of the context of the FP. (Bornmann et al., 2020)

The formula for calculating the DI includes three counts of citing papers (NF, \({N}_{B}^{5}\), and NR)

$${DI}_{5}=\frac{{N}_{F}-{N}_{B}^{5}}{{N}_{F}+{N}_{B}^{5}+{N}_{R}}$$

NF is the number of citing papers that cite the FP and none of its cited references. \({N}_{B}^{5}\) is the number of citing papers that cite the FP and at least five of its cited references. NR is the number of citing papers—published after the FP—that cite the references of the FP without citing the FP itself. The resulting DI values are between − 1 (consolidating), 0 (neutral), and 1 (disruptive).

Although six papers in Table 2 have a higher DI (> 0.3), we assume that only two DIs of these papers should be trusted. For the calculation of the DI, it is a necessary condition that most of the FP’s cited references are covered in the literature database that is used. Unless this is the case, one cannot check whether the FP’s cited references are also cited references of the citing papers. Biased DIs may result from missing coverage. Since three of the FPs in our dataset have low shares of covered cited references in the WoS (less than 20% of all cited references), we abstained from interpreting the disruptiveness of these papers. There are two papers with a high and reliable DI, since they have shares of 85% and 100%. They are highlighted in bold in Table 2; one of these papers has a very high DI of 0.94 which is close to the maximum possible value (see the penultimate Column of Table 2).

Table 2 Disruption index (DI) scores for papers in Table 1

The two papers that have high DI scores are the paper by Livak and Schmittgen (2001), and the paper by Palella et al. (1998). The paper by Livak and Schmittgen presents an influential method of analysis. It is the paper with the highest DI score. The authors describe a method to conduct relative expression analysis of data from PCR analysis. The PCR is used widely in the biological sciences. The paper by Palella et al. was concerned with the effectiveness of antiretroviral therapies, and reports that “the recent declines in morbidity and mortality due to AIDS are attributable to the use of more intensive antiretroviral therapies” (Palella et al., 1998, p. 853). This article demonstrated that suppression of HIV replication restored immunological activity and the ability to control characteristic opportunistic infections associated with AIDS.

Discussion

Every scientific field is different, and in many respects. Indeed, as Knorr-Cetina (1999) has suggested, different research fields constitute different epistemic cultures. What emerges from this study of the research literature in immunology is that there are two broad types of contributions that become highly referenced in immunology. Some papers are contributions to methodology. Lowry et al. (1951), Laemmli (1970), and Taswell (1981) are examples of this type of publication. In fact, some of the methods reported in these papers have broad application outside immunology. Consequently, though they are genuine contributions to immunology, their contributions are aptly characterized as instrumental. This is not to diminish their significance, but rather to note that these papers contain no new conceptual or theoretical innovations that advanced our knowledge in immunology. On the other hand, some papers are explicit contributions to the conceptual and theoretical development of immunology. Kappler et al. (1981) and Uchiyama et al. (1981) are examples of this type of paper. Both of these papers contribute to our understanding of the role and function of T cells in an organism’s immunological responses.

Part of the motivation for our study was a hunch that Janeway (1989) was a significant paper in immunology. Our analyses show that the paper is not among the most referenced publications in 1989. But we did find that Janeway and Medzhitov (2002) was mainly responsible for one of the peaks in the spectrogram. This latter paper is, to a large degree, a follow-up on the prediction of TLR and their subsequent empirical confirmation. So our initial hunch was not so far off the mark.

This study suggests the need for, and provides the foundations for, a more comprehensive study of the research literature in immunology. Such a study would enable us to uncover other trends in the development of scientific knowledge in this field, and in a scientific field in general.