Sequence analysis of annually normalized citation counts: an empirical analysis based on the characteristic scores and scales (CSS) method

In bibliometrics, only a few publications have focused on the citation histories of publications, where the citations for each citing year are assessed. In this study, therefore, annual categories of field- and time-normalized citation scores (based on the characteristic scores and scales method: 0 = poorly cited, 1 = fairly cited, 2 = remarkably cited, and 3 = outstandingly cited) are used to study the citation histories of papers. As our dataset, we used all articles published in 2000 and their annual citation scores until 2015. We generated annual sequences of citation scores (e.g., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ {01233233221} \right\}$$\end{document}01233233221) and compared the sequences of annual citation scores of six broader fields (natural sciences, engineering and technology, medical and health sciences, agricultural sciences, social sciences, and humanities). In agreement with previous studies, our results demonstrate that sequences with poorly cited (0) and fairly cited (1) elements dominate the publication set; sequences with remarkably cited (3) and outstandingly cited (4) periods are rare. The highest percentages of constantly poorly cited papers can be found in the social sciences; the lowest percentages are in the agricultural sciences and humanities. The largest group of papers with remarkably cited (3) and/or outstandingly cited (4) periods shows an increasing impact over the citing years with the following orders of sequences: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ {0123} \right\}$$\end{document}0123 (6.01%), which is followed by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ {123} \right\}$$\end{document}123 (1.62%). Only 0.11% of the papers (n = 909) are constantly on the outstandingly cited level.


Introduction
Bibliometrics is the backbone of scientometrics; most of the studies in scientometrics are based on publication and citation data (Vinkler 2016). Bibliometrics applies statistical methods for analyzing counts of publications and citations (University of Waterloo Working Group on Bibliometrics 2016). Since the introduction of citation analysis (Garfield 1955), citations have been seen as the basic unit of impact which follow from ''votes'' of citing authors for publications (Bornmann and Marx 2014;Jha et al. 2016). ''The act of citing another person's research provides the necessary linkages between people, ideas, journals and institutions to constitute an empirical field or network that can be analysed quantitatively'' (Mingers and Leydesdorff 2015, p. 1). Many publications in bibliometrics have focused on analyzing the distributions of citations. For example, Albarrán and Ruiz-Castillo (2011) investigated 3.7 million articles published in 22 scientific fields. They found that ''citation distributions are highly skewed: About 70% of all articles receive citations below the mean, and articles with a remarkable or outstanding number of citations represent about 9% of the total' ' (p. 48). According to the results of Ponomarev et al. (2012), ''a typical citation pattern has an initial period of slow citation growth lasting from 5 to 20 months… After this initial slow growth phase, the citation rates accelerate until they reach saturation plateaus, after which they decrease''.
However, there is a gap in the literature with respect to studies analyzing citation distributions in more detail. In this study, therefore, annual categories of normalized citation scores (''poorly cited'', ''fairly cited'', ''remarkably cited'', and ''outstandingly cited'') are used to study the citation histories of papers (Glänzel and Schubert 1988). As our dataset, we use all the articles published in 2000 and their annual citation scores until 2015. We compare the sequences of annual citation scores in six broader fields (natural sciences, engineering and technology, medical and health sciences, agricultural sciences, social sciences, and humanities).

Literature overview
An early study with the focus on number of citations as a function of time was published by Vlachy (1985). The aging of information in papers (measured by synchronous or diachronous methods) have been studied by Glänzel and Schubert (1995) as well as Glänzel (1997Glänzel ( , 2004. Schubert and Glänzel (1986) introduced the so called ''response time'' which reveals the speed of receiving citation impact (see also Bornmann and Daniel 2010). They found different times between the fields.
Only a few studies have focused on the citation histories of publications, where the citations for every year are assessed (whether they are lower or higher compared to citations which other publications received in the same year). Most of these studies have dealt with specific distributions of citations. Good examples are sleeping beauties. These are papers which generate little or no citation impact over a long time period (e.g. 10 years), before they start to generate considerable impact. According to Mir and Ausloos (2016), the phenomenon of sleeping beauties is also labeled as resisted discoveries, premature discoveries, delayed recognition, or information awakening. Overviews on sleeping beauties' studies can be found in Teixeira et al. (2016) and Min et al. (2016).
Recently, the citation histories of papers have been investigated in more detail by two studies. Baumgartner and Leydesdorff (2014) explored the citation curves (1) of six journals in different fields as well as (2) in one entire field (virology) over 16 years. Basically, they found two typical curves: ''sticky knowledge claims'' continue to be cited more than 10 years after publication. ''Transient knowledge claims'' show a decay pattern after reaching an early peak. The other study by Colavizza and Franceschet (2016) investigated the Physical Review archive, covering 120 years of physics. They found the following three types of citation curve: ''(1) Marathoners: publications which start fast or slow, reach a moderate peak and keep improving the ratio of received citations, or at least keep being relevant over prolonged amounts of time by manifesting a slow decline or a plateau. Marathoners in effect tend to age slowly, or not at all, and are also more numerous and varied than sprinters. (2) Sprinters: publications with fast, even extremely fast and high peak, and equally rapid ageing. These publications are immediately relevant for their community, and rapidly forgotten thereafter, and are fewer in number in the APS dataset.
(3) Middle-of-the-roads: publications with a citation history close to the global average citation history, that is, a fast but moderately peaking curve with a gradual decay over time'' (p. 1043).

Field normalization of citation impact
This study uses standard impact scores in bibliometrics, namely field-and time-normalized citation impact scores (in a dynamical variant) (Vinkler 2010). These dynamically normalized impact counts (DNIC) are defined as where i = 1, 2,… are publications, j = 1, 2,… are citing years, and f = 1, 2,… are fields.
Here, field delineations based on disciplinary OECD minor codes are used. The OECD field definitions can be found at http://www.oecd.org/science/inno/38235147.pdf. We selected the 2 digit level scheme. C ij denotes citations received by publication i in year j, and E fj denotes mean (received) citations of all publications in field f and year j (i.e. E fj is the expected value). N fj is the number of cited publications in field f and year j (N fj is based on non-zero citations), and f = f(i) means a certain field of a given publication. The indicator follows the standard approach in bibliometrics with both field-and time-normalized citations (Waltman 2016). The difference from the standard approach in bibliometrics is that the calculation is based on annual citations, and not on the citations between publication year and a fixed time point later on.
If C ij = 0, then DNIC ij = 0. If DNIC ij [ 1, the citation impact of the publication is higher than the average in the corresponding OECD disciplinary category and (cited as well as citing) publication years. If DNIC ij \ 1, the impact is lower than the average. Glänzel and Schubert (1988) introduced the characteristic scores and scales (CSS) method for grouping ranked observations into rank-specific categories (see also Glänzel 2007Glänzel , 2010Glänzel , 2011. Consider a set of n papers. The observed citations X i received by paper i are ranked in descending order, X Ã 1 ! X Ã 2 ! . . . ! X Ã n , where X 1 * and X n * denote the citations of the most and least frequently cited papers, respectively. Set the initial values b 0 = 0 and v 0 = n, where n is the number of papers. b 1 is defined as the mean citations; v 1 is defined by the comparison X Ã v1 ! b 1 and X Ã v1þ1 \b 1 . This comparison is repeated, yielding

Classifying of publications using the CSS method
Thus, we obtain series b 0 B b 1 B … and v 0 C v 1 C …. The kth class is defined by the pair of threshold values [b k-1 , b k ]; the number of papers belonging to this class amounts to v k-1v k .
The CSS method can be used to classify the papers within certain fields into four impact classes: ''poorly cited'', ''fairly cited'', ''remarkably cited'', and ''outstandingly cited''. Then, for example, the share of outstandingly cited papers can be determined for a set which includes papers from different fields (e.g. all papers published by a university). However, the method can not only be used to classify single papers, but also to certain aggregates of papers. For example, Bornmann and Glänzel (2017) propose using the CSS method to classify the universities in a specific ranking (e.g. the Leiden ranking) into performance classes (e.g. based on the number of highly-cited papers). The universities can then be separated into low and high performers.
In this study, we use the CSS method for classifying the papers into four citation impact classes based on DNIC ij . Thus, we do not use the citation counts of single papers, but the annual field-and time-normalized scores for the classification. Consider the set DNIC ij È É of n papers published in various disciplines. We used the OECD major codes to compare the results of six broad disciplines: natural sciences, engineering and technology, medical and health sciences, agricultural sciences, social sciences, and the humanities. The broad disciplines are aggregates of OECD minor codes.
In each discipline and across disciplines, the DNIC ij scores (of paper i in a given year j) are ranked in descending order (DNIC Ã 1 ! DNIC Ã 2 ! . . . ! DNIC Ã n ) j . The comparison between DNIC and b is defined by Then, the pair of threshold values [b k-1 , b k ] forms the impact class. Using the CSS method, the annual categorization of papers to citation impact classes is based therefore on the annual DNIC scores. The values of the annual DNIC scores are kept with min k C 2, 3, …, respectively, which means k C 2, 3, … in every year after the publication year. Since the values k = 2 and k = 3 are usually used to identify highly cited papers (Glänzel 2011), we set k C 2 as ''fairly cited'' papers, k C 3 as ''remarkably cited'' papers, and k C 4 as ''outstandingly cited'' papers in the long run.

Sequence analysis of annual CSS scores
In a yearly time series j = 1, 2,…, m, the annual CSS scores k of each publication form a sequence across 16 years (starting in 2000). In other words, we have a sequence of 16 scores for every publication with values between 0 = poorly cited and 4 = outstandingly cited. Two examples of sequences are shown in Fig. 1. Sequence a f g is 01233233221 f g and sequence b f g is 01001000100 f g . a f g indicates a highly cited publication (most of the time) and b f g a constantly little cited or non-cited publication. The statistical analyses of the data in the current study are based on the strategy proposed by Brzinsky-Fay et al. (2006) for the analysis of sequence data. Sequence data is analyzed in many research fields, e.g. DNA sequences in biology and life courses in social sciences. ''A sequence is defined as an ordered list of elements, where an element can be a certain status (e.g., employment or marital status), a physical object (e.g., base pair of DNA, protein, or enzyme), or an event (e.g., a dance step or bird call). The positions of the elements are fixed and ordered by elapsed time or by another more or less natural order'' (Brzinsky-Fay et al. 2006, p. 435).

Dataset used
The bibliometric data used in this study is from an in-house database developed and maintained by the Max Planck Digital Library (MPDL, Munich) and derived from the Science Citation Index Expanded (SCI-E), Social Sciences Citation Index (SSCI), and Arts and Humanities Citation Index (AHCI) prepared by Clarivate Analytics, formerly the IP & Science business of Thomson Reuters. The study is based on 790,698 articles published in 2000 and the corresponding citations across 16 citing years (with 2000 as the first citing year). Since many papers have been assigned to more than one OECD minor code, 161,302 papers appear between two and six times in the dataset (435,634 papers have no duplicates). We decided to let the papers appear multiple times in the dataset, since the papers might have different citation distributions in the disciplines. Table 1 shows the number of annual CSS categories in the dataset. Since we included 790,698 articles with 16 annual citation scores each in the study, the study is based on 12,651,168 annual CSS categories.

Descriptive statistics
The sequence analyses which we describe in the ''Sequence analysis'' section are based on several transformations of the original raw data from the MPDL in-house database. In order to reveal the relations between the raw data and the transformed (field-and timenormalized) data, Table 2 shows annual citations, annual normalised citation scores (DNIC), and sequences of CSS scores for some example papers. Table 2 tries to demonstrate the spectrum of different citation impact histories in the dataset. Group (1) in the table consists of papers with increasing citation impact over the citing years. The citation impact of the papers in group (2) is more or less stable over the years. Decreasing and fluctuating histories, respectively, are shown under group (3) and (4) in the table. The WoS accession numbers listed can be used to inspect the paper and its citations in WoS in more detail.
The CSS method was initially proposed by Glänzel and Schubert (1988). Since then, the method has been used in various contexts to classify single papers or aggregates of papers as ''poorly cited'', ''fairly cited'', ''remarkably cited'', and ''outstandingly cited'' (Albarrán and Ruiz-Castillo 2011;Bornmann and Glänzel 2017;Glänzel 2007Glänzel , 2010Glänzel , 2011Li et al. 2013). Although the studies were based on different bibliometric datasets, the distributions seem to follow (more or less) a general distribution pattern of percentages: 70% (poorly cited)-21% (fairly cited)-7% (remarkably cited)-2% (outstandingly cited). In addition, similar distribution patterns are reported by Chi and Glänzel (2016) in the context of usage counts. Table 3 presents distributions of ''poorly cited'', ''fairly cited'', ''remarkably cited'', and ''outstandingly cited'' papers in the six disciplines which we considered in our study. The statistics in the table refer to CSS scores across 16 citing years (beginning in 2000). For example, the mean percentage of poorly cited papers in natural sciences is 70.57% across 16 citing years; the lowest percentage is 66.21% and the highest is 77.49%. The range between the minimum and maximum percentages is 11.28 points. The comparison of the percentages in Table 3 with the general distribution pattern of percentages (70-21-7-2%) reveals that natural sciences, engineering and technology, medical and health sciences, and agricultural sciences are more similar to the general distribution pattern than the social sciences and the humanities. However, the largest variability of the percentages over the years can be observed for the agricultural sciences (see the ranges in Table 3).
Similar field-specific differences in distributions of CSS scores are also reported by Glänzel (2011) and Albarrán and Ruiz-Castillo (2011).  ,4,6,6,8,9,13,8,9,7,11,9 0,0,0,0.64,0.30,1.2,1.6,1.6,2,2.2,3.1,1.9,2.1,  Sequence analysis Table 4 shows the most frequent sequences of CSS scores in the dataset and their prevalence in natural sciences, engineering and technology, medical and health science, agricultural sciences, social sciences, and humanities. We made a cut at 0.5% which means that only sequences are listed in the table with a percentage of at least 0.5 in the dataset of all publications. In order to compare disciplinary differences between the same set of   The analysis is based on four categories: poorly cited (0), fairly cited (1), remarkably cited (2), and outstandingly cited (3). However, the most frequent sequences consist of only poorly cited (0) and fairly cited (1) elements Scientometrics (2017Scientometrics ( ) 113:1665Scientometrics ( -1680Scientometrics ( 1673 sequences, the selected 17 sequences from the total set are listed for all disciplines (although other sequences might meet the threshold of 0.5% in single disciplines).
In accordance with the prevalence of skewed citation distributions in the sciences and the dominance of non-cited and little cited papers, the list of sequences in Table 4 only contains two CSS scores: 0 = poorly cited and 1 = fairly cited. Thus, in the set of all papers (and also in most of the disciplines), sequences with 3 = remarkably cited and 4 = outstandingly cited are rare (less than 0.5%). Figure 2 shows the sequences in the dataset as sequence index plots. Whereas Table 4 focusses on the most frequent sequences, all sequences are included in Fig. 2. The plots show a horizontal line for each sequence, distinguishing the CSS scores with different colors (Brzinsky-Fay et al. 2006). Similarly to Table 4, Fig. 2 demonstrates that the group of sequences with constantly poorly cited elements is the biggest group at the top of the plots. Below this biggest group, we can observe those sequences which are commonly labeled as sleeping beauties. This is a relatively small set of papers which are poorly cited initially and remarkably or outstandingly cited in later years. Another group of papers (sequences) is also clearly visible in Fig. 2. These papers are poorly cited most of the time with a short interruption of a fairly cited period (mostly 1 year). The probability of interruption in early years is higher than in later years in all disciplines. This is especially visible for the agricultural sciences and social sciences, where a large red bar is visible in the second year after publication (see the corresponding higher percentages for these disciplines in Table 4). At the bottom of all plots, the small set of constantly outstandingly papers is visible.
With regard to the differences between the disciplines, Table 4 shows that the social sciences are the discipline with the highest percentage of constantly poorly cited papers (29.59%). The lowest percentages are in the agricultural sciences (18.58%) and humanities (19.59%). Thus, here is a large difference between the social sciences and the humanities (although they are frequently treated together in bibliometrics). However, both disciplines show similar results, if we look at the horizontal ''Total'' line in Table 4. Both disciplines have the highest percentages, which mean that the sequences are more highly concentrated than those in other disciplines. This might be partly an effect of the lower number of sequences. However, agricultural sciences also have a relatively low number of sequences, but the concentration of sequences is significantly lower than in the social sciences and the humanities.
In order to obtain a better overview of the sequences in the dataset, two further analyses have been done. The analyses condense the sequences still further. The first condensation which is shown in Table 5 treats CSS scores identically if they consist of the same elements. That means the sequence 2112 f g is treated the same as 1222 f g because both sequences consist of the CSS scores 2 and 1 only. The results in Table 5 refer to the complete dataset and are not restricted to the most frequent sequences unlike the results in Table 4. The results in Table 5 confirm the results in Table 4 and Fig. 2. About a quarter of the sequences consist of constantly poorly cited papers 0 f g. However, the largest group of sequences 01 f g is that which includes poorly cited and fairly cited periods (46.85%). This group of papers is especially dominant in the humanities with 64.35%. There is a third large group of sequences (19.43%) in Table 5 012 f g which includes poorly cited, fairly cited, and remarkably cited periods. This group contains about 20% of the papers in all disciplines except one: in the humanities, only 11.82% of the papers have these three elements.
The results in Table 5 allow a closer look at the sequences which include outstandingly cited periods (3). The largest group of papers with such a period is 0123 f g(6.01%), which is followed by 123 f g (1.62%) in the table. Only 0.11% of the papers (n = 909) are constantly on the outstandingly cited level over a period of 16 years. Most of these papers have been published in the natural sciences (n = 417) and medical and health sciences   The analysis is based on four categories: poorly cited (0), fairly cited (1), remarkably cited (2), and outstandingly cited (3) Table 6 Most frequent sequences with elements in the same order in the dataset (at least 0.5%) and their prevalence in six disciplines The analysis is based on four categories: poorly cited (0), fairly cited (1), remarkably cited (2), and outstandingly cited (3). However, the most frequent sequences consist of only poorly cited (0) and fairly cited (1) elements (n = 383). There is only one such paper in the humanities and 6 such papers in agricultural sciences. Constant performers on the level of fairly cited (1) or remarkably cited (2) are very rare in the dataset. In total, only 37 papers are constantly fairly cited and 3 papers constantly remarkably cited. The second condensation which is shown in Table 6 treats identically all sequences that have the same order of CSS scores. That means the sequence 2112 f gis treated the same as 211112 f gbecause the CSS scores appear in the same order in both sequences (first 2, then 1, and then 2 again). The sequences which are shown in Table 6 are restricted to those with at least 0.5% of the papers in the dataset-similar to Table 4. Again, the results in Table 6 reveal that about a quarter of the papers are constantly poorly cited (with a significantly higher percentage in the social sciences). 13.9% of the papers have a sequence with initially increasing citation impact (from 0 to 1) and then decreasing (from 1 to 0). For 8.66 and 5.51% of the papers the 010 f g sequence order is followed by a 10 f g and 1010 f g sequence.
In Table 6, remarkably cited or outstandingly cited periods do not play any role. Their occurrences are too low in general.

Discussion
In recent years, a development has become apparent in bibliometrics for citation impact no longer to be reduced to the times cited information, but analyzed more specifically. For example, the citation context is considered in the bibliometric analyses to have more specific information on the impact of publications and how cited publications are perceived (Small et al. 2017). Carroll (2016) takes into account ''the frequency with which the paper is cited within citing publications … adding depth and value to the citation metric'' (p. 1329). The results of Hu et al. (2015) show that successive citations in papers are more intentional and reasonable than first-time citations-if papers are cited multiple times in a paper. The ''Literature overview'' section in this paper presents some further studies which take a closer look at citations by investigating the citation history of papers.
In this study, we used a method for the analysis of citation distribution which has never been used before in bibliometrics (to the best of our knowledge). Based on annually normalized citation scores, we generated annual sequences of CSS scores (e.g. 01233233221 f g ) which we analyzed using the strategy proposed by Brzinsky-Fay et al. (2006). This strategy allows the identification of very frequent and less frequent sequences over the complete publication set and disciplinary sets. In agreement with previous studies, our results demonstrate that sequences with poorly cited (0) and fairly cited (1) elements dominate the publication set; sequences with remarkably cited (3) and outstandingly cited (4) periods are rare. The highest percentages of constantly poorly cited papers can be found in the social sciences; the lowest percentages are in the agricultural sciences and humanities. The largest group of papers with remarkably cited (3) and/or outstandingly cited (4) periods shows an increasing impact over the citing years with the following orders of sequences: 0123 f g (6.01%), which is followed by 123 f g (1.62%). Only 0.11% of the papers (n = 909) are constantly on the outstandingly cited level. These might be the few papers which significantly drive scientific progress (Rodríguez-Navarro 2016).
This study was a first attempt to use sequence analyses with bibliometric data. We think that this statistical approach can lead to interesting insights in citation histories. The application of this approach can be further extended beyond the analyses in our study. For example, a focus of future research could be on the comparison of sequences and the measurement of differences between two sequences. According to Brzinsky-Fay et al. (2006), the so-called Levenshtein distance has been used for comparisons in various fields, such as plagiarism detection and the analysis of DNA sequences. The Levenshtein distance quantifies the distance between two sequences. Another topic for future research could be possible explanations of differences between sequences. Distance measures between two sequences could be included as dependent variables in regression models, which are then explained by various characteristics of the publications (e.g., their subject category, country of origin, or reputations of authors).