Modeling the obsolescence of research literature in disciplinary journals through the age of their cited references

Dorta-González, Pablo; Gómez-Déniz, Emilio

doi:10.1007/s11192-022-04359-w

Modeling the obsolescence of research literature in disciplinary journals through the age of their cited references

Open access
Published: 19 April 2022

Volume 127, pages 2901–2931, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientometrics Aims and scope Submit manuscript

Modeling the obsolescence of research literature in disciplinary journals through the age of their cited references

Download PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

There are different citation habits in the research fields that influence the obsolescence of the research literature. We analyze the distinctive obsolescence of research literature in disciplinary journals in eight scientific subfields based on cited references distribution, as a synchronous approach. We use both negative binomial (NB) and Poisson distributions to capture this obsolescence. The corpus being examined is published in 2019 and covers 22,559 papers citing 872,442 references. Moreover, three measures to analyze the tail of the distribution are proposed: (i) cited reference survival rate, (ii) cited reference mortality rate, and (iii) cited reference percentile. These measures are interesting because the tail of the distribution collects the behavior of the citations at the time when the document starts to get obsolete in the sense that it is little cited (used). As main conclusion, the differences observed in obsolescence are so important even between disciplinary journals in the same subfield, that it would be necessary to use some measure for the tail of the citation distribution, such as those proposed in this paper, when analyzing in an appropriate way the long time impact of a journal.

Understanding the Scientific Enterprise: Citation Analysis, Data and Modeling

A field- and time-normalized Bayesian approach to measuring the impact of a publication

Article Open access 17 April 2024

Modelling citation networks

Article 05 September 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The use of research publications decreases with time and age of literature. In library and information science, this phenomenon is known as literature obsolescence. This refers to a decrease in its frequency of use or citation, but not to its value as a source of knowledge.

The obsolescence in the research literature can be measure through the citation analysis technique. This methodology supposes a relationship between the cited document and the citing one. Authors increasingly cite current documents in detriment of older, which can be consider obsolete because they stop being cited (used).

Studies on modelling literature obsolescence go back to the sixties of last century. De Solla Price (1965) postulated that the use of literature declines with time according to a negative exponential distribution, although other authors suggested a lognormal distribution as the most suitable to measure literature obsolescence (Egghe & Ravichandra Rao, 1992; Gupta, 1998).

There are two methods to study obsolescence: synchronic (retrospective) and diachronic (prospective). Synchronic analysis is based on references made, while diachronic analysis is based on citations received. In synchronic obsolescence analysis the half-life is the statistical median considering the years of the references in reverse chronological order. Diachronic obsolescence requires setting a period of time in the past to look into the future. Both methods can also be conducted retrospectively (Egghe & Rousseau, 2000). However, some authors argue that synchronic and diachronic studies produce similar results (Stinson & Lancaster, 1987), suggesting a preference for the synchronic method.

Although it is possible to estimate literature’s half-life and even the time when the literature is not cited anymore, it is not possible to know the reasons for this. The main factor pointed out in the literature to justify the different obsolescence patterns is the diffusion process of knowledge in the different fields. Two main types of obsolescence occur related to the diffusion process. Both types are related to the rate at which diffusion occurs. In the first place, initially, there is a high number of citations until a modal value is reached, followed by a rapid drop in them, a drop that is also very sharp, as occurs, for example, in the fields of Medicine and Engineering. The second type is due to a slower diffusion process, associated with a slower rate of decline. Examples of this behavior in obsolescence are the basic sciences, social sciences, and humanities (Bouabid & Larivière, 2013; Cano & Lind, 1991).

The objective of this work is to measure the obsolescence of research literature in disciplinary journals. We present data and results on the obsolescence of eighty disciplinary journals over a long time period.

We analyze the distinctive obsolescence of research literature in disciplinary journals in eight scientific subfields based on cited references distribution, as a synchronous approach. We use both Negative Binomial (NB) and Poisson distributions to capture this obsolescence. The corpus being examined is published in 2019 and covers 22,559 papers citing 872,442 references. Moreover, three measures to analyze the tail of the distribution are proposed: (i) survival rate, (ii) mortality rate, and (iii) percentile.

These measures of the distribution tail collect the behavior of the citations at the time when the document starts to get obsolete in the sense that it is little cited (used). The theoretical framework is described in the following section.

Theoretical framework in obsolescence of the research literature

Since Gross & Gross (1927) introduced the concept of obsolescence, that is, the phenomenon that research publications are decreasingly used over time, obsolescence and its law have become an important topic in bibliometrics and scientometrics.

In library and information sciences, the term half-life first appeared in the work by Burton & Kebler (1960). These authors postulated that literature becomes obsolete and half-life means the time during which one-half of the currently active literature was published.

The use of literature declines with time according to a negative exponential distribution (De Solla Price, 1965). In this respect, Ewing (1966) carried out a diachronic study of research articles in chemistry and found that the number of citations decreased as the year of publication was closer to the current year. This author pointed to the growth of the literature as a factor that influences the measurement of the obsolescence ratio.

Disambiguating the term half-life, Line (1970) asserts that literature half-life should be composed by the obsolescence rate and the literature growth rate. Line and Sandison (1974) defined obsolescence as the decrease or fall into time of the validity or utility of information. However, Brookes (1970) stated that the theoretical problem of measuring obsolescence rate when literature is growing is more complex that the method proposed by Line.

Citation density decreases exponentially with age (Gupta, 1990). However, literature obsolescence can also be influenced by unknown factors. Thus, Egghe and Ravichandra Rao (1992) demonstrated that the obsolescence factor defined by Brookes (1970) is not actually a constant, but a statistical function of time. This is because citation data is not exponentially distributed as Brookes stated. In general, citation distribution presents an initial growth followed by an exponential decline. These authors stated that lognormal distribution is the model that best describes both the initial growth of citations and the subsequent decline.

About the combination of obsolescence and growth, both phenomena can be studied with the same mathematical function (Egghe & Ravichandra Rao, 1992). In the synchronic case, obsolescence increases with literature growth; in the diachronic case, the effect is opposite. In this respect, Van Raan (2000) argues that in the initial moments of any discipline much less published documents exist than in later years. Thus, citation distribution according to years always has a combination of aging and literature growth phenomena.

Burrell (2002) concluded that the lognormal distribution model shows more success describing the synchronic citations of documents, confirming previous studies such as that of Egghe and Ravichandra Rao (1992). These authors also observed that lognormal distribution describes and fits very well to the age of the first references (Egghe & Ravichandra Rao, 2002).

Some studies compare obsolescence patterns across disciplines (Larivière & Archambault, 2008; Song et al., 2015; Zhang & Glänzel, 2017a, b). The main factor pointed out in the literature to justify the different obsolescence patterns is the diffusion process of knowledge (Bouabid & Larivière, 2013; Cano & Lind, 1991; Small & Crane, 1979). Some authors developed a mathematical model to describe this pattern (Barnett et al., 1989).

An overview of the literature aging is offered by Glänzel (2004), presenting the different aspects that can be analyzed by both methods, synchronic and diachronic. More recently, studies and discussions on this matter continued with aspects as the mathematical model (Bouabid, 2011; Wallace et al., 2009), obsolescence measures (Bouabid and Larivière, 2013, Zhang & Glänzel, 2017a, b), and its influencing factors (Wang et al., 2019).

Empirical data

The Scopus database was used as data source for the empirical application. We considered the following eight subject categories (subfields) in this database, three of them from science, two from social science, one from health science, one from engineering, and one from humanities: Cell Biology, Economics & Econometrics, Electrical & Electronic Engineering, General Chemistry, General Medicine, General Physics & Astronomy, History, and Library & Information Sciences.

We decided a priori to take eight subfields. This number was set so that both figures and tables could be displayed in the paper. The subfields were selected based on the previous experience of the authors and trying to cover disciplines as diverse as possible.

For each subfield, ten disciplinary journals were randomly selected from the top 10% of most cited journals in the subfield measured through the Scopus CiteScore. Note that there is great variability in the number of journals per subfields (size of the subject categories). Some of them are very large, with more than 300 journals. For this reason, only ten journals were randomly selected from the top 10% in relation to the CiteScore.

Then, for each journal, all documents published in 2019 and catalogued in this database as research articles were collected. Finally, for each research article, the years of publication from all cited references under the age of 150 were downloaded.

All the journals used here and its abbreviated title together with its corresponding subject category are shown in Table 4 (see the “Appendix”).

Methodology

Most of the studies in aging of cited references are based on the fact that the distribution of age for the cited references is a continuous distribution. The exponential distribution is considered the first assumption as a simple case of working. In this case see for instance (Egghe & Ravichandra Rao, 2002) is considered that the number of citations received at time t is $c(t)=\theta \exp (-\theta t)$, $t\ge 0$, $\theta >0$, for which the obsolescence (aging) factor a results $a=c(t+1)/c(t)=\exp (-\theta )$ which is independent of t.

Here, we do not use the concept of obsolescence distribution function and will consider that $X_1,X_2,\dots ,X_t$ are independent and identically distributed random variables with values in ${{\mathcal {X}}}\in {{\mathbb {N}}}$. They represent the number of cited references in a journal or a collection of journals in a subject category in the last t years. That is, for a given sample of documents, $X_i$ represents the number of cited references with an age of i years. Following to Burrell (2002) we are going to assume that $X\equiv X_i$ initially follows a Poisson distribution with mean $\theta \in \Theta >0$. That is,

$$\begin{aligned} \Pr (X =x|\theta )=\frac{\theta ^{x}}{x!}\exp (-\theta ),\quad x=0,1,\dots \end{aligned}$$

(1)

In practise the citation of papers empirically seems to decrease with the years and this behaviour can be modelled by the Poisson distribution. Nevertheless, the Poisson distribution shows equi-dispersion (i.e. the variance is equal to the mean), which would make it inappropriate for defining the random variable X, an event that empirical studies have shown presents over-dispersion (i.e. the variance is greater than the mean). See Table 1 where the index of dispersion ($\text{ ID }=var(X)/E(X)$) is shown together with other descriptive statistics of the empirical data used here.

Table 1 Some descriptive statistics of the subject categories empirical data

Full size table

In addition to the above, there is evidence that over-dispersion is related to the heterogeneity of the population of subject categories. In this case, the parameter $\theta$ can be considered a random variable that takes different values between different journals in the subject category, reflecting uncertainty about this parameter and varying from individual to individual according to a probability density function. Here, we assume that this parameter follows a gamma distribution with shaper parameter $\alpha >0$ and rate parameter $\beta >0$, i.e.

$$\begin{aligned} \pi (\theta ) = \frac{\beta ^{\alpha }}{\Gamma (\alpha )} \theta ^{\alpha -1}\exp (-\beta \theta ),\quad \theta >0. \end{aligned}$$

(2)

In practise, other mixing distribution beyond the gamma, as the inverse Gaussian distribution, could be considered here. Thus, after using (1) and (2) we get that the unconditional distribution $\Pr (X=x)=\int _{\Theta }\Pr (X=x|\theta )\pi (\theta )\,d\theta$ results a negative binomial distribution, $X_t\sim NB(\alpha ,1/(\beta +1))$. In this case, the obsolescence factor is given by $a(t)=\beta /(\beta +1)(1+(\alpha -1)/(t+1))$. It is well-known that for mixed Poisson distributions the variance is always greater than the mean.

Cited reference survival rate and cited reference mortality rate

The survival function describes the proportion of cited documents beyond a given value x, thus the probability that the document will survive beyond x. Survival is understood here in the sense that the document continues to be cited, if it has been until reaching the number of citations x. For the negative binomial distribution a closed-form expression for the survival function is written in terms of the regularized generalized incomplete beta function, $I_z(a,b,c)$ (see Johnson et al., 2005, p.45) and is given by,

$$\begin{aligned} R(X)=\Pr (X\ge x)=I_{\beta /(\beta +1)}(1,\alpha ,x). \end{aligned}$$

The mortality function (also known as hazard function) is defined as $\uplambda (x)=\Pr (X=x)/R(x)$. In our setting, represents the probability, per unit of x, that the document ceases to be cited just after reaching the number of citations x. Distributions with decreasing mortality rates have heavy tails. Distributions with increasing mortality rates have light tails.

Cited reference percentile: VaR and TVaR

$\text{ VaR}_p(X)$ is the value of $x_p$ such that $\Pr (X>x_p)=p$, $0<p<1$, i.e. the probability than the number of cites will exceed the VaR$_p$%. Therefore, it is the 100pth percentile of the random variable X.

Value at Risk (VaR) has become the standard risk measure used to assess risk exposure in financial and actuarial issues. From the bibliometric point of view, we can make a simile considering the risk of the fact that a publication becomes obsolete. A researcher with high expectations regarding publications will prefer a disciplinary journal that becomes obsolete as late as possible. The survival function allows us to calculate the probability that the number of citations is greater than a specific value, say $x_p$. On the other hand, the VaR allows calculating the number of citations for which said probability is precise p.

Suppose a decision maker, for example, a possible researcher interested in a certain disciplinary journal for publishing a paper. In that case, it should opt for the one with a fixed probability p maximizes the value $x_p$. That is, between two disciplinary journals, say A and B, that produce the values at risk $x_p^A$ and $x_p^B$ for the exact value of p, respectively, the researcher should opt for the one with the highest value. This option will guarantee less obsolescence of the corresponding disciplinary journal. The choice of p here is crucial and will depend on the consideration of each researcher.

The measure VaR is merely a cutoff point and does not describe the tail behavior beyond the VaR threshold. The Tail-value-at-risk (TVaR) is a measure that is in many ways superior than VaR by reflecting the shape of the tail beyond VaR threshold. The tail-value-at-risk at the 100pth security level, denoted by TVaR$_p(X)$, is the expected number of citations on the condition that the random variable X exceeds the 100pth percentile of X. That is,

$$\begin{aligned} \text{ TVaR}_p(X)= E(X|X>x_p)=x_p+\frac{\sum _{x=x_p}^{\infty }(x-x_p)\Pr (X=x)}{R(x_p)}. \end{aligned}$$

Some computations see (Klugman et al., 2008, Chap. 6) provide the TVaR for the NB distribution which is given by,

$$\begin{aligned} \text{ TVaR}_p(X) = \frac{(x_p+1)\,_2F_1(1,x_p+1+\alpha ,1+x_p,\beta /(\beta +1))}{ _2F_1(1,x_p+1+\alpha ,2+x_p,\beta /(\beta +1))}. \end{aligned}$$

Numerical experiments

Parameter estimates for the subject categories and the disciplinary journals by using the Poisson and negative binomial distributions are illustrated in Tables 5 and 6 (“Appendix”), respectively. We have used the Akaike’s information criterion (AIC) as a measure of the model selection. A lower value of this measure of model selection is desirable. It is observable that the negative binomial distribution fits the data well, which is confirmed looking at the Fig. 1.

The improvement in the fit of the NB model compared to the Poisson model is remarkable in all subject categories and journals analyzed. In the case of the subject categories (see Table 5 in the “Appendix”), the reduction in the AIC for the NB model compared to the Poisson model is greater than 37% for all categories, even reaching 67% in the case of History. Furthermore, the variability in the parameters within each model reflects the differences observed both between the different subfields (Table 5) and between the journals within the same subfield (see Table 6 in the “Appendix”).

The field effect can be seen better in Fig. 1. The upper graph of this Figure shows the Poisson probability function and the negative binomial probability function in the lower part. For that, we have used the estimated parameters that appear in Table 5. It is observed that the latter better captures the pattern followed by the field that appears in the empirical data. Furthermore, the NB distribution approaches zero much more slowly, thus showing a much heavier right tail than the Poisson distribution. Within each model, the differences between the distributions are considerable. According to the NB model, which offers a better fit to the empirical data, the differences are clearly observed both in the peak and in the tail of the distribution. The subfield with the heaviest tail, and therefore with the least obsolescence, is History. On the contrary, the subfields with the lightest tails, and therefore where the obsolescence is greater, correspond to Electrical & Electronic Engineering, General Chemistry, General Medicine, and Cell Biology.

However, it is sometimes not obvious to compare the citation distribution for two different fields. This is because the curves often intersect each other on one or more occasions. It would therefore be convenient to use some measures for the tail of the citation distribution, which is the part of the curve that best defines the obsolescence of the literature in a field. Since the accuracy obtained with the NB distribution is much better than that achieved with the Poisson distribution, from now on, we will limit ourselves to offering subsequent numerical results only for the first one. The accuracy obtained with the NB distribution is confirmed looking at the Figs. 2 and 3. The empirical application of the obsolescence measures proposed in this paper is shown in Tables 2 and 3, and 7, 8, and 9 (see “Appendix”).

Cited reference survival rate

As previously stated, the tail of the distribution allows to measure the obsolescence in a field. As can be seen in Tables 2, and 7 and 8 in “Appendix”, there are very important differences between subfields in relation to the tail of the distribution. In the case of the subject categories (Table 2), the one with the highest survival rate in the cited references is History. Above 41% of the cited references in the journals of History are over twenty years, and about 10% are over fifty. The second highest survival rate is achieved in General Physics & Astronomy. Above 25% of the cited references in the journals of General Physics & Astronomy are over twenty years, and about 2.6% are over fifty.

The subject categories with the lowest survival rates are Electrical & Electronic Engineering and General Medicine. Only 8% of the cited references in Electrical & Electronic Engineering are over twenty years, while 0.1% are over fifty. In the case of General Medicine, only 9% of the cited references in the journals of this category are over twenty years, and 0.1% are over fifty.

Therefore, the obsolescence in Electrical & Electronic Engineering is the order of five times higher than the obsolescence in History at medium time period (twenty years) but the order of eighty times higher at long time period (fifty years). Note that this proportion increases exponentially over time. Thus, for example, after a hundred years this proportion is 7000–1, that is, the survival rate for Electrical & Electronic Engineering is approximately 7000 times lower than that of History.

Although two subfields very far in their citation habits (Electrical & Electronic Engineering and History) have been compared, important differences can also be observed between subject categories of the same branch of knowledge. This is the case, for example, in science between General Physics & Astronomy and General Chemistry. In General Physics & Astronomy, over 25% of the citations are directed to publications over twenty years. This percentage is reduced to 11% in General Chemistry, which is less than half that in General Physics & Astronomy. On the other hand, these proportions again increase exponentially over time. Thus, for example, after fifty years, the relationship is approximately 7–1, that is, the survival rate in General Chemistry is 7 times lower than in General Physics & Astronomy. However, at a hundred years this proportion is approximately 50–1, with a survival rate for General Chemistry being around fifty times lower than for General Physics & Astronomy.

These differences between subfields, although with lower proportions, can also be observed between other subfields and even between journals of the same subject category (see Tables 7 and 8 in the “Appendix”). Although the subject category is the smallest disaggregation set in relation to the field in the Scopus database, it is common for the same subject category to include very different specialties in their referencing habits. As an example, we can analyze the tail of the distribution for two different specialties within the Library & Information Sciences category. The journal Scientometrics clearly shows less obsolescence than Journal of Information Science. While the survival rate after twenty years for the first journal is around 1.7 times that the second one, the survival rate after sixty years is around 22 times higher for Scientometrics, and around 300 times higher after a hundred years.

Something similar happens with journals in other subject categories. In General Medicine, for example, the obsolescence in New England Journal of Medicine is much lower than that of PLoS Medicine. While the survival rate after twenty years for the first journal is around 1.5 times that the second one, the survival rate after sixty years is around 5.5 times higher for the first journal, and around 32 times higher after a hundred years.

Cited reference mortality rate

As previously evidenced, the survival rate decreases rapidly as the age of the cited reference increases. However, the mortality rate (hazard rate) takes on more stable values. It can be observed in Table 2 that all the analyzed subject categories present increasing mortality rates, which indicates that their distributions have light tails. The tails will be lighter the higher their mortality rates are. On the contrary, they will be less light the lower their mortality rates are. Thus, History has the lowest mortality rates (around 0.04). Although the mortality rate increases slightly over time, the range of variation is quite narrow (between 0.0455 and 0.0476). With a mortality rate that is also low (around 0.07) is General Physics & Astronomy. However, the highest mortality rates (greater than 0.12) are reached in Cell Biology, General Medicine, and Electrical & Electronic Engineering.

Again there are important differences between subject categories. For example, as extreme cases, the mortality rates in Electrical & Electronic Engineering are about three times those in History. Within the same branch, the differences are minor, although they remain remarkable. Thus, the mortality rates in General Chemistry are approximately 50% higher those in General Physics & Astronomy.

If we again compare the journals Scientometrics and Journal of Information Science (Table 8 in the “Appendix”), the mortality rates of the cited references in the second journal are approximately 50% higher than those in Scientometrics. In General Medicine, the mortality rates of the cited references in PLoS Medicine are between 25% and 29% higher than those in New England Journal of Medicine.

Cited reference percentile: VaR and TVaR

A lower obsolescence, this is a slower aging, is associated with higher values for a given percentile (see Tables 3 and 9 in the “Appendix”). Of the subject categories analyzed, the one that presents a minor obsolescence attending to the percentiles, this is a slower aging of the literature, is History. Attending TVaR, a 5% of the cited literature in History is more than 65 years, while 1% is over 98. At a distance is General Physics & Astronomy, where 5% of the cited literature is over 42 years, and 1% is over 63.

On the contrary, a higher obsolescence, that is a faster aging, is associated with lower values for a given percentile. Of the subject categories analyzed, those that present a major obsolescence, this is a fast aging of the literature, are Electrical & Electronic Engineering, General Medicine, Cell Biology, and General Chemistry. Thus, a 5% of the cited literature in Electrical & Electronic Engineering is over 24 years, while 1% is over 36.

At the level of disciplinary journals (Table 9 in the “Appendix”), in general the differences within the same subject category are less than between journals from different categories. As an example of two different disciplines within the same subject category, we can analyze the case of the Library & Information Sciences. The journal Scientometrics clearly shows less obsolescence than Journal of Information Science. Again attending the TVaR, 5% of the literature cited in Scientometrics is over 31 years, while 1% is over 46. However, in the Journal of Information Science, 5% of the literature is over 24 years, and 1% is over 34.

Let us now suppose that a researcher with a career in Economics decided to opt for one of the two journals, Applied Economics or Economics Letters, which are very similar in the content they disseminate (the second is more practical). If you set a probability $p=0.01$, that is, that the distribution of the number of citations leaves an area to the right of 1%, you should choose to send the article to the first journal, which has a VaR with a value of 60 instead of the VaR 49 that the latter has (see Table 9). If the researcher considers that the value of p that measures journal obsolescence should be 9%, then there is not much difference between the two VaR values (32 and 29), and it could feel indifferent in this case.

Furthermore, the following comparison can also be made. While in Scientometrics there are 9% of cited references with more than 26 years, in Journal of Information Science with this same age the percentage is reduced to only 4%. Similarly, while in Scientometrics there are 6% of cited references with more than 30 years, in the Journal of Information Science with this same age the percentage of cited references is reduced to only 2%.

In General Medicine, 5% of the literature cited in New England Journal of Medicine is over 26 years, and 1% is over 38. However, in PLoS Medicine, 5% of the literature is over 22 years, and 1% is over 32. Moreover, while in New England Journal of Medicine there are 9% of cited references with more than 21 years, in PLoS Medicine with this same age the percentage is reduced to only 6%.

In Economics & Econometrics, 5% of the literature cited in American Economic Review is over 43 years, and 1% is over 65. However, in Applied Economics, 5% of the literature is over 35 years, and 1% is over 50. Furthermore, while in American Economic Review there are 6% of cited references with more than 40 years, in Applied Economics with this same age the percentage is reduced to only 3% of the cited references.

Therefore, a detailed analysis of Table 9 (see the “Appendix”) would allow, although it is not the purpose of this work, to group journals with similar obsolescence through the analysis of the tails of their distributions. Similarly, it would also allow subject categories to be disaggregated into different disciplines by analyzing obsolescence through the behavior of the tails in the distributions.

Table 2 From top to down survival rate and mortality rate for the subject categories

Full size table

Table 3 Percentile VaR (above) and TVaR (below) for subject categories

Full size table

Conclusions

The literature framework establishes that a document is obsolete when it is no longer cited, i.e., when it is no longer used by an academic community as a source of information to argue, justify or contradict the statements or findings reported by other authors.

The results in this study support that the subfield and even the discipline (specialty) are influencing obsolescence. Thus, there is a field effect in the phenomenon by which publications are less and less cited over time, known as literature obsolescence.

We used all the cited references in 22,559 research articles published in 2019 from eight different subfields (subject categories in Scopus): Cell Biology, Economics & Econometrics, Electrical & Electronic Engineering, General Chemistry, General Medicine, General Physics & Astronomy, History, and Library & Information Sciences.

The distribution of synchronically accumulated citations produced an initial growth followed by an exponential decrease. We concluded that the negative binomial is preferable to the Poisson distribution for the datasets considered in all the cases.

However, it is not obvious to compare the citation distribution for two different disciplinary journals. This is because the curves often intersect each other on one or more occasions. For this reason, three measures to analyze the tail of the distribution were proposed: survival rate, mortality rate and percentile.

There are very important differences between subfields and even between disciplines in relation to the tail of the distribution. The highest survival rate is observed in History. Above 41% of the cited references are over twenty years, and about 10% are over fifty. At a certain distance from this last subfield, although with a similarly high survival rate, is the General Physics & Astronomy field, for which more than 25% of the cited references have a survival rate of more than twenty years and close to 2.6% over fifty. On the contrary, the lowest survival rates are observed in Electrical & Electronic Engineering and General Medicine. Only 8% of the cited references in Electrical & Electronic Engineering and 9% in General Medicine are over twenty years, while 0.1% are over fifty. Therefore, the obsolescence in Electrical & Electronic Engineering is the order of five times higher than in History at medium time period (twenty years) but the order of eighty times higher at long time period (fifty years). Note that this proportion increases exponentially over time.

Important differences are also observed between subfields of the same branch of knowledge. This is the case of General Physics & Astronomy and General Chemistry, for example. A 25% of the citations in General Physics & Astronomy are directed to publications over twenty years, and this percentage is reduced to 11% in General Chemistry (less than half). Furthermore, this proportion again increase exponentially over time. Thus, after fifty (a hundred) years, the survival rate in General Chemistry is seven (50) times lower than in General Physics & Astronomy.

Differences are also observed at journal level. While the survival rate at twenty years in the journal Scientometrics is around 1.7 times that in Journal of Information Science, after sixty years it is around 22 times higher for Scientometrics. Something similar happens with journals in other subfields.

The mortality rate takes on more stable values than the survival rate. All the subfields present increasing mortality rates, which indicates that their distributions have light tails. History has the lowest mortality rates (around 0.04) followed by General Physics & Astronomy (around 0.07). The highest mortality rates (greater than 0.12) are reached in Cell Biology, General Medicine, and Electrical & Electronic Engineering. Again, there are important differences both between subfields and between disciplines. The mortality rates in Electrical & Electronic Engineering are about three times those in History, and in General Chemistry are about 50% higher than in General Physics & Astronomy. At disciplinary journal level, the mortality rates in Journal of Information Science are about 50% higher than those in Scientometrics, for example.

Actually, this is an expected result if we consider the diffusion process of knowledge in the different fields. Two main types of obsolescence occur related to the diffusion process. Both types are related to the rate at which diffusion occurs. In the first place, initially, there is a high number of citations until a modal value is reached, followed by a rapid drop in them, a drop that is also very sharp, as occurs, for example, in the fields of Medicine and Engineering. The second type is due to a slower diffusion process, associated with a slower rate of decline. Examples of this behavior in obsolescence are the basic sciences, social sciences, and humanities.

Finally, a higher obsolescence, that is a faster aging, is associated with lower values for a given percentile. The highest obsolescence is again observed in Electrical & Electronic Engineering, General Medicine, Cell Biology, and General Chemistry. Thus, attending the TVaR, a 5% of the cited literature in Electrical & Electronic Engineering is over 24 years, while 1% is over 36. On the contrary, a 5% of the cited literature in History is more than 65 years, while 1% is over 98. At disciplinary journal level, while in Scientometrics there are 6% of cited references with more than 30 years, in Journal of Information Science with this same age the percentage of cited references is reduced to only 2%, for example.

As has been evidenced, the difference between subfields can also be observed between disciplinary journals of the same subfield. Although the subject category is the smallest disaggregation set in relation to the field in the Scopus database, it is common for the same subject category to include very different disciplines. The differences can be very noticeable in the use that an academic community makes of bibliographical references to argue, justify or contradict the statements or findings reported by other authors.

In this respect, as practical application and future line of research, a detailed analysis of the measures proposed in this paper would allow to group journals with similar obsolescence through the comparison of the tail in the distributions. Similarly, it would also allow subject categories to be disaggregated into different disciplines by analyzing obsolescence through the tail of the distributions.

The journal impact factor focus on measuring the average citations per document in a short period of time (between two and five years generally). These measures do not collect the citations received in long time periods. However, as has been evidenced in this paper, some journals with low obsolescence accumulate a high percentage of citations after many years. Therefore, it would be necessary to accompany the short-term impact factor with some measure for the tail of the citation distribution, such as those presented in this paper, to provide a more accurate idea of the real impact of said journal.

As a final consideration, diachronic analysis is much less common in the literature as it takes time for citation to accumulate. However, our approach can lead also to diachronic analysis because it could be applied in the same way. Diachronic analysis is based on citations received, instead of synchronic analysis which is based on references made. Therefore, diachronic obsolescence requires setting a period of time in the past to look into the future. In our approach but for the diachronic analysis, the years of the citations must be considered in natural chronological order, instead of the reverse chronological order of the synchronic analysis. However, some authors argue that synchronic and diachronic studies produce similar results (Stinson & Lancaster, 1987), suggesting a preference for the synchronic method.

References

Barnett, G. A., Fink, E. L., & Debus, M. B. (1989). A mathematical model of academic citation age. Communication Research, 16(4), 510–531.
Article Google Scholar
Bouabid, H. (2011). Revisiting citation aging: a model for citation distribution and life-cycle prediction. Scientometrics, 88(1), 199–211.
Article Google Scholar
Bouabid, H., & Larivière, V. (2013). The lengthening of papers’ life expectancy: A diachronous analysis. Scientometrics, 97(3), 695–717.
Article Google Scholar
Brookes, B. C. (1970). Obsolescence of special library periodicals: Sampling errors and utility contours. Journal of the American Society for Information Science, 21(5), 320–329.
Article Google Scholar
Burrell, Q. L. (2002). The nth-citation distribution and obsolescence. Scientometrics, 88(1), 309–323.
Article Google Scholar
Burton, R. E., & Kebler, R. W. (1960). The half-life of some scientific and technical literatures. American Documentation, 11(1), 18–22.
Article Google Scholar
Cano, V., & Lind, N. C. (1991). Citation lifes cycles on ten citations classics. Scientometrics, 22(2), 297–312.
Article Google Scholar
De Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Article Google Scholar
Egghe, L., & Ravichandra Rao, I. K. (1992). Citation age data and the obsolescence function: Fits and explanations. Information Processing and Management, 28(2), 201–217.
Article Google Scholar
Egghe, L., & Ravichandra Rao, I. K. (2002). Theory and experimentation on the most-recent-reference distribution. Scientometrics, 53(3), 371–387.
Article Google Scholar
Egghe, L., & Rousseau, R. (2000). The influence of publication delays on the observed aging distribution of scientific literature. Journal of the American Society for Information Science and Technology, 51(2), 158–165.
Article Google Scholar
Ewing, G. J. (1966). Citation of Articles from Volume 58 of the Journal of Physical Chemistry. Journal of Chemical Documentation, 6(4), 247–250.
Article Google Scholar
Glänzel, W. (2004). Towards a model for diachronous and synchronous citation analyses. 60(3), 511–522.
Gross, P. L., & Gross, E. M. (1927). College libraries and chemical education. Science, 66(1713), 385–389.
Article Google Scholar
Gupta, B. M. (1998). Growth and obsolescence of literature in theoretical population genetics. Scientometrics, 42(3), 335–347.
Article MathSciNet Google Scholar
Gupta, U. (1990). Obsolescence of physics literature: Exponential decrease of the density of citations to Physical Review articles with age. Journal of the American Society for Information Science, 41(4), 282–287.
Article Google Scholar
Johnson, N., Kemp, A., & Kotz, S. (2005). Univariate discrete distributions. Wiley.
Klugman, S., Panjer, H., & Willmot, G. (2008). Loss models: From data to decisions (3rd ed.). Wiley.
Larivière, V., & Archambault, É. (2008). Long-term variations in the aging of scientific literature: From exponential growth to steady-state science (1900–2004). Journal of the American Society for Information Science and Technology, 59(2), 288–296.
Article Google Scholar
Line, M. B. (1970). The half-life of periodical literature: Apparent and real obsolescence. Journal of Documentation, 26(1), 46–54.
Article Google Scholar
Line, M. B., & Sandison, A. (1974). Obsolescence and changes in the use of literature wuith time. Journal of Documentation, 30(3), 283–350.
Article Google Scholar
Small, H. G., & Crane, D. (1979). Specialties and disciplines in science and social science: An examination of their structure using citation indexes. Scientometrics, 1(5–6), 445–461.
Article Google Scholar
Song, Y., Ma, F., & Yang, S. (2015). Comparative study on the obsolescence of humanities and social sciences in China: under the new situation of web. Scientometrics, 102, 365–388.
Article Google Scholar
Stinson, E. R., & Lancaster, F. W. (1987). Synchronous versus diachronous methods in the measurement of obsolescence by citation studies. Journal of Information Science, 13(2), 65–74.
Article Google Scholar
Van Raan, A. F. (2000). On growth, ageing, and fractal differentiation of science. Scientometrics, 47(2), 347–362.
Article Google Scholar
Wallace, M. L., Larivière, V., & Gingras, Y. (2009). Modeling a century of citation distributions. Journal of Informetrics, 3(4), 296–303.
Article Google Scholar
Wang, M., Zhang, J., Chen, G., & Chai, K. H. (2019). Examining the influence of open access on journals’ citation obsolescence by modeling the actual citation process. Scientometrics, 119(3), 1621–1641.
Article Google Scholar
Zhang, L., & Glänzel, W. (2017a). A citation-based cross-disciplinary study on literature aging: Part I-the synchronous approach. Scientometrics, 111(3), 1573–1589.
Article Google Scholar
Zhang, L., & Glänzel, W. (2017b). A citation-based cross-disciplinary study on literature aging: Part II-diachronous aspects. Scientometrics, 111(3), 1559–1572.
Article Google Scholar

Download references

Acknowledgements

We thank the two anonymous reviewers for their valuable comments and suggestions, which have greatly helped us improve the original manuscript.

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Emilio Gómez-Déniz was partially funded by Grant ECO2017–85577–P (Ministerio de Economía, Industria y Competitividad. Agencia Estatal de Investigación).

Author information

Authors and Affiliations

Department of Quantitative Methods and TIDES Institute, University of Las Palmas de Gran Canaria, Campus de Tafira s/n, 35017, Las Palmas de Gran Canaria , Spain
Pablo Dorta-González & Emilio Gómez-Déniz

Authors

Pablo Dorta-González
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Gómez-Déniz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Dorta-González.

Appendix

See Tables 4, 5, 6, 7, 8, and 9.

Table 4 Journals and subject categories in the empirical application

Full size table

Table 5 Estimates of the parameters and AIC values for the different subject categories studied

Full size table

Table 6 Estimates of the parameters and AIC values for the different journals between subject categories studied

Full size table

Table 7 From top to down survival rate and mortality rate for the journals considered in CB, E&E, E&EE, and GC

Full size table

Table 8 From top to down survival rate and mortality rate for journals in GM, GP&A, H, and L&IS

Full size table

Table 9 Percentile VaR (above) and TVaR (below) for all the journals considered

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dorta-González, P., Gómez-Déniz, E. Modeling the obsolescence of research literature in disciplinary journals through the age of their cited references. Scientometrics 127, 2901–2931 (2022). https://doi.org/10.1007/s11192-022-04359-w

Download citation

Received: 15 March 2021
Accepted: 19 March 2022
Published: 19 April 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11192-022-04359-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Modeling the obsolescence of research literature in disciplinary journals through the age of their cited references

Abstract

Similar content being viewed by others

Understanding the Scientific Enterprise: Citation Analysis, Data and Modeling