An analysis of bibliometric indicators to JCR according to Benford’s law
- First Online:
- Cite this article as:
- Alves, A.D., Yanasse, H.H. & Soma, N.Y. Scientometrics (2016) 107: 1489. doi:10.1007/s11192-016-1908-3
Journal Citation Reports (JCR) is the main source of bibliometric indicators known by the scientific community. This paper presents the results of a study of the distributions of the first and second significant digits according to Benford’s law (BL) of the number of articles, citations, impact factors, half-life and immediacy index bibliometric indicators in journals indexed in the JCR Sciences and Social Sciences Editions from 2007 to 2014. We also performed the data analysis to country’s origin and by journal’s category, and we verified that the second digit has a better adherence to BL. The use of the second digit is important since it provides a more sound, complete and consistent analysis of the bibliometric indicators.
KeywordsJournal Citation ReportsJCRBibliometric indicatorsBenford’s law
Bibliometric indicators have increasingly becoming of interest for they can be helpful in providing some measurement of the visibility of scientific publications. Bibliometric indicators can provide support in the evaluation of research success, the impact in the scientific community, and to research policy optimization. They can also help researchers in selecting the journals to which to submit their manuscripts (Durieux and Gevenois 2010). Funding agencies have also been looking at bibliometric indicators as they can offer quantitative measures to the results of the investment made in science (Cabezas-Clavijo et al. 2013).
As far as we know, the main source of bibliometric indicators known by the scientific community is the Journal Citation Reports (JCR). According to Thomson Reuters (2016), the owner of JCR, “Journal Citation Reports offers a systematic, objective means to critically evaluate the world’s leading journals, with quantifiable, statistical information based on citation data. By compiling articles’ cited references, JCR helps to measure research influence and impact at the journal and category levels, and shows the relationship between citing and cited journals”. JCR contains several bibliometric indicators, which can reveal information about the performance of each journal and annually it has the Science and Social Sciences editions.
The bibliometric indicators listed in the JCR have already been used in many studies in the last years (Vanclay 2012; Sangwal 2013; Campanario 2014, 2015). Given the interest in bibliometric indicators and the information that can be extracted from the JCR database, in this paper we investigate whether the main bibliometric indicators listed in the JCR comply with Benford’s law (BL).
BL is the empirical observation that in many data sets the significant digits are not uniformly distributed, as one might expect, but instead they tend to follow a very particular logarithmic distribution (Berger and Hill 2015). One of the applications of BL has been to aid researchers and professionals in identifying eventual anomalies in data sets, such as, financial data of religious community (Clippe and Ausloos 2012), the quality of occupational hygiene (De Vocht and Kromhout 2013), aggregated income taxes of Italian municipalities (Mir et al. 2014), birth time series (Ausloos et al. 2015), natural climatic process (Joannes-Boyau et al. 2015) and to distinguish different chaotic processes from stochastic processes (Li et al. 2015).
To the best of our knowledge, Campanario and Coslado (2011) was the first application of BL to bibliometric indicators listed in the JCR. They noted that the number of articles published, citations received to journals and impact factors of journals indexed in the JCR Science Edition from 1998 to 2007 not always comply with BL. They identified the first significant digit of each one of these indicators for each year separately, and compared them to the numbers predicted by BL.
In Alves et al. (2014), we extended the work of Campanario and Coslado (2011) analyzing the distribution of the first significant digit of the number of articles published of journals indexed in the JCR Science and Social Sciences Editions from 2007 to 2011. We also investigated their compliance with BL analyzing the number of articles published according to the country of origin and to the journal’s category.
Other studies that analyses bibliometric indicators using BL (first digit) are of Egghe and Guns (2012) and Hürlimann (2015). The former, introduced a generalization of BL based on the same data used by Campanario and Coslado (2011), while the latter suggests a truncated Erlang distribution and he partially used data given in Campanario and Coslado (2011) and Alves et al. (2014) to illustrate his new approach to some bibliometric indicators.
In this paper, we study the distributions of the first and second significant digits according to BL of several bibliometric indicators in journals indexed in the JCR Sciences and Social Sciences Editions from 2007 to 2014. The indicators considered are: number of articles, citations, impact factors, half-life and immediacy index. We also investigate their compliance with BL analyzing them indicators according to the country of origin and to the journal’s category.
BL, also known as the first digit law, is a logarithmic distribution function used to predict the first significant digit in numerical data. It asserts that the leading significant digit is not equally likely to be any one of the nine possible digits, but it is 1 more than 30 % of the time, and it is 9 less than 5 % of the time, with the probability of occurrence decreasing logarithmically in value as the digit increases from 1 to 9.
This was first observed by Newcomb (1881), who noted that the first pages of his book of logarithmic tables were more worn than the latter pages, which indicated that tables of logarithms were not used in a uniform way. However, only in 1938 the law was referred to as BL when Benford published a paper (Benford 1938) analyzing diverse data sets.
Expected proportions of BL for the first and second digits
Materials and methods
We get the data for this study from the JCR (Science and Social Sciences Editions) database available on the Web, covering the period from 2007 to 2014. Initially, we collected the following data for each journal: title, ISSN, year, edition, country of origin and the journal’s category. It is worth noting that a journal can belong to one, two, or more JCR categories. The following bibliometric indicators for all journals indexed in the JCR in both editions were also collected: number of articles, total of citations and self-citations, 2-year impact factor (IF) with self-citations (2Y-IF) and without self-citations (2Y-IFWSC), 5-year IF with self-citations (5Y-IF), cited and citing half-life, and immediacy index.
Results and discussion
According to all 288 tests performed, it is possible to observe that 191 (66.32 %) do not comply with BL. The result for each edition of the JCR is almost equal, 95 (65.97 %) for Science Edition and 96 (66.67 %) for Social Sciences Edition. By considering 1BL, 112 (77.78 %) do not comply with BL, while for 2BL 79 (54.86 %) do not comply with BL, which is a better result.
The average percentage of journals considered varied from an indicator to another. For instance, the total number of articles to the Science Edition is 97.82 % of the total number of journals for 1BL and 96.40 % for 2BL. For the Social Sciences Edition, the average percentage is very similar, 97.86 % for 1BL and 95.50 % for 2BL. The percentages are similar for the citations, self-citations, 2Y-IF, 2Y-IFWSC, 5Y-IF and immediacy index. However, for the cited and citing half-life the percentages are smaller than 77 % in the Science Edition and smaller than 63 % in the Social Sciences Edition.
Number of articles
Campanario and Coslado (2011) noted that the number of articles published in journals indexed in the JCR Science Edition do not comply with 1BL from 1998 to 2007. Alves et al. (2014) also noted the same in journals indexed in the JCR Science and Social Sciences Editions from 2007 to 2011. The same occurred from 2007 to 2014, but for the 2BL, some Chi square values in some years are smaller than the critical value. For instance, only 2 years (2007 and 2011) do not comply in the JCR Science Edition.
The number of articles published in a journal varies a little from 1 year to another, since the number of issues in a journal seldom changes yearly. This may explain the non-compliance of the first digit and the slight improvement on the second digit for the number of articles published.
The number of citations indicate the total number of times that each journal was cited by all journals included in the database within the current JCR year. For this indicator we observed that almost all data complies with 1BL, being the result still better for 2BL, that has a sole exception of non-compliance in the year 2012. Considering the self-citations, all years comply with 2BL. We also considered the number of citations without self-citations and to 2BL they all comply with BL. For the 1BL, non-compliance was observed in the years from 2009 to 2011 in the Science Edition. In 2009 and 2010, the Chi square values are smaller compared to those of the indicator with self-citations; nevertheless, they do not obey BL.
Considering the country of origin and the journal’s category for the citations and self-citations, both 1BL and for 2BL have a very high average acceptance percentage, around of 95 %, for the two JCR editions. For all cases, except by a small difference the journal’s category in Social Sciences Edition for self-citations, the percentage is higher for 2BL compared to 1BL.
The IF identifies the frequency with which an average article from a journal receives citations by other articles in a particular period. To 2Y-IF, the last 2 years are considered and for the 5Y-IF the last 5 years. To the IF values we noticed that almost all of them do not follow 1BL, expect the year 2007 in the Social Sciences Edition for 2Y-IFWSC. Additionally, the 5Y-IF do not follow 1BL, but it follows the 2BL, except in 2008 in the Social Sciences Edition. This is interesting since neither 2Y-IF nor 2Y-IFWSC follow BL in almost all years. It provides support for the preferential use of the 5Y-IF over the 2Y-IF or 2Y-IFWSC.
Half-life and immediacy
The cited half-life gives the number of years back from the current year that accounts for 50 % of the total number of citations to a journal. The citing half-life identifies the number of years from the current year that accounts for 50 % of the cited references from articles published by a journal. The immediacy index measures how frequently, in average, an article from a journal is cited within the same year of publication and it is a useful metrics for evaluating journals that publish cutting-edge research.
We noted that not all years of the cited and citing half-life indicators comply with 1BL and 2BL for the two JCR editions. We expect that the great majority of half-life values “>10.0” is probably bounded from 10 to 20 years. For the sake of calculation we, therefore, assumed “1” to be the first digit of all the half-life values “>10.0”. The new Chi square values obtained are slightly smaller but the conclusions for the first digit remain the same.
For the immediacy indicator the same pattern occurred with 2BL, but to 1BL, the Chi square values are in accordance to BL for some years.
In this paper, we analyzed the number of articles, citations, impact factors, half-life and immediacy index bibliometric indicators in journals indexed in the JCR Science and Social Sciences Editions from 2007 to 2014 according to 1BL and 2BL. We also performed the data analysis to country’s origin and by journal’s category.
In Alves et al. (2014) we verified that for countries and for journal’s categories the majority complies with 1BL. In this study, we verified that the second digit has a better adherence to BL, since the average percentage of compliance is around of 95 % to almost all bibliometric indicators. The non-compliance with 1BL of the number of articles for the period 1998–2007 was observed in Campanario and Coslado (2011), for the period 2007–2011 in Alves et al. (2014), and again in the current study for the period 2007–2014. Here we observed a slight improvement on the compliance with 2BL. This may be explained by the observation that the number of articles published in a journal varies a little from 1 year to another.
From the data analyzed an interesting result came out related to 5Y-IF. 5Y-IF has a better compliance to BL than 2Y-IF and 2Y-IFWSC. This result gives support for the preferential use of the 5Y-IF over 2Y-IF or 2Y-IFWSC. For both the cited and citing half-life they do not comply with 1BL and 2BL in all the years considered. The non-compliance with BL may be explained by the fact that we expect that these indicators to be quite stable over the years. The result for the immediacy index for some of the years was better for the 1BL compared with 2BL. This occurred also considering the journal’s category for the Social Sciences Edition. Finally, the result for citation and self-citations is very good, for both 1BL and for 2BL.
This study indicates that to consider the country of origin and journal’s category is relevant either to 1BL or to 2BL. The use of the second digit is important since it provides a more sound, complete and consistent analysis of the bibliometric indicators.
Further studies can be carried out using the generalized law of Benford (Egghe and Guns 2012).
The authors acknowledge the financial support of CAPES and CNPq. We thank the suggestions of the anonymous reviewers that improved the presentation of the paper.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.