Introduction

Benford’s Law (Benford 1938), also known as the first digit law or law of the leading digits, is a logarithmic probability distribution function for the first significant digits, which can be written as

$$ P\left( d \right) = \log_{10} \left( {1 + \frac{1}{d}} \right),\,d = { 1},{ 2}, \, \ldots ,{ 9} $$
(1)

where P is the probability and d is the first significant digit in question. The first significant digit of a number is the first non-zero digit on its extreme left like 7 for 725 and 2 for 0.0239. According to Eq. 1, in a given data set the probability of occurrence of a certain digit as first significant digit decreases logarithmically as the value of the digit increases from 1 to 9. The expected proportions for the first digits are shown in Table 1.

Table 1 The expected proportions of Benford’s Law for the first digits

This was first observed in 1881 by American astronomer and mathematician Simon Newcomb (Newcomb 1881), who noted that the first pages in his book of logarithmic tables were more worn than later pages, which indicated that the tables of logarithms were not used in a uniform way. From this he inferred that fellow scientists using the logarithm tables were looking up numbers starting with 1 more often than numbers starting with 2, numbers with first digit 2 more often than 3, and so on.

This law was rediscovered in 1938 by American electrical engineer and physicist Frank Benford and is now known as “Benford’s Law”. Benford analyzed 20 lists of large data sets with a total of 20,229 observations and 10 lists of smaller data sets with a total of 2,968 observations. This lists included the surface areas of rivers, the sizes of populations, physical constants, molecular weights, entries from a mathematical handbook, numbers contained in an issue of a magazine, death rates etc. He found that the digit 1 tends to occurs with probability of ~30 %, much greater than the expected 11.1 % (i.e., one digit out of 9).

Benford’s Law has been extensively applied to a wide variety of natural and man-made data sets, such as numerical data on the country-wise adherent distribution of major world religions (Mir 2012), financial data of religious community (Clippe and Ausloos 2012), fraud detection in scientific publications (Hein et al. 2012), detecting electoral fraud (Beber and Scacco 2012), time series analysis of seismic clusters (Sottili et al. 2012), experimental values of β-decay half-lives (Ni et al. 2009) and hydrological data (Nigrini and Miller 2007).

Campanario and Coslado (2011) was the first application of Benford’s Law to scientometric data. They used for this study a sample of number of articles published, citations received to journals and impact factors of journals indexed in the Science Citation Index from 1998 to 2007. They used data published in the JCR® database to Spanish universities available on the Web. They identified the first significant digit of each one of these variables for each year separately, and compared this to the number predicted by Benford’s Law. Citations data followed Benford’s Law in all years studied. However, for the data on the number of articles, there was no compliance with Benford’s Law in any of the years considered. The same occurred with the data for impact factors in almost all years studied.

Recently, Egghe and Guns (2012) used a generalization of Benford’s Law related to the general law of Zipf with exponent β > 0 in the data of Campanario and Coslado (2011). They applied nonlinear least squares to determine the optimal β and showed that this generalized law fits the data better than the classical Benford’s Law.

The present paper extends the work of Campanario and Coslado (2011). We analyzed the compliance of the number of articles published of journals indexed in the JCR® Science and Social Sciences Editions from 2007 to 2011 with Benford’s Law. We also investigated their compliance with Benford’s Law analyzing the number of articles published according to the journal’s country of origin and to the journal’s category. In addition, we make a comparison with the Scopus data.

Materials and methods

In this study we used data available in the JCR® database on the web from 2007 to 2011, with separate editions for Science and Social Sciences. All journals indexed in the JCR® with at least 1 article published were included. We also take into consideration the journal’s country origin and the journal’s category.

Initially we identified the first significant digit of the number of articles published in each journal indexed in the JCR®, for each year and edition, separately, to calculate the frequency of each digit and we compared it with the number predicted by Benford’s Law.

Then, we carry out the χ 2 test:

$$ \chi^{2} \left( {n - 1} \right) = \mathop \sum \limits_{i = 1}^{n} \frac{{\left( {N_{\text{o}} \left( d \right) - N_{\text{e}} \left( d \right)} \right)^{2} }}{{N_{\text{e}} \left( d \right)}},$$
(2)

to test the Null Hypothesis, H 0 that the observed distribution of the first significant digit, in each case we consider, is the same as the expected number based on Benford’s Law.

For n = 9 we have n − 1 = 8 degrees of freedom, and χ 2(8) = 15.507 for a 95 % confidence level. This is the critical value for the acceptance or rejection of the Null Hypothesis, that is, if the calculated value of χ 2 is less than the critical value then we accept H 0 and conclude that data is in compliance with Benford’s Law, otherwise, we reject H 0.

Alternatively we can test each of the nine proportions separately. The Z-statistic is the test to verify whether the observed proportion for a digit differs significantly from the expected value based on Benford’s Law (Nigrini 2012). The Z-statistic formula takes into account the absolute magnitude (the numeric distance) of the difference between the observed and the expected values, the cardinality of the data set, and the expected proportion value and is given by the following equation:

$$ Z = \frac{{\left| {P_{\text{o}} - P_{\text{e}} } \right| - \left( \frac{1}{2N} \right)}}{{\sqrt {\frac{{P_{\text{e}} \left( {1 - P_{\text{e}} } \right)}}{N}} }}, $$
(3)

where P o denotes the observed proportion value, P e the expected proportion value, and N is the total numbers of observations. The term in the numerator (1/2N) is a continuity correction term and it is considered only when it is smaller than the other term in the numerator. For a significant level of 5 %, the cutoff level is 1.96. When Z-statistic exceeds 1.96 it indicates that the difference between the observed proportion and the expected proportion values is significant at the 0.05 level, which means there is only a 5 % probability that the difference is due to chance alone.

Data available in the Scopus database were also used. Using this database, we tested the number of articles published in journals of some countries and categories of the JCR®. Similarly, all journals indexed in Scopus with at least 1 article published were considered. Furthermore, only journals present in both databases were considered.

Using the binomial distribution (Ni et al. 2009), the expected root-mean-square error, ∆[N(d)]:

$$ \Updelta N\left( d \right) = \sqrt {NP\left( d \right)\left( {1 - P\left( d \right)} \right),} $$
(4)

was also calculated where N is the total number of points considered and P(d) is the prediction of Benford’s Law.

Results and discussions

Campanario and Coslado (2011) noted that the number of articles published, citations received to journals and impact factors of journals indexed in the JCR® Science Edition from 1998 to 2007 not always are in compliance with Benford’s Law. A summary of their analysis is presented in Table 2.

Table 2 χ 2 values for the number of articles published, citations received and impact factors of journals indexed in the JCR® Science Edition from 1998 to 2007 (from Campanario and Coslado (2011))

Observe that χ 2 values for the number of articles are greater that the critical value (15.507) in all years, that is, all of them are not in compliance with Benford’s Law.

We decided to extend their analysis and we investigated the data of the following years. We analyzed the number of articles published in journals indexed in the JCR® Science Edition from 2007 to 2011 and the results are shown in Table 3. Despite of the fact that they had already calculated the χ 2 value for 2007, we calculated it again for the sake of verifying the compatibility of our results with theirs. We observed a small difference probably due to the fact that we considered a larger number of journals, with the update of the JCR® database.

Table 3 The frequency of occurrence of the figure d as the first significant digit, obtained from the number of articles published of journals indexed in the JCR® Science Edition from 2007 to 2011

The χ 2 values in all years are significantly greater than the critical value. Furthermore, we observe that the Z values for digit 1 are greater than the cutoff level (1.96) in all years. The same occurred with digit 5, except in 2007.

Campanario and Coslado (2011) take into consideration only journals of the Science Edition but we extended the calculation for the JCR® Social Sciences Edition. The result is presented in Table 4 and, as can be seen, the result is even worse. All years are not in compliance with Benford’s Law and the Z values are greater than the cutoff level for almost all digits. They mentioned in their paper that they have no explanation for these differences.

Table 4 The frequency of occurrence of the figure d as the first significant digit, obtained from the number of articles published of journals indexed in the JCR® Social Sciences Edition from 2007 to 2011

Mir (2012) observed that the data of three major Christian denominations follow Benford’s Law. However, when Christianity is considered as a single religious group, the distribution of the significant digits of the adherent data deviates from the predictions of Benford’s Law. Inspired by this observation we analyzed the journals according to their country of origin and to their JCR® category.

Table 5 presents the total number of countries that are in compliance (YES) or not (NO) with Benford’s Law considering the χ 2 values for the number of articles published in journals indexed in the JCR® Science Edition from 2007 to 2011, highlighting the three countries with the highest χ 2 values that are not in compliance with Benford’s Law and their respective number of journals and articles considered in each year.

Table 5 Total number of countries that are in compliance (YES) or not (NO) with Benford’s Law considering the χ 2 values for the number of articles published in journals indexed in the JCR® Science Edition from 2007 to 2011

It is possible to observe that the great majority of the countries is in compliance with Benford’s Law. “Poland” and “Turkey” are the countries that appeared more times in the list of countries that are not in compliance with Benford’s Law. In the case of “Turkey” it is interesting to note that the number of journals indexed in the JCR® greatly increased from one year to another. Furthermore, one can see that the χ 2 values decrease as the number of journals and articles increases. It is worth observing that the number of journals indexed in the JCR® is very small for some countries, not being sufficient for using the χ 2 test for the adherence of the data to Benford’s Law. According to Nigrini (2012), the rule for Benford’s Law for first non-zero significant digit χ 2 test is that the expected number of observations of each cell should be at least 5, hence, the number of observations should be at least 100 (100 times 0.0458 which is close enough to 5).

The result is very similar for the journals indexed in the JCR® Social Sciences Edition. Only a few countries are not in compliance with Benford’s Law, as shown in Table 6. Nevertheless, the χ 2 values are much smaller than the values presented when journals were considered as a single group. It is interesting to observe that “United States” and “England” are not in compliance with Benford’s Law in all years.

Table 6 Total number of countries that are in compliance (YES) or not (NO) with Benford’s Law considering the χ 2 values for the number of articles published in journals indexed in the JCR® Social Sciences Edition from 2007 to 2011

Other analysis carried out took into consideration the journal’s category in the JCR® Science Edition from 2007 to 2011. The result is presented in Table 7. It is possible to verify that the percentage of categories that are in compliance with Benford’s Law is larger compared to the percentage of countries that are in compliance with Benford’s Law in almost every year, except in 2009. “Mathematics” and “Nursing” appeared more times in the list of categories that are not in compliance with Benford’s Law.

Table 7 Total number of journal’s categories that are in compliance (YES) or not (NO) with Benford’s Law considering the χ 2 values for the number of articles published in journals indexed in the JCR® Science Edition from 2007 to 2011

For the journals indexed in the JCR® Social Sciences Edition, the result is significantly worse compared to the results with journal’s country of origin, as shown in Table 8. In some cases the numbers of journals in compliance with Benford’s Law were lower than the number of journals not in compliance. “Sociology” is a category that is not in compliance with Benford’s Law in all years.

Table 8 Total number of journal’s categories that are in compliance (YES) or not (NO) with Benford’s Law considering the χ 2 values for the number of articles published in journals indexed in the JCR® Social Sciences Edition from 2007 to 2011

It is interesting to observe that the χ 2 values observed for the journals indexed in the JCR® Social Sciences Edition are always greater than those presented for journals indexed in the JCR® Science Edition.

We compare next the number of articles published informed in the JCR® and the Scopus databases. We limited the comparison to journals of some countries and of some categories. To make the comparison, we considered only journals that were present in both databases.

The analysis performed showed that there are some cases where Scopus data are in compliance with Benford’s Law but the JCR® Editions data are not. Also the opposite was observed, that is, where JCR® Editions data are in compliance with Benford’s Law but the Scopus data are not. In Table 9 we summarize these findings with 8 examples.

Table 9 Comparison of the number of articles published in indexed journals in the JCR® and the Scopus databases and their compliance with Benford’s Law

The examples presented in Table 9 were carefully chosen so that the total number of journals is more than 100. In each example, the number of journals indexed in both databases is presented. Beside this value, we present the number of journals indexed in JCR® database in parenthesis. The columns “Min” and “Max” indicate the minimum and maximum number of articles published in journals indexed in the JCR® and the Scopus databases, respectively, according to the country of origin or category considered. The χ 2 values are also presented, and the values that are not in compliance with Benford’s Law are highlighted. The digits (d) with significant differences according to the Z-statistic test are also presented. We observed that there are two examples in compliance with Benford’s Law according to the χ 2 test but with one digit with significant difference according to its Z value.

Conclusions

In this paper we applied Benford’s Law to the data of JCR® Science and Social Sciences Editions, and Scopus, taking into consideration the number of articles published by journals indexed in the two databases. The data of these journals were analyzed by the country of origin and the journal’s category. From the country of origin analyses the majority is in compliance with Benford’s Law. In the case of journal’s category, the majority also follows Benford’s Law, except two recent years (2010 and 2011) in journals indexed in the JCR® Social Sciences Edition.

The nonconformity with Benford’s Law identified with the analysis performed in this work could be indications of either incomplete data (for instance, Karamourzov (2012) observed that there is a small fraction (<8 %) of journals of Russia indexed by JCR® in 2010; Michels and Schmoch (2012) noted the steady increase in recent years of publications that have been indexed at Web of Science and Scopus too), data errors, inconsistencies, or anomalies, and/or conformity to a large exponential power law, occurring with the JCR® and/or SCOPUS data, in view of significant differences observed. These indications were already mentioned in previous works where nonconformities were observed (see, for instance, Nigrini (2012)).

We believe that the main contribution of this study is to alert about these differences and, perhaps, provide an explorative instrument to identify where possibly some data anomalies may be occurring, regardless of which database is correct.