Introduction

Nowadays, there is no lack of rankings of entities related to research. Examples of such entities are scientific journals and universities. For the latter, there are many ways to rank them. Academic Ranking of World Universities (more known as the Shanghai ranking), QS World University Ranking and THE World University Rankings are the most well-known university rankings. These rankings differ in indicators used and the weighting of the indicators, and their results have been shown to be only moderately correlated (Olcay and Bulu 2017; Mussard and James 2018). CWTS Leiden Ranking, which uses an advanced bibliometric methodology and which do not rely on reputation measurement, avoids a composite indicator of university performance, and this constitutes a fundamental difference between this ranking and the three more well-known ones. University rankings have been exposed for various criticism, though. Regarding Academic Ranking of World Universities, Billaut et al. (2010) concluded that all criteria used by the designers of the ranking are only loosely connected with what they intended to capture. High volatility from year to year has been observed for THE World University Ranking (Bookstein et al. 2010), as well as high volatility in the ranking of universities after the 30th position for Academic Ranking of World Universities and THE World University Rankings (Saisana et al. 2011).

One main factor in the reputation of universities is the quality of the journals in which the researchers publish their work. Hence, there is also a great interest in ranking academic journals.

One can distinguish between expert-based and citation-based approaches to the ranking of journals. In some countries, for instance Norway, the allocation of university funding from the government partly depends on researcher driven quality classifications of journals (and of publishers) (Sivertsen 2010). Publications in journals assigned to the highest level yield a higher amount of funding. In this case, then, an expert-based approach to the ranking of journals is used. By contrast, in Sweden (the country where the authors of this paper are active), some universities allocate funding to their units (like departments) on the basis on a field normalized counterpart to the Journal Impact Factor (JIF), the latter a citation-based indicator that has been much used during a long time to evaluate academic journals (Garfield 2006). For each considered journal and publication year, a field normalized indicator value is calculated, operations that give rise to an (implicit) ranking of the journals. Then, for a given university unit, an average over the scores of the journals of the publications of the unit is obtained. In these Swedish cases, clearly, citation-based approaches to the ranking of journals are used. Research shows that there are differences between the expert-based and the citation-based approaches to journal rankings but the two approaches have been shown to be moderately or highly correlated (e.g. Serenko and Dohan 2011; Ahlgren and Waltman 2014; Haddawy et al. 2016; Saarela et al. 2016). However, Rigby et al. (2015), dealing with agricultural and environmental economics, did not find a significant correlation between journal scores based on expert assessments and the impact factors of the journals.

It is possible to combine expert-based and citation-based approaches to journal rankings. We have not found much earlier research on such hybrid approaches, however. Chen and Chen (2011) proposed a methodology for journal ranking that integrates experts’ assessments with scores obtained by Google’s PageRank (Brin and Page 1998), whereas Tsai et al. (2014) combined the two approaches by use of the Borda count approach (Reilly 2002).

There has been quite much research on the development of citation-based journal ranking approaches. One of the simplest approaches is to sum the number of citations from journals in the same field, for example as in Bush et al. (1974). This method has the obvious drawback that journals with a larger amount of papers will have more citations and, hence, will be ranked higher. The abovementioned JIF divides the number of citations received by a journal by the number of published papers in the journal such that a citation per paper indicator is obtained. The JIF can be attributed to Gross and Gross (1927) but Garfield (1955) also made important contributions (see the historical accounts of Archambault and Larivière (2009) and Garfield (2006)). Pinski and Narin (1976), in a seminal paper, proposed the first iterative algorithm, the invariant method, for valuing citations. The invariant method is based on the idea that citations from higher-cited journals are more valuable than citations from lower-cited journals. Several other approaches, all based on citation weighting, were later developed with respect to journal ranking (e.g. Bergstrom 2007; Bollen et al. 2006; Giuffrida et al. 2019; Guerrero-Bote and Moya-Anegón 2012; Kalaitzidakis et al. 2011; Liebowitz and Palmer 1984). With regard to these examples, Liebowitz and Palmer (1984) used their proposed measures to rank economics journals. However, their measures favor journals that publish reviews. This is related to the fact that reviews are on average more frequently cited than non-review articles. Examples of rankings of economics journals using the invariant method include the rankings in Kodrzycki and Pingkang (2006) and Ritzberger (2008). For a somewhat more extensive account of the history of citation weighting, see Giuffrida et al. (2019).

It has been shown that also various citation-based approaches to journal rankings are correlated. For instance, Harzing and van der Wal (2009) and Garcia-Romero et al. (2016) found rank correlations in the range 0.67–0.91, whereas Bornmann et al. (2018) reported rank correlations in the range 0.89–0.99 for the journal indicators outlined in Google Scholar. Now, this seems to imply that the citation-based ranking approach would not be of any great importance. However, although the correlation is high, there can be a substantial difference in rank for individual journals. Liebowitz and Palmer (1984) observed substantial differences for individual journals when the used citation-based indicator was adjusted for journal size (in terms of total number of characters published by a journal): Economic Journal dropped from rank 7 to rank 23, Oxford Economic Papers from rank 12 to rank 42, and Review of Economics and Statistics from rank 5 to rank 16. Garcia-Romero et al. (2016) observed that Journal of Economic Literature would rank 2 using the Free Disposal Hull method (Deprins et al. 1984) but 13 using the KMS method (Kalaitzidakis et al. 2011). It was further observed that Journal of Economic Growth would rank 6 and 26, whereas Economics and Politics would rank 11 and 45 using the same two methods.

To our best knowledge, only a few papers have considered ranking uncertainty in economics. Lubrano et al. (2003) considered ranking uncertainty when ranking economics departments. Oswald (2007) noted that it is better to publish the best (in terms of citation counts) paper in a medium-ranked journal compared to publish the four worst papers in an elite journal. Wall (2009) noticed that the distribution of citations overlap between journals, implying a high degree of uncertainty in the ranking of journals. Halkos and Tzeremes (2011) used data envelopment analysis (DEA) to evaluate economics journals. They derived efficiency scores, and bootstrap confidence intervals of these scores, to measure the uncertainty. Stern (2013), who dealt with ranking of economics journals, took uncertainty into account for the (five-year) JIF in terms of confidence intervals. Significance testing was applied as well, and which journals that were significantly better than others in terms of the JIF was displayed. Hudson (2013) predicted the Keele journal classification by regression methods and measured the probability of correct classification. The robustness of rankings of economics journals was studied by Seiler and Wohlrabe (2014), where the rankings were based on the JIF, by considering the skewness of the distribution of citations to the articles in a journal. Horrace and Parmeter (2017) revisited the Stern study (2013) referred to above, but performed multivariate inference to control the overall error rate of the testing procedure. Confidence intervals around values of the PPtop 10% indicator, i.e. the share of papers that belong to the 10% most frequently cited papers in a certain subfield and time period, was used by Bornmann and Wohlrabe (2019). The PPtop 10% results were used for assigning economics journals to four impact classes.Footnote 1

We also mention the study by Bradshaw and Brook (2016), even if journals in economics were not taking into account in the study. These authors ranked journals based on mean ranks and with respect to five indicators. Uncertainty in mean rank for the considered journals was dealt with by the use of resampled uncertainty intervals around these ranks.

In this contribution, we consider uncertainty in a ranking of economics journals. The choice of economics as a field of study is motivated by the fact that economics is a field in which ranking of journals, departments and even individuals is common. We use the invariant method of Pinski and Narin (1976) to rank the journals as, theoretically, it is the most appealing method (see the next section). We propose an uncertainty measure, which is based on a bootstrap approach. The measure is the average absolute change in rank, which we see as a reasonable uncertainty measure regarding rankings. We further calculate, based on the bootstrap method, 95% confidence interval for the observed values of the invariant method.Footnote 2 We show that the confidence intervals adds information besides only using the rank. It is possible to identify groups of journals with practically equal performance, as well as identifying journals that substantially parts from their neighboring journals. Basically, if two confidence intervals overlap, this would indicate that the corresponding two journals are not significantly different. So far as we know, the current paper is the first to address uncertainty of journal rankings, where the rankings are based on the invariant method.

The remainder of the paper is structured as follows. The next section treats data and methods. The invariant method, as well as the bootstrap approach used, are described in the section. The results are given in the third section, whereas the results are discussed, and conclusions put forward, in the last section of the paper.

Data and methods

There are many methods when it comes to ranking journals. Each method has its advantages and disadvantages. In this section, we discuss the invariant method of Pinski and Narin (1976). The name origins from that the method is invariant to citation intensities but it also has some other interesting properties. Palacios-Huerta and Volij (2004) proved, using an axiomatic approach, that the invariant method satisfies four important properties: invariance to reference intensity (the average number references per paper with regard a journal), weak homogeneity, weak consistency and invariance to splitting of journals. The invariance to reference intensity means that each journal has one vote and the cited journals share this vote proportional to the proportion of citations they receive. Weak homogeneity says that the relative ranking between the two journals, where the two journals have the same number of publications and the same reference intensity, should be directly related to the ratio of how they cite each other. Weak consistency implies that if we consider a subset of journals and redistribute the citations of the left out journals to the journals we keep according to how the left out journals cite, then the relative ranking should be the same as for the full set of journals. Invariance to splitting of journals means that if we split a journal to more than one journal with the same reference and citation profiles, then the relative ranking of the other journals will be the same. Palacios-Huerta and Volij (2004) proves that the invariant method is the only method that satisfies all four properties. A related method is the impact-adjusted measure of Liebowitz and Palmer (1984), which, as shown by Palacios-Huerta and Volij (2004), satisfies all properties except the invariance to reference intensity property.Footnote 3

The journals are ranked according to the invariant measure. The observed rank of journal i is denoted by \(r_{i}\). To evaluate the uncertainty, we used a bootstrap approach suggested by Andersson et al. (1998) in the context of ranking hospitals according to short-term mortality risk ratios for inpatients who had received treatment for myocardial infarction. Let \(c_{ij}\) denote the number of citations to journal \(i\) from journal \(j\). Next, let \(c_{j} = \sum\nolimits_{i} {c_{ij} }\) be the sum of journal \(j\)’s cited references that point to a journal included in our study. We took the number of references for journal \(j\) as given and randomized them according to probabilities defined as follows:

$$p_{ij} = \frac{{c_{ij} }}{{c_{j} }}$$
(1)

In other words, we took the number of references in journal \(j\) and randomly distributed them according to the proportion received citations for each journal. We then calculated values of the invariant measure for each journal and ranked the journals according to these values. This procedure was repeated \(B\) times, which yielded \(B\) rankings of the journals based on bootstrapping. Let \(r_{ib}\) be the new rank for journal \(i\) in replicate \(b\), and \(d_{ib} = \left| {r_{ib} - r_{i} } \right|\) the deviation of the new rank to the observed rank for \(i\) in replicate \(b\). As a measure of variability, we calculated the average absolute change in rank for journal \(i\), \(\overline{d}_{i}\), as follows:

$$\overline{d}_{i} = \sum\limits_{b = 1}^{B} {\frac{{d_{ib} }}{B}}$$
(2)

We used 1,000 as the value of \(B\). \(\overline{d}_{i}\) can be interpreted as the expected random change in the ranking of the \(i{\text{th}}\) journal.

The average absolute change in rank over all journals, say \(C\), can be interpreted as the expected random change in the ordering of the journals. \(C\) can be defined as follows:

$$C = \sum\limits_{i = 1}^{K} {\frac{{\overline{d}_{i} }}{K}}$$
(3)

where \(K\) is the number of considered journals.

The rankings are based on the invariant measure and their variability are informative. Hence, we also calculate bootstrapped-based 95% confidence intervals as a complement to the average absolute change in rank. We normalized the invariant measure such that the journal with the highest observed value of the measure has a value of 1, whereas the journal with the lowest observed value has a value of 0. A confidence interval was then formed for each journal using the bootstrapped values of the invariant measure. The additional information gained is that a journal can have a high average absolute change in rank due to that there are other journals with a similar value of the invariant measure, while another journal may have the same variability in the invariant measure but does not change rank as there are no other journals with similar values of the invariant measure.

The data was obtained from Journal Citation Reports (JCR), provided by Clarivate Analytics, and the publication period in the study is 1998–2013. We used all journals that belonged to the Web of Science subject category Economics in year 2013, 332 journals in total. Further, we only considered citations within the set of the 332 journals.

Results

The results are displayed in a number of tables and figures. To start with, we note that the value of \(C\), i.e. of the average absolute change in rank over all journals, is 3.83 with a standard deviation of 3.61. This implies that, on average, the ranks are well estimated but that there are large individual variation in how well the ranks are estimated. In Table 1, the 60 highest ranked journals are shown. The journal names shown in the table, and in Table 2, are abbreviations of the full names. The average absolute change in rank (\(\overline{d}_{i}\)) and the 95% confidence intervals are also indicated. The highest ranked journals do not reveal much of surprises with Journal of Political Economy, Quarterly Journal of Economics and Econometrica in top. Journal of Political Economy is a clear winner and ranked first in all bootstrap replicates. Quarterly Journal of Economics is second in about 91% of the times and Econometrica 9% of the times, yielding only a small degree of uncertainty in the ranking between them. This uncertainty is confirmed by the 95% confidence intervals of the invariant measure that do overlap. For the remaining journals, there are some uncertainty in \(\overline{d}_{i}\), but this uncertainty is too low to yield any substantial variability in rank. To exemplify the latter, the journal with the highest average absolute change in rank is American Economic Journal: Macroeconomics, which has rank 48 and a \(\overline{d}\) value of about 2.3. This implies that a rank in the close neighborhood of 48 can be considered fairly certain.

Table 1 Rank, average absolute change in rank (\(\overline{d}\)) and the 95% confidence interval for the normalized invariant measure for the 60 highest ranked journals
Table 2 Rank, average absolute change in rank (\(\overline{d}_{{}}\)) and the 95% confidence interval for the normalized invariant measure for the 60 journals with the largest average absolute change in rank

In Table 2, the journals with the largest \(\overline{d}\) values are shown. The 95% confidence intervals are all small and very close to 0, meaning that they are rather uninformative. Baltic Journal of Economics, with rank 249, is the journal with highest \(\overline{d}\) value (about 30). At second place with an average rank change of nearly 26 positions is Journal of Consumer Affairs (rank 145). Then it is a rather big step down to the third journal, which is Asian Economic Policy Review (rank 217), with a \(\overline{d}\) value of 15.5. Below that, there is a fair amount of journals with a \(\overline{d}\) value in the interval 10–15. The journal with the 60th highest \(\overline{d}\) value is Spatial Economic Analysis (rank 233; \(\overline{d}\) is about 5.7).

In Figs. 13, the results for the average absolute change in rank is displayed, ordered by rank. In Fig. 1, the 100 highest ranked journals are shown. On the y-axis we have the rank ± \(\overline{d}\), i.e. the average absolute change in rank. We see that the variability for the first about almost 60 highest ranked journals are negligible, while amongst the top 100 there are a few with higher variability. Notably Quantitative Economics (ranked 68) with a \(\overline{d}\) value of almost 11. This is also seen in the 95% confidence interval that is considerably larger than for the surrounding journals (Table 2; Fig. 4). Compared to the highest ranked journals, the mid-ranked (position 101–200) have higher variability in rank (Fig. 2). As noted above, Journal of Consumer Affairs at rank 145 has a \(\overline{d}\) value of about 26. For the lowest ranked journals, the average mean absolute change in rank seems to be lower and getting lower with lower rank (Fig. 3). An exception is the most volatile journal of all, Baltic Journal of Economics. The general picture is that the rank of the highest and lowest journals are well estimated when using the invariant method, while for the mid-ranked journals there is considerably more uncertainty.

Fig. 1
figure 1

The 100 highest ranked journals and their average absolute (bootstrapped) change in rank

Fig. 2
figure 2

The 100 mid-ranked (positions101-200) journals and their average absolute (bootstrapped) change in rank

Fig. 3
figure 3

The 132 lowest ranked journals (positions 201–332) and their average absolute (bootstrapped) change in rank

In Figs. 46, the values of the invariant measure and the corresponding bootstrapped 95% confidence intervals are indicated. Figure 4 shows that the highest ranked 10 journals are considerably better compared to the rest and that there are notably differences within the first 10 journals as well. The highest ranked journal, Journal of Political Economy, is normalized to have a value of one, while the second and third journals (Quarterly Journal of Economics and Econometrica) have values of about 0.8. The fourth and fifth journals have values around 0.55, while the sixth has a value of about 0.48 (see also Table 1). The confidence intervals are rather large, but as there are large differences in the observed values of the invariant measure there is very little overlap in intervals. The latter implies high certainty in rank. In Fig. 4, we have zoomed the confidence intervals for journals ranked 10–25. It can be seen that there are three groups within these journals. The first group consists of journals with ranks 10–13, the second of journals with ranks 14–19 and the third of journals with ranks 20–23. The journals at rank 24 and 25 seems to belong to the journals with higher rank. There are then subgroups of journals in which the journals are more or less equal. Notice that this information cannot be gained from observing the corresponding \(\overline{d}\) values in Table 1.

Fig. 4
figure 4

The 100 highest ranked journals and bootstrapped 95% confidence intervals for the invariant measure, with zoom for journals at rank 10–25

For the mid-ranked journals (Fig. 5), the confidence intervals are wide in comparison to the value of the invariant measure, and there are no clear groups of journals. The same applies to a majority of the lowest ranked journals (Fig. 6): wide confidence intervals for journals up to rank about 290 and no clear groups of journals. For the journals above rank 290, the confidence intervals are more narrow implying that their rank is better estimated. We can be quite certain that the lowest ranked journals actually belong to the group of low ranked journals.

Fig. 5
figure 5

The 100 mid-ranked journals and bootstrapped 95% confidence intervals for the invariant measure

Fig. 6
figure 6

The 132 lowest ranked journals and bootstrapped 95% confidence intervals for the invariant measure

Discussion and conclusions

Publishing in highly ranked scientific journals are not only important for the academic career of the individual researcher, but it is also important for academic institutions as better reputation makes it easier to attract good researchers. Moreover, it is not unusual that publication data is used to allocate financial resources to departments, where publications in higher ranked journals yields more resources. Regarding university rankings, briefly treated in the beginning of this paper, these are most often partly based on publication data. It is possible for higher ranked universities to have higher tuition fees compared to lower ranked ones.

Overall, there are many reasons for an interest in the ranking of scientific journals. The focus of this paper is the ranking of economic journals. Economics is a field in which ranking of journals, departments and even individuals is common. Our main contribution is to propose an uncertainty measure, the average absolute change in rank, which is based on a bootstrap approach. The bootstrap is a robust approach that assumes less than competing asymptotic approaches. We also consider the distribution of the underlying ranking measure, here the invariant measure of Pinski and Narin (1976), to gain additional useful information. We show that ranks of the highest, as well as the lowest, ranked journals are well estimated, while there is a high degree of uncertainty regarding the rank of many mid-ranked journals. Here, the uncertainty measure clearly shows that the highest and lowest ranked journals cannot be considered as outliers due to randomness as their ranks are well estimated. The distribution of the underlying measure is useful for identifying groups of journals that are more or less of the same quality (from the point of view of the invariant measure). The journal with the highest observed value of the invariant measure, Journal of Political Economy, has the best performance and constitutes a singleton, whereas Quarterly Journal of Economics and Econometrica form the next group (there is a slight overlap between the two with respect to confidence intervals). The journals ranked between about 190 to 230 form another group in which there are no major quality differences between the journals, as the confidence intervals are overlapping (and the uncertainty of the ranks are high).

In this work, we have used the invariant method as the ranking method. Regarding future research, it would be interesting to study if ranking uncertainty is dependent on ranking method. It might be the case that there are ranking methods that are better than the invariant method from this perspective, i.e. methods associated with less ranking uncertainty compared to the invariant method.