Normalisation of citation impact in economics

This study is intended to facilitate fair research evaluations in economics. Field- and time-normalisation of citation impact is the standard method in bibliometrics. Since citation rates for journal papers differ substantially across publication years and Journal of Economic Literature classification codes, citation rates should be normalised for the comparison of papers across different time periods and economic subfields. Without normalisation, both factors that are independent of research quality might lead to misleading results of citation analyses. We apply two normalised indicators in economics, which are the most important indicators in bibliometrics: (1) the mean normalised citation score (MNCS) compares the citation impact of a focal paper with the mean impact of similar papers published in the same economic subfield and publication year. (2) PPtop 10 % is the share of papers that belong to the 10% most cited papers in a certain subfield and time period. Since the MNCS is based on arithmetic averages despite skewed citation distributions, we recommend using PPtop 10 % for fair comparisons of entities in economics. In this study, we apply the normalisation methods to 294 journals (including normalised scores for 192,524 papers). We used the PPtop 10 % results for assigning the journals to four citation impact classes. Seventeen journals have been identified as outstandingly cited. Two journals, Quarterly Journal of Economics and Journal of Economic Literature, perform statistically significantly better than all other journals. Thus, only two journals can be clearly separated from the rest in economics.


Introduction
Research evaluation is the backbone of economic research; common standards in research and high-quality work cannot be achieved without such evaluations (Bornmann, 2011;Moed & Halevi, 2015). It is a sign of post-academic sciencewith its focus on accountabilitythat quantitative methods of research evaluation complement qualitative assessments of research (i.e. peer review). Today, the most important quantitative method is bibliometrics with its measurements of research output and citation impact. Whereas in the early 1960s, only a small group of specialists was interested in bibliometrics (e.g. Eugene Garfield, the inventor of Clarivate Analytics' Journal Impact Factor, JIF), research activities in this area have substantially increased over the past two decades .
Today various bibliometric studies are being conducted based on data from individual researchers, scientific journals, universities, research organizations, and countries (Gevers, 2014). According to the Panel for Review of Best Practices in Assessment of Research et al.
(2012) bibliometrics is the most important part of the field of scientometrics and is "accepted by the general scientific community" (p. 34).
Since citation impact is seen as a proxy of research quality, which measures one part of quality, namely impact (other parts are accuracy and importance, Martin & Irvine, 1983), while impact measurements are increasingly used as a basis for funding or tenure decisions in science, citation impact indicators are the focus of bibliometric studies. In these studies it is often necessary to analyse citation impact across papers published in different fields and years. However, comparing counts of citations across fields and publication years leads to biased results (Council of Canadian Academies, 2012). Since the average citation rates for papers published in different fields and years differ significantly (independently of the quality of the papers) (Kreiman & Maunsell, 2011;Opthof, 2011), it is standard in bibliometrics to normalize citations. According to Abramo, Cicero, and D'Angelo (2011) and Waltman and van Eck (2013b), field-specific differences in citation patterns arise for the following reasons: (i) different numbers of journals indexed for the fields in bibliometric databases (Marx & Bornmann, 2015); (ii) different citation and authorship practices, as well as cultures among fields; (iii) different production functions across fields (McAllister, Narin, & Corrigan, 1983); and (iv) numbers of researchers vary strongly by field (Kostoff, 2002). The law of the constant ratios (Podlubny, 2005) claims that the ratio of the numbers of citations in any two fields remains close to constant.
It is the aim of normalized bibliometric indicators "to correct as much as possible for the effect of variables that one does not want to influence the outcomes of a citation analysis" (Waltman, 2016, p. 375). In principle, normalized indicators compare the citation impact of a focal paper with a citation impact baseline defined by papers published in the same field and publication year. The recommendation to use normalized bibliometric indicators instead of bare citation counts is one of the ten guiding principles for research metrics listed in the Leiden manifesto (Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015;Wilsdon et al., 2015).
This study is intended to introduce the approach of citation normalizing in economics, which corresponds to the current state of the art in bibliometrics. Section 3 presents two normalized citation indicators (see also Appendix B): the mean normalized citation score (MNCS), which was the standard approach in bibliometrics over many years, and the current preferred alternative PP top 50% . The MNCS normalizes the citation count of a paper with respect to a certain economic subfield. PP top 50% further corrects for skewness in subfields' citation rates; the metric is based on percentiles. It determines whether a paper belongs to the 50% most frequently cited papers in a subfield. The subfield definition used in this study relies on the Journal of Economic Literature (JEL) classification system. It is well-established in economics and most of the papers published in economics journals have JEL codes attached.
In section 2 we describe our dataset and provide several descriptive statistics. We extracted all of the papers from the Web of Science (WoS, Clarivate Analytics) economics subject category published between 1991 and 2013. We matched these papers with the corresponding JEL codes listed in EconLit. Using citation data from WoS, we realized that the citation rates substantially differ across economic subfields. As in many other disciplines, citation impact analyses can significantly inspire or hamper the career paths of researchers in economics, their salaries and reputation (Ellison, 2013;Gibson, Anderson, & Tressler, 2014).
In a literature overview Hamermesh (2015) demonstrates that citations are related to the salaries earned by economists. Fair research evaluations in economics should therefore consider subfield-specific differences in citation rates, because the differences are not related to research quality.
In section 4 we introduce a new economics journal ranking based on normalized citation scores. We calculated these scores for 192,524 papers published in 294 journals (see also Appendix A). Although several top journals are similarly positioned to other established journal rankings in economics, we found large differences for many journals. In section 6, we discuss our results and give some direction for future research. The subfield-normalization approach can be applied to other entities than journals, such as researchers, research groups, institutions and countries.

The Journal of Econometric Literature (JEL) Codes
A key issue in the calculation of normalized citation scores is the definition of fields and subfields, which are used to compile the reference sets (Wilsdon et al., 2015;: "In comparative studies, inappropriate reference standards obtained from questionable subject assignment might result in misleading conclusions" (Glänzel & Schubert, 2003, p. 357). The most common approach in bibliometrics is to use subject categories that are defined by Clarivate Analytics for WoS or Elsevier for Scopus. These subject categories are sets of journals publishing papers in similar research areas, such as biochemistry, condensed matter physics and economics. They shape a multidisciplinary classification system covering a broad range of research areas (Wang & Waltman, 2016).
However, this approach has been criticised in recent years because it is stretched to its limits with multi-disciplinary journals, e.g. Nature and Science, and field-specific journals with a broad scope, e.g. Physical Review Letters and The Lancet. "These journals do not fit neatly into a field classification system" (Waltman & van Eck, 2013a, p. 700), because they cannot be assigned to a single field or publish research from a broad set of subfields (Haddow & Noyons, 2013).
It is not only specific for fields, but also for subfields that they have different patterns of productivity and thus different numbers of citations (Crespo, Herranz, Li, & Ruiz-Castillo, 2014;National Research Council, 2010). Thus, it is an obvious alternative for field-specific bibliometrics to use a mono-disciplinary classification system (Waltman, 2016). It is an advantage of these systems that they are specially designed to represent the subfield patterns in a single field (Boyack, 2004) and are assigned to papers on the paper-level (and not journal-level). The assignment of subfields at the paper level protects the systems from problems with multi-disciplinary journals. In recent years, various bibliometric studies have used mono-disciplinary systems. Chemical Abstracts (CA) sections are used in chemistry and related areas Bornmann, Schier, Marx, & Daniel, 2011), MeSH (Medical Subject Headings) terms in biomedicine (Bornmann, Mutz, Neuhaus, & Daniel, 2008;Leydesdorff & Opthof, 2013;Strotmann & Zhao, 2010), PACS (Physics and Astronomy Classification Scheme) codes in physics and related areas (Radicchi & Castellano, 2011), and the MathSciNet's MSC (Mathematics subject classification) system in mathematics (Smolinsky & Lercher, 2012).

Publication and citation data
WoS is the most important bibliographic database in bibliometrics. Most of the studies in this area are based on its publication and citation data. We downloaded all meta-data of the papers and the corresponding citations from the subject category "economics", which were published between 1991 and 2013. We used 1991 as the first year, since JEL codes were established in its current form in 1991. We obtained data for 224,867 papers with the document type "article" or "review", which were published in 386 journals. With the exclusion of other document types (e.g. editorial material, notes, and comments), we focus in this study on substantial and citable items.
We have made four adjustments to this dataset: (1) We excluded publications of the Papers and Proceedings issues from the American Economic Review and the European Economic Review. These papers are usually very short due to space considerations from the journal (usually five to six pages). They often represent an extension only that has been left out in full-length papers published elsewhere.
(2) We only kept those papers published in journals that were listed in 2013 for at least four years. Thus, we excluded papers from journals that have stopped being listed (or reclassified) in WoS or deceased.
(3) The journals in which the papers have appeared had to be listed in EconLit, since the JEL codes were obtained from the Econlit database. If we were not able to match a paper via EconLit (because the publishing journal was not listed), we used JEL codes data from RePEc (see Zimmermann, 2013). For these papers we applied a similar matching procedure as described by Angrist, Azoulay, Ellison, Hill, and Lu (2017).
(4) Papers without JEL codes, missing JEL codes, or with JEL codes "Y" and "Z" were excluded from the study. The codes "Y" and "Z" are not related to a specific content.
The four adjustments ended up with 192,524 papers, which appeared in 294 journals.
The citations of these papers refer to the time period between publication and the end of 2016.
Thus, the citation counts of the papers are based on different citation windows (ranging between 4 and 26 years). The longer the citation window, the more the "true" impact of a paper can be determined (Research Evaluation and Policy Project, 2005;Wang, 2013). Glänzel (2008) and Glänzel, Thijs, Schubert, and Debackere (2009) recommend using a citation window of at least three years. Johnston, Piatti, and Torgler (2013) show for papers published in the American Economic Review that the mean citation rate peaks in the fourth year after publication. Since the citations in our in-house database are counted until the end of 2016, we included no years prior to 2013 in this study. The results in Table 9 demonstrate that almost all journals publish papers with zero citations.

Descriptive statistics and differences in citation rates
With an average of 145 citations, the highest citation rate was reached by the Quarterly Journal of Economics by way of comparison. Arellano and Bond (1991) is the most frequently cited paper in our set (with 4,627 citations).   The results in Table 3 also reveal that the average citation rates decline over time in most cases, as the citation window gets smaller.
The dependency of the average citations in economics on time and subfield, which is independent of research quality, necessitates the consideration of subfield and publication year in bibliometric studies. Without consideration of these differences, research evaluations are expected to be biased and disadvantage economists newly publishing in the field or working in subfields with systematically low average citations (e.g. in "History of Economic Thought, Methodology, and Heterodox Approaches", B). 3 Standard approaches in bibliometrics to normalize citation impact Economics was already part of bibliometric studies, which considered field-specific differences (e.g. Ruiz-Castillo, 2012). Palacios-Huerta and Volij (2004) generalized an idea for citation normalization that goes back to Liebowitz and Palmer (1984), where citations are weighted with respect to the citing journal. However, this approach does not correspond to the current standards in bibliometrics and has not yet become established in economics. Angrist et al. (2017) constructed their own classification scheme featuring ten subfields in the spirit of Ellison (2002). The classification builds upon JEL codes, keywords, and abstracts. Using about 135,000 papers published in 80 journals the authors construct time varying importance weights for journals that account for the subfield where a paper was published. However, this approach also normalizes on the citing side, similar to Palacios-Huerta and Volij (2004). Combes and Linnemer (2010) calculated normalized journal rankings for all EconLit journals.
Although they considered JEL codes for the normalization procedure, they calculated the normalization at the journal, and not at the paper level. Linnemer and Visser (2016) document the most cited papers from the so called top-5 economics journals (Card & DellaVigna, 2013), where they also account for time and JEL codes. With the focus on the top 5 journals, however, they considered only a small sample of journals and did not calculate indicators.

Mean Normalized Citation Score (MNCS)
The definition and use of normalized indicators in bibliometrics started in the mid-1980s with the papers by Schubert and Braun (1986) and Vinkler (1986). Here normalized citation scores (NCSs) result from the division of the citation count of focal papers by the average citations of comparable papers in the same field or subfield. The denominator is the expected number of citations and constitutes the reference set of the focal papers (Mingers & Leydesdorff, 2015;Waltman, 2016). Resulting impact scores larger than 1 indicate papers cited above-average in the field or subfield and scores below 1 denote papers with belowaverage impact.
Several variants of this basic approach have been introduced since the mid-1980s (Vinkler, 2010) and different names have been used for the metrics, e.g. relative citation rate, relative subfield citedness, and field-weighted citation score. In the most recent past, the metric has been mostly used in bibliometrics under the label "MNCS". Here the NCS for each paper in a publication set (of a researcher, institution, or country) are added up and divided by the number of papers in the set, which results in the mean NCS (MNCS). Since citation counts depend on the length of time between the publication year of the cited papers and the time point of the impact analysis (see Table 3), the NCS is separately calculated for single publication years.
van Raan (2005) published the following rules of thumb for interpreting the MNCS: "This indicator enables us to observe immediately whether the performance of a research group or institute is significantly far below (indicator value <0.5), below (indicator value 0.5-0.8), about (0.8-1.2), above (1.2-1.5), or far above (>1.5) the international impact standard of the field" (p. 7). Thus, excellent research has been published by an entity (e.g. journal or researcher), if the MNCS exceeds 1.5. 17.4% of the papers in our dataset belong to the excellent category, while 4.7% are classified as above average; 11.8% and 43.5% of the papers are in the far below and below categories, respectively.
The MNCS has two important properties, which are required by established normalized indicators (Moed, 2015;Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011): (1) The MNCS value of 1 has a specific statistical meaning: it represents average performance and below-average and above-average performance can be easily identified. (2) If the paper of an entity (e.g. journal or researcher) receives an additional citation, the MNCS increases in each case.
A detailed explanation of how the MNCS is calculated in this study can be found in Appendix B.

PP top 50% -a percentile based indicator as the better alternative to the MNCS
Although the MNSC has been frequently used as indicator in bibliometrics, it has an important disadvantage: it uses the arithmetic average as a measure of central tendency, although distributions of citation counts are skewed (Seglen, 1992). As a rule, field-specific paper sets contain many lowly or non-cited papers and only a few highly-cited papers (Bornmann & Leydesdorff, 2017). Therefore, percentile-based indicators have become popular in bibliometrics, which are robust against outliers. According to Hicks et al. (2015) in the Leiden Manifesto, "the most robust normalization method is based on percentiles: each paper is weighted on the basis of the percentile to which it belongs in the citation distribution of its field (the top 1%, 10% or 20%, for example)" (p. 430). The recommendation to use percentile-based indicators can also be found in the Metric Tide (Wilsdon et al., 2015).
Against the backdrop of these developments in bibliometrics, and resulting recommendations in the Leiden Manifesto and the Metric Tide, we use the PP top 50% indicator in this study as the better alternative to the MNCS. Basically, the indicator is calculated on the basis of the citation distribution in a specific subfield whereby the papers are sorted in decreasing order of citations. Papers belonging to the 50% of most frequently cited papers are assigned the score 1 and the others the score 0 in a binary variable. The binary variables for all subfields can then be used to calculate the P top 50% or PP top 50% indicators. P top 50% is the absolute number of papers published by an entity (e.g. journal or institution) belonging to the 50% most frequently cited papers and PP top 50% the relative number. Here, P top 50% is divided by the total number of papers in the set. Thus, it is the percentage of papers by an entity that are cited above-average in the corresponding subfields.
The detailed explanation of how the PP top 50% indicator is calculated in this study can be found in Appendix B.

Comparison of citation counts, normalized citation scores (NCSs) and P top 50%
The normalization of citations only makes sense in economics if the normalization leads to meaningful differences between normalized scores and citations. However, one cannot expect complete independence, because both metrics measure impact based on the same data source.  Since Linnemer and Visser (2016) based their analyses on a different set of journals which is significantly smaller than our set, the differences are expectable.
The impact scores in Table 4 reveal that the papers are most frequently cited in the subfields with very different citation counts -between n=344 in "General Economics and Teaching" (A) and n=4627 in "Mathematical and Quantitative Methods" (C).
Correspondingly, similar NCSs in the subfields reflect different citation counts. The list of papers also demonstrate that papers are assigned to more than one economic subfield. The paper by Acemoglu, Johnson, and Robinson (2001) is the most cited paper in four subfields.
Since many other papers in the dataset are also assigned to more than one subfield, we considered a fractional counting approach of citation impact. The detailed explanation of how the fractional counting has been implemented in the normalization can be found in Appendix B. Table 4 provides initial indications that normalization is necessary in economics.
However, the analysis could not include P top 50% , because this indicator is primarily a binary variable. To reveal the extent of agreement and disagreement between all metrics (citation counts, NCS, and P top 50% ), we group the papers according to the Characteristics Scores and Scales (CSS) method, which is proposed by Glänzel, Debackere, and Thijs (2016). For each metric (citation counts and NCS), CSS scores are obtained by truncating the publication set at their metric mean and recalculating the mean of the truncated part of the set until the procedure is stopped or no new scores are generated. We defined four classes which we labeled with "poorly cited", "fairly cited", "remarkably cited", and "outstandingly cited" (Bornmann & Glänzel, 2017). Whereas poorly cited papers fall below the average impact of all papers in the set, the other classes are above this average and further differentiate the high impact area.  Table 5 (left side) shows how the papers in our set are classified according to CSS with respect to citations and NCS. 84% of the papers are positioned on the diagonal (printed in bold), i.e. the papers are equally classified. The Kappa coefficienta more robust measure than the share of agreement, since the possibility of agreement occurring by chance is taken into accounthighlights that the agreement is not perfect (which is the case with Kappa=1).
According to the guidelines by Landis and Koch (1977), the agreement between citations and NCS is only moderate.
The results in Table 5 show that 16% of the papers in the set have different classifications based on citations and NCS. For example, 13,843 papers are cited below average according to citations (classified as poorly cited), but above average cited according to NCS (classified as fairly cited). Two papers clearly stand out by being classified as poorly cited with respect to citations, but outstandingly cited with respect to the NCS. These are Lawson (2013) with 15 citations and an NCS of 7.8, and Wilson and Gowdy (2013) with 13 citations and an NCS of 6.8. There are also numerous papers in the set that are upgraded in impact measurement by normalized citations: 7,226 papers are cited above average (fairly cited) according to citations, but score below average according to NCR (poorly cited). 546 papers are outstandingly cited if citations are used; but they are remarkably cited on the base of the NCR, i.e. if the subfield is considered in impact measurement. Table 5 (right side) also includes the comparison of citations and P top 50% . Several papers in this study are fractionally assigned to the 50% most-frequently cited papers in the corresponding subfields and publication years (see the explanation in Appendix B). Since P top 50% is not completely a binary variable (with the values 0 or 1), we categorized the papers in our set into two groups: P top 50% <=0.5 and P top 50% >0.5 for the statistical analysis. Nearly all of the papers classified as poorly cited on the basis of citations are also cited below average on the basis of P top 50% . Thus, both indicators are more or less in agreement in this area. The results also show that many papers that are above average cited by P top 50% are classified differently by citations. On the one hand, these results are an indication that the indicator is able to level the skewness of citations in the above average area. On the other hand, 50,448 (26%) papers are classified as poorly cited on the basis of citations, but are above average cited on the basis of P top 50% .
Taken together, the results in Table 5 demonstrate that normalization leads to similar results as citations for many papers; however, there is also a high level of disagreement, which may bias the results of impact analyses in economics based on citations.

New field-and time-normalized journal ranking
The first economics journal ranking was published by Coats (1971) who used readings from members of the American Economic Association as ranking criterion. With the emerging dominance of bibliometrics in research evaluation in recent decades, citations have become the most important source for ranking journalsin economics and beyond. The most popular current rankings in economicsbesides conducting surveys among economistsare the relative rankings that are based on the approach of Liebowitz and Palmer (1984).
Bornmann, Butz, and Wohlrabe (in press) provide a comprehensive overview of existing journal rankings in economics.
Since funding decisions and the offer of professorships in economics are mainly based on publications in reputable journals, journal rankings should not be biased by different citation rates in economics subfields. Based on the NCS and the P top 50% for each paper in our set, we therefore calculated journal rankings by averaging the normalized paper impact across years. Figure   The alternative PP top 50% journal ranking is based on the premise that the impact results for scientific entities (here: journals) should not be biased by a few outliers, i.e. the few very highly-cited papers. Figure 2 shows the rank distribution of the journals on the basis of PP top 50% and the corresponding CIs. In contrast to the MNCS, we do not find any group of journals that is statistically significantly different from the others. Furthermore, the shape of the curve is less convex, and the curve slopes down almost linearly.
These results highlight that the PP top 50% journal ranking is less affected by outliers and reflects the majority of papers published in the journals more accurately than the MNCS ranking. The CIs for the journals in Figure 2 demonstrate that the accuracy of impact measurement is the lowest for journals in the middle rank positions (the CIs are comparably wide) and the highest for journals with the highest or lowest rank positions (the CIs are comparably small). We therefore used another (robust) method to classify the journals into certain impact groups and separate an outstandingly cited group. In section 4.1 we applied the CSS method to assign the papers in our set to four impact classes. Since the method can also be used with aggregated scores (Bornmann & Glänzel, 2017), we assigned the journals in our set to four impact classes based on PP top 50% . Table 9 in Appendix A shows all journals (n=294) with their assignments to the four groups: 145 journals are poorly cited, 79 journals are fairly cited, 40 journals are remarkably cited, and 30 journals are outstandingly cited. Table 6 shows the 30 economics journals in the outstandingly cited group. Additionally, three further journals are considered in the table. Their CIs include the threshold that separates the outstandingly cited journal group from remarkably cited journals. Thus, one cannot exclude the possibility that these journals also belong to the outstandingly cited group. The two top journals in Table 6  In order to investigate the stability of journals in the outstandingly cited group, we annually assigned each economics journal in our set to the four citation impact classes (following the CSS approach). Seven out of the 33 journals in Table 6 Table 6 are either classified as outstandingly or remarkably cited over the years.

Comparisons with other journal rankings
How is the PP top 50% journal ranking related to the results of other rankings in economics? The most simple form of ranking the journals is by their mean citation rate. The JIF is one of the most popular journal metric, which is based on the mean citation rate of papers within one year received by papers in the two previous years (Garfield, 2006). In the comparison with PP top 50% we use the mean citation rate for each journal. Since the citation window is not restricted to certain years in the calculation of PP top 50% , we consider all citations from publication year until the end of 2016 in the calculation of the mean citation rate.
The RePEc website (see www.repec.org) has become an essential source for various rankings in economics. Based on a large and still expanding bibliometric database, RePEc publishes numerous rankings for journals, authors, economics departments and institutions.
RePEc covers more journals and additional working papers, chapters and books compared to WoS (further details can be found in Zimmermann, 2013). For the comparison with the PP top 50% journal ranking, we consider two popular journal metrics from RePEc: the simple and the recursive Impact Factor (IF). The simple IF is the ratio of all citations to a specific journal and the number of listed papers in RePEc. The recursive IF also takes the prestige of the citing journal into account (Liebowitz & Palmer, 1984). Whereas the simple and recursive IFs are based on citations from the RePEc database, the citations for calculating the mean citation rates (see above) are from WoS.  (Landis & Koch, 1977). Thus, the results reveal that there is considerable agreement, but also disagreement between the rankings. The results in    Table 8 (see the parts with the second and third robustness checks). If the top-cited papers are excluded, all journals besides two are equally classified; the Kappa coefficient is correspondingly close to 1. The exclusion of lowly-cited papers leads to more journals, which are assigned to different classes; however, the Kappa coefficient is still very high at 0.86. According to the guidelines of Landis and Koch (1977) the agreement is almost perfect. The results in Table 8 also show that 20 journals are downgraded by one class, if lowly-cited papers are excluded. These journals suffer from the fact that the median is higher than in the complete set of papers. In the calculation of PP top 50% with the complete set, many papers only marginally passed the median.

Discussion
Field-and time-normalization of citation impact is the standard method in bibliometrics (Hicks et al., 2015), which should be applied in citation impact analyses across different time periods and subfields in economics. The most important reason is that there are different publication and citation cultures, which lead to subfield-and time-specific citation rates: for example, the mean citation rate in "General Economics and Teaching" decreases According to Li and Ruiz-Castillo (2014), the percentile rank indicator is robust to extreme observations. In this study, we used the PP top 50% indicator to identify those papers belonging to the above-average half in a certain subfield and time period. Besides focusing on the above-average half, it is also possible to focus on the 10% or 20% most frequently cited papers (PP top 10% or PP top 20% ). As the results of Waltman et al. (2012) show, however, the focus on another percentile rank is expected to lead to similar results. Besides percentiles, the use of log-transformed citations instead of citations in the MNCS formula has also been proposed as an alternative (Thelwall, 2017). However, this alternative has not reached the status of a standard in bibliometrics yet.
In this study, we calculated normalized scores for each paper. The normalization leads to similar impact assignment for many papers; however, there is also a high level of disagreement, which may lead to biased results of impact analyses in economics-based on citations. There are several cases in the data that demonstrate unreasonable advantages or disadvantages for the papers if the impact is measured by citation counts without consideration of subfield-and time-specific baselines. For example, we can expect that papers published in "History of Economic Thought, Methodology, and Heterodox Approaches" and papers published recently are systematically disadvantaged in research evaluations across different subfields and time (because of their low mean citation rates). By contrast, papers from "Financial Economics" and papers published several years ago are systematically advantaged, since more citations can be expected. Thus, we attach importance to the consideration of normalization in economic impact studies, which is strongly recommended by experts in bibliometrics (Hicks et al., 2015).
In this study, we introduce a new journal ranking, which is based on the state of the art in bibliometrics. According to Hicks et al. (2015) and The ideal way of assessing entities in science, such as journals, is to combine quantitative (metrics) and qualitative (peer review) assessments to overcome the disadvantages of both approaches each. For example, the most-reputable journals that are used for calculating the Nature Index (NI, see https://www.natureindex.com) are identified by two expert panels (Bornmann & Haunschild, 2017;Haunschild & Bornmann, 2015). The NI counts the publications in these most-reputable journals; the index is used by the Nature Publishing Group (NPG) to rank institutions and countries. To apply the ideal method of research evaluation in economics, peer review and metrics should be combined to produce a list of top-journals in economics: a panel of economists uses the list from our study with about 30 outstandingly cited journals and rates them according to their importance in economics. Ferrara and Bonaccorsi (2016) offer advice on how a journal ranking can be produced by using expert panels.
In this study we produced a comprehensive dataset with normalized scores on the paper level. We used the dataset to identify the most frequently cited papers and journals. The dataset can be further used for various other entities in economics. The most frequently cited researchers, research groups, institutions, and countries can be determined subfield-and timenormalized. On the level of single researchers, we recommend that the normalized scores should be used instead of the popular h index proposed by Hirsch (2005). Like citation counts, the h index is not time-and subfield normalized. It is also dependent on the academic age of the researcher. Thus, Bornmann and Marx (2014a) recommended calculating the sum of P top 50% for a researcher and dividing it by the number of his or her academic years. This results in a subfield-, time-, and age-normalized impact score. In future studies, we will apply citation impact normalization on different entities in economics. It would be helpful for these studies if normalized impact scores were to be regularly included in RePec, although it is a sophisticated task to produce these scores.
Appendix A Table 9. Descriptive statistics for the journals included in this study and journal rankings based on the mean normalized citation scores (MNCS) and the share of the 50% most frequently cited papers (PP top 50% ). The

Calculation of the Mean Normalized Citation Score (MNCS)
For the calculation of the MNCS, each paper's citations in a paper set (of a journal, researcher, institution, or country) are divided by the mean citation impact in a corresponding reference set; the received NCSs are averaged to the MNCS. where c i is the citation count of a focal paper and e i is the corresponding expected number of citations in the economic subfield (JEL code). The MNCS is defined similar to the item-oriented field-normalized citation score average indicator (Lundberg, 2007;Rehn, Kronman, & Wadskog, 2007). Since citation counts depend on the length of time between the publication year of the cited papers and the time point of the impact analysis (see Table 3), the MNCS is separately calculated for single publication years.
It is a nice property of the MNCS that it leads to an average value of 1. However, this is only valid in a paper set (with papers from one year) if each paper is assigned to one field.
However, many of the field classification systems (e.g. JEL codes) assign papers to more than one field. Table 10 shows a simple example that illustrates the problem with the multiassignment of papers. Paper number 5 is assigned to two fields. The obvious solution for the calculation of the NCS would be to calculate an average of two ratios for this paper: ((9/10.67)+(9/8.5)/2)=0.95. However, this solution leads to an average value of greater than 1 (1.01) across the five papers in Table 10. In order to solve this problem, Waltman et al. (2011) propose the following two calculations, which ensure a mean value of 1 (see Table 11): (1) The expected number of citations for field X is calculated as follows: (20+3+(9*0.5))/(1+1+0.5)=11. Thus, the citations of paper 5 are fractionally counted; the calculation for field Y is correspondingly: (8+(9*0.5))/(1+0.5).
(2) The NCS for paper 5 also considers its fractional assignment to two fields and is calculated as follows: (9/11*0.5)+(9/8.33*0.5). Both calculations lead to the desired property of the indicator that it results in a mean value of 1 across all papers in a field although the papers might be assigned to more than one field.  Table 12 uses an example dataset to demonstrate how the PP top 50% indicator is calculated. Basically, the indicator is generated on the basis of the citation distribution in a field (here: field A) whereby the papers are sorted in decreasing order of citations. Papers belonging to the 50% most frequently cited papers are assigned the score 1 and the others the score 0 in a binary variable. The binary variable can then be used to calculate the P top 50% or PP top 50% indicators. P top 50% is the absolute number of papers published in field A belonging to the 50% most frequently cited papers (here: 5) and PP top 50% the relative number whereas P top 50% is divided by the total number of papers (5/10*100=50). If a journal (here: journal X) had published 5 papers from field A (and no further papers in other fields), P top 50% = 3 and PP top 50% = 60% (3/5). The PP top 50% indicator is concerned by two problems, whereby the solution for the first problem is outlined in Table 13. Citation distributions are characterized by ties, i.e. papers having the same number of citations. The ties lead to problems in identifying the 50% most frequently cited papers, if the ties concern papers around the threshold of 50% in a citation distribution. In Table 13, the 7 papers with 20 citations can be clearly assigned to the 50% most frequently cited papers and the 5 papers with 0 citations to the rest. However, this is not possible for the 6 papers with 10 citations; they cannot be clearly assigned to one of both groups. Waltman and Schreiber (2013) propose a solution for this problem, which leads to exactly 50% most frequently cited papers in a field despite the existence of papers with the same number of citations (around the threshold). We explain their solution using the example data in Table 13.

Calculation of the percentile based indicator: PP top 50%
. Each of the 18 papers in field B represents 1/18=5.56% of the field-specific citation distribution. Hence, together the 7 papers with 20 citations represent 7*5.56%=38.92% of the citation distribution, the 6 papers with 10 citations represent 6*5.56%=33.36% of the citation distribution, and the 5 papers with 0 citations represent 5*5.56%=27.8%. We would like to identify the 50% most frequently cited papers, whereby the 10 papers with 10 citations are still unclear. Waltman and Schreiber (2013) fractionally assign these papers to the 50% most frequently cited papers, so that we end up with 50% 50% most frequently cited papers.
The 7 papers with 20 citations cover 38.92% of the 50% most frequently cited papers.
The rest (50%-38.92%=11.08%) needs to be covered by the 10 papers with 10 citations. In order to reach this goal, the segment of the citation distribution covered by the papers with 10 citations must be split into two parts, one part covering 11.08% of the distribution, the other part covering the remaining 33.36%-11.08%=22.28%. This other part (22.28%) belongs to the bottom 50% of the citation distribution. Splitting the segment of the distribution covered by papers with 10 citations is done by assigning each of the 6 papers to the 50% most frequently cited papers with a fraction of 11.08%/33.36%=0.33. The value 11.08% represents the share of the papers with 10 citations, which belong to the 50% most frequently cited papers; 33.36% is the percentage of papers in the field with 10 citations.
In this way, we obtain 50% 50% most frequently cited papers, since ((0.33*6)+7)/18 equals 50%. There are 6 papers in the field with 10 citations, which are fractionally assigned to the 50% most frequently cited papers, and 7 papers with 20 citations that clearly belong to the 50% most frequently cited papers.
Fehler! Ungültiger Eigenverweis auf Textmarke. shows an example that reveals the second problem with the PP top 50% indicator: papers are assigned not only to one, but to two or more fields. The example in Table 14 consists of 26 papers whereby 1 paper (see the grey shaded lines in the table) belongs to two fields. In these cases, the papers in multiple fields are fractionally counted for the calculation of PP top 50% following the approach of Waltman et al. (2011).
We explain the approach using the example in Table 14. Since 1 paper in the table belongs to two fields (B and C), it is weighted by 0.5 instead of 1 (the other papers in the sets which belong to one field each are weighted with 1). This leads to 15.5 papers in field B and 10.5 papers in field C.
In field B, the papers with 20 citations represent 29.03% of the citation distribution (4.5/15.5), the papers with 10 citations 38.71% (6/15.5), and the papers with 0 citations 32.26% (5/15.5). Thus, the papers with 20 citations cover 29.03% of the 50% most frequently cited papers. The rest with 20.97% (50%-29.03%) should be covered by the 6 papers with 10 citations. Splitting the segment of the distribution covered by papers with 10 citations is done by assigning each of the 6 papers to the 50% most frequently cited papers with a fraction of 20.97%/38.71%=0.54. Thus, we obtain 50% 50% most frequently cited papers since ((0.54*6)+4.5)/15.5 equals 50%. In field C with a total of 10.5 papers, we have 3 papers with 50 citations (28.57% of the citation distribution), 4.5 papers with 20 citations (42.86% of the distribution), and 3 papers with 10 citations (28.57%). 21.43% of the citation distribution (50%-28.57%) should be covered by the papers with 20 citations: 21.43%/42.86%=0.5. We receive the value of 50% with ((0.5*4.5)+3)/10.5. In Table 15, the data from Table 14 are used to transfer the calculations for two different fields (B and C) towards a small world example in which only two journals exist (Y and Z) publishing all the papers in fields B and C. Journal Y has published 16 papers and