Introduction

The U.K. has enjoyed an extended period during which its international comparative research performance, as indexed by annual average Category Normalized Citation Impact (CNCI), rose to place it above even the U.S. among the G7 group of large economies. In government reports, this was seen as a reflection of successful research policies, particularly those for cyclical university research assessment. In this paper we show that an equally plausible explanation of rising CNCI, for the U.K. and for other nations, is that it was an inevitable side-effect of rising international research collaboration.

Many countries have adopted processes for the assessment and evaluation of the public sector research base, with and without financial rewards, as a policy lever for encouraging more impactful research. An early example of such a policy was the U.K.’s Research Assessment Exercise (RAE). This was initiated experimentally in 1986 and developed into a fully structured cyclical process from 1992 onwards, with formal subject panels reviewing research portfolios from all publicly funded universities. The outcome of each panel assessment was a grade, which not only identified relative performance but also acted as a weighting factor in core funding to the institution.

The U.K. experiment appeared to be successful and was emulated by other countries. Not only did the RAE enable greater selectivity in the distribution of research funds but it also appeared to be associated with a change in the trajectory of U.K. research impact. Impact, as indexed by citations to journal articles, had fallen during the 1980s by comparison with global averages but began to rise from the mid-1990s onwards (Adams, 1998, 2002) and the U.K. has recently had the highest relative impact among the large research economies.

Research assessment brings public policy to bear on research ‘performance’ (Gläser & Laudel, 2016) and the implied link between them should prove testable. Adams (2002; Adams & Smith, 2002) drew attention to the rise in the U.K.’s relative citation impact in the 1990s although Elton (2002) indicated deleterious consequences. Geuna and Martin (2003) examined such exercises across 12 countries and concluded that policy benefits did indeed outweigh costs, at least initially, but that cyclical evaluation produced diminishing returns. Barker (2007) suggested that the U.K. RAE process led to greater coordination of research around traditional disciplinary concerns and inhibited applied research. Moed (2008) saw detailed changes in researcher response, from quantity to quality output as assessment criteria were modified.

These studies provide no conclusive answer, however, to the question of whether assessment as a policy contributes to improvements in national research quality, at least as indicated by scientometric analysis. A key challenge is that there is no possibility of a controlled trial in which some countries (or disciplines) are subject to assessment while others are not, all other policy and funding factors being constant. Thus, where research performance appears to have improved, the tendency is towards uncritical positivism.

In the absence of experiment, a comparative approach offers a route forwards. The testable hypothesis is that the inception and process of formal research assessment cycles subsequently lead to relative improvement in national research performance. If true, then we should predict: first, a degree of within-country synchrony, in that a nation’s research performance indicators should rise at or soon after the introduction of the policy; and, second, a degree of between-country diversity, in that this change should be seen in countries with this policy and not elsewhere and not at the same time.

Change within country A might be rapid in some parts, where researchers are well informed and agile, though possibly lagging elsewhere. However, between countries A and B there is no reason to expect A’s response to policy to affect B’s research base. Hence, we can look for ‘reasonably’ close policy/response timing within-country coupled with asynchrony across countries.

Why might international synchrony in research parameters nonetheless arise? Wagner and Leydesdorff (2005) suggested that research collaboration has enabled a global research network to evolve, with self-organizing properties transcending national research policy. Easier communication and travel in the 1990s started a tide of international engagement boosted by the further proliferation of the Web after 2000 and the consequent expansion of international collaboration. The 1980s publication output of the U.K. was 90% domestic, with no international co-authors; by 2020, more than 70% of U.K. papers had a co-author from another country. Similar networks grew across Europe while international collaboration for the U.S. rose from 5 to 40% of research papers. (Adams, 2013).

The network hypothesis was endorsed by Adams and Szomszor (2022) in an analysis of research output, bilateral and multilateral collaboration, subject diversity and citation impact over four decades from the 1980s to present. Data collated from Clarivate Web of Science™ for the G7 and BRICK groups of countries and 26 other nations revealed that change in national indicators could be associated with changes in international research collaboration and, specifically, that collaboration mode shifted from bilateral to multilateral at more or less the same time across countries.

This analysis did not explain these similarities in patterns and timing across countries, nor why this occurred despite strategic diversity. For example, the U.K. instituted the RAE in the 1990s while Hong Kong, New Zealand and Australia did not do so until the 2000s. Canada has an intensive analytical impact assessment in education and health but not more generally. Other leading research economies—such as Germany and the U.S.—have no regular national assessment. There is even more diversity than this in national approaches to institutional and project funding for research, employment practices, links between research and teaching, and policies on collaborative research and international programs. Yet, despite such policy divergence, the observed patterns of change are strikingly similar.

In this paper, we review the degree of synchrony between policy and CNCI change for three exemplar countries, then move to review change across a larger set of countries. Finally, we deconstruct citation impact and analyse its components for an explanation of the links between impact and collaboration.

Method

This study uses the annual average Category Normalized Citation Impact (CNCI) as an indicator of ‘research performance’. Citation counts are a conventional indicator used by scientometric analysts. Serial independent investigations confirm the generic positive relationship between peer judgments of research quality and average citation indicators (Adams, 2007; Thelwall et al., 2023; Waltman & Van Eck, 2015).

The cumulative citation count for a research paper rises over time after publication: for papers published in the approximately 20,000 journals indexed in Clarivate’s Web of Science™ this means that a 2013 publication has an all-fields average of 29 citations whilst a 2020 publication averages only 14. The rate of increase is dependent on the research field (i.e., journal subject category): for example, a 2013 Biochemistry paper averages 38 citations while an average Agriculture paper of the same year has accumulated only 24. Citation counts are typically higher for reviews than articles. For these reasons, raw counts are moderated to account for inherent variations. CNCI is adjusted by dividing the actual citation count for each paper by the relevant world average for subject, publication year and document type (Szomszor et al., 2021).

Reference lists have tended to become longer in recent years (Nicolaisen & Frandsen, 2021), though some journals (e.g., Nature) have explicitly limited this. Rising raw citation counts could be interpreted as having a ‘grade inflation’ effect and possibly other disruptive outcomes. Nonetheless, this will not affect the value of a normalized citation indicator, since the normalization explicitly reduces every citation count to the relevant global average. The ways in which these additional references are distributed to earlier literature as citations may, however, affect the spread of high and low cited outliers.

Whole counting was used throughout when the citation count of each paper was normalized against the world average for its relevant document type, journal category and publication year. We recognise that there are sound arguments in favour of fractional partitioning of output and impact for some types of analysis (Potter et al., 2020; Waltman & Van Eck, 2015). In the context of this study, such methods would obscure the raw, source data on which we focus. Note that although some papers have authors from many countries, and thus count once in each national total, they also count only once in the combined global total and do not skew the normalization.

Analysis of national synchrony

We analyzed the timing of the initial and subsequent cycles of university research assessment in each of three countries and compared this timetable with the respective changes in average national citations per paper.

Three countries (the U.K., New Zealand and Australia) have at different times instituted cyclical research assessment modelled on the UK’s RAE in their similarly structured university systems. New Zealand introduced the Performance Based Research Fund (PBRF) in 2003 and Australia introduced Excellence for Research in Australia (ERA) in 2010. Each later introduced socio-economic impact elements as the U.K. did in the Research Excellence Framework (REF) in 2014. There is thus a time-spread of inception.

Analysis of international synchrony

To explore national indicators of research performance in the context of the international network, we downloaded the Clarivate Web of Science™ publication dataset for a large set of countries. We examined in detail the deconstructed data for a subset of 38 of those countries to explore changes in the domestic and international components, and then the changing global association between international collaboration and overall impact.

The dataset repeats the analysis of Adams and Szomszor (2022) with an updated data download that enabled deconstruction of national papers into domestic (no international co-author), internationally bilateral (one or more authors from the country of interest and one collaborating country) and internationally multilateral components.

The analysis covers the period from 1981 to 2018, thus including years prior to the appearance of the Internet. It refers solely to research articles and reviews (which are deemed to be substantive and original academic papers) and for which the available publication data are comprehensive and consistently indexed.

The publication data were sourced for 38 relatively prolific research economies covering a broad geographical spread. The selected countries accounted for 35,666,890 (92%) of the 38.8 million papers indexed globally in the database over the period, rising from 87% in 1981 to 95% recently (Table 1).

Table 1 Countries covered in the analysis

Assignment to country was based on author addresses. Whole counting of papers was used, where each paper is assigned once to each country given in an author affiliation. Each country’s indexed papers were counted in total and then deconstructed by collaboration mode: domestic; bilateral; and multi-lateral. Papers were counted and analysed separately by collaboration mode.

Deconstruction of national average CNCI

It has long been known that internationally co-authored papers have a higher average citation count than comparable domestic papers (Narin et al., 1991). There is thus an interaction between collaboration and indexed citation impact. For this reason, in this study, the percentage (or share) of national papers that were internationally collaborative was indexed, and citation impact was analyzed for three sub-sets for each country: nationally as a whole; for domestic papers only; and for internationally collaborative papers.

For each country, the percentage of national papers that had international co-authors was plotted alongside the three values of CNCI (national average, domestic with no international co-authors, internationally collaborative). Since it is evident from prior work that international collaboration does influence citation impact, we then analyzed the correlation between annual average national CNCI and the annual percentage of papers that had an international co-author for all countries in our dataset.

Results

National synchrony

The influence of cyclical research exercises and of international collaboration on citation impact should be visible if the time series of collaboration and impact is plotted against a timeline of policy events.

The time series of national average CNCI plotted against the dates of the first and subsequent research assessment cycles in the U.K., New Zealand and Australia fails to reveal any general concordance between the national inception of an assessment policy and the rise of average national CNCI in each country. (Fig. 1).

Fig. 1
figure 1

Annual values of Category Normalized Citation Impact and the percentage of papers that have an international co-author, for (a) Australia, (b) New Zealand and (c) the U.K. The timeline in each figure also marks the dates of significant research policy events: For Australia, the Dawkins reforms of the 1990s (orange) and cycles of Excellence in Research for Australia; for New Zealand, cycles of the Performance Based Research Fund; and for the U.K., cycles of the Research Assessment Exercise

For Australia, a fall in national CNCI in the 1980s appears to be reversed after the Dawkins reforms to that country’s higher education structure and research funding (Dawkins, 1988: chapter 9). Impact rises more steeply in the 2000s, prior to the first cycle of ERA in 2010. It parallels the rise in international collaboration and the change in gradient can be associated with a growth of multilateral rather than bilateral publications.

For the U.K., there is similarly a decline in CNCI in the 1980s followed by an upswing (attributed to the introduction of research assessment) but the first two assessment cycles, in 1986 and 1989, saw continuing decline and it was only in 2005 that U.K. CNCI regained the level from which it had fallen twenty years earlier. The introduction of the revised REF after 2008 is associated with a relative CNCI plateau.

For New Zealand, CNCI rises rather later than for the other countries despite the country’s similar profile of international collaboration. It would not be unreasonable in this instance to associate the upward inflexion in the curve with the inception of the PBRF in the mid-2000s. New Zealand is thus the only one of these three countries where an association between policy and response can reasonably be claimed, whereas such a claim would not be sustainable for the other two.

International synchrony

A summary analysis of the trajectory of average CNCI for the G7 and BRICK nations reveals a rising profile for every nation except the U.S., irrespective of their policy on research assessment (Fig. 2).

Fig. 2
figure 2

Average national Category Normalized Citation Impact for the G7 major economies and the BRICK group of emerging research economies. This has risen over the last forty years and CNCI for the G7 has converged on a value well above world average

This graph reveals a core problem for long-term global analysis as an informant of public policy. Everybody except the U.S. ‘gets better’ between 1981 and 2021 and most big research economies pass the benchmark of world average (= 1.0). If all the G7 except Japan have a CNCI of 1.3 or higher, and China—now the most productive economy in terms of indexed research publications—has a CNCI rising through 1.1, then there is a conundrum as to what part of global research might remain below world average?

To explain the apparent conflict, between a world average exceeded by the average of national totals that make up more than half global output, we need to refer to the historical observation that domestic papers are cited less frequently (on average) than internationally collaborative papers (Narin et al., 1991). The ‘pool’ for each country has (i) its own domestic output (typically around 30–40% of total papers) plus (ii) its shared international papers. These shared papers are reported again in its partners’ tallies. The world ‘pool’ only has one deduplicated set of the collaborative papers (single set of all (ii)) plus all the less well-cited domestic pots (sum of all (i)). Thus, invisibly in the arithmetic, the world average is depressed and the national headline CNCI is boosted.

From a bibliometric perspective, a technical solution to the problem captured in Fig. 2 would be to apply fractional attribution of both papers and citations (Waltman & van Eck, 2015). In this study such a move would be problematic from the perspective of examining policy, however, because it obscures interactions between results for sub-sets of papers, thus hiding emergent management implications. CNCI can be analysed in collaborative components (Potter et al., 2020) and we refer to this in Discussion.

Deconstruction of national average CNCI

When international collaboration increases, the national mix of domestic and international papers changes. At a deconstructed level, what is the effect within a single nation’s publication portfolio? For the U.K., an example noted earlier, international collaboration increased as a share of total national publication output while the absolute volume of domestic output remained static. In consequence, a rise in the average U.K. CNCI should have been highly predictable.

It is also the case that bilateral papers are cited less frequently (on average) than multilateral papers (Potter et al., 2020). For example, when a U.K. researcher co-authors with a colleague in France then the typical CNCI is higher than the U.K. average. However, while it is clearly higher for ALL UK-France collaborations, including third parties, it is not much higher for papers that are only bilateral (Adams & Gurney, 2018). Since multilateral collaboration became universally more common from the mid-2000s (Adams & Szomszor, 2022) this would potentially have boosted a further ‘uplift’ in average CNCI.

The deconstruction and separate tracking of domestic and collaborative components produces a novel picture. Using the U.K. again as an example, the average CNCI of each part of the national portfolio hardly changes at all: U.K. domestic CNCI drifts down slightly (from around 1.2 to around 1.05, annual average 1981–2020); average CNCI from collaborative papers falls in the 1980s (from 1.7 in 1981–85 to 1.5 in 2000–04) but is restored and then boosted to 1.64 (roughly where it started) with the expansion of multilateral papers. The ‘decline’ in U.K. impact during the 1980s, which research assessment was believed to have addressed, may potentially have been a consequence of bilateral partnering, then the predominant mode, with countries that added little to a high existing national average. (Fig. 3).

Fig. 3
figure 3

UK net average CNCI appears to have risen since the 1990s but the specific averages for its domestic papers and its internationally collaborative papers has changed very little over forty years. The percentage share of UK research papers that have an international co-author has risen over time which suggests that the overall UK average CNCI is driven by the rising share of international and thus more highly cited papers

By analyzing the components in the mix separately we are able to interpret the source of the headline change in average CNCI. The U.K. national average CNCI is in reality an artefact: a product of evolving international collaboration, not a benefit derived from policies that have barely maintained domestic CNCI at par.

This analysis was repeated on other countries. It then becomes apparent that while there have been some marginal improvements in the separate domestic and collaborative CNCI components, the principal shift in overall CNCI is linked to rising collaboration and national average shifts from similarity to domestic CNCI and towards similarity to collaborative CNCI.

A clear link between research performance and national policies is difficult to discern. For example, Germany has no national assessment systems but has undergone major institutional restructuring: there is little change in average domestic CNCI and only slight improvement in collaborative CNCI. The Netherlands has institutional research assessment, but no national cycle; while collaborative CNCI increased after 2001, its domestic CNCI is declining. Australia instituted ERA in 2009: like the UK, collaborative CNCI fell until 2000 and then rose marginally whilst domestic CNCI drifted lower. The USA has no institutional assessment: domestic CNCI has fallen markedly and collaborative CNCI is gradually declining. South Korea has no consistent assessment cycle: its domestic CNCI is static but its collaborative CNCI has risen as international collaboration has increased, perhaps driven by the highest GERD/GDP ratio in the G20. (Fig. 4).

Fig. 4
figure 4

For six countries: the average annual CNCI for all papers; CNCI for purely domestic papers and for papers with an international co-author; and the annual percentage of papers that have an international co-author

China is the only country where both domestic and collaborative CNCI are rising: it has a four-yearly discipline assessment system that covers both teaching and centralized research performance monitoring. It may be the case that exceptional investment in and expansion of its research base is raising its profile, domestic standards and international recognition.

Citation impact and subject diversity was tracked in each of 33 countries for which we have detailed data (this was not done for five smaller countries: see also Adams et al., 2020). Collaboration expanded across 40 years, and annual average national CNCI was compared with the percentage of papers that were internationally collaborative in that year. This showed that initially, in the 1980s, CNCI had no dependence on the balance of domestic and collaborative research. Median CNCI in this set of countries rose as collaboration increased globally across the next four decades. As it did, so an increasingly significant correlation between national CNCI and national collaboration levels emerged, suggesting a growing dependency of national citation impact upon international collaboration (Fig. 5).

Fig. 5
figure 5

Median value of average national CNCI among 33 countries rises over time, during a period of rising international collaboration. As the percentage of papers that are internationally collaborative rises, the correlation between citation impact and collaboration increases and becomes statistically significant

Discussion

We suggested that the link between research assessment policy, research base response and research performance would be reflected in two ways: a degree of within-country synchrony, in that a nation’s research performance indicators should rise at or soon after the introduction of the policy; and a degree of between-country diversity, in that this change should be seen in countries with this policy and not elsewhere and not at the same time.

The analyses in this paper support neither of these propositions. Citation impact changed at much the same time across countries irrespective of assessment (and other) policies. That change was associated with increasing international collaboration. Collaborative papers have higher citation impact and it is evident that this lifted the national averages. Consequently, citation impact is now globally and significantly correlated with collaboration.

We found only weak evidence that research assessment systems influence national research performance. New Zealand appears to respond when assessment is introduced but change in Australia and the U.K. is equally, or more, synchronous with changes in collaboration. (Fig. 1).

The citation impact of all countries, except the U.S., has risen over the last four decades, the major G7 economies now converge on a similar value, and the BRICK economies are also rising towards and above world average (Fig. 2). The sum of country activity with a CNCI index above world average makes up the larger part of the world total. This apparent anomaly, where most components exceed a collective average, is explained by the lower citation counts for unique, purely domestic papers and the higher citation counts for shared, internationally collaborative papers.

When publication and citation data are deconstructed, it becomes apparent that the average CNCI of the U.K.’s purely domestic papers (1.18 in 1981–85 and 1.06 in 2017–21) and of its internationally collaborative papers (1.70 earlier and 1.64 latterly) did not change markedly over the period from 1981 to 2020. The apparent rise in the U.K. average national CNCI (from 1.25 in 1981–85, dipping to 1.15 in 1988 and then rising to 1.42 in 2017–21) was derived primarily from the changing balance of domestic and collaborative papers as international collaboration grew from 10 to 70% of annual output while domestic output stagnated. (Fig. 3).

Similar patterns of CNCI change at gross national level accompanied by stasis at deconstructed domestic and internationally collaborative levels can be seen in other countries, with the exception of China where citation impact has improved substantially across components. National gains in gross CNCI emerge almost entirely from increased international collaboration and shared papers. (Fig. 4).

As global collaboration has risen, so national CNCI values have become more closely dependent on that level of collaboration. There was little association in the 1980s but the correlation across countries is now highly significant. (Fig. 5).

These data, particularly the relatively synchronous convergence irrespective of differences in the timing or direction of national research policies, support Wagner and Leydesdorff’s (2005) concept of a global research network. The network exhibits implicitly self-organising properties, probably building on a consensus view of ‘good practice’ in science emerging from concepts of an ‘invisible college’ in research (Crane, 1972).

While it is certainly true that many domestic science policies favoured, and some explicitly promoted, collaboration—seen as a route to gain access to both resources and a greater knowledge pool—cross-national growth was at similar rates irrespective of local policy and incentive. Adams and Szomszor (2022) show that collaboration shifted from bilateral to multilateral at about the same time across countries. For the G7, there was a common trajectory of rising bilateral co-authorship until around 2000 when it flattened for most countries at around 25–30% of total output. U.S. bilateral collaborations started from a lower point and continued to grow until it too reached this band.

Multilateral collaboration was initially scarce. It increased throughout the period at a similar rate for the G7 countries, so the most collaborative in 1981 remain so in 2018. An inflection to a slightly slower growth rate after 2000 is evident but, unlike bilateral collaboration, the curves never flatten. The general pattern is repeated for all but China, which has the least multilateral collaboration and a greater proportion of bilateral relationships.

Too many research activity indicators are gross averages for a distribution that is both skewed and composed of units with multiple and differing attributes. We refer the reader to the Collab-CNCI approach of Potter et al. (2020, in press), which compares domestic papers with domestic and collaborative with collaborative, to explore the effects we have described.

We also note arguments that the global expansion of reference lists may have pushed up citation rates and hence could influence outcomes such as those described here. This might potentially be an alternative explanation for the outcomes but is addressed by the methodology we used since the global average in each year is the normalizing factor and hence brings the individual paper and aggregate national averages back to a common benchmark.

What are the implications for national research policy? A failure to understand what happens when international collaboration grows leads to errors not only in the substance of reporting but also in the management decisions and policy developments based on those reports. Analysis and plans based only on coarse averages of a country’s performance are evidently flawed.

There appears to be a frequent failure among policy-makers and research managers to appreciate that change in both the national research system and the global context needs to be understood and interpreted in detailed profiles and not only at a headline ‘average’ level. The research volume, diversity and impact of the G7 and, increasingly, other countries are becoming pervasively bound to and dependent on one another.