Introduction

The measurement and analysis of research collaboration by using co-authorship links in scientific papers have been practiced in bibliometric research for over 40 years. Aggregate statistics can be derived showing the development of collaboration patterns among institutions and countries within the global network. At these levels, indicators of research collaboration have proved useful for national policy analysis and organizational strategic development.

In Fuchs et al. (2021), we recently reviewed the literature on collaboration indicators and found reason to modify an often-used indicator that was first introduced in the article entitled “Understanding patterns of international scientific collaboration” by Luukkonen et al. (1992). In this article, the authors define the relative importance of country Y for country X as:

$$\frac{{\left( {\frac{{C_{X,Y} }}{{C_{X} }}} \right)}}{{\left( {\frac{{C_{Y} }}{{T - C_{x} }}} \right)}} = \frac{{\left( {\frac{{C_{X,Y} }}{{C_{Y} }}} \right)}}{{\left( {\frac{{C_{X} }}{{T - C_{X} }}} \right)}}$$
(1)

A collaboration is counted each time author addresses in two different countries appear in the same publication. CX,Y denotes the number of collaborations between countries X and Y; CX is the total number of collaborations country X has with all other countries, CY is the total number of collaborations country Y has with all other countries, and T is the total number of pairwise country collaborations in the set of publications under study.

As shown in Rousseau (2021), this formula may lead to inconsistent results. We presented a modification that leads to a more logical solution in Fuchs et al. (2021), see also (Fuchs & Rousseau, 2021) for a practical approach. A new collaboration indicator Relative Intensity for Collaboration (RIC) was introduced as:

$${\text{RIC}}\left( {X,Y} \right): = \left( {\frac{{C_{XY} }}{{C_{X} }}} \right) \bigg/\left( {\frac{{C_{Y} - C_{XY} }}{{T - C_{X} }}} \right) = \frac{{C_{XY} \cdot \left( {T - C_{X} } \right)}}{{C_{X} \cdot \left( {C_{Y} - C_{XY} } \right)}}$$
(2)

We refer to this formula as the relative intensity of collaboration of actor X to actor Y (in that order), actor X being the focus. We see that RIC(X,Y) ≥ 0, but that there is no theoretical upper limit. This indicator is an asymmetric (the role of country X differs from the role of country Y) and a relative indicator. The main reason for preferring RIC over the indicator from 1992 (formula (1)) is that, ceteris paribus, when the collaboration between X and Y increases, RIC increases, which is not always the case for formula (1).

To demonstrate the properties of RIC with the use of real-world data, we used a matrix with collaboration frequencies among the 20 largest contributing countriesFootnote 1 to journals indexed by the Web of Science in the years 2000–2020. Figure 1 provides an example where RIC is used to measure the relative intensity over time of China’s collaboration with five major collaborating countries within the matrix of 20 countries.

Fig. 1
figure 1

The relative intensity of collaboration (RIC) over time from the perspective of China in relation to five major collaborating countries. China is X in RIC(X,Y)

The patterns seen in Fig. 1 are related to the growth of China within the global network. Ten years ago, China surpassed the United Kingdom as the USA’s largest collaboration partner in science. The USA now has twice as many collaborations with China as with the UK. However, the relative intensity of collaboration between the USA and China has been declining since it peaked in 2016. China is seeking other partners, among them the UK. This recent development is only visible with an indicator of the relative intensity of collaboration. The decreased relative intensity of collaborations between China and the USA is possibly reflecting COVID-19 travel restrictions and the deteriorating relations between the two countries since 2018 (Tang et al., 2021; Zweig, 2021).

The policy interest in the patterns observed in Fig. 1 is obvious. The observations lead to other questions that may also be of policy interest: Is China developing a more balanced global collaboration profile as the intensity in relation to the USA decreases? And what happened to the balance of the global collaboration profile of the USA as China became the major collaboration partner?

This article introduces a new indicator to measure the balance in collaboration profiles (BIC). As a first step towards calculating the BIC, we prefer using the RIC as applied on a matrix of collaboration frequencies within a network, but similar probabilistic affinity indices used to study collaboration (Chinchilla-Rodríguez et al., 2021) may provide the basis as well. Then the weighted Lorenz curve is applied to measure the distances from a balanced relation. This procedure results in a weighted Gini index between 0 and 1. As the Gini index is widely used to measure inequality in the distribution of income across a population where 0 refers to perfect equality, we suggest turning it around and using the Gini evenness measure to express perfect balance (at 100%). This leads to the following definition.

Definition

The balance of collaboration, denoted as BIC(L), of a country L with respect to a group of other countries, is equal to the evenness Gini index of the weighted Lorenz curve of collaboration, expressed as a percentage.

We explain further how such a weighted Lorenz curve of collaboration can be constructed and how the corresponding Gini evenness measure is calculated. The BIC can be used to compare actors in a network and how their collaboration profiles develop over time. We will demonstrate how the BIC is constructed with the example of the calculations of RIC in China’s relations with 19 other countries with the same data as used in Fig. 1. The properties of the indicator will be discussed step by step. We end the analysis by presenting an example of BIC scores for China, the UK, and the USA based on the development within the same network during the period 2000–2020.

Methods

Introducing a weighted Lorenz curve of collaboration

Consider a set V of publications, representing e.g., a scientific field, originating from different countries. Moreover, we assume that each country in the set has at least one internationally collaborated publication. We focus on country L and count the number of times country L has publications, i.e., collaborated with each other country in the set. This is the number of collaboration links of each other country with country L. Although we prefer fractional counting [and there exist several options here, see e.g., (Rousseau & Zhang, 2021)], at this point it does not matter if one counts the number of publications using whole counting or using fractional counting. For simplicity, we formulate our theory in terms of whole counting. We recall that whole counting refers to participating, while fractional counting refers to the actual contribution to collaborative work.

If there are N other countries in the set V, this leads to a finite row S = (s1, …, sN). If whole counting is used, sj denotes the number of collaboration links between country L and country Lj, j = 1, …, N. Some components of S may be zero, but that is allowed in our approach.

Next, we count for each country, except L, the number of articles resulting from international collaboration (articles with more than one country in the byline) with countries different from L. Hence, for the other countries, we exclude collaborations with country L. Again, this can be done with whole numbers or fractionally (the fractional contribution of country L in each collaborated article). This leads to a finite row R = (r1,r2,….rN). Indices must correspond of course: the index j in arrays S and R must refer to the same country. Because we assumed that each country has at least one internationally collaborated publication it does not occur that for a country the r- as well as the s-value is zero. It is possible though that one of the two values is zero. The s-value is zero if the country does not collaborate with country L, its r value is zero if this country only collaborates with country L.

Now the arrays S and R are normalized, leading to the arrays A and W with coordinates:

$$a_{j} = \frac{{s_{j} }}{{\mathop \sum \nolimits_{m = 1}^{N} s_{m} }}\;\;{\text{and}}\;\;\;w_{j} = \frac{{r_{j} }}{{\mathop \sum \nolimits_{m = 1}^{N} r_{m} }}$$
(3)

We note that \(\mathop \sum \nolimits_{m = 1}^{N} s_{m}\) is equal to the total number of collaboration links between country L and all other countries. The sum \(\mathop \sum \nolimits_{m = 1}^{N} r_{m}\) is equal to the total number of collaboration links between all countries, excluding country L. Yet, such links are now double counted as a collaboration between country Li and Lj counts for country Li and also for country Lj. Consequently \(T = \mathop \sum \nolimits_{m = 1}^{N} s_{m} + \left( {\mathop \sum \nolimits_{m = 1}^{N} r_{m} } \right)/2\) is equal to the total number of collaboration links in set V.

Now we want to compare these two arrays. Does country L collaborate with other countries in proportion to their contribution in collaborations in field V, or does L has a few preferred countries?

We re-arrange the country values such that for the new arrangement

$$\frac{{a_{1} }}{{w_{1} }} \ge \frac{{a_{2} }}{{w_{2} }} \ge \cdots \ge \frac{{a_{N} }}{{w_{N} }}$$
(4)

where we have assumed that all components of W are different from zero. Now we can construct the corresponding weighted Lorenz curve. This is the broken line connecting the origin (0,0) to the points with components

$$\left( {\mathop \sum \limits_{j = 1}^{i} w_{j} ,\mathop \sum \limits_{j = 1}^{i} a_{j} } \right)_{i = 1, \ldots ,N}$$
(5)

We easily see from its definition that if the normalized weight values are equal (to 1/N) then we obtain the standard Lorenz curve (Lorenz, 1905; Rousseau et al., 2018, p. 88).

Note that the order of the ratios (aj/wj) already provides information about preferred countries for country L. If aj = wj then country L collaborates with country j in the same proportion as country j has collaborated with other countries within this dataset. If aj > wj then country L has a preference for collaborating with country Lj, while if aj < wj the opposite is the case. Note that the ratios (aj/wj) are the slopes of the line segments of the weighted Lorenz curve. As these slopes decrease, see Eq. (4), the curve is concave. Only if all (aj/wj)-values are equal to 1, the weighted Lorenz curve coincides with the diagonal of the unit square. Two simple examples of the construction of a weighted Lorenz curve are shown in Fig. 2.

Fig. 2
figure 2

Weighted Lorenz curves. A Standard discrete case. B Case with zero weights

The usual theory of the weighted Lorenz curve does not include the case that some W-values are zero. If w1 = 0, then a1/w1 = ∞ (recall that it is excluded that a1 as well as w1 are equal to zero). Yet, there is no real objection to beginning a weighted Lorenz curve with one or more vertical parts. Such a weighted Lorenz curve would be farther away from the diagonal than a similar one not beginning with a vertical part. This feature corresponds with our aim of comparing the A and W arrays.

A partial order among countries

We have explained the weighted Lorenz construction for a given country L. This construction can be repeated for all other countries. If we denote the weighted Lorenz curve of country Lj by Lw(Lj), then we can introduce a partial order between countries, denoted as -< as follows: Lj-< Lm if Lw(Lj) ≤ Lw(Lm), with equality only if the two curves coincide. The relation -< is only a partial order as curves may intersect, as they do for classical Lorenz curves.

The relation with RIC(X,Y)

Assuming that country X in Eq. (2), from Fuchs et al. (2021) is now the target country L and country Y is now Lj (the jth country), then CX,Y is now denoted as sj; CX is now denoted as \(\mathop \sum \nolimits_{m = 1}^{N} s_{m}\); CY being the total number of collaborations country Y has with all other countries except country X, is here denoted as rj and TCX, the total number of pairwise country collaborations, not including those with country X, in the set of studied publications is now \(\left( {\mathop \sum \nolimits_{m = 1}^{N} r_{m} } \right)/2\). Consequently, \(w_{j} = \frac{{r_{j} }}{{\mathop \sum \nolimits_{m = 1}^{N} r_{m} }} = \frac{{C_{Y} - C_{X,Y} }}{{2\left( {T - C_{X} } \right)}}\) and \(a_{j} = \frac{{s_{j} }}{{\mathop \sum \nolimits_{k = 1}^{K} s_{k} }} = \frac{{C_{X,Y} }}{{C_{X} }}\). Hence the slopes of the line segments of the weighted Lorenz curves, namely (aj/wj) are equal to twice the RIC(X,Y) values. When wj is equal to zero, this is not a proper RIC value. It refers to the case that country Lj has an exclusive relation with country L.

Measures for global collaboration patterns

Now any measure respecting the partial order of the weighted Lorenz curve can be used as a collaboration measure. This means that if m denotes a measure and Lj -< Ln then m(Lj) ≤ m(Ln) with equality only if Lj = Lm. Probably the weighted Gini index is the easiest to understand. The interpretation of this index is the same as that of the (unweighted) Gini index, namely twice the area between the (weighted) Lorenz curve and the diagonal. This indicator can be calculated as:

$$G_{w} \left( L \right) = \frac{1}{2}\mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} \left| {w_{i} a_{j} - w_{j} a_{i} } \right|$$
(6)

or equivalently, when data are ranked as in Eq. (4), and bj = 1−\(\mathop \sum \nolimits_{m = 1}^{j} a_{m}\) for j = 1, …, N, with b0 = 1:

$$G_{w} \left( L \right) = 1 - \left( {\mathop \sum \limits_{i = 1}^{N} b_{i - 1} w_{i} + \mathop \sum \limits_{j = 1}^{N - 1} b_{j} w_{j} } \right)$$
(7)

Formula (6) can be found in Theil (1967, p. 121), while proof of formula (7) is given in the “Appendix”. This measure has been used e.g., in studies of the localization of industry under the name of ‘locational Gini coefficients’ (Krugman, 1991; Zitt et al., 1999).

If the first k weights are equal to zero, then this formula becomes (see “Appendix”):

$$G_{w} \left( L \right) = 1 - \mathop \sum \limits_{j = k + 1}^{N} \left( {b_{j} + b_{j - 1} } \right)w_{j}$$
(8)

still with bj = 1−\(\mathop \sum \nolimits_{m = 1}^{j} a_{m}\) for j = 1, …, N, and b0 = 1.

Formulae (6), (7), and (8) are (classical) weighted Gini indices for inequality. As stated before, we need the weighted Gini evenness coefficient, which is equal to 1 − Gw(L). Hence, we have for countries X and Y:

$$BIC\left( {X,Y} \right) = \mathop \sum \limits_{j = k + 1}^{N} \left( {b_{j} + b_{j - 1} } \right)w_{j}$$
(9)

The higher the weighted Lorenz curve the more country L’s collaboration pattern differs from the other countries’ collaboration pattern.

A simple fictitious case, illustrating the different steps in the whole counting case (the same idea can be applied to fractional counting) is provided in “A fictitious illustrative example” section. A real-world example follows in “An example based on real data” section.

A fictitious illustrative example

We consider the following collaboration Table 1 which refers to six articles and six countries (hence N = 5), where we focus on country L1. The cells in the table refer to the number of authors of the country in the first column collaborating to the article in a given column.

Table 1 Collaboration table (6 articles, 6 countries; we focus on country L1)

We will only consider internationally collaborated articles in this example (so A6 is not considered). We will only illustrate the case of whole counting. Hence, we only use the data in Table 2.

Table 2 Data for the case of collaborated articles using whole counting

Figure 3 shows the corresponding weighted network.

Fig. 3
figure 3

Weighted network corresponding with the data in Table 2

The array S = (2, 2, 1, 1,1) gives the number of articles in which country Lj, j = 2,3,4,5,6, collaborates with country L1 (with or without other countries). Normalizing leads to A = (2/7, 2/7, 1/7,1/7,1/7).

The array R is here (3,5,2,4,0), showing the number of collaboration links involving countries Lj, j = 2,3,4,5,6. The corresponding normalized array is W = (3/14, 5/14, 2/14, 4/14, 0/14).

Now the components of W and A must be re-arranged such that aj/wj is decreasing. This leads to L6,L2,L4,L3,L5 with corresponding (aj/wj) values: + ∞ > 4/3 > 1 > 4/5 > 2/4. Recall that for the finite results, these slopes are equal to twice the RIC(L1, Lj) values.

Now we can draw the weighted Lorenz curve. It connects the points with components (0,0)—(0, 1/7)—(3/14, 3/7)—(5/14, 4/7)—(10/14, 6/7)—(1,1). It is always concave. The result for this example is shown in Fig. 2.

The value of the weighted Gini evenness index is 34/49. This means that the BIC is equal to 69.4%.

We note that using S = (200,200,100,100,100) and R = (300,500,200, 400,0) would lead to the same weighted Lorenz curve and hence also to the same BIC value (Fig. 4).

Fig. 4
figure 4

Weighted Lorenz curve for country L with arrays S = (2, 2, 1, 1,1) and R = (3,5,2,4,0)

An example based on real data

To provide examples based on real-world data, we used Web of Science as represented in Clarivate’s InCites in 2021 to register co-occurrences in the addresses of scientific articles (including reviews) of the 20 countries listed in Table 4 below. A co-occurrence is counted only once even if a country is listed more than once in an article. The 20 countries were selected by the size of their scientific output and their size within the global network of international collaboration. Together, the 20 countries contributed to 87.6% of the world's scientific output in publications in 2020. Our measurement will not change much by adding more countries. However, the aim of collaboration analysis is often to compare countries of the same size or within the same region. We suggest adding smaller countries to a group of large countries rather than replacing large countries with smaller countries if the intention is to provide an unbiased picture of how smaller countries collaborate within the global collaboration network.

To demonstrate how the BIC is constructed using a real-world example, we consider the collaboration pattern of China in the year 2018 with the other 19 countries/regions. To create the Lorenz curve, the other countries need to be ranked by descending RIC as shown in Table 3. The basis for the Lorenz curve is the two columns Array A and Array W which are also the basis for calculating the RIC. To take the example of China in relation to the USA, the RIC value of 1.122 is calculated as the observed relative intensity (0.375) divided by the expected relative intensity (0.334). The first of these values is used in Array A while half of the latter is used in Array W. After the columns are made cumulative, Array A provides the y-axis and Array W the x-axis for the Lorenz curve. Array W provides the weighing of the A values. The result is shown in Fig. 5. The value of the weighted Gini evenness index is 0.637. This means that the BIC is equal to 63.7%.

Table 3 A basis for the Lorenz curve
Fig. 5
figure 5

Weighted Lorenz curve for China in 2018

Two examples of the use of the BIC

Before concluding, we will use the same dataset to provide two examples of how the BIC can be used in policy-relevant analysis. Figure 6 shows the development of the BIC in the period 2000–2020 for the same six countries that are represented in Fig. 1 above. The UK stands out with a high degree of balance in international scientific collaboration while China stands out with a low degree. The general trend is towards increased balance in international scientific relations, but this is the case for China and the USA only in recent years. When comparing with Fig. 1, it seems that the increasingly intense collaboration between the two countries until 2016 came at a loss of balance and that the reduced bilateral intensity in the most recent years has provided for increased balance in the two countries’ relations to other countries.

Fig. 6
figure 6

Example of BIC values: The balance in the collaboration patterns of China and its five major collaborating countries in the years 2000–2020

The BIC is probably most relevant to use in combination with RIC or similar measures of collaboration intensity. An example is given in Table 4. Here, we have ranked the 20 countries in descending order according to the BIC values in 2020. In the other columns, we identify the countries or groups of countries that exhibit the highest and lowest RIC values from the perspective of each country in the same year. The relative intensities could also have been visualized in a two-dimensional map, but here, the BIC adds information about the balance.

Table 4 The 20 countries ranked according to BIC in 2022 with additional information about the bilateral relations with the highest and lowest collaboration intensity according to RIC

Conclusion

The BIC indicator is a new indicator, specifically constructed to measure balance in collaboration. The indicator is based on the Gini index for a weighted Lorenz curve. The relation between this weighted Lorenz curve and the previously introduced indicator of relative intensity of collaboration (RIC) is explained. As the weighted Lorenz curve and the associated Gini index are not often used in bibliometric work, their construction and calculation are clearly explained.

Our examples using a dataset with collaboration frequencies among 20 large countries demonstrate that the new indicator of balance in collaboration patterns can be a useful extension of indicators measuring the relative intensity of collaboration in bilateral relations. For any country, the balance can be studied over time and compared to other countries.