Introduction

The reliance on teamwork in scientific research has increased over the last decades (Zeng et al., 2017; Fortunato et al., 2018). The fraction of scientific papers written by teams of researchers and the number of authors in a scientific paper have increased over the last century on average (Guimerá et al., 2005; Wuchty et al., 2007). Various factors affect outcomes of scientific teamwork, including the team size (i.e., the number of authors of a paper) (Wuchty et al., 2007; Wu et al., 2019), internationality (i.e., the number of countries involved in a paper) (Hsiehchen et al., 2015; Coccia and Wang, 2016), ethnic diversity (i.e., the number of ethnicities involved in a paper) (AlShebli et al., 2018), interdisciplinarity (i.e., the number of disciplines of authors involved in a paper) (Van Noorden, 2015; Lariviére et al., 2015), and team freshness (i.e., fraction of authors who have not collaborated with others before) (Zeng et al., 2021). In addition, quantitative approaches to scientific collaboration networks have contributed to the understanding of patterns of collaborations among researchers (Newman, 2001; Zeng et al., 2017) and their relations to research productivity (e.g., the number of published papers) or impact (e.g., the number of citations received by published papers) of researchers (Hou et al., 2008; Ding et al., 2009; Yan and Ding, 2009; Yan et al., 2010; Abbasi et al., 2011, 2012; Uddin et al., 2013; Ebadi and Schiffauerova, 2015; Wang, 2016; Guan et al., 2017).

A universal trend in modern scientific teamwork is that researchers from different institutions collaborate with each other (Adams et al., 2005; Cummings and Kiesler, 2005; Jones et al., 2008). Such teams tend to produce papers with higher citation impacts than those written by teams confined to a single institution (Jones et al., 2008). Patterns of co-authorships among researchers from different institutions have been characterized through analyses of collaboration networks among institutions (Melin and Persson, 1996; Ye et al., 2012; Chen et al., 2020). Grant collaboration involving multiple institutions is also a growing trend (National Science Foundation, 2012; Nagarajan et al., 2013; Ma et al., 2015). Ma et al. analyzed a British collaboration network among institutions in which edges represent partnerships between two institutions in funded research projects (Ma et al., 2015). They found that universities with many edges tend to be densely connected to each other, forming a rich club. Analyses of such grant collaboration networks may inform the government and other stakeholders on how to allocate research funding to institutions (Szell and Sinatra, 2015).

In the present study, we represent collaborations among institutions on research grants as bipartite networks to investigate grant collaborations among two or more institutions. Note that Ma et al. investigated a dyadic collaboration network of research grants in which collaborations between three or more institutions were represented by dyadic collaborations (Ma et al., 2015). Such a projection into dyadic networks, called the one-mode projection, is a major method for analyzing networks involving higher-order interactions among nodes (Newman, 2001; Opsahl et al., 2008; Zeng et al., 2017). However, evidence suggests limitations of describing such higher-order data only using pairwise interactions (Battiston et al., 2020; Torres et al., 2021). In fact, despite the coordination cost that collaborating institutions owe, it is not uncommon that more than two institutions participate in a funded research project (Adams et al., 2005; Cummings and Kiesler, 2005, 2007). Grants with large monetary amounts often require or at least encourage inter-institutional collaboration and are sometimes a main reason for collaboration among institutions (Bozeman and Corley, 2004). Large grant teams in terms of the number of investigators tend to be more productive (Cook et al., 2015), and collaboration with such large and productive teams tends to receive grants in the future (Ebadi and Schiffauerova, 2015b). These factors may also lead to an increase in the number of collaborating institutions. Thus motivated, we investigate networks of higher-order grant collaborations among institutions.

The relationships between research funding and research productivity or impact have been investigated for individual grants (Lauer, 2016), investigators (Defazio et al., 2009; Jacob and Lefgren, 2011; Beaudry and Allaoui, 2012; Fortin and Currie, 2013; Ebadi and Schiffauerova, 2016), institutions (McAllister and Narin, 1983; Boyack and Börner, 2003; Payne and Siow, 2003; Rosenbloom et al., 2015; Ma et al., 2015), and geographical regions (Zucker et al., 2007). Understanding such relationships is expected to assist the government and other stakeholders to develop strategies for allocating research funds to different units for enhancing research performance. Evidence supports positive correlations between the monetary amount of research funding received by an institution and its research productivity or impact (McAllister and Narin, 1983; Boyack and Börner, 2003; Payne and Siow, 2003; Rosenbloom et al., 2015; Ma et al., 2015). On the other hand, the per-dollar productivity or impact of an institution that receives a large amount of research funding tends to be diminishing (Zhi and Meng, 2016; Yin et al., 2018; Wahls, 2019; Aagaard et al., 2020). Given this, in the present study we ask the following question: do institutions participating in many collaborative grants gain advantages in their per-dollar research impact when they densely collaborate with each other (i.e., they form a rich club) in research grants? We examine this question using bipartite-network representation of collaborative grants among institutions, which allows us to investigate relationships among rich clubs, research impact, and the collaboration size. A preprint of this study is available (Nakajima et al., 2022a).

Methods

Construction of data sets

Collaborative grants

We use publicly available data on the grants administered by the National Science Foundation (NSF)Footnote 1. We focused on the collaborative grants in each of which multiple institutions participate and each institution was responsible for a separate award. Therefore, each collaborative grant is composed of a set of linked awards each of which is separately administered by a single institution. For this type of collaborative grant, research proposals submitted by collaborating institutions must have the same project title beginning with ‘Collaborative Research:’ (e.g., see the latest guide posted by the NSFFootnote 2. We confirmed that this rule was applied at least since 1999Footnote 3). Therefore, we first collected the data of the awards with the project title beginning with ‘Collaborative Research:’ and the start date between January 1, 2000 and December 31, 2020. Second, we identified the set of institutions that received at least one such award. Third, we used the Wikipedia APIsFootnote 4 to categorize each institution into one of 48 types; see Table S1 for the complete list of institution types. Fourth, we obtained the data of the awards received by the institutions whose type name includes ‘university’, ‘college’, or ‘school’ (see Table S1 for the list of institution types that we focused on). Among these institutions, there are 14,081 collaborative grants each of which contains at least two awards (i.e., institutions). Fifth, for each collaborative grant, we identified the set of participating institutions, the 7-digit award number (i.e., ID) assigned to each participating institution, and the monetary amount distributed to each participating institution.

To quantify the research outputs produced under the collaborative grants, we use the Web of Science Core Collection databaseFootnote 5. There are 1,082,349 papers that were published between January 1, 2000 and December 31, 2020 and include at least one of the words ‘National Science Foundation’ and ‘NSF’ in the acknowledgment section. The fraction of papers with acknowledgment data in this data set has increased since 2008 because the Web of Science started recording the funding acknowledgment data in August 2008Footnote 6. For each of these papers, we extracted the 7-digit award numbers mentioned in the acknowledgement section, the number of times cited by other papers in the database, the research disciplines assigned to the paper, which is available in the data set, the publication year, and the document type. We retained the 1,066,324 papers whose document types are either ‘Article’, ‘Review’, ‘Letter’, ‘Editorial Material’, ‘Meeting Abstract’ or ‘Proceedings Paper’, as suggested before (Waltman, 2016). Then, for each award comprising a collaborative grant, we identified the papers that mentioned its award number in the acknowledgment section. We removed the collaborative grants with less than five published papers in the database because such collaborative grants often have extreme impact values due to the small number of the associated papers. Then, we were left with 7,026 collaborative grants, each of which is associated with at least five of the 101,283 published papers. These collaborative grants have been awarded to 570 institutions in total.

Fig. 1
figure 1

An example of three collaborative grants and the corresponding bipartite network of institutions and collaborative grants

Single-institution grants

For comparison, we also analyzed the grants that were composed of just one award given to one institution. To prepare such data, we first identified the awards of which the project title did not begin with ‘Collaborative Research:’ and the start date was between January 1, 2000 and December 31, 2020. There are 148,795 awards that meet these criteria and have been received by any of the 570 institutions that have participated in at least one collaborative grant. Second, for each of these awards, we identified the institution that received the award, the 7-digit award number (i.e., ID) assigned to the institution, the monetary amount of the award, and the first and last names of a principal investigator (PI) and co-PIs. Third, for each award, we identified the papers that mentioned its award number in the acknowledgment section. We removed the awards associated with less than five published papers in the Web of Science database. Then, we were left with 41,510 awards. According to the NSF’s guide, these awards belong to one of the following three types of grant: (i) single-institution grant without co-PI, (ii) single-institution grant in which all the co-PIs are from the same institution as the PI’s, and (iii) collaborative grant in which at least one co-PI from a different institution from the PI’s participates and the PI’s institution is responsible for the award.

We focus on the awards of types (i) and (ii) because they are genuine single-institution grants. We found 24,866 awards of type (i) among the 41,510 awards. It is not straightforward to classify the remaining 16,644 awards into types (ii) and (iii) because the affiliations of the co-PIs are not available in our data set. Therefore, we attempted to identify the awards of type (ii) as follows. First, for each co-PI in a given award, we obtain the set of candidate affiliations of the co-PI as the set of the affiliations of the authors who have the same first name initial and the same full last name as the co-PI in any of the papers associated with the award. Second, we regard that an award is of type (ii) if and only if the set of candidate affiliations of every co-PI in the award includes the institution that has received the award. We obtained 7854 awards of type (ii) among the 16,644 awards with co-PIs. Otherwise, we regard that the award is of type (iii).

In summary, we obtained 24,866 + 7854 = 32,720 single-institution grants, each of which is associated with at least five of the 363,116 published papers. These grants have been awarded to 441 institutions in total.

Bipartite network of institutions and collaborative grants

From the data on the collaborative grants, we construct a bipartite network that consists of a set of institutions \(V = \{v_1, \ldots , v_N\}\), where N is the number of institutions, a set of collaborative grants \(U = \{u_1, \ldots , u_M\}\), where M is the number of collaborative grants, and a set of edges E. An edge \((v_i, u_j)\) exists between institution \(v_i\) and collaborative grant \(u_j\) if and only if \(v_i\) received an award in the collaborative grant \(u_j\). A unique 7-digit award number and a unique monetary amount are associated with each edge \((v_i, u_j) \in E\). We denote by \(k_i\) the degree of \(v_i\), i.e., the number of awards that institution \(v_i\) received from collaborative grants. We denote by \(s_j\) the degree of \(u_j\), i.e., the number of collaborating institutions in collaborative grant \(u_j\). We show in Fig. 1a hypothetical bipartite network of four institutions and three collaborative grants. In this example, we have \(V = \{v_1, v_2, v_3, v_4\}\), \(U = \{u_1, u_2, u_3\}\), \(E = \{(v_1, u_1), (v_1, u_2), (v_2, u_1), (v_2, u_2), (v_2, u_3), (v_3, u_3), (v_4, u_2), (v_4, u_3)\}\), \(k_1=2,\ k_2=3,\ k_3=1,\ k_4=2,\ s_1=2,\ s_2=3,\) and \(s_3=3\).

Detection of rich clubs

A rich club of a dyadic network is defined as a subnetwork in which the nodes with the highest degrees (i.e., the nodes with the largest numbers of connected edges) are densely inter-connected to each other (Zhou and Mondragon, 2004; Colizza et al., 2006). There are a few studies on rich clubs in bipartite networks. Opsahl et al. investigated rich clubs in a bipartite network of academic authors and papers (Opsahl et al., 2008). They constructed a weighted unipartite network in which the weight of each edge between two authors is equal to the number of coauthored papers, which corresponds to the one-mode projection of the bipartite network to a unipartite network, and then applied a method to detect weighted rich clubs for dyadic networks. The same method was applied to detect a rich club in a bipartite brain network (Crossley et al., 2013), a bipartite transportation network (Feng et al., 2016), and a bipartite technological network (Cinelli, 2019). In the present work, we investigate rich clubs in higher-order networks of collaborative grants among institutions, which one-mode projection does not characterize. Specifically, we develop and apply a method to detect rich clubs in bipartite networks without using the one-mode projection.

We define a rich club of a given bipartite network composed of institutions and collaborative grants in which the institutions with the largest degrees densely collaborate with each other. To compute the rich club, we first calculate the rich-club coefficient, denoted by \(\phi (k)\), for the original bipartite network for a given degree k. By extending the definition for dyadic networks (Zhou and Mondragon, 2004; Colizza et al., 2006), we define \(\phi (k)\) as the number of collaborative grants that are exclusively composed of the institutions with a degree larger than k divided by the maximum possible number of collaborative grants that are exclusively composed of some of these nodes. Formally, we define

$$\begin{aligned} \phi (k) = \frac{|U_{>k} |}{\sum _{i=2}^{N_{>k}}{\left( {\begin{array}{c}N_{>k}\\ i\end{array}}\right) }}, \end{aligned}$$
(1)

where \(U_{>k}\) is the set of collaborative grants that are exclusively composed of the institutions with a degree larger than k, and \(N_{>k}\) is the number of institutions with a degree larger than k. To examine the presence of a rich club, we need to compare \(\phi (k)\) with values for a reference model (Colizza et al., 2006). Therefore, we define the normalized rich-club coefficient, denoted by \(\rho (k)\), as

$$\begin{aligned} \rho (k) = \frac{\phi (k)}{\phi _{\text {rand}}(k)}, \end{aligned}$$
(2)

where \(\phi _{\text {rand}}(k)\) is the rich-club coefficient for the reference model of bipartite network. If \(\rho (k)\) is sufficiently larger than 1, we say that the institutions with a degree larger than k form a rich club. For dyadic networks, a standard choice of the reference model is the configuration model, which randomizes the edges of the original network while preserving the degree of each node (Colizza et al., 2006). Here we use a counterpart of the configuration model for bipartite networks in which we randomize the edges of the original bipartite network while preserving the degree of each institution and each collaborative grant (Newman et al., 2001; Nakajima et al., 2022b). We compute \(\phi _{\text {rand}}(k)\) as the rich-club coefficient averaged over 10,000 randomized bipartite networks.

Measuring research impact for awards, institutions, and grants

Each award in collaborative grants is associated with a monetary amount and a set of journal and conference papers supported by the award, with which we calculate the per-dollar research impact (Lauer, 2016) as follows. First, to compare the citation count across different publication years and research disciplines, we normalize the number of citations received by each of the 101,283 papers, which are associated with at least one collaborative grant (Radicchi et al., 2008; Waltman, 2016). To this end, we denote by c the number of citations that a given paper z has received. We define \(c_{0}\) as the number of citations that a paper that was published in the same year as z and belongs to a research discipline assigned to z has received on average. Specifically, we set \(c_{0} = (\sum _{d \in D(z)} \bar{c}_{d, y(z)}) / |D(z) |\), where D(z) is the set of the research disciplines assigned to z, \(|D(z) |\) is the number of research disciplines to which z belongs, y(z) is the publication year of z, and \(\bar{c}_{d, y(z)}\) is the average number of citations received by the papers published in discipline d and year y(z). Each paper is assigned to at least one of the 42 research disciplines (Huang et al., 2020) (see Supplementary Sect. S2 for details). We define the normalized number of citations received by z as \(c / c_{0}\). Then, we define the per-dollar impact of the award given to institution \(v_i\) in collaborative grant \(u_j\), denoted by \(x_{ij}\), as the sum of \(c/c_0\) over all the papers associated with the award, which we then divide by the monetary amount of the award.

We measure the impact of collaborative funded research for a given subset of institutions, denoted by \(V'\ (V' \subseteq V)\), as follows. We first calculate the average per-dollar impact of the awards in collaborative grants that the institutions in \(V'\) have received, denoted by \(\bar{x}_{\text {inst}}(V')\). Then, we define the normalized impact for the set of institutions \(V'\) as \(\bar{x}_{\text {inst}}(V')/\bar{x}\), where \(\bar{x}\) is the average per-dollar impact of all the awards in collaborative grants. For example, when we consider the set of institutions \(V'=\{v_1, v_3\}\) in a bipartite network shown in Fig. 1b, we obtain \(\bar{x}_{\text {inst}}(V') = (x_{11} + x_{12} + x_{33})/3\). Note that \(\bar{x} = (x_{11} + x_{12} + x_{21} + x_{22} + x_{23} + x_{33} + x_{42} + x_{43})/8\). If the normalized impact is larger than 1, the impact of \(V'\) is higher than the average impact of all the institutions.

We measure the impact of a given subset of collaborative grants, denoted by \(U'\ (U' \subseteq U)\), as follows. We first calculate the average per-dollar impact of the awards in \(U'\), denoted by \(\bar{x}_{\text {grant}}(U')\). We are interested in whether institutional collaborations yield higher impact than the average impact of the participating institutions. Therefore, we define the normalized impact of \(U'\) as \(\bar{x}_{\text {grant}}(U') / \bar{x}_{\text {inst}}(V'(U'))\), where \(V'(U')\) is the set of institutions participating in at least one collaborative grant in \(U'\). Note that \(\bar{x}_{\text {inst}}(V'(U'))\) is the average per-dollar impact of the awards that the institutions in \(V'(U')\) have received. For example, let us consider the set of collaborative grants \(U'=\{u_1, u_2\}\) in a bipartite network shown in Fig. 1b. One obtains \(\bar{x}_{\text {grant}}(U') = (x_{11} + x_{21} + x_{12} + x_{22} + x_{42})/5\). Because set of institutions \(V'(U')\) is \(\{v_1, v_2, v_4\}\), one obtains \(\bar{x}_{\text {inst}}(V'(U')) = (x_{11} + x_{12} + x_{21} + x_{22} + x_{23} + x_{42} + x_{43})/7\). If the normalized impact is larger than 1, the impact of the collaborative grants in \(U'\) is higher than the average impact of the institutions participating in a collaborative grant in \(U^{\prime}\).

To quantify the impact of single-institution grants, we adapt the above procedure for collaborative grants to the case of single-institution grants as follows. First, we construct a bipartite network composed of institutions and single-institution grants. Second, we normalize the number of citations received by each of the 363,116 papers that are associated with at least one single-institution grant by the publication year and research discipline. Then, we directly apply the definitions of impact in the case of bipartite networks of institutions and collaborative grants to the bipartite networks of institutions and single-institution grants.

Results

Fig. 2
figure 2

Rich-club phenomena in networks of grant collaboration. a Normalized rich-club coefficient \(\rho (k)\) as a function of the number of awards that the institution received from collaborative grants. We measured \(\rho (k)\) for the entire network (labeled “All collaborations”), the subnetwork only composed of collaboration between \(s=2\) institutions, that with \(s=3\), \(s=4\), and \(s=5\). In this figure, Figs. 3b, 4a–e, and 5, we omit data points for a given value of k if there are less than five instances contributing to the data point. b Rank correlation matrix between the different networks, where the rank is in terms of the number of awards in collaborative grants that the institution has received. We used the top 50 institutions in the entire network to calculate the rank correlation. c PCA result for the 50 institutions with the largest numbers of awards in the entire network. The number indicates the institution’s rank in the entire network. See the Supplementary Materials for the names of the 50 institutions

Higher-order rich clubs in collaborative grants

We explore possibility of higher-order rich clubs in collaborative grants. We are also interested in how a rich-club phenomenon depends on the number of institutions in a collaborative grant. Therefore, we calculate the normalized rich-club coefficients for the entire bipartite network and the bipartite subnetwork induced by the collaborative grants of degree (i.e., the number of collaborating institutions), s. We consider \(s \in \{2,3,4,5\}\) because collaborative grants with \(s \ge 6\) are rare; there are less than 100 grants for each \(s\ge 6\).

Figure 2a shows the normalized rich-club coefficients for the different bipartite networks. Figure 2a indicates that the entire bipartite network shows a rich-club phenomenon (i.e., rich-club coefficient \(> 1.10\), although this criterion is arbitrary) for the threshold of the number of awards from collaborative grants, k, approximately \(100 \le k \le 200\). (The P-value is less than 0.005 for \(1 \le k \le 193\) according to the Bonferroni-corrected permutation test; see Supplementary Sect. S3.) The rich-club coefficient reaches the maximum value of approximately 1.21 at \(k = 144\). The figure also indicates that, although the bipartite subnetwork with \(s=2\) has rich clubs that are statistically significant (see Supplementary Sect. S3), the rich-club coefficient values are modest with the largest value of 1.13. In contrast, the bipartite subnetwork only composed of collaborations among \(s= 3\) institutions, the subnetwork restricted to \(s=4\), and that restricted to \(s=5\) show relatively strong and persistent rich clubs across a range of k. Therefore, the institutions that receive the largest numbers of awards from either the triadic, quartic, and quintic collaborative grants tend to more densely collaborate with each other than the institutions with the largest numbers of awards from dyadic collaborative grants. Note that the normalized rich-club coefficient for the entire bipartite network (circles in Fig. 2a) is mostly determined by that for the subnetwork induced by the dyadic collaborative grants (crosses in Fig. 2a). This is because dyadic collaborative grants are dominant in number; they account for approximately 67% of all the collaborative grants.

We next compare the rich clubs in the different subnetworks. We focus on the 50 institutions with the largest numbers of awards in the entire bipartite network of collaborative grants. For these institutions, we calculate the Spearman’s rank correlation coefficient in terms of the number of awards between each pair of the five bipartite networks (i.e., the entire network, \(s=2\) subnetwork, \(s=3\) subnetwork, \(s=4\) subnetwork, and \(s=5\) subnetwork). We show the rank correlation for all pairs of networks in Fig. 2b. We find that the entire network is the most strongly correlated with the \(s=2\) subnetwork. This result is expected because the collaborations between \(s=2\) institutions are by far the largest contributor to the entire network. Figure 2b also indicates that the correlation is larger when s is closer between two subnetworks.

This result led us to hypothesize that some institutions are good at securing collaborative grants involving fewer institutions, while other institutions are the opposite. To test this hypothesis, we classify the same 50 institutions using a principal component analysis (PCA). To run the PCA, we encode each institution into a four-dimensional vector composed of the normalized number of awards in collaborative grants with \(s=2\), \(s=3\), \(s=4\), and \(s=5\). Specifically, we scale each entry of the vector to have mean 0 and standard deviation 1. Then, we run the PCA on the normalized vectors using the scikit-learn library (Pedregosa et al., 2011).

We show the PCA result in Fig. 2c. Each data point is labeled with the institution’s rank in terms of the number of awards in collaborative grants that the institution has received; see Table S2 for the names of the 50 institutions. The first two principal components, denoted by PC1 and PC2, explain 74.7% and 13.1% of the variance of the data, respectively. Therefore, we conclude that the two-dimensional representation of the institutions shown in Fig. 2c, where the two axes correspond to PC1 and PC2, is sufficient. The eigenvector corresponding to PC1 is (0.53, 0.54, 0.49, 0.44), which indicates that the number of awards from collaborative grants of any size of collaboration approximately equally contributes to PC1. As expected, institutions with a higher rank (i.e., data points labeled with a smaller number in Fig. 2c) tend to have a higher PC1 value. The eigenvector corresponding to PC2 is \((-0.25, -0.28, -0.22, 0.89)\). Therefore, the PC2 classifies the 50 institutions into those frequent in collaborations with smaller numbers of institutions (i.e., \(2 \le s\le 4\)) and those frequent in collaborative grants with \(s=5\). For example, the University of California, Berkeley ranks the 11th, 11th, 3rd, and 1st in the \(s=2\), \(s=3\), \(s=4\), and \(s=5\) subnetworks, respectively; University of Washington ranks the 6th, 2nd, 9th, and 2nd in the same four subnetworks; University of Colorado at Boulder ranks the 8th, 7th, 4th, and 4th; University of California, Los Angeles ranks the 24th, 29th, 22nd, and 7th; University of California, Santa Barbara ranks the 22nd, 38th, 42nd, and 8th; Rice University ranks the 45th, 44th, 82nd, and 6th. The latter three universities have a much higher rank in the subnetwork with \(s=5\) than that in the entire network. The behavior of institutions with a low PC2 value is the opposite. For example, University of Illinois at Urbana-Champaign ranks the 1st, 1st, 8th, and 10th in the \(s=2\), \(s=3\), \(s=4\), and \(s=5\) subnetworks, respectively; University of Michigan, Ann Arbor ranks the 3rd, 3rd, 5th, and 17th in the same four subnetworks; Massachusetts Institute of Technology ranks 5th, 9th, 12th, and 28th; Duke University ranks 18th, 18th, 34th, and 55th; Virginia Polytechnic Institute and State University ranks 32nd, 19th, 14th, and 53rd.

Research impact of the institutions with the largest numbers of collaborative grants

Fig. 3
figure 3

Research impact of award-rich institutions. We analyze the single-institution grants, all the collaborative grants, and the collaborative grants with different values of s. a Rank plot of the institutions in terms of the number of awards. b Normalized impact of the institutions with more than k awards from grants. We denote by \(V_{>k}\) the set of those institutions

We now investigate research impact of the institutions with the largest numbers of awards from collaborative grants. Note that these institutions form putative rich clubs. For comparison, we also analyze the research impact of the institutions with the largest numbers of awards from single-institution grants. Here we analyze the data separately for all the collaborative grants, the collaborative grants comprising \(s \in \{2,3,4,5\}\) institutions, and single-institution grants.

First, we show the rank plot of the number of awards received by the institution, k, in Fig. 3a. The figure indicates that k is skewed toward the top-ranked institutions. For example, the top 20% of institutions obtained approximately 82% of the awards in collaborative grants and approximately 79% of the awards in single-institution grants. This result is consistent with the concentration of research funding in top-ranked institutions observed in the NSF (Xie, 2014), the National Institutes of Health grants in the US (Wahls, 2019; Lauer and Roychowdhury, 2021), and the Engineering and Physical Sciences Research Council grants in the UK (Ma et al., 2015). We also found that the top-ranked institutions less dominate the distribution of awards in the case of collaboration with a larger number of institutions (i.e., larger s). For example, the top 20% of institutions account for approximately 79% of the awards in single-institution grants (i.e., \(s=1\)), 76% for \(s=2\), 70% for \(s = 3\), 60% for \(s = 4\), and 53% for \(s = 5\). To be further quantitative, we have calculated the coefficient of variation for the distribution of the number of awards, which is equal to 1.75, 1.67, 1.49, 1.17, and 0.95 for \(s=1\), \(s=2\), \(s=3\), \(s=4\), and \(s=5\), respectively; the Gini coefficient is 0.74, 0.72, 0.66, 0.56, and 0.46 for \(s=1\), \(s=2\), \(s=3\), \(s=4\), and \(s=5\), respectively.

Second, we show the normalized impact of the institutions as a function of k in Fig. 3b. We find that the institutions with approximately 100 or more awards from collaborative grants tend to be less impactful in the per-dollar sense than those with fewer awards. Similarly, the institutions with approximately 100 or more awards from single-institution grants tend to be less impactful than those with fewer awards. This result of the diminishing per-dollar productivity or impact at the institution level is consistent with the previous results (Zhi and Meng, 2016; Yin et al., 2018; Wahls, 2019; Aagaard et al., 2020). Figure 3b also indicates that similar diminishing research impact is present for collaborative grants of different collaboration sizes, \(s \in \{2,3,4,5\}\).

Research impact of the collaborative grants within rich clubs

Fig. 4
figure 4

Advantage of collaborations among the award-rich institutions. We plot the normalized impact of the collaborative grants in each of which the fraction of the institutions receiving more than k awards from collaborative grants is at least p. We denote by \(V_{>k, \ge p}\) the set of the institutions participating in at least one collaborative grant in \(U_{>k, \ge p}\). a Entire network. b Subnetwork with \(s=2\). c Subnetwork with \(s=3\). d Subnetwork with \(s=4\). e Subnetwork with \(s=5\)

Fig. 5
figure 5

Overlay of the rich-club coefficient and research impact of the collaborative grants. Each panel shows the normalized rich-club coefficient and the normalized impact as a function of the number of awards k that the institution has received from collaborative grants. a Entire network. b Subnetwork with \(s=2\). c Subnetwork with \(s=3\). d Subnetwork with \(s=4\). e Subnetwork with \(s=5\)

Given the results shown in Fig. 3, rich clubs may be detrimental to research impact because a rich club is a set of high-degree nodes, i.e., institutions with many awards. However, Fig. 3 does not imply that collaborative grants among rich-club institutions are not impactful; we did not look into collaboration among rich-club institutions with Fig. 3. Therefore, we now investigate possible associations between the rich clubs in collaborative grant networks and research impact. We first validate the impact of the collaborative grants within rich clubs, which are exclusively composed of the institutions with the largest numbers of awards. We denote by \(U_{>k, \ge p}\) the set of collaborative grants in which the fraction of the institutions with more than k awards from collaborative grants is at least p. We compare impact of the collaborative grants, \(U_{>k, \ge p}\), for different p values.

We show in Fig. 4 the normalized impact of the collaborative grants in \(U_{>k, \ge p}\) for different values of k and p for the entire network and the subnetwork of each collaboration size \(s \in \{2, 3, 4, 5\}\). For the entire network, Fig. 4a indicates that the collaborative grants in \(U_{>k, \ge p}\) with \(p=1\) and large k tend to be more impactful than the expectation for the participating institutions. The maximum value of the normalized impact is approximately 1.15 at \(k=159\). The figure also indicates that the collaborative grants in \(U_{>k, \ge p}\) with \(p=1\) for given value of k tend to have a higher normalized impact than those in \(U_{>k, \ge p}\) with \(0< p < 1\). For example, at \(k = 159\), the normalized impact is 1.15, 1.10, 1.00, 0.97, and 0.98 for \(p=1\), \(p=0.8\), \(p=0.6\), \(p=0.4\), and \(p=0.2\), respectively. Figure 4b–e indicate that the normalized impact for \(U_{>k, \ge p}\) with \(p=1\) tends to be larger than 1 at large k values in the subnetwork with \(s \in \{2,3,4,5\}\). This result is qualitatively the same as that for the entire collaboration network shown in Fig. 4a. Figure 4b–e also indicate that the normalized impact for \(U_{>k, \ge p}\) with \(p=1\) tends to be larger than that for \(U_{>k, \ge p}\) with \(0< p < 1\) in each subnetwork with \(s \in \{2,3,4,5\}\). By definition, the normalized impact of the single-institution grants is exactly equal to 1 for any k. Altogether, these results indicate that collaborations among the institutions with the largest numbers of collaborative grants tend to be impactful, not because such institutions tend to be strong in research but because they collaborate.

To further investigate the association between rich clubs and research impact, we investigate relationships between the normalized rich-club coefficient, \(\rho (k)\), and the normalized impact of the collaborative grants that are exclusively composed of the institutions in the rich club. We denote by \(U_{>k}\) the set of collaborative grants that are exclusively composed of the institutions with more than k awards from collaborative grants. Note that \(U_{>k}\) is equivalent to \(U_{>k, \ge p}\) with \(p=1\). If \(\rho (k)\) is sufficiently larger than 1, then \(U_{>k}\) is the set of collaborative grants contained in the rich club. Therefore, if rich clubs are associated with high research impact, the normalized impact of \(U_{>k}\) should be larger than 1 for the k values at which \(\rho (k)\) is sufficiently larger than 1.

We show in Fig. 5 the plots of \(\rho (k)\) and the normalized impact of \(U_{>k}\) against k, separately for the entire network and the subnetworks with \(s \in \{2,3,4,5\}\). The figure indicates that the normalized impact of \(U_{>k}\) tends to be larger than 1 if \(\rho (k)\) is larger than 1 in the entire network (Fig. 5a). For example, \(\rho (k)\) is largest at \(k=144\). The institutions with more than 144 awards collaborate with each other approximately 21% more densely than in a randomized network (i.e., \(\rho (144) \approx 1.21\)). The impact of the collaborative grants in \(U_{>144}\) is approximately 14% higher than expected from the average impact of the institutions participating in a collaborative grant in \(U_{>144}\). However, at \(k=299\), the rich club is absent (i.e., \(\rho (299) \approx 0.67\)), and the impact of the collaborative grants in \(U_{>299}\) is 30% lower than the expectation for the participating institutions. The Pearson correlation coefficient between \(\rho (k)\) and the normalized impact, where we regarded a pair of these two quantities for a value of k as a data point, is equal to \(r = 0.85\) (P-value is less than 0.001). We also found a significant positive correlation between these two quantities for the subnetwork with \(s=2\) (\(r = 0.89,\ P < 0.001\); see Fig. 5b), \(s=4\) (\(r = 0.61,\ P < 0.005\); see Fig. 5d), and \(s=5\) (\(r = 0.98,\ P < 0.001\); see Fig. 5e). For the subnetwork with \(s=3\), while we found a negative correlation (\(r = -0.81,\ P < 0.001\); see Fig. 5c), the normalized impact tends to be larger than 1 if \(\rho (k)\) is larger than 1 for approximately \(1 \le k \le 45\).

Discussion

We investigated higher-order rich-club phenomena in networks of collaborative research grants. To this end, we developed a method to detect rich clubs in bipartite networks. We observed rich clubs in both the entire bipartite network and the subnetworks induced by the collaborative grants with a given number of collaborating institutions, s, where \(s\in \{2, 3, 4, 5\}\). The subnetworks with \(s =\) 3, 4, and 5 had stronger rich clubs than that with \(s=2\). Regarding performances of rich clubs, we found that the collaborative grants within rich clubs tend to have higher per-dollar impact than the average impact expected for the institutions participating in the collaboration. We emphasize that the higher impact of rich clubs is a genuine effect of collaboration because the impact of the single-institution grants is normalized to 1. These results support our hypothesis that collaborations among institutions in rich clubs are impactful.

Our results extend the findings on the rich clubs in grant collaboration networks shown in a previous study (Ma et al., 2015) in the following two aspects. First, we found that some collaboration-rich institutions tend to densely collaborate with each other in research grants involving fewer institutions, whereas other collaboration-rich institutions tend to do so in research grants involving more institutions. One factor underlying this phenomenon may be strategies of individual institutions regarding interdisciplinary research projects. Evidence suggests that interdisciplinary research projects are less likely to attract funding in a short term (Bromham et al., 2016), whereas they positively contribute to long-term funding performance (Sun et al., 2021). This tendency may affect funding strategy of individual researchers and institutions, which may affect the distribution of the size of collaboration in terms of the number of institutions for the institution to which the researchers belong. Note that Ma et al. employed the one-mode projection and therefore the impact of the size of collaboration is not a question that they focused on in their study. Second, the benefits of rich clubs to the per-dollar research impact seem to come from collaborations among the institutions that belong to the rich clubs. Ma et al. indicated that the rich clubs attract a large number or monetary amount of awards and tend to produce a large number of papers with high quality (Ma et al., 2015). In contrast, our results indicate that collaborations among the institutions in rich clubs are impactful in terms of the per-dollar research impact, whereas the institutions themselves with many collaborations are not particularly impactful.

The generality of rich clubs in grant collaboration networks deserves further investigation. For example, the presence of rich-club phenomena and their association with research impact may be stronger in some research disciplines than in others. Our results do not guarantee the association between rich clubs and research impact across different disciplines. In fact, the strength of the correlation between productivity and institutional collaborations in writing papers substantially depends on research disciplines (Abramo et al., 2009). Rich clubs and their relevance to research impact may also depend on funding agencies. The National Institute of Health financially encourages that multiple investigators with expertise in different health profession fields work together in research projects (Little et al., 2017), which may lead to rich-club phenomena in networks in which the node is a department or institution. Moreover, higher-order rich-club phenomena in grant collaboration networks may depend on the definition of the node. In fact, Ma et al. reported that a British collaboration network among investigators in which an edge represents two investigators’ co-funded research projects does not have rich clubs (Ma et al., 2015).

We did not address causality between rich clubs and research impact. Furthermore, the higher impact of the collaborative grants within the rich clubs may be associated with various properties of the member institutions other than the density of their collaborations, including the internationality of the faculty (Mamiseishvili and Rosser, 2009), departmental and institutional size (Dundar and Lewis, 1998), grant type (Jacob and Lefgren, 2011), and funding support from industries (Gulbrandsen and Smeby, 2005), which may affect research impact. Additionally, there are other forms of dense mesoscopic structure of grant collaboration networks, most famous one of which is probably the community structure. Such other forms of dense mesoscopic structure may also affect research impact. Examples of collaborations that may form such mesoscopic or community structures include teams composed of private universities that may be subsidized by their financial resources (Adams et al., 2005), collaborations among investigators from different departmental affiliations (Nagarajan et al., 2013), and collaborations between universities and industries (Ankrah and AL-Tabbaa, O., 2015). Moreover, many co-authorship networks among authors also show structures including the community structure and rich clubs (Girvan and Newman, 2002; Opsahl et al., 2008; Zeng et al., 2017). The present method is also applicable to the investigation of higher-order rich-club phenomena in co-authorship networks. Further exploring the associations and causality between mesoscopic structure of networks involving higher-order interaction and research impact for various types of scientific collaborations warrants future work.