Introduction

Collaborations have been recognised to play a pivotal role, alongside competition, in the innovation process of a country (de Solla Price and Beaver 1966). The primary benefit from collaborations is the share of knowledge, skills and techniques among partners, and such inflow of knowledge contributes to the “accumulation of knowledge” process that leads to economic development and growth (Katz and Martin 1997; Luukkonen et al. 1992). Not surprisingly, governments tend to encourage research collaborations among authors, including it in the funding conditions, in an attempt to increase the country’s scientific output (Lee and Bozeman 2005). Literature has shown that collaborations, and especially international collaborations, have a positive correlation with the number of publications (Lee and Bozeman 2005; McFadyen and Cannella 2004) and the publications’ impact (He et al. 2009; Wuchty et al. 2007).

However, having more collaborations not necessarily translates into higher scientific output, as not every collaboration is ideal (Lee and Bozeman 2005). While collaborations bring benefits to researchers, building a collaboration network also entails significant costs (Jeong et al. 2011; Katz and Martin 1997), hence authors must make trade-offs to maximise their utility. Authors in scientifically developed regions do not seek the same collaborations as those in scientifically weak regions. According to the centre-periphery hypothesis, researchers in less developed areas are willing to collaborate with those in more advanced areas in order to gain access to resources, knowledge, expertise etc., while authors in advanced regions seek for complementarities (Acosta et al. 2011). Empirical studies testing this hypothesis found positive results (Schubert and Sooryamoorthy 2010), although some scepticism remains (Acosta et al. 2011; Wagner and Leydesdorff 2005). Given that countries tend to follow different types of collaborations (Meyer and Persson 1998; Ozcan and Islam 2014), it is crucial to understand which types of collaborations are more beneficial for the studies countries/regions.

To further explain the mechanism of collaborations, few studies emphasize the need of differentiating the sub-unit level of advancement (Moed 2016; Duque et al. 2005; Moed and Halevi 2014; UNESCO 2014). The pattern and impact of collaborations is associated with a nation’s scientific development stage. However, it is important to point out that a generalised estimation assuming that a nation presents a homogeneous level of development can lead to inaccurate suggestions. This is in particular the case of China where large internal disparities persist (Fan et al. 2011; Wang et al. 2013b; Wang et al. 2019) but are often overlooked in scientific collaboration studies. A region’s development stage associated with its absorptive capacity has been shown to be closely connected with its innovation output (Wang et al. 2019). However, the relationship between regional development stage and scientific collaborative performance has hardly been investigated.

Since the opening-up reform, China has experienced an expansion in collaborations, especially in the international context. In the past decades, China established collaborations with more than 150 countries and the number of co-published articles in China increased faster than the average (Niu and Qiu 2014; Zhou and Glänzel 2010). While more and more collaboration cooperation agreements with different countries have been established, the Chinese government has been actively promoting international collaborations between Chinese and foreign researchers, aiming to enhance the internationalization of China’s scientific research activities (Andreosso-O’Callaghan 1999; Bound et al. 2013; Wang et al. 2017b). Yet, little is known about what types of collaboration are more beneficial for Chinese regions at various development stages.

Considering that the pattern of international collaborations also varies greatly across different disciplines (Luukkonen et al. 1992; Coccia and Wang 2016), our study focuses on collaborations in one field, i.e. nanotechnology. As a promising, rapidly developing high-tech sector (Zheng et al. 2014), nanotechnology has a great potential for the future development of a wide-range of areas. This is also the field where extensive research funding has been fuelled by Chinese government (Bai 2005) and remarkable development has been observed in different regions (Wang et al. 2019).

This study will contribute to the existing literature, by further exploring the correlation between collaborations and scientific output, in the context of nanotechnology-specific knowledge production, among the Chinese regions. This papers aims to answer the following questions:

  1. 1.

    Does domestic collaboration positively affect the scientific research output in Chinese regions?

  2. 2.

    Does international collaboration positively affect the scientific research output in Chinese regions?

  3. 3.

    Does a centrally located position positively affect a region’s scientific research output?

  4. 4.

    Does regional development stage matter in benefiting from domestic/international collaborations? If so, what types of collaborations are more suitable for low-capability regions and what types of collaborations are more beneficial for high-capability regions?

The structure of the paper is as follows. The next section will present a theoretical and empirical background to the study. Section three will present the data used and will discuss the methodology, including the variables and model specification adopted. Section four will discuss the results from the econometric analysis. Finally, the last section will present conclusions, limitations and some policy advice.

Background

Domestic/international collaborations and research output

Most literature holds a positive view on the contribution of collaborations to scientific output. Half a century ago, de Solla Price and Beaver (1966) found a positive correlation between the amount of collaborations an author has, and his or her total number of publications, suggesting that the most productive researchers are also the ones collaborating the most. This theory has been supported by recent empirical studies, which found a positive association of collaborations with scientific output (i.e. the number of publications) and with their social impact (i.e. the number of citations) (Adams et al. 2005; Glänzel 2001; McFadyen and Cannella 2004; McFadyen et al. 2009).

The strength of the collaborations in particular has been considered as an important element. Having strong partnerships entails higher trust and reciprocity, which reduces the costs and risks from new collaborations, and in turns positively affects the scientific output (Gonzalez-Brambila et al. 2013; Guan et al. 2015a).

Collaborations are differentiated in the literature between domestic and international collaborations. Existing literature argues that international collaborations are preferable over domestic ones as they seem to produce higher impact (Jeong et al. 2014; Leydesdorff et al. 2014; Tang and Shapira 2012; Wagner et al. 2018). Ebersberger et al. (2014) found that, while a high level of regional technological specialization and technological variety were negatively correlated with domestic extra-regional collaborations, technological variety was positively associated with the propensity to collaborate with foreign partners. Such findings suggest a strong preference of researchers towards international collaborations, rather than domestic networks. Chinese policies in recent years have been in line with this view, actively promoting transnational collaborations through the participation to different international research projects, such as Framework Programme 7, Horizon 2020 and World-Class 2.0 (European Commission 2007, 2018; Zhao 2018).

Given the unbalanced development across China, whether different regions have the capability to manipulate and benefit from such collaborations remains a question. As aforementioned in the introduction section, an important gap in the existing literature consists in the focus on most studies on country level analyses, while only few studies have differentiated for the sub-unit level of advancement when exploring ‘who’ benefits from those collaborations (Duque et al. 2005). With the aim to fill this gap, this study investigates the impact of different types of collaborations on the scientific output of Chinese regions at different development stages.

Diversified collaborations and research output

Scholars have also studied how diversified collaborations impact research performance. Some studies suggest that differences between the collaborators negatively impact on their performance, as they could hamper the knowledge flow and exchange (Hoskisson et al. 2002). Nevertheless, a large body of literature has shown that collaborative diversity potentially contributes to the innovation process, allowing partners to access a diverse range of skills, expertise and knowledge (Gonzalez-Brambila et al. 2013; Guan et al. 2015a).

To examine the diversity of collaborations, a large body of the literature has applied social network analysis to study a partner’s position embedded in the collaboration networks (Abbasi et al. 2012). Partners involved in a social network share workload, information, skills and expertise, equipment, etc. (Lee and Bozeman 2005; Li et al. 2013). The configuration of linkages among partners, i.e. who you reach and how (Nahapiet and Ghoshal 1998), is a main determinant in the process of knowledge creation (McFadyen and Cannella 2004). Researchers attempt to answer the question of how partners should change their interactions to acquire a more desirable and advantageous position within a network (Li et al. 2013). Given that the most influential partner(s) are often the ones located in central places, researchers have developed several centrality measures to determine the importance of a partner within a network (Abbasi et al. 2011). The main measures used include degree centrality, closeness centrality, betweenness centrality and eigenvector centrality, which measure the following respectively: the number of collaborators each node has; the distance of a node to other nodes in the network; the number of times a node lies ‘between’ other pairs of nodes; the value of centrality of the nodes connected to a node (Abbasi et al. 2011).

Actors embedded in a large and cohesive network (with a high degree of centrality) are believed to hold an advantageous position, which allows them to access new, richer and diverse information (Gonzalez-Brambila et al. 2013; Phelps et al. 2012). Having more ties in the network (i.e. being in a centralized position) can promote the generation of novel and useful ideas (McFadyen and Cannella 2004; McFadyen et al. 2009).

A large stream of the literature has supported the importance of holding a bridging position within the network (high betweenness centrality), as it allows the actors to benefit from the access to new, non-redundant knowledge and resources (Burt 1992; Guan et al. 2015b; Li et al. 2013; Gonzalez-Brambila et al. 2013). In their studies, Gonzalez-Brambila et al. (2013) measured a striking prevalence for the role of brokerage over cohesion on both citations and publications output, while Li et al. (2013) found betweenness centrality to have the largest and most significant impact on citations, out of the six indicators analysed. Guan et al. (2015b) built a multilevel collaboration network to measure the effect of network position in the process of innovation production; using betweenness centrality as a measure of structural centrality, they found a positive significant effect of the latter on innovation output. In line with the literature, we chose to use betweenness centrality as a measure of verified collaboration with an important position.

The analysis contributes to the existing literature in different ways. First, most studies in the literature focused on the impact of network characteristics on knowledge transfer and flow, or on publications impact (McFadyen et al. 2009), while only a few studies examined the impact on knowledge creation. This work contributes to the literature by estimating the relationship between the network position and knowledge creation—measured by publications- in the field of nanotechnology. In addition, this work provides evidence of the impact that a broad collaboration structure has on the regional knowledge creation in China, over a 17-year dataset.

The case of nanoscience development among Chinese regions

Nanotechnology is a promising, rapidly developing high-tech sector (Zheng et al. 2014). Nanoscience results from the cooperation of multidisciplinary fields, i.e. chemistry, physics, biotechnology, engineering and material science, towards the study of atoms and molecules (Schummer 2004). This field has a great potential for the future development of a wide-range of areas including energy, healthcare, pharmaceutical industry, food industry and climate change (Sozer and Kokini 2009; Zheng et al. 2014). Not surprisingly, nanotechnology is progressing rapidly, and international collaborations have been found to play a significant role in its development (Zheng et al. 2014). Over the past decades, China significantly invested in nanotechnology, and it rapidly became one of the leading nations for what concerns the share of the world’s publication on nanotechnology (Tang and Shapira 2011b; Zhou and Leydesdorff 2006). In 2004, China’s world share of nano-related publications − 8.34%—was higher than the country’s average world share of publications − 6.52%—(Zhou and Leydesdorff 2006). The development of nanotechnology in China reflects the desire from the government to move a developing country towards the global technology-economic frontier (Wang et al. 2019) and gain a leading role within the international context. Therefore, studying the extent to which collaborations impact on China’s regional development in a highly relevant-to-the-government sector can provide guidance for more effective investments.

Along with the rapid growth in scientific and technological output related to nanotechnology, an unbalanced regional development is also observed in China. Wang et al. (2012) studied nanotechnology collaborations between China and the US and they measured that such collaborations were asymmetrical, with a small group of Chinese scientists working with a large number of US scientists. Tang and Shapira (2011a) found that a small number of “elite” universities in China were collaborating with a wide range of universities in the US, while the majority of the Chinese universities had few (or no) partnerships. Using patent network data, Ozcan and Islam (2014) find that the highly centralized networks in China were dominated by a few large players.

A possible explanation can be found in the role played by technological proximity, which refers to the shared knowledge base of different collaborators (Cunningham and Werker 2012). According to Cunningham and Werker (2012), actors must be different enough to benefit from each other’s knowledge, but sufficiently similar to understand each other. Therefore, not all the regions might benefit from the same type of collaborations, simply because they might be too much similar to/different from each other. Less developed regions in China are too different from foreign leading countries, hence they could not maximise the benefits from such collaborations. While developed Chinese regions might have the necessary capacity and resources to optimise the benefits coming from foreign partners, other less-developed regions do not have the capability necessary to benefit from those collaborations, due to barriers and costs, such as financial resources, administration, etc. (Katz and Martin 1997). Moreover, international collaborations involve significantly higher costs than shorter-distance collaborations (Wagner 2006). Thus, regions face trade-offs between costs and benefits to decide the best strategy to implement (Jeong et al. 2014). In this study we will follow the approach of Duque et al. (2005) and Lee and Bozeman (2005), and we investigate “who” benefits from “what”.

Data collection and methodology

Data collection

For the bibliometric analysis, this paper uses a panel dataset, from 1999 to 2015, containing records of the collaborations for co-publications on nanotechnology, among 30 Chinese mainland regionsFootnote 1 and between the mainland regions and 27 non-mainland/foreign regions.Footnote 2 The dataset contains 419,910 total publications in the nanotechnology field, obtained from Clarivate Analytics Web of Science (WoS), Science Citation Index Expanded. Moreover, the dataset was constructed using a lexical query searching and defining strategy developed by the Georgia Institute of Technology (see more in Porter et al. 2008; Wang et al. 2013a). The query used to search for publications on nanotechnology is reported in the “Appendix”. The analysis is based on based on 280,543 meso- and macro-level collaboration connections, 190,126 of which are among the Chinese mainland provinces, while 90,417 collaborations are between the mainland provinces and 27 non-mainland/foreign regions. The data for the control variables implemented in the analysis was obtained and re-elaborated from: China National Intellectual Property Administration (CNIPA),Footnote 3 China Statistical Yearbook (various issues) and China Statistical Yearbook on Science and Technology (various issues). See Fig. 1 for the steps of data collection and processing.

Fig. 1
figure 1

Steps of data collection and processing

As aforementioned, the unbalance in regional development is as one of the main issues characterising Chinese regions; hence, this paper will need to separate the estimations, accounting for different regional levels, in order to obtain more significant and unbiased results. As there is no standard classification for Chinese regions, various attempts have been made, including the twofold (i.e. coastal and inland) and threefold (coastal, middle and western) geographical groups (Hao and Wei 2010; Wang and Szirmai 2013). To reflect the scientific research ability, we contend that it is more appropriate to make the regional division based on the volume of scientific knowledge. Therefore, following the methodology used by Wang et al. (2019), this work uses the total number of scientific publications as the criterion to divide the 30 Chinese regions under study into three groups,Footnote 4 respectively “advanced”, “medium” and “lagging” regions, based on their knowledge capability level, as measured by the number of total annual regional publications. These three regional groups are also referred to high-, medium- and low-capability regions.

Variables

Dependent variable

The dependent variable in this analysis is the total number of regional nanotechnology-related publications in each year, which is used to represent the level of nanotech-knowledge generated in the specific region and specific year.

Collaboration variables

Depth of domestic collaborations

Based on the matrix of scientific collaborations between Chinese regions, we calculate the intensity of such collaborations using the Jaccard index, to measure the strength of bilateral relationships between different regions (Luukkonen et al. 1993; Wang et al. 2017b). The index was firstly introduced in 1973 by Henry Small; given two different sets of papers for regions X and Y, the Jaccard index is defined as the intersection (number of co-authored papers) divided by the union of the two sample sets (Leydesdorff 2008; Small 1973). Numerically, the domestic collaboration intensity can be written as:

$$CI_{xy} = \frac{{Coll_{xy} }}{{Pub_{x} + Pub_{y} - Coll_{xy} }}$$
(1)

where CI is the intensity of collaboration (Jaccard) index, \(Coll_{xy}\) is the total number of collaborations between regions X and Y; \(Pub_{x}\) is the total number of scientific publications in region X and \(Pub_{y}\) is the total number of scientific publications in region Y. The index can take values between 0 and 1, where 0 indicates that there are no collaborations between the two regions, and value 1 means that the total number of publications equals the number of collaborations; i.e. 100% of the publications in X and Y are the result of the collaborations between the two regions (Boschma et al. 2014). To obtain more meaningful results, the \(CI\) variable for collaboration intensity with other regions is not calculated out of the total sample, implying that there is not one single variable for “domestic collaboration intensity”. Instead, three variables are created, each measuring the intensity of collaborations with advanced regions, with medium regions, and with lagging regions respectively.

Depth of international collaborations

Similar to the index to capture the depth of domestic collaborations (see Eq. 1), one can also construct the variable to measure the international collaborations. However, the correlation between the two types of collaboration intensities will influence the regression quality. Hence we build the external–internal (EI) index to examine the depth of international collaborations. This measures the dominance of external ties in a network, over the internal ones (Krackhardt and Stern 1988). The EI index, proposed by Krackhardt and Stern (1988), focuses on the relative stronger role of “friendships” between organisations’ subunits, to those within units, and is obtained by the following formula:

$$E{-}I\;{\text{index}} = \frac{{E_{i} - I_{i} }}{{E_{i} + I_{i} }}$$
(2)

where \(E_{i}\) is the number of ties external to the network, while \(I_{i}\) is the number of internal ties. The index ranges from − 1 to + 1, and it indicates the extent to which the network ties cut across the group boundaries: a value of “− 1” indicates that all the network ties are within the group (with strong domestic collaborations), while “+ 1” means that all the network ties happen across group (with strong international collaborations). The value “0” is a special case in which the number of ties within and across groups are identical (Gonzalez-Brambila et al. 2013; Krackhardt and Stern 1988; Tortoriello and Krackhardt 2010).

We calculate the EI index, measuring the dominance of external over internal network ties. In every region, external ties are all the connections that involve collaborations between this region and foreign partner(s), while internal ties are the collaborations between the region and other domestic partners. Regions might be constrained in the population size and amount of resources they can dedicate to building network collaborations (Krackhardt and Stern 1988), resulting in a trade-off between the amount of internal and external relationships they can maintain. Hence the EI index adopted in this analysis allows us to consider the impact of external collaborations relative to the internal ones.

Breadth of collaborations (with importantly embedded positions)

Besides the strength of domestic and international collaborations mentioned above, it is also important to understand how broadly and importantly a player is embedded in the collaboration network. To this end, as mentioned in “Diversified collaborations and research output” section, several centrality measures (including degree centrality, closeness centrality, betweenness centrality and eigenvector centrality) were used. These centrality measures are highly correlated with each other and share a great deal of similarity (Guan et al. 2015b; Valente et al. 2008). Moreover, existing literature has shown that, in capturing the advantageous position in a network, betweenness centrality is more relevant than other centrality measures (see for example Gonzalez-Brambila et al. 2013; Guan et al. 2015b; Li et al. 2013). Hence, only betweenness centrality will be used for the estimations in the regressions.

Betweenness centrality measure, first proposed by Freeman (1977), is defined as the extent to which a node lies on the paths between others. In other words, it is a measure of the fraction of shortest paths (or ‘geodesic’ distance) between pairs of nodes in the network that pass through the given node (Newman 2005). It is calculated based on the formula below:

$$C_{\text{B}} (v) = \mathop \sum \limits_{s \ne v \ne t \in V} \frac{{\sigma_{\text{st}} (v)}}{{\sigma_{\text{st}} }}$$
(3)

where the betweenness centrality \((C_{\text{B}} )\) of a node v is equal to the sum of the number of shortest paths (σ) from any node s to a node t, which the node v lies on, divided by the number of paths from s to t, with v, s and t belonging to the set of vertices/nodes \({V}\) (Brandes 2001). Moreover, the betweenness centrality measure is normalised to obtain a value between 0 and 1, where 0 is the lowest betweenness, and 1 is the highest (Freeman 1977).

Control variables

Productivity of industrial innovations—granted patents to R&D ratio

Researchers have extensively made use of patenting counts as indicators for the current state of a technology level (Archibugi and Pianta 1996; Youtie et al. 2008). Therefore, the model includes the regional yearly number of industry patents, divided by the total regional level of R&D expenditure, as a proxy for the region’s productivity of industrial innovations.

Human capital ratio—share of university personnel

Evidence suggests that the level of research personnel has a significant impact on technological output (Zhang 2017). Therefore, the model controls for the regional level of university personnel over the regional population, lagged by 1 year. There are two main reasons for observing the personnel level at time t − 1: first, the lagged value of the explanatory variable reflects the fact that the knowledge production in a specific year is the result of a research in the past (most likely in the previous year) (Fu 2008). Second, using 1 year lag for the independent variable, allows us to eliminate possible endogeneity bias for the variables of interest (Zhang 2017).

Openness level—FDI to GDP ratio

Literature has widely confirmed the significant role of foreign investment in the development of scientific knowledge and output, and there is large evidence also for the specific case of China (Fu 2008; Kuo and Yang 2008; Zhang 2017). The annual regional FDI is normalized to make it comparable, by dividing it for the regional level of yearly GDP. Same as the previous variable, in order to capture the lagged contribution of this factor and to eliminate possible endogeneity bias from the model, this indicator is lagged by 1 year.

Newly added human capital—graduates

The total value of regional graduates, i.e. the number of individuals with a Masters or PhD in a region in a given year, captures the effect of newly added human capital in the process of knowledge production.Footnote 5 In the analysis, the total number of regional graduates is normalized by taking the natural logarithm. Moreover, similarly to the university personnel and FDI variables, a 1 year lag is considered for possible delayed impacts of newly-graduated students on scientific productivity (time for finding a job, time to work on a publication etc.), as well as to eliminate endogeneity biases.

General economic development level—regional GDP

Given the unbalance in development across the Chinese provinces, it is important to control for regional GDP, which considers the different stages of economic advancement of the regions (Zhang 2017). Taking the natural logarithm of GDP is necessary to make the results comparable, given the highly skewed distribution of gross domestic product among the Chinese provinces. Similar to human capital and openness variables, 1 year lag is also taken for this indicator.

Model

As discussed in previous section, the dependent variable in this analysis is the number of nanotechnology publications, which is a non-negative integer. Therefore, count data models, such as Poisson or Negative Binomial (NB) models are preferred to traditionally used models such as Ordinary Least Square (OLS) models (Gonzalez-Brambila et al. 2013; Guan and Liu 2016; Guan et al. 2015b; Wang et al. 2019).

The Poisson model is a maximum likelihood estimation model, used in the literature for non-negative count data, which follows a Poisson distribution (Wooldridge 2002). The Poisson model is a restricted version of the Negative Binomial, and it imposes a strong assumption of constant variance; i.e. it requires the mean of the dependent variable to be equal to its variance (Gardner et al. 1995; Guan et al. 2015b). However, the dataset in our analysis suffers from over-dispersion, as shown by the standard deviation of nanotechnology publications (\(\sigma = 1332.21\)) significantly exceeding its mean (\(\mu = 823.4\)). It presents a strongly right-skewed distribution, with many regions producing a small number of publications, and few regions producing a significantly high amount of publications. Since this could lead to low standard errors and p values, the Negative Binomial model results to be the most appropriate model, as it allows to correct for over-dispersion in the dependent variable (Guan and Liu 2016).

The Negative Binomial model was proposed by Hausman et al. (1984), to estimate the relationship between patents and R&D expenditure, between 1968 and 1974, using a non-negative integer as the dependent variable, and accounting for the yearly random and fixed effects. Such model addresses the limitations from the Poisson model’s assumptions, by adding an alpha (\(\alpha\)) parameter to the regression, which accounts for the unobserved heterogeneity among the observations (Cameron and Trivedi 2005). The negative binomial regression model in our analysis is as follows:

$$P\left( {Y = y_{it} |x_{it} } \right) = \frac{{\varGamma \left( {y_{it} + \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \alpha }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\alpha $}}} \right)} \right)}}{{\varGamma \left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \alpha }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\alpha $}}} \right)\gamma \left( {1 + y_{it} } \right)}}\left( {\frac{1}{{1 + \alpha \mu_{it} }}} \right)^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 \alpha }}\right.\kern-0pt} \!\lower0.7ex\hbox{$\alpha $}}}} \left( {\frac{{\alpha \mu_{it} }}{{1 + \alpha \mu_{it} }}} \right)^{{y_{it} }}$$
(4)

where \(y_{it}\) is the number of nano-publications in region \(i\) at year \(t\), \(\varGamma\) is the gamma function, \(\mu_{it}\) is the conditional mean, \(\alpha\) is the degree of dispersion. The conditional mean \(\mu_{it}\) can be expressed as:

$$\begin{aligned} \mu_{it} & = { \exp }(\beta_{0} + \beta_{1} * CI_{it}^{high} + \beta_{2} *CI_{it}^{medium} + \beta_{3} *CI_{it}^{low} + \beta_{4} *(E - I)_{it} \\ & \quad + \beta_{5 } *BETWEENNESS_{it} + \beta_{6} *(TOTPAT/R\& D)_{it} + \beta_{7} *UNIEMPL_{it - 1}^{share} \\ & \quad + \beta_{8*} (FDI/GDP)_{it - 1} + \beta_{9} *lnGRAD_{it - 1} + \beta_{10} *lnGDP_{it} +\delta_{it}) \\ \end{aligned}$$
(5)

where \(CI_{it}^{high}\), \(CI_{it}^{medium}\), \(CI_{it}^{low}\) are the collaboration intensities with advanced, medium and lagging regions respectively, for region \(i\) in year \(t\). \((E - I)_{it}\) is the internationalization index, while \(BETWEENNESS_{it}\) is the normalized betweenness centrality. \((TOTPAT/R\& D)_{it}\), \(UNIEMPL_{it - 1}^{share}\), \((FDI/GDP)_{it - 1}\), \(lnGRAD_{it - 1}\) and \(lnGDP_{it}\) are the control variables: total granted patents ratio to R&D expenditure, the share of university personnel, the FDI over GDP ratio, the share of graduates and the log of regional GDP. Apart from the total granted patents ratio to R&D expenditure \((TOTPAT/R\& D)_{it}\), and regional GDP (\(lnGDP_{it}\)), which are measured for year \(t\), the other control variables are taken at time year \(t - 1\), for the reasons explained above.

Although it is widely used in the literature, we must acknowledge two main limitations of the Hausman et al. (1984) model. First, Allison and Waterman (2002) argued how the fixed-effect negative binomial model by Hausman et al. (1984) is not a “true” fixed-effect model. They demonstrated that the model does not control for all the covariates that remain stable over time. Given such an issue with the fixed-effect specification for the NB model, we adopted the population averaged (PA) Negative Binomial model (Cameron and Trivedi 2005; Wang et al. 2019). The PA specification model considers the average of the random effects, instead of using the observation-specific random effects, and it provides estimates for the average of the observations (Cameron and Trivedi 2005).Footnote 6 To prove for robustness of the NB model, the analysis compares the estimates with those from a fixed-effect (FE) Poisson regression. The FE Poisson is a more advanced version of the normal Poisson, and it allows estimating correlations in longitudinal datasets. Although Poisson model’s fit for the sample in analysis is not ideal and produces biased estimators, the FE Poisson has strong robustness properties for the estimation of parameters (Wooldridge 2002), thus allowing us to compare estimates between models.

The second limitation is that the results of the Negative Binomial model might suffer from relative lower precision, thus reducing the accuracy of the conclusions, due to the larger standard errors estimated by the model (Gonzalez-Brambila et al. 2013). However, in testing the strength of the estimations, through a cross-models comparison, it is possible to control also for the consistency in the significance of the estimations. To provide an additional robustness check for the estimated model, the results from NB and FE Poisson models will also be compared to estimations using an OLS model, where the dependent variable is the natural logarithm of the nano-technology publications. Although the magnitude of the in the OLS estimations are not comparable to those from the other two models, given the different dependent variables adopted, nevertheless it is used in the literature to provide information about the general direction and magnitude of the regressors’ estimated coefficients (Guan et al. 2015b).

Results

Descriptive statistics

Table 1 summarises the general mean, standard deviation, minimum and maximum values for the variables used in the analysis, for the whole dataset. It is possible to observe that while the mean of nano-publications was equal to 823.35 for the period 1999–2015, the standard deviation (\(\sigma = 1332.21\)) was higher than the mean, suggesting a highly skewed distribution across the regions, as already mentioned in the methodology discussion (“Data collection and methodology” section). Table 1 shows that the standard deviation is significantly high for most of the variables used in the analysis, and in few cases, it exceeds the mean, suggesting for over-dispersion, as in the case of ‘nano-publications’, ‘betweenness centrality’ and ‘FDI/GDP’ ratio.

Table 1 Descriptive statistics for the full sample

To better describe the situation across regions, Table 2 presents the descriptive statistics, divided by the three groups. The table shows that the mean \((\mu )\) of nano-publications was 110.34 for lagging regions, while it was 6–16 times higher for medium and advanced regions (\(\mu = 617.45\) and 1742.27, respectively).

Table 2 Descriptive statistics by regional groups

For what concerns the explanatory variables of the model, the summary statistics table highlights a significant difference in the intensity of collaborations, both nationally and internationally, among groups. For example, the average of collaboration intensity with advanced, medium and lagging regions, for medium regions is generally two to three times larger than for the lagging ones; for advanced regions, it is two to five times larger than the lagging ones. A similar trend applies to betweenness centrality, with lagging regions having a mean significantly lower than medium and advanced regions.

Regarding the control variables, Table 2 shows that the share of research personnel is remarkably unbalanced across regions, with lagging regions having on average less than half of the personnel in advanced regions. Moreover, the mean of the ratio of FDI to regional GDP in lagging and medium regions is significantly lower than that in advanced ones.

It is important to note that Table 2 presents a particularly high standard deviation for many of the variables, hinting that even after dividing the regions in sub-groups, these are still not perfectly homogeneous. Although the groups division and the estimates that will be presented are considered robust, it is necessary to be careful in the interpretation and generalisation of results.

Table 3 shows the correlation matrix for the explanatory variables of the model. While most of the variables have a correlation lower than 0.6, the table shows few pairs of variables significantly correlated. Those particularly concern the first variable (collaboration intensity with leading regions), which is correlated with \(CI_{it}^{medium}\) (\(\rho = 0.703\)), university personnel (\(\rho = 0.634\)), regional graduates (\(\rho = 0.734\)), and GDP (\(\rho = 0.606\)). Also, betweenness centrality is correlated with university personnel (\(\rho = 0.673\)), while regional graduates are also correlated with GDP (\(\rho = 0.850\)).

Table 3 Correlation matrix

Since the negative binomial model does not provide statistics to control for possible multicollinearity that might raise from such high correlation (McFadyen et al. 2009), it is necessary to make robustness checks for the correlated variables. First, since the highest correlation concerned two of the control parameters, Table 7 in “Appendix” reports the results from checks for multicollinearity between the control variables by regressing them together (Model 1), as well as taking out only log GDP (Model 2), or regional graduates (Model 3). No significant changes in the coefficient signs or significance levels are revealed for the variables. Moreover, the Wald test at the bottom of the table indicates that models (2) and (3) are less significant than (1), suggesting that all the control variables should be included. Models (4), (5) and (6) in Table 7 control for the correlation between \(CI_{it}^{high}\) and \(CI_{it}^{medium}\), showing no significant change in the coefficients. The same applies for the regressions in columns (7), (8) and (9), which control for the correlation of \(CI_{it}^{high}\) with university personnel, graduates and GDP. Finally, columns (10) and (11) show no significant effects for the correlation between betweenness and university personnel. It follows that there is no important multicollinearity in our model, hence conclusions can be drawn.

Econometric analysis

The impact of explanatory variables

After controlling for biases due to multicollinearity, it is possible to analyse the regressions output. Table 4 presents the results from the negative binomial model for the full sample, with different combinations of independent variables. Table 5, instead, shows the results for the main regressions in Table 4, divided for the three regional groups—respectively advanced, medium and lagging regions. This was calculated using Stata’s split coefficient command, to gain a better understanding of the impact of the explanatory variables on knowledge creation, for different regions. The models include all the control variables, as well as dummy variables for the year fixed-effects, which have been omitted from the regression tables. In Tables 4 and 5, column (1) provides estimates just for the control variables. Columns (2) to (5) present the estimates of the collaboration intensities with advanced, medium and lagging regions, first individually, then combined. Column (6) shows the estimate for EI index, and column (7) estimates betweenness centrality impact alone. Column (8) shows the estimates for the full model.

Table 4 Results of negative binomial regressions for the full dataset
Table 5 Results of negative binomial regressions by regional groups

Starting from the collaboration indicators, in Table 4 it is possible to see how collaborating with advanced regions has a strong positive effect on publications, which is consistent across specifications. As predicted, Chinese regions benefit significantly from collaborating with regions that have higher knowledge stock, resources and experience. This result is stable across groups, as shown in Table 5, and interestingly lagging regions have a higher relative impact from such collaborations as compared to the other groups.

With regard to collaboration intensity with medium regions, Table 4 provides a positive strong effect on publications; however, in the tripartite analysis, a positive significant effect is found only for medium regions, namely within the same regional group. The coefficients are not significant for other regional groups (i.e. lagging and advanced groups). This could reflect the fact that many medium regions tend to cooperate among themselves, given the similar background and level of knowledge capabilities. In fact, sharing similar level of knowledge, expertise and understanding between the parties is a crucial element to build a fruitful collaboration (Li et al. 2013). This within-group collaboration effect could also explain the significant coefficient in Table 5, Col. 4, presenting the collaboration impact of lagging regions with other lagging regions (\(b_{(lagging)} = 26.38\)). However, the coefficient of collaboration with lagging regions becomes not significant in the whole model (Table 4, Col. 8), suggesting that the negative effect received by more developed regions completely offsets the positive effect received by less developed regions. Advanced regions present a significantly negative coefficient from the collaboration with lagging regions (\(b_{(lagging)} = - 92.16\)). Since more advanced regions have higher knowledge stock and resources, they do not benefit from collaborations with less developed regions, although they still bear the costs from such collaborations, and that is where the negative impact might derive from. From another perspective, the negative effect might be associated with collaboration patterns (e.g. who initiated the research, who leads the research, what is required by the funding sponsor, etc). Due to the lack of matching data between authors and their affiliations in our earlier data, this study does not examine roles of collaboration. This, however, is of interest to be tested in future research.

Moving on to the internationalisation index (or EI index), Table 4 shows a positive and statistically significant coefficient in the full model specification. However, from the tripartite analysis (Table 5) it appears that EI index has a positive impact only for advanced regions. Medium and lagging regions do not show any statistically significant coefficient for the internationalisation index, suggesting that there might be significant costs associated with the opening to international collaborations, such as financial, administrative and cultural costs (Wagner 2006). In line with previous arguments from the literature, the regression results suggest that such costs might offset the benefits of less developed regions. Scientifically advanced areas are characterised by higher human capital, financial resources and technological capabilities, which ensure a high absorptive capacity (Fu 2008), thus allowing them to benefit more from a broad network structure. In contrast, lagging regions in China do not seem to have reached that stage yet. Considering the discrepancy of regional development in China, this finding calls attention for Chinese government in implementing collaboration policies. Not all Chinese regions, at least at the current stage, can leverage the knowledge flows generated via collaboration.

Finally, the index of collaboration breadth (proxied by betweenness centrality) shows a strong positive effect in the pooled estimation in Table 4, both in model (7) only including betweenness, and in the full model (8). This finding is in line with the literature, which suggests that actors in a more central position have access to diverse and non-redundant information, which is ultimately reflected by a higher number of nanotechnology-related publications. However, Table 5 highlights that the positive effect from betweenness is not homogeneously distributed across regions; the positive coefficient found in the pooled estimation is hiding a strongly significant impact for advanced and medium regions only. The coefficient for lagging regions is not statistically significant, indicating that those regions do not benefit from filling the structural holes in the network. Perhaps, the low level of absorptive capacity in lagging regions causes the costs from maintaining central positions in the network to exceed the benefits, thus having a reversal effect on their performance.

Together with the regression results regarding collaboration depth variables explained above, this calls attention from Chinese government to take regional capability into consideration while implementing collaboration policies. High capability regions can benefit from broad collaboration networks and leveraging knowledge from diversified partners, while low capability regions seem to benefit more from concentrated collaborations with advanced domestic partners.

The impact of control variables

In general, the ratio of total granted patents to R&D expenditures has a negative, statistically non-significant coefficient in almost all the regression models. This holds for both the pooled analysis and the estimations by regional groups. A possible explanation for the lack in significance is that the patents data are industry-related, thus a change in the parameter does not directly affect academic publications.

With regard to university personnel, the coefficient is not statistically significant in the pooled estimation; however, from Table 5 we can see that such insignificance is due to the negative sign for the advanced regions’ group, which offsets the effect of lagging and medium regions. This may suggest that leading regions already have a high level of university personnel, and more personnel input cannot be used effectively, which indicates diminishing returns in the knowledge production process in such regions. However, in the lagging and medium regions, there might be still increasing returns to scale, hence one unit of personnel input will produce more than one unit of scientific output. Therefore, an increase in the personnel inputs is more valuable in lagging and medium regions than in advance regions.

The openness variable (FDI/GDP) is also not significant in the pooled analysis; however, in Table 5 we can see that while the coefficient is not significant for lagging and medium regions, it is positively significant for advanced regions. This result corresponds with the effect of aforementioned international collaboration depth. Considering that advanced regions have higher scientific capability and can transform the interaction with foreign countries into positive gain in knowledge production. This finding is in line with that of Wang et al. (2017a), i.e. whether or not one region benefits from FDI depends on this region’s local capacity. Finally, the variable of newly added human capital (regional graduates) and general economic/development level (region GDP) are both significant in the pooled analysis, and such significance is further explored in Table 5. The variable of regional graduates is positively significant in lagging and medium regions. This, similar to the personnel variable explained earlier, shows that human capital input is more needed in lagging and medium regions than in advanced regions.

Robustness checks

The robustness of the specified model was discussed, and the model was proved to be sound and to provide consistent estimates after running separate regressions testing correlated variables one at the time (see Table 7 in the “Appendix”). Some additional analyses have been conducted to confirm the reliability of the estimations obtained from the negative binomial model. Firstly, the model was re-estimated using Poisson fixed-effect specification, with robust standard errors. As explained above, although our data does not follow a Poisson distribution, and it is characterised by over-dispersion, the Poisson FE model has been indicated by the literature to have some robustness properties (Wooldridge 2002), and it should provide some comparable estimates. Secondly, the regression was re-estimated using ordinary least square (OLS) methodology, after taking the natural logarithm of the dependent variable: the log of nanotechnology publications. Although the model uses a different dependent variable, making it hard to compare the estimation coefficients, it is used in the literature to test the robustness and consistency of the NB results (see for example Guan et al. 2015b). Table 6 reports the results from the two control regressions, as well as the results from the original NB model.

Table 6 Estimations for robustness checks—cross models comparison

Collaboration intensity with advanced regions is highly significant and positive in all the three models. Collaboration intensity with medium regions, instead, is only significant in the NB and OLS models, it is not significant in the Poisson model. Probably the over-dispersion problem caused estimations to be inaccurate; however, the coefficient of \(CI_{it}^{medium}\) has a positive sign in all the three models, suggesting a similar direction of the impact. Collaboration intensity with lagging regions maintains a negative direction in all the three estimation models, although it does not show a significant coefficient for the NB and OLS models. Similarly, the internationalization index maintains a positive estimation coefficient in all the three models, although it is only significant for models (1) and (2). Finally, betweenness centrality is positive and significant in all the three models. For what concerns the control variables, Table 6 also shows consistency across different estimation models.

Although the coefficients estimated are not identical for the three models, we can conclude that overall the NB model used is robust. In fact, despite the enormous issues and limitations coming from the application of the OLS and Poisson FE models for the dataset in analysis, the robustness checks proved that the estimates form the NB model are consistent, and we found a good matching rate with the control models.

Discussion and conclusions

Collaboration patterns are believed to be connected with a country’s scientific development stages (Moed 2016; Moed and Halevi 2014; UNESCO 2014). In this context, it is crucial to understand what types of collaboration are more beneficial for what types of country/region. Employing the scientific articles published by Chinese researchers in the nanotechnology field, this paper examined the impact of collaboration depth (with domestic and foreign partners) and collaboration breadth on the regional scientific output.

Results from our analysis suggest that benefits from collaborations depend on the development stages of collaborative partners. The strong positive effect of collaborations with more advanced domestic regions has been confirmed for all the subgroups; lagging regions have been found to benefit the most from collaborations with scientifically advanced regions. However, the contribution of collaborations with medium regions and lagging regions is very limited, existing only for the within-group collaborations. These results prove that the primary benefit from collaborations is the sharing of skills and knowledge between the parties (Katz and Martin 1997). If there is not much knowledge for advanced regions to learn from the lagging regions, such collaborations are not beneficial to the former.

Regarding the strength of international collaborations, our study finds a significant positive effect in advanced regions; this is in line with the literature supporting international collaborations as a mean to achieve diverse knowledge and information, which can foster the national innovation capacity (see for example Glänzel 2001; Guan et al. 2015b; Leydesdorff et al. 2014). However, such positive and significant effect does not hold for less developed regions. This seems to be because less developed Chinese regions and foreign countries are characterised by very different levels of knowledge and capability. Therefore, the costs of collaborating internationally exceed the benefits for less developed regions, and they hamper the effectiveness of such collaborations, resulting in little/no knowledge creation (Katz and Martin 1997; Wagner 2006).

This confirms that, in order to benefit from scientific collaboration, it is important that the two actors involved share a similar level of knowledge and understanding (Gonzalez-Brambila et al. 2013). Through collaboration, partners can fill the knowledge gap between them, and create new knowledge. However, if the gap between collaborators is too wide in terms of capability, skills and competences (as in the case of collaborations between lagging regions and foreign countries), such cooperation does not lead to effective knowledge production (McFadyen et al. 2009).

Being centrally located in a verified collaboration network is found to bring significant and positive effect to regional scientific output. Again, however, such effect is conditional on a region’s scientific capability. This study suggests that only medium and advanced regions can benefit from being embedded in a broad collaboration network. Such added value is not observable for lagging regions, suggesting that their low absorptive capacity level prevents them from exploiting the knowledge inflow deriving from a diversified network structure.

The above findings have important theoretical implications. First, our results complement the scientific development model discussed by Moed (2016), Duque et al. (2005), Moed and Halevi (2014), and UNESCO (2014). In accordance with their findings that collaboration patters reflect development stages, our study emphasises the differences of collaboration effect received by different regions at different development stages. This result also provides empirical support to the issue raised by Lee and Bozeman (2005), who argue that not all collaborations are ideal. Although several studies have found a positive impact deriving from an increase in collaborations, very few differentiated the effect of such collaborations, by considering the level of scientific capacity of the actors involved in the collaboration process (Duque et al. 2005). This study suggests that generalising impact evaluations could result in limited or inaccurate interpretations, especially for countries with high internal development unbalances. In the case of Chinese regional collaborations, different partnerships (between different regional groups) lead to different effects. Second, findings from this analysis provided evidence that international collaborations might not be beneficial a priori. Costs and barriers increase when a region or country moves to the international context, hence not all the actors might benefit from such collaborations, and trade-offs based on the capabilities of the actors involved should be considered (Jeong et al. 2014).

Our results also present important policy implications. In recent decades, the Chinese government incurred in significant investments in nanotechnology to incentivise research and development, as well as collaborations within and outside the national borders. The interest for engaging in international partnerships, indicated by the large investments in collaboration programmes such as the Framework Programme 7, the Horizon 2020 and the World-Class 2.0 (European Commission 2007, 2018; Zhao 2018) aimed to improve the Chinese regions’ knowledge capability, and promote future development. While providing some support to these policies’ objectives, as shown by the positive significant correlation between collaborations variables and nano-publications, our results also call attention to the regional disparities that have been on the rise in the past decades. When such differences are taken into analysis, the policy scenario changes substantially.

In scientifically weaker regions, in order to increase knowledge production and catch up with more advanced regions, local authorities should be aware that domestic collaborations, in particular with more advanced regions, are more efficient than international collaborations, at least at current stage.

Considering the positive impact of collaboration breadth index in advanced and medium regions, our study verifies that regions with relatively high research capacity can manipulate (and benefit from) knowledge generated by more diversified partners. However, low-capability regions seem to benefit more from concentrated collaborations with less broad partners. If one wants to promote more collaborations between different organizations in China, as suggested by Ozcan and Islam (2014), our study shows that matching development stages or capability levels should be taken in consideration in order to really benefit from such collaborations.

Finally, we must acknowledge the limitations to the above approach in measuring knowledge creation. First, co-authored papers are only partial indicators of collaborations, as remarked by Katz and Martin (1997). There are many different types of collaborations that take place without ultimately resulting in publications. Likewise, there are cases of co-authored papers resulted from indirect and marginal forms of collaborations (Katz and Martin 1997), thus it is important to be cautious in interpreting the results. Nevertheless, literature conveys that collaborations on publications are one of the best documented and most quantifiable measurements of scientific collaborations (Abramo et al. 2011; Glänzel and Schubert 2005). Second, the breadth of collaborations is only proxied by betweenness centrality to be included in the analysis. Previous research found that including different centrality measures in the same model might cause high multicollinearity (see Guan et al. 2015b), hence the decision to include only one measure. However, not considering all the relational measures in a network might lead to inaccurate estimations, as more interactions happen between actors, which are not accounted for (Gonzalez-Brambila et al. 2013). Therefore, attention must be paid in the interpretation of results. Additional research exploring the impact of other network structures’ measures on knowledge creation might overcome the limitations in interpretations from this study, due to possible omitted variable bias. Third, this study focuses on collaborations at regional and national levels, without distinguishing subtopics of nanoscience. Considering the fact that nanoscience has been widely applied into various fields (Wang et al. 2013a), further research exploring collaborations in different scientific nano domains is encouraged to bring support to our results, and to provide further evidence of the correlation between collaborations and knowledge creation.