1 Introduction

Scientific and technical knowledge is generated mostly by specialized actors (universities, research centres and firms) which, for a number of reasons,Footnote 1 tend to co-locate in specific locations. This can result in the development of high-tech clusters, innovative industry agglomerations, ‘hot spots’ and centres of excellence typical of technologically advanced regions (Swann et al. 1998; Bresnahan et al. 2001; Maggioni 2002; Braunerhjelm and Feldman 2006). Knowledge flows easily within these geographical locations (and between neighbouring ones) as a result of mobility of inventors and highly qualified workers, interactions between producers and sub-suppliers of specialized inputs, and knowledge spillovers more generally. However, scientific and technical knowledge can also flow across different areas, and several breakthrough technologies have been developed based on the joint efforts of scientists and technicians working in different geographical locations.

The present paper draws on two literature streams. The first deals with the identification and study of innovation networks (Jaffe et al. 1993; Audretsch and Feldman 1996; Maurseth and Verspagen 2002; Cowan and Jonard 2003; Paci and Usai 2000, 2009; Breschi and Lissoni 2004, 2009; Maggioni et al. 2007; Maggioni and Uberti 2007, 2009, 2011; Hoekman et al. 2009; Picci 2010; Cassi and Plunket 2012; Maggioni et al. 2011, 2013); the second exploits spatial econometric techniques to account for the existence of not directly measurable (or unmeasured) spillovers effects associated with the creation of new knowledge (Audretsch and Feldman 1996; Acs et al. 2002; Fischer and Varga 2003; Bottazzi and Peri 2003; Greunz 2003; Bode 2004; Moreno et al. 2005; LeSage and Pace 2009; Autant-Bernad and LeSage 2010; Usai 2011; Varga et al. 2010). We build on previous works (Acs et al. 2002; Cowan and Jonard 2004; Maggioni et al. 2007) which assume that knowledge can be diffused and exchanged through unintentional diffusion patterns based on spatial contiguity or intentional relations based on a-spatial networks.

According to the first pattern, knowledge flows from poles of excellence and its positive effects extend to other agents (i.e. firms, universities and research centres) located in neighbouring areas. Hence, relevant regions present both ‘attraction potential’ and ‘diffusive capacity’ (Hägerstrand 1965, 1967; Acs et al. 2002). Each innovative region extends its influence over neighbouring territories through a trickling down process of unintended spatial spillovers. In this view, space matters most and knowledge flows, following almost purely geographical patterns.

In a-spatial networks, knowledge is exchanged mainly through voluntary ‘barter’ and increases via learning by interacting within specialized networks, intentionally established between crucial nodes (Cowan and Jonard 2004). The technological and scientific knowledge developed within a region is diffused and exchanged through a set of a-spatial networks (often resulting from formal and contractual agreements between institutions) that connect the region with other regions, irrespective of their geographical location. Thus, in this second case, relational networks matter more, and knowledge spreads following intentional patterns which may not follow a geographical pattern.

Maggioni et al. (2007) test whether formal relationships based on a-spatial networks between geographically distant regions prevail over diffusion patterns based on spatial contiguity, using spatial econometric techniques to measure the effects of different ‘spatial’ weight matrices which referred to both geographical and relational ‘proximity’. Their analysis suffers from two main limitations. The first relates to the possibly inaccurate identification of inter-regional scientific relationships through the use of data on joint research networks financed under the European Commission’s Fifth Framework Programme (hereafter FP5); FP5 data only record research network membership and (in most cases) amounts of funds, but do not trace the knowledge flows within networks. The second relates to the possible misspecification of the econometric model resulting from alternate use of ‘geographical’ and ‘relational’ weight matrices. If the data generation process (i.e. the influence of other regions’ innovation activity on each regions’ innovative performance) has both a geographical and a relational component, then any attempt to measure either of these components without taking account of the other could lead to biased and inefficient econometric estimates. In the present paper, as well as using a larger sample of countries and regions,Footnote 2 we take account of both geographical and the relational proximity effects in the same econometric specification, through a series of tests aimed at identifying the actual organizational structure of knowledge flows, connecting European regions, activated and financed by the FP5.

There are several methodological issues related to endogeneity of the relational weight matrix that need to be dealt with. However, we believe that our exploratory analysis highlights a relevant phenomenon and could initiate an interesting research agenda.

The paper is organized as follows: Sect. 2 discusses a number of empirical issues related to the use of geographical and relational weight matrices for spatial econometric analysis of patent data; Sect. 3 presents the research questions, and Sects. 4 and 5 present the estimated models and the results. Section 6 concludes the paper.

2 Knowledge flow networks: their structure and layout

In standard empirical papers in the ‘geography of innovation’ literature, the first main section usually presents the econometric model used to investigate the determinants of knowledge creation and diffusion within and across regions, which usually is based on different augmented versions of the knowledge production function. For the present study, we need first to devise a method that allows us to peer inside the ‘black box’ of joint research networks in order to investigate how scientific and technological networks are formed and diffuse across European regions.

While we are convinced that innovative processes and dynamics are the outcome of individual agents’ decisions, we consider the regional level appropriate for empirical observation of the innovation process since it allows consideration of inter-agent spillovers which are overlooked if the analysis is performed at the individual agent (or institutional) level.

Regional innovation performance, proxied by patenting intensity, is determined by region-specific innovative inputs combined in a knowledge production function and influenced by the innovative performance of ‘neighbouring regions’ (defined as geographically and relationally proximate). Econometric analysis of spatial autocorrelation phenomena is well diffused in the innovation literature, and use of alternative measures of technological, institutional, social and organizational neighbours has been discussed in depth (Torre and Gilly 2000; Boschma 2005; Cantner and Meder 2007; Boschma and Frenken 2009; Ponds et al. 2007, 2010; Marrocu et al. 2013a). However, the innovative and explorative contribution of this paper consists of identifying an estimation method which considers the research object rather than the research network to compute an alternative measure of proximity. As discussed below, the variables of interest are network structures, reflecting how a FP5 contract is organized internally; link directions, identifying the recipient and the sender of knowledge; and link weights, measuring the amount of knowledge flowing in the network.

2.1 From ‘space versus networks’ to ‘space and networks’

Maggioni et al. (2007 p. 472) conduct two distinct spatial econometric exercises to ‘verify whether or not hierarchical relationships, based on a-spatial networks between geographically distant excellence centres, prevail over diffusive patterns, based on spatial contiguity’. The first is based on a geographical weight matrix, \(W^{g}\); the second is based on a relational weight matrix, \(W^{r}\). Since comparing the size of the coefficients in two regressions based on different weight matrices is questionable, the analysis was complemented by a third exercise based on a the spatial weight matrix, \(W^{{r}-{g}}\), obtained as the difference between \(W^{r}\) and \(W^{g}\). In other words, we subtracted the index of geographical contiguity from the index of relational contiguity, so that the residual proximity definition included only ‘pure relational connections established between geographically non-contiguous regions’ (ibid. p.488). The results confirmed the existence of a pure relational component of the autocorrelation phenomenon which determines the innovative performance of a region, together with the already known geographical component. However, this does not adequately tackle the estimation problem. If the innovative performance of a region (which may be partly explained by an internal knowledge production function) is influenced by both its geographical and relational neighbour regions, then any estimation based on a model specifying one definition of contiguity at a time (either relational or geographical) will produce biased estimations due to omitted variables.

In the present paper, we estimate a SAR model with two different weight matrices to detect the existence of ‘spatial’ autocorrelation arising from both geographical and relational behaviours and dynamics (following Doreian 1989 and Lacombe 2004). Following Paci et al. (2014), we apply specific econometric techniques in order to test different non-nested model specifications.

We hypothesize that the innovative performance of a region is determined primarily by a region-specific knowledge production function and is influenced also by both geographically contiguous and relationally proximate regions. Thus, any estimate that does not account for these three factors will misspecify the data, leading to biased and inconsistent estimates.

2.2 From membership to knowledge flows

As mentioned above, data on joint research networks funded by the EU under FP5, which is available at the CORDIS website,Footnote 3 record the names and locations of participating institutions, and their status (coordinator or participant), and, in most cases, the amount of funds granted.

FP5 was a 5-year programme that started in 1998 and concluded in 2002,Footnote 4 aimed at integrating different research areas and developing a critical mass of European science and technology resources. The total number of contracts financed by FP5 is 16,085 involving total funding of about €12,000 million. Using this database, we select only contracts with a network structure (mainly joint research projects); our analysis is based on 6,755 institution networks (42 % of total FP5 contracts) with an average membership of 7 (1 coordinator plus 6 participants). The geographical scope of the analysis is limited to the 171 NUTS2 regionsFootnote 5 in the EU 15 countries. Since we are interested in the structure of knowledge flows within these collaborative research networks, alternative and specific hypotheses about how knowledge flows effectively within networks are defined and tested.

In order to define the structure of a research network, we use the simple taxonomy proposed by Maggioni and Uberti (2011), which considers two dimensions of knowledge flows (direction of links and structure of the network) and their combinations (Fig. 1).

Fig. 1
figure 1

Network structures of collaborative research contracts. Source based on Maggioni and Uberti (2011)

Figure 1 illustrates the case of a very small and simple research network composed of one coordinator and four participants. According to this taxonomy, knowledge can flow in several ways within a network, resulting in four different relational structures.Footnote 6 First, links (i.e. knowledge flows) can be reciprocal and the underlying network structure will be hierarchical if there are mutual, egalitarian, but exclusive ties between the coordinator and each participant (Fig. 1, panel A). In this case, the network structure is a star, with a very high centralization value, where the symmetry of relations guarantees mutual exchange of knowledge that is filtered by the pivotal player.

Second, knowledge could flow within the set of agents irrespective of their structural position (Fig. 1, panel B). This complete structure reflects the absence of hierarchy within the network (indeed all centralization indexes are equal to zero) and knowledge flows freely among all the actors. There is no coordination and/or brokerage of knowledge and information, and all agents have the equal status of ‘member’.

The assumption of tie reciprocity could easily be relaxed were we to assume the existence of different levels of knowledge stocks between the coordinator and participants in terms of emission of knowledge and absorptive capacity, and two structures would emerge according to the hierarchy within the network.

If knowledge flows involve exclusive relations between the coordinator and each single participant as in the star structure, but, in contrast to Fig. 1, panel A, there is no mutual and balanced exchange of knowledge between them, two alternative structures emerge: a bottom-up structure (i.e. knowledge flowing from participants to coordinator), as in Fig. 1, panel C, and a top-down structure (i.e. knowledge flowing from coordinator to participants), as in Fig. 1, panel D.

A further network structure is characterized by no reciprocity of links and no hierarchy (Fig. 1 panels E and F): in this case, each member exchanges knowledge locally and exclusively with his/her next neighbour (clockwise CW or anticlockwise ACW direction), and two wheel structures of knowledge flows emerge, where all members are interchangeable and there is no central node. A wheel structure achieves global transmission of knowledge only through multiple passages of local links. Wheel structures may provide micro-economic advantages, as shown by Jackson (2008); however, FP5 contracts were designed to promote knowledge diffusion across all members. Since wheel-like structures seem most unlikely to describe effective knowledge flows within a FP5 research network, we exclude them from the econometric analysis.

Various reasons have been proposed in the literature (Cowan and Jonard 2003, 2004; Vega-Redondo 2007; Goyal 2007; Jackson 2008) for why knowledge flows follow a given structure within a research contract. A complete network, because of its resilience, yields maximum effective knowledge transmission; a star structure ensures maximum efficiency given the small transmission costs; a bottom-up structure is able to combine different knowledge inputs from several sources; a top-down structure allows rapid diffusion of content from a central node.

Here, we do not provide a detailed discussion of the strength and weaknesses of the different network structures nor rank them according to given criteria. The aim of this exploratory analysis is to identify which of the different layouts (combinations of network structures and link weights) best describes the relational dependence between the innovative activities of different European regions.

We acknowledge that by introducing a spatial autocorrelation matrix based on the relational behaviour of regional scientists and technicians, along the ‘communication’ channel of social influence (Leenders 2002), we are introducing potential endogeneity within the weights matrix. However, we believe that the level of potential endogeneity is acceptable, since the relational behaviour at regional level is based on the joint participation of regional actors in R&D contracts financed by FP5, and the dependent variable is the innovative performance of the region, measured by patenting intensity.

2.3 How to weight knowledge flows

There is also the problem of the value to be attributed to the links within a research network, and the use of binary versus weighted networks to measure the existence and amount of knowledge exchanged (and/or transferred) within the network. This is part of a more general problem related to social network analysis (SNA) which has been thoroughly discussed in the literature (Fagiolo et al. 2007; Fagiolo 2010; Opsahl and Panzarasa 2009; Opsahl et al. 2010; Barigozzi et al. 2010).

Figure 2 (derived from Fagiolo et al. 2007) represents a taxonomy of link typologies: a link value can be binary (B), reflecting the presence or absence of a relation; or weighted (W), if the link presents a value different from 0. A link can be undirected (U) if there exists symmetry of relations (as in Fig. 1, panels A and B) or directed (D) if the direction of the relation is relevant (as in Fig. 1, panels C and D).

Fig. 2
figure 2

A taxonomy of network types based on weights and direction of links. SourceFagiolo et al. 2007

These four network structures typologies can be ranked in ascending order of analytical difficulty of treatments: BUN, BDN, WUN and WDN.Footnote 7 While most relevant economic applications of SNA should be treated as WDN, most researcher analyses are based on BUN because of dichotomization and symmetry procedures which are far from neutral.

In hypothesizing about how to use the membership information recorded in the CORDIS database, to represent actual knowledge flows, we formulated three alternative ways to assign weights to each link in a given contract:

  • we count as 1 each and every link described by the chosen network structure irrespective of the number of the nodes in the network. Thus, we assume that a greater amount of knowledge is exchanged and/or transferred within a large network compared with a small network, indirectly assume that there are no ‘budget constraints’ on the relational capacity of a node. We indicate this modality as 1;

  • we count as \(1/N\) (where \(N\) is the total number of nodes in a given network) each and every link described by the chosen network structure so as to take account of the limited relational capacity of a node within a network. We indicate this modality as N;

  • we count as \(1/L\) (where \(L\) is the number of links in a given network) each and every link described by the chosen network structure so as to take account of the limited relational capacity of a network which may nonlinearly dependent on the number of nodes. We indicate this modality as L.

On the basis of these assumptions, it is possible to build 12 different knowledge flows layouts (4 structures \(\times \) 3 link weights) for each joint research network funded by the EU. However, since the paper focuses on innovative performance at the regional level, we aggregate the joint research contracts established among research institutions (and, less frequently, firms) and transform them into region-based networks.

This procedure follows three steps:

  • first, we geo-localize (according to NUTS2 classification) each actor involved in the selected network contracts, distinguishing between coordinators and participants within each contract;

  • second, we re-code individual contract data on a regional basis;Footnote 8

  • third, for each region, we sum all contracts included in FP5 which involve the region’s institutions.Footnote 9

The final resultFootnote 10 is a squared matrix \(\mathbf{Z}^\mathbf{m}\) (\(171 \times 171\)) per each network layout (i.e. 12 combinations of network structure and link weights), where the rows and columns are European regions and the generic element \(z^{m}_{ij}\) measures the extent of the knowledge flows between region \(i\) and region \(j\) as described by a given \(m\) network layout.

Since we want to account for both the relational and the geographical dimensions of knowledge flows, we use the above-mentioned 12 \(\mathbf{Z}^\mathbf{m}\) matrices as relational weight matricesFootnote 11 and the first-order contiguity matrix as a geographical weight matrix, in the spatial econometric analysis of a regional knowledge production function performed in Sects. 4 and 5.

3 Research questions

The main hypothesis in this paper is that region \(i\)’s innovative output, measured by patenting intensity, is explained by regional innovative inputs and structural characteristics, and by some ‘spatial’ autocorrelation effects, which may arise from geographical knowledge spillovers and/or relational knowledge barter exchanges mediated by specific network layouts.

Through a series of spatial econometric exercises described in Sects. 4 and 5, we test the existence and extent of this relational autocorrelation.

More generally, the empirical analysis tests the following research questions (RQ):

RQ1: Does each of the theoretically imposed layouts at the individual contract level univocally determine the inter-regional network structure of knowledge flows as defined by FP5?

Despite the high variance in databases, levels of analysis and estimation methods, the economic literature on research networks financed under the EU Framework programmes (Autant-Bernard et al. 2007; Balland 2012; Caloghirou et al. 2004; Breschi and Cusmano 2004; Protogerou et al. 2010; Lata and Scherngell 2010; Scherngell and Barber 2009, 2011) shows the existence of an oligopolistic structure in which a restricted number of institution localized in central, high-income regions play a major role along a core-periphery pattern. Thus, we can expect the interregional network structure also to be heavily influenced by the spatial distribution of members and coordinators in the different European regions. We tackle this research question in Sect. 3.1.

RQ2: Do actual knowledge flows in FP5 joint research networks follow a complete or a hub-and-spoke structure?

The theoretical literature on network structure arising from the micro-based game-theoretical approach (surveyed in three books: Goyal 2007; Vega-Redondo 2007; Jackson 2008) or from the heterogeneous agents, simulation and/or experimental approaches à la (Cowan and Jonard 2003, 2004; Maggioni 2004; Callander and Plott 2005; Cassi and Zirulia 2008; Morone and Taylor 2004; Goeree et al. 2009), and the large literature on the effects of networks structure on the innovative performance of individual node (i.e. individual scientists, firms and regions), provides evidence of advantages related to different network structures. In particular, while a small-world structure is considered the most efficient layout to maximize the average content of a scientific network, it may be unfit for equity reasons and not preferred by voluntary aggregations of research institutions. The hub-and-spoke structure is an efficient layout and is easily implemented when the balance of power among networks members is very unequal.Footnote 12 Due to the lack of consensus in this literature, we do not have ex-ante clear expectations of a prevailing structure. We discuss this further in Sect. 5.

RQ3: Is there is a trade-off between the size of the scientific and technological network of a region and its effectiveness in influencing the innovative performance of the same region? How can we measure this trade-off?

While it is clear that a large network provides advantages related to the number of knowledge sources that can be accessed, there is theoretical evidence that size advantages may be constrained by time and the number of relations manageable by networks members and by the network coordinator in a hub position (Jackson and Wolinsky 1996; Goyal and Joshi 2003). In this case, a hierarchical network layout with decreasing returns to the number of nodes (or links) should perform better in relation to knowledge flows within regional joint research networks for innovative performance. We discuss this in Sect. 5.

3.1 Network structures at contract and regional level

The process of passing from individual contract level to regional level is driven by two main factors: the attribution of a given theoretical network layout and the spatial distribution of FP5 coordinators and participants across European regions. Thus, starting from a very hierarchical layout at the contract level, the regional level network structure could be egalitarian if the distribution of coordinators is sufficiently equal across regions.

Table 1 and Fig. 3 formalize and extend the analysis by comparing the network structures of in the entire FP5 at regional level, with ideal–typical representations of simulated networks with the same number of nodes for each given layout. For example, the actual network derived from the aggregation of contracts along a bottom-up layout, as in Fig. 1, panel C (C henceforth), is compared to a 171 node star-shaped network in which all coordinators are located in the same region; or the actual network derived from the aggregation of contracts along a complete layout, as in Fig. 1, panel B (B henceforth), is compared to a complete network of 171 nodes.

Fig. 3
figure 3

Distribution of FP5 research contracts at the regional level

Table 1 Simulated and actual networks statistics at the regional level

The similarity between the two networks (actual and simulated) is crucially dependent on the spatial distribution of institutions and organization members in the joint research contracts financed by FP5, across the European regions. In particular, for coincidence between the actual and the simulated networks, the C layout requires that the coordinators of all contracts should be located in one region, while the B layout requires that every contract should involve one institution per European region; therefore, that contracts are equally distributed across all regions.

Figure 3 describes the extremely skewed and highly correlated spatial distribution of FP5 coordinators and participants across the 171 European regions.Footnote 13

In order to measure the similarity of actual regional networks based on different layouts, to their corresponding ideal types (i.e. hub and spokeFootnote 14 versus complete networks), for each theoretical layout, we simulate an ideal-type 171 node network and compute some key network indexes—i.e. density, degree and betweenness centralizationFootnote 15—for both actual and simulated networks (see Table 1).

Using a procedure proposed by Snijders and Borgatti (1999), we can measure to what extent actual compared to simulated networks display different (and statistically significant) densities. In particular, while actual networks derived from star-shaped layouts (A, C and D in Fig. 1) record a higher number of links than their simulated version, the actual B network displays a smaller value of the density index with respect to the simulated complete network where all possible links are established.

The values of degree and betweenness centralization of actual networks for symmetrical layouts (A and B) mimic the ranking in the simulated networks: actual A networks are more hierarchical than actual B networks. Results are mixed for asymmetrical layouts (C and D).

4 The model and estimation strategy

The empirical analysis consists of testing a knowledge production function, KPF (Griliches 1979; Romer 1990; Jones 1995), which describes the innovative output of a region as a function of different innovative inputs (i.e. different sources of R&D expenditure), other control variables characterizing the innovative and productive structure of each region, geographical accessibility and the role of the given region in the FP5 network. The implicit form of an enlarged KPF at the regional level is defined as follows:

$$\begin{aligned} \mathrm{PAT}_{i}^{t}=\left( {\mathrm{BIZRD}_i^\tau ,\mathrm{GOVRD}_i^\tau ,\mathrm{UNIRD}_i^\tau ,\mathrm{PROD}_i^\tau ,\mathrm{INN}_i^\tau ,\mathrm{ACCESS}_i^\tau ,\mathrm{BETW}_i^\tau } \right) \nonumber \\ \end{aligned}$$
(1)

where the dependent variable (\(\mathrm{PAT}_{i}^{t}\)) is the number of patent applications to the European Patent Office (whose geographical location is recorded based on inventor’s location) per million labour force registered at time \(t\) in region \(i\) (source: OECD 2010). This variable is the average value for period \(t\), i.e. 2005 and 2006, for 171 European regions at different NUTS levels (see Appendix Table 6).

Since we are interested in analysing how different sources of knowledge production affect patenting activity, we considered three different types of R&D intensity, expressed as share of regional GDP: business R&D (\(\mathrm{BIZRD}_{i}^{\tau }\)), government R&D (\(\mathrm{GOVRD}_{i}^{\tau }\)) and university R&D expenditure (\(\mathrm{UNIRD}_{i}^{\tau }\)) (source: Eurostat 2010a).

Following Glaeser et al. (1992), in order to test whether specialization, or differentiation, of the region’s productive and the innovative structure positively influences its innovative output, we included \(\mathrm{PROD}_{i}^{\tau }\) and \(\mathrm{INN}_{i}^{\tau }\), that is, location quotients calculated, respectively, for local units in high-tech sectors and for high-tech patents (source: Eurostat 2010a, b).

\(\mathrm{ACCESS}_{i}^{\tau }\) is the multimodal accessibility index, a measure of combined (air, road and rail) accessibility of a region. The index derives from the transformation of absolute values for each region so that the European value is 100 (source: Espon 2010).

All regressors are computed for each region \(i\) at time \(\tau \), that is, as an average for the period 1999–2004. This allows us to smooth away individual years’ variations, to take account of the time lag between R&D expenditure and patent application and to cope with the relevant problem of missing values at regional level.

In order to detect the relevance of the structural position of regions within FP5 research networks, we included \(\mathrm{BETW}_{i}^{\tau }\), betweenness centralityFootnote 16 of each region \(i\). This variable is a proxy for regional control, thanks to the bridging position of the region, of the diffusion of scientific and technical knowledge across research networks stretching across Europe.Footnote 17

As described in Sect. 2, innovative activity and several other economic phenomena are characterized by agglomeration and spillovers; hence, simple OLS estimations could be biased and spatial econometric techniques are required.

In order to detect whether spatial autocorrelation is relevant to this analysis, as a preliminary investigation, we compute Moran’s I on the dependent variable, \(\mathrm{PAT}_{i}^{t}\), with respect to a geographical (in this case first-order geographical contiguity, henceforth GEO)Footnote 18 matrix and different relational weight matrices.Footnote 19

Having detected the presence of positive and significant ‘spatial’ autocorrelation (see Appendix Table 8 and Sect. 5 for further details), we proceed to estimate a double-log specification in order to estimate the autocorrelation effects:

where all variables are the natural logarithms of the above variables, and \(\zeta _i^\tau \) is the error term.

Based on the seminal contribution of Anselin (1988), the spatial econometric literature proposes several models to deal with the problem of spatial spillovers in cross-sectional data.Footnote 20 A general spatial autoregressive model including both the spatially autoregressive error term (indicated by the coefficient \(\lambda \)) and the spatial lag on the dependent variable (indicated by the coefficient \(\rho \)) can be defined as follows:

$$\begin{aligned} y=\rho W_{1} y+X\beta +u\quad \hbox { and }\quad u=\lambda W_2 u+\varepsilon \end{aligned}$$
(2)

where \(W_{1}\) and \(W_{2}\) are squared spatial weight matrices and could be identical, and \(\varepsilon \) is the error term.

Imposing some restrictions on the weights of the previous model, that is, \(W_{1}=0\) or \(W_{2}=0\), two different spatial autoregressive models can be tested—the spatial error model (SEM), if \(W_{1}=0\):

$$\begin{aligned} y=X\beta +u\quad \hbox { and }\quad u=\lambda W_2 u+\varepsilon \end{aligned}$$
(3)

and the spatial autoregressive modelFootnote 21 (SAR), if \(W_{2}=0\):

$$\begin{aligned} y=\rho W_{1} y+X\beta +u \end{aligned}$$
(4)

Since the spillovers effects could be attributed also to the explanatory variables, a spatial Durbin model (SDM) can be defined:

$$\begin{aligned} y=\rho W_1 y+X\beta _1 +W_1 X\beta _2 +\varepsilon \end{aligned}$$
(5)

The model selection is complex and remains an open issue; however, there are several appropriate econometric tests, which, in conjunction with explicit theoretical assumptions about the mechanisms of transmission of knowledge spillovers, help to identify selecting the most suitable model specification (Elhorst 2010).

In this analysis, spatial weight matrices (usually indicated with \(W\)) are, alternatively, the geographical matrix (GEO) and one of several relational matrices, derived from different structures and link weights (REL) as defined in Sects. 2.2 and 2.3. Spatial econometric procedures require normalization of weight matrices. Here, we adopt row normalization.Footnote 22

Since the objective of the analysis is to consider the joint presence of geographical and relational effects, we estimate a two-weight SAR model along the lines of Lacombe (2004) and LeSage and Pace (2009), defined as follows:

$$\begin{aligned} y=\rho _\mathrm{GEO} W_\mathrm{GEO} y+\rho _\mathrm{REL} W_\mathrm{REL} y+X\beta +\varepsilon \end{aligned}$$
(6)

where \(W_\mathrm{GEO}\) and \(W_\mathrm{REL}\) are the geographical and the relational weights, and we can jointly estimate geographical lag (\(\rho _\mathrm{GEO}\)) and relational lag (\(\rho _\mathrm{REL}\)).Footnote 23

5 Results

Since simple computation of Moran’s I shows the presence of spatial autocorrelation in patenting activity (Appendix Table 8), we need to identify its source and control for it. Table 2 includes values for Moran’s I on the residuals in model 1 and presents some diagnostic tools, that is, Lagrange multiplier (LM) and robust LM, used to detect the presence of the spatially autoregressive error term (and its coefficient \(\lambda \)) or the spatial lag of the dependent variable (and its coefficient \(\rho \)).

Table 2 Moran’s I calculated on residuals for different weight matrices and LM for SAR and for SEM

Following Florax et al. (2003) and adopting the ‘classical’ approach, the LM test indicates the presence of spatial autocorrelation for the geographical weight matrix (GEO) and all \(C\) layouts, among all possible layouts with different link weights and directions. In particular, the classical approach for \(C\) and GEO suggests that the model should be estimated including a spatial lag term; for the remaining relational weights matrices (\(A,\, B\) and \(D\)), LM tests do not detect any spatial dependence.

Thus, according to this first econometric exercise, none of the research network structures at the individual contract level (except the bottom-up \(C\) structure) produce an autocorrelation effect on the region’s innovative performance.

Adoption of a ‘hybrid’ specification strategy based on LM robust values produces very similar results: \(C\) layouts are the only weights matrices that correct for the effect of spatial autocorrelation, in a spatial lag specification. However, if we use LM robust values, we cannot choose between a SAR and SEM model specification when a GEO weights matrix is used.

We decided to include GEO in the following estimations since spatial autocorrelation is detected in the residuals (the ratio of the Moran’s I over the degrees of freedom is positive 0.094 and significant at a 10 % level of significance, while the same does not apply for \(A,\, B\) and \(D\)).

As highlighted above, these results suggest that the research network layout at the individual contract level is relevant for the effects at the aggregate level of knowledge flows on the innovative performance of a given region. In particular, if knowledge flows are described in terms of \(A\) (i.e. a star hierarchic structure with mutual knowledge exchange), \(B\) (i.e. a complete a-hierarchical structure with no core region) and \(D\) (i.e. a top-down hierarchical structure with flows of knowledge stemming from the coordinator of the joint research contract towards the other members), it is not possible to detect any effect of a relational knowledge barter exchange phenomenon influencing the level of regional innovative activity.

Following Marrocu et al. (2013b, p. 1490), ‘we rule out the spatial Durbin model on substantive grounds, for this specification implies that the influence of neighbouring territories on the innovative performance of a certain region is mediated also by their R&D investments, conditional on a given connectivity structure’.Footnote 24 This would require neighbours’ R&D investments to be productive across NUTS2 regions. Since this assumption is not realistic in the European context, we argue that it is reasonable to assume that innovation spillovers work through the effective level of knowledge achieved by neighbouring regions, proxied by the level of patenting intensity.

As a robustness check, we estimate the model employing a Spatial Durbin specification. The results clearly show no spatial autocorrelation for the independent variables, but the spillover effect on the dependent variable remains significant.Footnote 25

Therefore, throughout the empirical analysis, we adopt the spatial autoregressive specification, described in general terms in equation 4 and applied here as follows:

$$\begin{aligned} \hbox {Pat}_i^t&= \beta _0 +\beta _1 \hbox {BizRD}_i^\tau +\beta _2 \hbox {GovRD}_i^\tau +\beta _3 \hbox {UniRD}_i^\tau +\beta _4 \hbox {Prod}_i^\tau +\beta _5 \hbox {Inn}_i^\tau \nonumber \\&+\beta _6 \hbox {Access}_i^\tau +\beta _7 \hbox {Betw}_i^\tau +\rho _Z \hbox {WPat}_j^t +\nu _i^\tau \end{aligned}$$
(7)

where \(\rho _{Z}\) is the coefficient of the spatially lagged dependent variable patents, which can be computed alternatively for one geographical (GEO) and three relational weight matrices arising from \(C\) layouts, and three different link weights (\(C_{1},\, C_{N}\) and \(C_{L}\)).

From the coefficients of the independent variables in Table 3, it is evident that the only positive and significant R&D coefficient is related to business activity, BizRD. The coefficients of GovRD and UniRD are not significantly different from zero, probably because these two sources of finance are mainly for basic research that is not directly patentable or because of institutional difficulties imposed by the different national legislation on individual scientists working in public universities and/or research who want to patent an innovation.

Table 3 Estimation results only 1 weight matrix at the time (SAR specification) (ML estimations)

The coefficient of the Access variable is never significant, showing that the relevant ‘centrality’ in European networks is probably related to socio-economic factors rather than mere logistics. The coefficients of Prod and Inn (which measure the high-tech specialization of the regional production and innovation system) are positive and significant, hinting at a role for specialization rather than differentiation as a source of innovation advantages, along the lines suggested indirectly by Glaeser et al. (1992).

The coefficient of Betw for any region in any joint research network according to the \(C\) layout of knowledge flows is negative and significant in all the models using GEO and \(C\) weight matrices. This slightly puzzling result is explained by considering this variable as signalling the ‘degree of interdisciplinarity’ of the regional scientific and technological population (universities, research institutions, firms, etc.) as suggested by Leydesdorff (2007) in the context of bibliometrics.Footnote 26 This result confirms that the region’s innovative performance depends on the specialization of its scientific and technological base.

Table 3 shows that both geographical spillovers and relational barter exchange, that is, the \(\rho _{Z}\) coefficients of the spatially lagged dependent variables for all the weight matrices included, significantly and positively influence the innovative activity of a given region. In relation to indirect effects,Footnote 27 we can see that, despite the smaller value of the coefficients of direct effects, R&D expenditure and regional productive specialization affect neighbouring regions through the spatial multiplier mechanism; this does not apply to other control variables.

Since the innovative performance of a region is influenced by its geographical and relational neighbouring regions, we move a step forward. We test the joint effect of two weights matrices, given that any estimation, based on a model specifying only one definition of contiguity (i.e. either relational or geographical), would produce biased coefficients due to omitted variables.

Hence, we estimate a SAR model including both weight matrices, as follows:

$$\begin{aligned} \hbox {Pat}_i ^{t}&= \beta _0 +\beta _1 \hbox {BizRD}_i^\tau +\beta _2 \hbox {GovRD}_i^\tau +\beta _3 \hbox {UniRD}_i^\tau +\beta \hbox {Prod}_i^\tau +\beta _5 \hbox {Inn}_i^\tau \nonumber \\&+\beta _6 \hbox {Access}_i^\tau +\beta _7 \hbox {Betw}_i^\tau +\rho _C W_\mathrm{REL} \hbox {Pat}_j^t +\rho _G W_\mathrm{GEO} \hbox {Pat}_j^t +\varepsilon _i^\tau \end{aligned}$$
(8)

where the variables \(\rho _{C}W_\mathrm{REL}\mathrm{PAT}^{t}_{j}\) and \(\rho _{G}W_\mathrm{GEO}\, \mathrm{PAT}^{t}_{j}\) represent the spatial lags of the dependent variable both for the relational and geographical weight matrices, respectively.

Although the main aim of the paper was to investigate the internal structure of scientific and technological knowledge flows within the regional networks activated in Europe by FP5 and, thus, the econometric exercises focus on the autocorrelation coefficients, it is worth looking at the sign and significance of the coefficients of the covariates presented in Table 4 and their direct, indirect and total effects. These are similar to those obtained in Table 3: BizRD is the only research input that positively influences regional innovative activity; the coefficients of Prod and Inn are positive and significant and negative and significant in the case of Betw.

Table 4 Estimation results 2 weight matrices jointly considered (SAR specification) (ML estimations)

Table 4 includes direct, indirect and total effects for all the significant variables in model 8. As expected, direct effects are always larger than indirect effects. In particular, the only positive and significant indirect effects are those associated with BizRD and Prod. This means that the innovative performance of a region (measured by patenting intensity) is explained mainly by own business sector R&D investment and specialization of both its production and innovation systems in high-tech sectors. It is also positively influenced by R&D investments and the high-tech specialization of neighbouring (both geographical and relational) regions.

Of more interest are the values of the spatially lagged dependent variables, \(\rho \). In column 1 of Table 4, with the combination of \(C_{1}\)-GEO—where the regional relational weight matrix is built on the basis of an assigned bottom-up research contract structure in which every link has the same weight (equal to 1)—both \(\rho _{C}\) and \(\rho _{G}\) are positive and significant, that is, both knowledge transmission mechanisms are in place.

When we model the structure of individual contract with link weights reflecting the opportunity costs of a coordinator for establishing a new link (as in the models \(C_{N}\)-GEO, column 2 of Table 3, in which each link is inversely related to the number of nodes in the network, and \( C_{L}\)-GEO, column 3 of Table 3, in which each link is valued as inversely related to the number of links in the networks), geographical contiguity becomes insignificant, while relational proximity is maintained.

In order to compare the different \(\rho \) values (recorded in Tables 3 and 4) and computed in different non-nested models, following Burnham and Anderson (2002), we compute Akaike weights, \(\mathrm{prob}_{j}\), as follows:

$$\begin{aligned} \mathrm{prob}_j =\frac{\exp \left[ {-\frac{1}{2}\left( {\mathrm{AIC}_{Cj} -\mathrm{AIC}_{C\,\mathrm{MIN}}} \right) } \right] }{\sum _{r=1}^R {\exp \left[ {-\frac{1}{2}\left( {\mathrm{AIC}_{Cr} -\mathrm{AIC}_{C\,\mathrm{MIN}} } \right) } \right] } } \end{aligned}$$

where \(j\) is the model, \(R\) is the number of models, and \(\hbox {AIC}_{C}\) is the bias-adjusted AIC value.

These results are presented in Table 5; the main diagonal displays the estimated spatial lag coefficients in SAR specification (7) with a single weight matrix and the off-diagonal values are the estimated spatial lag coefficients for SAR model (8) which jointly consider two geographical and relational matrices. The last column (weighted average) enables comparison of \(\rho \) across different matrices.

Table 5 Spatial lags coefficients (a weighted average comparison)

These results show that relational effects are stronger than geographical proximity: \(C_{1}\) is 2.5 times larger than GEO, \(C_{N}\) more than doubles the previous effect, and \(C_{L }\) has the highest value—eight times the geographical effect,Footnote 28 hinting at a more relevant role of intended knowledge barter exchange over unintended geographical spillovers across European regions, as suggested by Breschi and Lissoni (2001).

Also, the higher values of the weighted average of \(\rho \) associated with relational matrices suggest that the positive net effects related to knowledge flows enjoyed by the coordinator with the inclusion of a new member in the research network are counterbalanced by the coordination costs and budget constraints related to time and relational activity (along the lines of the co-authorship model in Jackson and Wolinsky (1996).

6 Conclusion

Regional innovation activity is a complex phenomenon where several (internal and external) forces are at play. A knowledge production function, which relates regional innovative inputs to regional innovative output, was employed to take account of the effects of both geographical and relational proximities.

In this paper, we modelled geographical proximity in terms of contiguity, as a measure of unintended knowledge spillovers, and relational proximity in terms of FP5 research contracts, as a measure of inter-regional intentional knowledge exchange among research institutions.

The analysis was not limited to detecting the presence of relational autocorrelation; we also designed an exploratory research methodology to look inside the ‘black box’ of joint research contracts, in order to identify which structures of knowledge flows are more effective for relational autocorrelation of innovative performance at regional level. We employed a spatial econometric specification to jointly consider the effects of geographical and relational autocorrelation (Doreian 1989; Lacombe 2004; LeSage and Pace 2009, and a statistical procedure that enabled us to compare \(\rho \) across non-nested models (Burnham and Anderson 2002).

There are many methodological issues that are not addressed; however, we consider this analysis to be a first exploratory attempt to jointly consider the geographical and the relational effects influencing the innovative performance of regions. First, our results confirm that relational autocorrelation is at work in influencing the innovative performance of European NUTS2 regions. Second, although relational autocorrelation theoretically could apply to all hypothesized layouts, only one typology of contract structure (i.e. the bottom-up layout \(C\)) appears to be relevant for influencing the patenting activity of a relationally defined neighbour region. This result suggests that, on the one hand, intentional knowledge exchanges mainly follow a hierarchical network structure, probably for efficiency reasons, and on the other hand, that research framework programmes may be good policy instruments to sustain the innovative performance of certain regions, but not to foster regional cohesion, since most coordinators are located in core regions.

When geography is included in the model in order to capture the spatial autocorrelation influencing innovative activity, all the standard results on this issue apply. As far as the estimated knowledge production function is concerned, innovative activity, proxied by patenting intensity, is supported mainly by the private sector (BizRD), production specialization (Prod) and an innovative (Inn) structure in high-tech sectors.

Third, when we consider C structures with more realistic weight links (such as 1/N and 1/L), which include the opportunity costs of enlarging the size of a research network, the pure geographical effect (deriving from pure unintended spillovers) loses its significance.

Several methodological issues concerning the possible endogeneity of relational weight matrixes still remain to be solved in the spatial econometric literature. Therefore, we believe that this paper contributes to an increasingly intense debate and may result in a fruitful research stream.