Introduction

This paper aims to address a gap in the literature studying innovation clusters by introducing an indicator which incorporates two phases of economic activity that have been treated as distinct in related studies: the creation of knowledge and the use of knowledge in production or, to put in Schumpeterian terms, the ‘invention’ phase and the ‘innovation and diffusion’ phase.

By creating an indicator which captures clustering in both patents and manufacturing employment, we attempt to depict the presence of a context which has the potential to serve as a fertile context for interactive learning (Asheim, 2001; Lundvall, 1985, 1992a), where knowledge exploration can co-exist, and co-evolve, with knowledge exploitation. Such a context incorporates both the ‘formal’ and the ‘informal’ types of innovation processes, and the complementarity between the two can be expected to provide potential for the enhancement of regional competitiveness (Isaksen & Nilsson, 2011a; Karlsen et al., 2011).

Previous empirical attempts to operationalise the cluster concept have centred on inter-industry linkages based on employment and establishment co-location, skill use, and supplier relationships via input–output measures (Czamanski & Ablas, 1979; Delgado et al., 2014; European Cluster Observatory, 2014a; Feser & Bergman, 2000).Footnote 1

Porter (2003) identified clusters based on the statistically significant pairwise locational correlation between industries, which indicates industry relatedness. Ellison et al. (2010) examined a broad range of Marshallian forces shaping co-agglomeration using pairwise indices. While this methodology allows for the incorporation of multiple dimensions of cluster dynamics, the study of pairwise co-agglomeration limits the scope of cross-sectoral co-location that can be captured. Delgado et al. (2016) built on the aforementioned work and developed a novel cluster algorithm that incorporated measures of inter-industry linkages captured by co-location patterns, input–output links, and similarities in labour occupations. This approach has been used in the U.S. Cluster Mapping Project. As the authors noted, however, their methodology did not explicitly account for knowledge linkages.

Delgado (2020) underlined the need to account for the colocation of innovation and production in clusters and developed a methodology to measure it ex post in the cases of U.S. clusters defined by the aforementioned model. It is this dimension that our methodology seeks to introduce by explicitly measuring the colocation of innovation and production. To our knowledge, there has been no previous attempt to define clusters based on patterns of colocation of patenting and manufacturing. Our approach is, therefore, differentiated by its attempt to study the role of clusters that are not limited to production-related concentration but combine innovation and production.

Within this context, we can expect the presence of spillovers which may vary in direction, e.g. going back and forth between different modes of innovation across the innovation chain, which in the work of Srholec and Verspagen (2012) are identified as four distinct ‘ingredients’ of innovation strategies: research, user, external, and production. Spillovers may also occur between producer and final or intermediate user. Changes in intermediate demand contribute, according to Lorentz and Savona (2008), along with technical change, to the evolution of economies’ structural change and, consequently, to macroeconomic growth. Our indicator seeks to embody these circular cumulative growth dynamics based on the interaction between knowledge supply and demand.

By introducing a cluster-mapping approach which is free of any a priori assumptions regarding the types of activities that are ‘expected’ to be co-located, we allow for cluster patterns to emerge organically from our data and cut across different sectors, while also overcoming artificial boundaries between the generation of knowledge and the use of said knowledge in production. With the use of patent-micro-data, our aim is to capture concentration patterns that tend to have knowledge at their core, moving away from a strict focus on industry-related metrics. The cluster indicator that will be constructed will point towards the presence of a cognitive context which can be expected to be conducive to the generation, diffusion and absorption of innovation.

Literature Review

Attempting to incorporate the process of innovation in any type of economic analysis presents a fundamental challenge, since it is a broad and rather fluid conceptFootnote 2 whose only defining characteristic, as Schumpeter (1947, p. 151) noted, is ‘simply the doing of new things or the doing of things that are already being done in a new way’.

We seek to systematise the related literature by underlining three basic dimensions of economic and innovation activity we seek to capture with our cluster indicator.

Agglomeration Dynamics

Marshall’s (1890) work on local spillovers, which underlined the importance of positive externalities between agglomerated firms belonging to the same sector, identified agglomeration as a major factor influencing innovation and economic growth, based inter alia on the ‘industrial atmosphere’ present in a specific location, where the ‘secrets of industry are in the air’. Utilising patent data, Jaffe (1986) identified the presence of localised R&D spillovers and their potential impact on firms’ knowledge generation and profitability. The innovation systems approach (Freeman, 1987; Lundvall, 1992a, b; Nelson, 1993) draws on the concept of Marshallian externalities, but emphasised that a variety of actors affect the patterns of production, diffusion, and use of knowledge in economic activity within a specific geographic location, with its focus increasingly placed on the regional level (Asheim et al., 2005). The literature on clusters, which has grown rapidly following the influential work of Porter (1990, 1998), also underlined the role of the region as a key driver of growth and innovation due to localised spillovers, as do similar conceptual frameworks such as ‘learning regions’ (Morgan, 1997) and ‘innovative milieux’ (Aydalot, 1988). All the aforementioned terms are applied to illustrate a local context that favours the development of a learning-based economy (Doloreux & Parto, 2004). Focusing on geographically concentrated activity, therefore, can be viewed as the first step in the attempt to detect systems of enhanced innovation and productivity dynamics.

Coexistence of Knowledge Creation and Use

The second step is the identification of a local context where the creation and use of knowledge coexist. Traditionally, invention and innovation were often viewed as parts of a linear process, where one step distinctly follows the other. However, as Kline and Rosenberg (1986) noted, these two phases of the innovation cycle generate feedback mechanisms, referred to by Lundvall (1992a, b) as interactive learning between producers and users of knowledge. To return to Schumpeter’s aforementioned quote, doing new things may induce new ways, and vice versa. Along those lines, Cooke (2005, p.3) described the regional innovation systems as ‘interacting knowledge generation and exploitation subsystems’ at the regional level. As is the case with innovation, however, ‘knowledge’ is not a uniform concept. Polanyi (1958) distinguished between ‘tacit’ and codified knowledge, pointing out that knowledge is often not explicitly articulated but, like Marshall’s industry secrets, may exist ‘in the air’. Jensen et al. (2007) used this distinction to contrast two corresponding modes of innovation: the Science, Technology and Innovation (STI) mode, which is ‘based on the production and use of codified scientific and technical knowledge’ and the Doing, Using and Interacting (DUI) mode, which ‘relies on informal processes of learning and experience-based know-how’ (p. 680). The authors found that firms that combine both modes appear to be more innovative, while Isaksen and Nilsson (2011a, b) drew similar conclusions, noting that the complementarity of ‘formal’ and ‘informal’ types of innovation potentially contribute to increased innovative capacity and competitiveness at the level of regional innovation systems. So far, most related empirical research on the operationalisation of innovation systems — a term which we will henceforth use interchangeably with the term ‘cluster’ — has failed to account for the combination of these two modes of innovation (Cruz & Teixeira, 2010; Lazzeretti et al., 2014).

Technology Relatedness and Spillovers

A third step in the identification and examination of clusters is to decide on how narrowly or widely to frame the cognitive space of such systems of innovation and production in terms of technologies and industries. Marshall’s aforementioned influential work — which was later built upon by Arrow (1962) and Romer (1987) — underlined the importance of externalities between firms belonging to the same sector. Jacobs (1969), on the other hand, emphasised the role of knowledge flows between different sectors mainly within the context of urbanisation economies. Similarly, Jaffe et al. (1993, p. 596) observed that knowledge spillovers are probably ‘not confined to closely related regions of technology space’. Literature findings have pointed towards the presence of both specialisation and diversification effects on regional economic performance (Beaudry & Schiffauerova, 2009). In regard to the evolution of technologies, Dosi's (1982) work has focused on the path dependent nature of technological change, with recent studies suggesting that regions branch into industries related to their existing activities (Corradini & Vanino, 2022; Neffke et al., 2011). Heimeriks et al. (2018) noted that the growing global technological base increases technological diversity, but also linkages between technologies, hence leading to increased complexity of knowledge ecosystems. Balland et al. (2019) attempted to depict technology relatedness and complexity in EU regions via the use of network-based techniques on patent data. Buccellato and Corò (2019) also depicted relatedness and complexity, but in terms of statistical industry classifications. When it comes to the methodological implications of the diminishing importance of fixed traditional sectoral boundaries on cluster mapping, Martin and Sunley (2003) noted that a significant limitation of ‘top-down’ cluster mapping exercises has to do with the fact that they study concentrations of economic activity on an industry-by-industry basis, hence disregarding linkages across industries which are central to the cluster concept. Along these lines, Srholec and Verspagen (2012, p. 1248) warned against a ‘mechanistic replication of taxonomies based on sectoral data’.

Operationalising the Literature

The indicator developed and presented in this paper incorporates the three aforementioned dimensions of the related literature as follows: the spatial agglomeration dimension is introduced via the use of location quotients, in order to capture the concentration of activity. The combined use of data on patenting and manufacturing helps embody different stages of the innovation process and consequently both formal and informal modes of interactive learning. And, finally, the use of principal component analysis on pooled data allows for the emergence of patterns of colocation that transcend traditional taxonomies of patenting and manufacturing activity, hence allowing for the inclusion of different branches that form part of the complex structure of innovation ecosystems.

Methodology: Patterns of Manufacturing and Patenting Co-Location

The first step of our analysis is to generate clusters for the year 2010,Footnote 3 based on the co-located concentration of manufacturing and patenting activity at the regional (NUTS 2) level. For manufacturing data, we utilise Eurostat’s Structural Business Statistics database. For patents, we use the OECD REGPAT database, which contains detailed regionalised patent data.Footnote 4

We use the location quotient (LQ) as an index of spatial concentration. The location quotient is an analytical statistic which is often used in order to measure the concentration of a certain economic activity in a region compared to a broader geographical entity. The European Cluster Observatory has applied this method in order to define employment-based clusters in NUTS regions in Europe (European Cluster Observatory, 2014b; European Commission, 2007). The widespread use of this type of methodologies by researchers in related fields is facilitated by the relatively easy access to employment data. Apart from its simplicity, the location quotient has several advantages when it comes to spatial pattern analysis (Lu, 2000), including its ability to depict concentration in relation to a different ‘standard’ area, in our case different counties. In the context of the present study, the LQ is particularly appropriate for an additional reason: it is a metric which is comparable across different types of data, in this case data on employment and patenting. The construction of the patent LQs was implemented based on the patent data of the OECD REGPAT database which have been linked to regions according to the inventors’ and applicants’ addresses. The patent applications under examination in the present paper are the ones made to the European Patent Office. Regarding the year, address, and way of counting each patent application, certain choices were made, in accordance to the related guidelines set out in the OECD Patent Statistics Manual (OECD, 2009). The year was defined according to the priority date, which indicates the first date of filing of the patent application and therefore can be considered the one closest to the actual invention date. The address considered was that of the inventor, since it gives information about innovation activity in the specific region, while the applicant’s address, which refers to the location of the company that owns the patent, may be in a different country. In cases of patents with multiple inventors, the method used was that of fractional counting, which attributes to each region the percentage which reflects its contribution to the patent. Equal weights were assigned to each contribution.

Manufacturing employment LQFootnote 5Footnote 6:

$$\nicefrac{\frac{Manufacturing\;subsector\;regional\;employment }{Manufacturing\;total\;regional\;employment}}{\frac{Manufacturing\;subsector\;EU\;employment}{Manufacturing\;total\;EU\;employment}}$$

Patent LQ:

$$\nicefrac{\frac{IPC\;class\;regional\;patents}{total\;regional\;patents}}{\frac{IPC\;class\;EU \;patents}{total\;EU\;patents}}$$

Having produced a set of 129 LQs for EU-15 NUTS 2 regionsFootnote 7Footnote 8, we proceed to implement principal component analysis (PCA) in order to capture the co-location of different types of activity.

PCA is a method for reducing the dimensions of a multivariate dataset while preserving a significant portion of its variability by producing a set of uncorrelated factors (principal components) which are linear combinations of the initial correlated variables. In the context of studying innovation dynamics, this methodology has been utilised recently by Kleszcz (2021) in order to aggregate the dimensions of the indicators constituting the European Innovation Scoreboard. PCA provides a particularly good fit to the theoretical underpinnings of our approach, since we seek to produce cluster indicators based on patterns that emerge organically from the data and not on a priori assumptions regarding cluster composition (Jolliffe & Cadima, 2016), while also having a clear view of the most important elements that comprise each cluster.

Given a dataset X consisting of n observations and p variables, the goal of PCA is to find the k principal components that maximise the variance of the data. The principal components are computed by finding the eigenvectors of the covariance matrix of X, and the amount of variance explained by each principal component is equal to the corresponding eigenvalue.

The transformed data can be represented by \(Y=X[{V}_{1}, {V}_{2}, \dots , {V}_{k}]\), where Y is a \(n\times k\) matrix and \({V}_{1}, {V}_{2}, \dots , {V}_{k}\) are the eigenvectors of the covariance matrix of X, arranged in descending order of eigenvalue. The amount of variance explained by the i-th principal component is equal to the corresponding eigenvalue, \({\lambda }_{i}\).

In order to generate the factors, we used Bartlett’s method (Bartlett, 1937) which minimises the sums of squares of factors using least squares. It has been argued in the relevant literature that this process produces factor scores that are highly correlated with their related factors (Gorsuch, 1983) and are unbiased (Hershberger, 2005). We applied the Kaiser-Gutman criterion (eighenvalues > 1) in order to select the number of principal components.

We implemented a three-step PCA: in the first step, we performed PCA on standardised LQ’s in each IPC class category. In the second step, we performed PCA on all factors generated via Bartlett’s method in the first step, and, in the third, final step (whose output is presented in Table 1), we pooled the new Bartlett’s patent factors generated with standardised manufacturing employment LQ’s, in order to implement PCA to produce 5 factors capturing co-located activity, henceforth referred to as cluster indicators. We applied a cut-off value of 0.5 (as indicated by the highlighted values) and labelled these indicators based on their composition (Table 2) as follows: motor and electronics, wood and metal, computer, textiles, chemicals.

Table 1 Cluster generation — PCA third step output
Table 2 Descriptive statistics for cluster indicators

In Fig. 1, we illustrate the cumulative percentage of the sample’s variance captured by our first 5 principal components, which is 60%. This percentage is close to that of the principal components chosen, for instance, in the aforementioned work of Kleszcz (2021)−68%. It should be noted, however, than in the context of the present paper, the primary goal is not to maximise the variance explained by the specific principal components, but rather to interpret the patterns of co-location depicted by them.

Fig. 1
figure 1

Percentage of sample variance explained by principal components

In Fig. 2, we illustrate the 5 principal components via a parallel coordinates plot, a well-established tool for visualising multidimensional data (Xyntarakis & Antoniou, 2019). The plot reveals that no particular region exhibits exceptionally high or low scores across all indicators, and no clear correlations between variables are observed. This is consistent with the orthogonal nature of principal components, which capture the maximum amount of variation in the original data while minimising the correlation between them.

Fig. 2
figure 2

Parallel coordinates plot for principal components

In Table 2, we present the descriptive statistics for the cluster indicators. Certain elements that stand out are that mean and median values are close to zero in all cluster indicators, while the standard deviation is particularly high, ranging from 82 to 87. This indicates high level of disparities among regions when it comes to cluster scores, and in the next section, we will examine more closely the nature of these disparities.

Table 3 presents the detailed cluster composition, based on the first two steps of our PCA. Τhe picture that emerges is one that presents clear elements of the ‘related variety’ and ‘complexity’ concept, i.e. clusters that are not narrowly defined in industry terms but include activity in different industries that are connected in terms of research and production.

Table 3 Cluster composition

In the component we label ‘Motor & Electronics Cluster’, we observe high loadings from two employment categories (manufacture of motor vehicles and manufacture of electrical equipment) and three patent categories (performing operations and transporting, mechanical engineering, chemistry, and metallurgy). Our ‘Computer Cluster’ and ‘Textile Cluster’ components also contain high loadings from three different types of patents (performing operations and transporting, physics, electricity in the computer cluster and performing operations and transporting, chemistry and metallurgy, textiles and paper in the textile cluster). In the ‘Wood & Metal’ and ‘Chemical’ components, we see two patent categories loading highly (fixed construction and mechanical engineering in the wood cluster and chemistry and metallurgy, performing operations and transporting in the chemical cluster), as well as two employment categories in the case of the wood and metal cluster (manufacture of wood products, manufacture of fabricated metal products except machinery).

It is worth noting that in the textile cluster, which centres around an industry usually viewed as ‘traditional’, we observe a high loading of the patent component ‘Organic Macromolecular Compounds and their Composition’ which relates to the shift of the textile industry toward technical textile production, an area of rapid innovation in which Europe has a leading role (McCarthy, 2016).

Cluster Geography

After having produced these cluster indicators, we proceed to examine the spatial distribution of cluster scores, both when it comes to concentration patterns at the European and inter-regional level, but also in regard to specific high-scoring regions, in order to detect indications of the historical evolution of industry specialisation.

In order to examine the degree of EU-wide spatial concentration of our cluster indicator scores, we first utilise the Moran’s coefficient, after having created a first-order queen contiguity weight matrix.Footnote 9 Moran’s I is a statistic used to measure spatial autocorrelation, i.e. the correlation of characteristics of proximal locations, and its values range from−1 (perfect dispersion) to 1 (perfect concentration). It is defined as:

$$\mathrm{I}=\frac{N}{W}\cdot \frac{{\sum }_{\mathrm{i}=1}^{\mathrm{N}}{\sum }_{\mathrm{j}=1}^{\mathrm{N}}{\mathrm{w}}_{\mathrm{ij}}\left({\mathrm{x}}_{\mathrm{i}}-\overline{\mathrm{x} }\right)\left({\mathrm{x}}_{\mathrm{j}}-\overline{\mathrm{x} }\right)}{{\sum }_{\mathrm{i}=1}^{\mathrm{N}}{\left({\mathrm{x}}_{\mathrm{i}}-\overline{\mathrm{x} }\right)}^{2}}$$

where:

  • N: the number of spatial units indexed by i and j

  • \(x\): the variable of interest

  • \(\overline{x }\): the mean of \(x\)

  • wij: a matrix of spatial weights with zeroes on the diagonal

  • W is the sum of all wij

We observe (Table 4) moderate levels of concentration which are significantly higher in the case of the motor and electronics cluster.

Table 4 Spatial autocorrelation of cluster indicator values

Before examining the regional characteristics of the cluster indicators’ geographical patterns, it is worth providing some context at the national level through a metric often used as a proxy for innovation ‘input’, namely expenditure on R&D spending. Figure 3 presents Eurostat data for two years: 2000 and 2010. What instantly stands out is a clear dichotomy between the so-called core and periphery countries of EU-15. The four southern countries (Greece, Italy, Spain, and Portugal) are the four worst performers, an observation which reflects the well documented gap in technological capabilities between core and periphery (Graebner & Hafele, 2020).

Fig. 3
figure 3

Expenditure on R&D (percentage of GDP)

Turning our attention to the maps of NUTS 2, this observation is re-affirmed at a first glance, since it is easily discernible that regions with high motor and electronics cluster scores are concentrated in Germany (Fig. 4). Other spatial patterns that stand out — albeit to a smaller degree — include the concentration of wood and metal in the Austria — Northern Italy wider region (Fig. 5), Computer in the south of the UK (Fig. 6), Textiles in Northern Italy (Fig. 7), and Chemicals in several Dutch regions Fig. 8).

Fig. 4
figure 4

Motor and electronics cluster

Fig. 5
figure 5

Wood and metal cluster

Fig. 6
figure 6

Computer cluster

Fig. 7
figure 7

Textile cluster

Fig. 8
figure 8

Chemical cluster

In Table 5, we present the top-10 regions in each cluster according to their indicator score. As expected when observing the maps, in regard to the motor and electronics cluster, we can observe that 9 out of the 10 top scoring regions are in Germany, thus directly reflecting the country’s dominance in the industry. Four of the top 10 motor and electronics cluster regions are present in other cluster top 10’s as well: Mittelfranken in the computer cluster, Düsseldorf in the chemical cluster, Chemnitz in the textile cluster and Arnsberg — where the logistics hub of Dortmund is located — in the wood and metal cluster. In several of the other top regions, we find headquarters and/or plants of major automotive companies: Mercedes-Benz and Porsche in Stuttgart, Renault and PSA (maker of Peugeot, Citroën, DS, Opel, and Vauxhall) in Île de France, and Ford Europe in Köln.

Table 5 Top cluster regions

Regarding the wood and metal cluster, the presence of natural resources can be expected to be heavily connected to this type of activity. Norrland, for example, has been known throughout centuries as a region rich in resources (Hermele, 2013). It is worth noting that other top-scoring regions with traditions in steel industries such as Arnsberg and País Vasco have, in recent decades, branching towards related sectors (González, 2005; van Winden et al., 2010).

UK regions score highly in the computer cluster indicator. The top 3 regions are located in the UK and specifically in the area surrounding London (Surrey, East and West Sussex, Hampshire, and Isle of Wight, Essex). We can observe the presence of metropolitan centres in other top regions, such as Edinburgh (in Eastern Scotland) and Wien, as well as other established clusters of high-tech economic activity, such as Eindhoven (in the Noord-Brabant region).

When examining the top scoring regions in the textile cluster, one can observe trajectories of economic activity which, as in the case of wood and metal, date back centuries. In particular, Flanders (where the top 2 regions are located) has dominated the textile export market since 1200 and textiles from Lombardy (the region which is at number 5 on the list) constituted a significant part of the Levant trade (Chorley, 1987), while the region of Valencia was a centre for silk production since the eighth century (Boyd-Bowman, 1973).

Turning to the chemical cluster, Hainaut, the top scoring region, is where the first industrial production of ammonia soda based on the process patented by Ernest Solvay (co-founder of the chemicals giant Solvay) took place in 1864 (Aftalion, 2001). In the top 10, we also find the Zuid-Holland region — where Rotterdam is located — and its neighbouring Utrecht region. As Smit notes (van den Bosch & Man, 2013), historically the accessibility of Rotterdam to huge vessels played a major role in the development of a petrochemical cluster (which included what was to become the Shell Pernis petrochemical complex), while later on — from the mid-1960s onwards — the location attracted basic chemical companies since ‘oil products constitute the most important input for these industries’. They were followed by chemical companies and the subsequent development of a network of suppliers of related goods or services. Not all regions appearing in the top 10 lists are, of course, widely recognisable as hosts to significant innovation activity. Drenthe, which appears to have a high chemical cluster indicator is arguably such a case. However, the chemical cluster Emmen, a European leader in specialised chemistry, is located in the region and provides the base for facilities of globally competitive companies such as Teijin Aramid, DSM Engineering Plastics and Low&Bonar.

Cluster Characteristics

Region Characteristics

In Table 6, we proceed to examine a set of metrics for top-10 scoring technology-production clusters concerning gross value added (GVA), R&D spending, and gross fixed capital formation (GFCF) and we compare the median values of the top regions’ data with the median values of all the regions in our dataset.Footnote 10 GVA is often used as a metric for sectoral added value at the regional level (e.g. Montoya & de Haan, 2008), while R&D spending has been traditionally viewed as a proxy for ‘innovation input’ (Maclaurin, 1953), which tends to generate knowledge spillovers (Jaffe, 1986; Nelson, 1959). Such spillovers, Acs et al. (1994) argue, are more crucial for small firms. Finally, gross fixed capital formation is used to illustrate sectoral investment at the regional level (Stirböck, 2002).

Table 6 Top-10 regions’ GVA, R&D, and gross fixed capital formation data

Regarding our descriptive data, we observe that the regions that score highly in the motor and electronics cluster indicator tend to have significantly higher values in all metrics except those concerning the agricultural sector, indicating that this type of cluster is located in highly competitive regions.

On the contrary, regions with high wood and metal cluster scores appear to have significantly lower levels of GVA and GFCF in every sector apart from agriculture, as well as low levels of R&D spending. This is in accordance to the observations on resource-based clusters outlined in the first section, which point out that such economies risk falling victims to lock-in due to their focus on specific types of activity which, as time passes, rely less on knowledge-creation and more on standardised production patterns.

In the top computer cluster regions, we observe high levels of GVA and GFCF in the information and communication and the trade, transport, accommodation and food service activities sectors, and low levels of the same metrics in agriculture. The textile cluster regions appear to have higher GVA and GFCF in agriculture, while the regions with high chemical cluster indicator scores do not have median values which vary significantly from the EU regional median, with the most noteworthy differences being observed in information and communication.

Industry Characteristics

In this section, we take a closer look at the sector characteristics of the top-scoring regions based on our cluster indicators. Specifically, we examine descriptive statistics concerning wages, number of employees, number of firms, and firm size.

A first observation (Table 7) that can be made is that regions with the highest scores in motor and electronics, computer and chemicals, tend to have wages that are significantly (≥ 20%) above the EU median in manufacturing, as well as in each sector which is included in their composition. On the other hand, the top regions in wood and metal and textile clustering appear to have manufacturing wages around the EU median and — with the exception of the wood sector — this is also the case with each sub-sector.

Table 7 Top 10 regions’ wages, employees, firms, and firm size

A second observation is that these regions, apart from scoring high in regard to relative concentration, also tend to have a high number of employees in each relevant sector. Kemeny and Storper (2015) pointed towards different productivity dynamics underlying absolute and relative types of specialisation. In the case of the former, they argue, the three main mechanisms that increase productivity are ‘sharing of input suppliers; matching of specialised labour demand and labour supply […] and technological learning or spillovers’ (p. 1006). When it comes to relative concentration, the authors underlined the potential dominant role of an agglomeration in regional demand for resources, as well as in commanding political attention. One can expect, based on the aforementioned dichotomy, that our indicators capture the presence of dynamics connected both to relative and absolute specialisation.

Our findings indicate that the regions where the highest cluster indicator scores are observed tend also to have a higher firm size in the sectors related to each cluster (as indicated by the total regional sector employment divided by the number of units/firms per regional sector). The greatest differences between top cluster and EU median values are observed in the cases of motor vehicles manufacturing (+ 61%) and chemicals (+ 56%), while the lowest is in fabricated metal (+ 10%). The two regions that stand out in regard to size in the motor sector are Ile de France (home of PSA Peugeot Citroën) and Stuttgart (home of Mercedes-Benz and Porsche). In computer equipment, manufacturing the Southern and Eastern Ireland region has by far the largest firm size average — more than twice the size of the second-best region — as is the case with the Rheinhessen-Pfalz region in chemical manufacturing.

A positive effect of firm size on wages has been consistently observed in related literature, including in studies of the European manufacturing sector (Lallemand et al., 2007). While this phenomenon has been often linked to productivity differentials, it can arguably also be attributed to other factors underlying large firms’ capacity and willingness to offer higher wages (Oi & Idson, 1999). In regard to the debate on the relationship between firm size and innovation-related performance, size advantages of large firms once again come into play, in the form, inter alia, of financial resources, internal knowledge and market power. However, small firms have different types of strengths, such as flexibility and effective communication (Pla-Barber & Alegre, 2007; Rogers, 2004). In a sample similar to the one of the present study, Vaona and Pianta (2008) found that large European manufacturing firms perform better than medium and small sized ones in both product and process innovation. On the other hand, Maskell (2001) argued that the number of firms in a cluster matters for innovation dynamics. For example, the birth of additional new firms and attracting firms from elsewhere is also important for innovative dynamics of a cluster, since co-location of firms within related industries enhances the ability to create knowledge by variation and a deepened division of labour.

Regions with the highest cluster scores do not always have more and bigger firms. The top — scoring regions in the chemical cluster actually have a 3% lower median number of chemical manufacturing firms than the EU total. In the case of the wood and metal cluster, four top regions have a lower median than the EU total median in fabricated metal product manufacturing, and three top regions have a lower median in wood product manufacturing.

Within the same cluster category one can observe a great degree of variance. Düsseldorf has 750 chemicals manufacturing firms, while Drenthe has 18. In the computer manufacturing cluster, the top region (Surrey, East & West Sussex) has an average firm size which is below the EU median (21 employees), and a number of firms which is more than 4 times above the EU median (488). Southern and Eastern Ireland, on the other hand, has an average computer manufacturing firm size of 138 employees — more than 6 times the EU median — and has less firms in the sector than the EU median. In a nutshell, one can observe that in the case of regional innovation systems depicted by our cluster indicator it is not always the case that ‘(absolute) size matters’.

Discussion

When starting out in this attempt to operationalise the concept of innovation systems by creating a novel cluster indicator, there were many reasons to believe it would lead to a dead end. Maybe patenting activity and manufacturing did not co-locate in a way that would be observable via the methodology applied. Maybe the cluster types identified would resemble existing sectoral taxonomies so closely that our approach would essentially offer no added value. Maybe, on the contrary, by using such an open-ended approach the picture that emerged would be so convoluted that no discernible patterns would be identified.

Yet, what instantly emerged from the data was a picture that corresponded, to a significant extent, to the theoretical foundations on which the methodology was constructed: Certain patterns of co-location of concentrated patenting and manufacturing that were homogeneous enough to be classified into distinct groups, but heterogeneous enough to highlight the need to overcome the confines of narrow sectoral taxonomies when studying innovation dynamics.

What was depicted was the presence, in different occasions, of a local context where patenting and manufacturing activity is co-located in activities that are linked across the value chain. While our methodology does not explicitly account for spillovers and network effects, the assumption, based on the related literature, is that the context depicted tends to provide fertile ground for the development of such dynamics. This has to do with spillovers occurring within regions with a strong concentration of knowledge production and use, but it also relates to the capacity of such regions to attract, absorb and transform external spillovers.

The composition of the cluster groups generated points toward the need to move beyond strict sectoral taxonomies when studying innovation systems, in order to capture the branching to new sectors that may not have been as strongly related previously. Most components produced contained high loadings from three or more different patent categories and it was often the case that they contained loadings from two different employment sector categories. Hence, a priori categorisations, while convenient, fail to capture the complexity of modern knowledge and production eco-systems.

Conclusions

The results of this paper indicate that in today’s complex and evolving economy, the study of innovation can benefit from moving past artificial boundaries regarding the nature and structure of innovation systems. In future research, the fundamental principle on which this methodology has been based can be extended and applied to many types of data linked to the innovation process. Given the rapidly growing availability of data and pattern recognition techniques, there is no reason to limit oneself to static assumptions. It is easy to understand why studying the automotive industry without taking into account electronics would not make sense, yet this is exactly what one would do if relying on previously applied taxonomies. A main direction this research can be built upon is by addressing one of its main limitations, namely the absence of explicit modelling of connections between actors in the innovation systems. In the model presented, such connections have been assumed to exist based on collocation in order to be able to apply the methodology on a large scale. However, using data, for instance, on co-citations and input-outputs and methods on network analysis, one can narrow down on specific clusters produced and provide a more complete picture by depicting intra-regional and inter-regional linkages. Furthermore, while the current analysis provides a single ‘snapshot’ of cluster composition, the same methodology can be applied to data spanning a wider time-frame, in order to depict the evolution of cluster dynamics in more detail. This will potentially allow for the study of ways in which regional clusters follow path-dependent trajectories and also create new paths by branching to related sectors.

This point has direct policy implications, since it is imperative for policy-makers to have a real-time view of the geography of innovative activity. Much can be lost in translation if policy is designed based on models that fail to illustrate emerging and evolving innovation ecosystems. The evolution of an economy is a complex process whose effects have many dimensions. Adapting to it in a way that benefits society the most requires constantly recalibrating our assumptions in accordance with the economy’s rapid transformations. This can translate in tailor made policy initiatives that will help build on regional advantages while also generating the potential for diverse evolutionary trajectories.