Data and sample
Our empirical strategy enables us to capture a time dimension between founder work experience and KIE firm performance. We capture data to measure survival and performance in 2015 at the firm-level, and link it to data from 2011 about the characteristics of founders and firms, as developed in a large-scale European survey of knowledge-intensive entrepreneurial firms (AEGIS Project 2013).
The AEGIS survey is our starting point, as it was an exploratory attempt to map the activity and characteristics of knowledge-intensive entrepreneurship in Europe. The survey was designed by the research teams in the AEGIS project, and implemented through telephone interviews subcontracted to Global Data Collection Company. The survey drew largely from the Amadeus database, with supplementation from a few other databases. Amadeus is a massive firm database owned by Bureau van Dijk, a privately owned business intelligence conglomerate, covering over 12 million European-based business entities.
Originally the sectoral query used in identifying the sampling frame, which was targeted to maximize the frequency of KIE firms, returned 547,678 companies. After cleaning, this number dropped to 338,725 firms.Footnote 3 Contact information was found for 180,215 firms, and in order to retrieve the target sample from each country and each sectoral grouping the dataset was complemented by a few other databases (Dun & Bradstreet, Kompass, and others). This resulted in a final sampling frame of 202,286 firms. A target response rate was set at 4000 firms, and the sample was randomized with stratified sampling occurring in each distinct country (Croatia, Czech Republic, Denmark, France, Germany, Greece, Italy, Portugal, Sweden, and the United Kingdom). At survey completion, 4004 firms had been surveyed. The KIE definitions employed in the EU project (as later described in Malerba et al. 2016) served as a screening mechanism for the survey, where the firms needed to be: at the time less than 8 years of age; involved in market activities (exploiting innovative opportunities), as well as independent (e.g. no subsidiaries, just changed their legal status, etc). A number of questions related to knowledge, innovative, opportunities, and barriers. The survey was aimed to find KIE firms across sectors, where Table 1 shows the NACE classifications. Regarding the choice of certain low and medium tech sectors, a pilot project to AEGIS identified a subset of sectors in Europe that were crucial for growth in the region/nation, and for employment (EC 2006), in addition to containing high degrees of entrepreneurial activity in the countries sampled. Regarding service sectors, KIBS classifications from the OECD were applied, as well as other knowledge-rich service sectors at the NACE level (OBS in the data). So empirically, the firms sampled contain the often sampled NTBF-heavy sectors like pharmaceuticals, medical devices, and machinery, but also sectors that fall outside these to include more types of knowledge intensity and innovation. In other words, the AEGIS survey aimed at exploring the broader population of KIE firms by surveying diverse domains with high potential for KIE (Malerba et al. 2016). These include high tech sectors, where small, new firms are often innovators; low and medium tech sectors, where users and appliers of new scientific, technological and creative knowledge may be more prevalent; and finally knowledge-intensive service sectors, where more types of co-creation driven innovation and innovative applications of new knowledge (Tether 2000) are known to occur.Footnote 4 In addition, micro firms (firms with 10 employees or less) constitute the majority of the firms sampled in the AEGIS survey (64%), a population that receives as of yet little empirical attention in other large scale surveys. The survey included questions about the founder and founder team, and is ambitious in its depth of analysis of core constructs to isolate the knowledge-intensive and innovative components of the KIE venture. Many items in the AEGIS survey draw upon questions from validated existing surveys, such as the GEM and Community Innovation Survey (CIS) as well as new questions.
A limitation of the original AEGIS survey is that it does not include data about performance, nor allow longitudinal or hierarchical modeling potential, due to its cross-sectional nature. Prior analysis based upon data from this survey either gives a broad overview of the characteristics of these firms (Malerba et al. 2016; Protogerou et al. 2017) or defines specific characteristics of KIE firms as compared to non (less) KIE firms (Malerba and McKelvey 2018a). To analyze performance in terms of both survival and growth rates, we have supplemented the original survey with data from the same Amadeus database from which the firms were originally drawn, which in the time since the survey was administered was merged into the larger Orbis database. This was possible for 2978 of the 4004 firms in the AEGIS sample. Due to the matching of these two datasets, as well as further complications of missing values in the data, the sample size in the growth and survival regressions was considerably reduced. However, descriptive statistics below show that the variables do not drastically change in value, and our sample size is still adequate for the type of modelling used, and through testing for the characteristics of the missing data (which we assess as being missing at random (MAR)) we are confident that the models are not compromised by the missingness in the data (see Tables 2, 3, 4, 5, and 6 for details).Footnote 5
Firm exit, or, the Cox proportional hazard function of whether or not the firm was marked as inactive during the period between founding and the present date was derived.Footnote 6 This model is often applied to studying the exit by entrepreneurial ventures (Agarwal and Audretsch 2001; Esteve-Peréz and Mañez-Castillejo 2008; Segarra and Callejón 2002; Strotmann 2007). The Cox model is useful in research problems where the baseline hazard function is unknown or otherwise hard to estimate, as is often the case with firm-level survival data. This is because the Cox model leaves the baseline hazard function unspecified, taking on a semi-parametric character. The flexibility of the Cox function also makes it ideal for application in our hybrid cross-sectional dataset drawing from Orbis data as the AEGIS survey, since this data is a combination of two dataset largely containing time invariant independent variables. To combat potential survival bias in the data, left truncation to the model was applied by specifying the time in the firms’ lifespan which the surveying took place.
To derive survival information from the data, the firms in the AEGIS survey was combined with the firm status variables used in Orbis, and matched by ID number. Accordingly, the survival analysis dataset is limited to those 2978 firms in the AEGIS survey that were taken from Amadeus/Orbis. Since the survival indicator was collected via Orbis at a later date (October, 2015) than the administration of the survey (ranging from the fall of 2010 to the spring of 2011), it serves as a valid indicator.
The firm growth analysis is based upon the same 2978 firms, by using this additional financial reporting data. Specifically, we calculate the averaged year-on-year natural logarithmic growth of number of employees and logarithmic operating revenue growth of the firm over 6 years (2010–2015) (cf. Bracker et al. 1988; Coad et al. 2016; McKiernan and Morris 1994). Firm growth is a heterogeneous concept that is best measured using multiple indicators (Delmar et al. 2003; Coad et al. 2016; Haltiwanger et al. 2013).
In addition, we use a quantile regression approach for growth. By viewing Figs. 1 and 2, the quantile comparison plots of the two growth variables, and Figs. 3 and 4, the corresponding density plots, it is apparent that many firms in the sample have not experienced any growth at all., but also that something interesting is occurring in the tails of the distribution. A quantile regression approach, since the technique is extremely resilient to skewed data, and helps us assess statistical associations with different ‘centers of gravity’ in the data space.
Therefore, in order to estimate the growth regressions appropriately, we chose to employ the quantile regression (inclusion of observations conditional on survival). Quantile regression is an alternative to mean regression (which has proven problematic in assessing firm growth outcomes due to inherent biases of firm size and growth, including growth outcomes for small and micro firms (Haltiwanger et al. 2013). Table 2 conveys the summarized descriptive statistics of all four outcome variables used in our models. When reporting our statistical results for the growth regressions, we follow previous research on firm growth (Coad et al. 2016) by making use of the 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, and 0.95 quantiles. However, we view ‘high growth’ as the growth at or above the 90th quantile, so in the subsequent text we focus our results on the 90th and 95th quantile.
Entrepreneurial experience is a binary variable taking the value 1 if one or more of the founders had prior experience either owning an existing firm, owning a firm that has ceased operations, or was self-employed, and taking the value 0 otherwise. Academic experience is a binary variable taking the value 1 if one or more of the founders of the venture had prior work experience at a university or research institute, and 0 otherwise. This serves as a commonly applied proxy for the firm being an academic spinoff (Perkmann and Walsh 2007; Perkmann et al. 2013). Industry experience is a binary variable taking the value 1 if one or more of the founders of the venture had prior experience working in the same industry as the focal firm, and 0 otherwise. Lastly, Total experience is a categorical variable (0–3) that sums all types of above experience present in the founder(s) of any given firm.
We also include a set of control variables for all regression. To assess the association of the main explanatory variables on performance for KIE firms, we control for the richness of knowledge intensity in two ways. First, following Malerba and McKelvey’s ‘Super-KIE’ coding method (Malerba and McKelvey 2018a), an entrepreneurial firm is coded as being ‘Super knowledge intensive’ in the following manner, if: i.) The firm has introduced an innovation in the past 3 years; ii.) At least one member of the founding team has completed at least a bachelor’s degree; iii.) At least one member of the founding team has expressed their main area of competence to be technical and engineering knowledge, or, product design skills. This variable takes the value 1 if the firm satisfies all these prerequisites and 0 otherwise. We also address employee knowledge intensity using a binary variable denoting whether or not the firm has 33% or more of its employees possessing at least a tertiary degree (the European Commission (2006, 2013) defines industries with 33% tertiary education averages as knowledge intensive activities). Using this indicator, we control for knowledge intensity through non-founder human capital levels as previous studies have shown this aspect of human capital to also be influential for firm performance and growth (Delmar and Wennberg 2010; Siepel et al. 2017; Smith et al. 2005). This and the above variable both contain components of the education level of the founder as well as of the employees, so in a sense we control for this aspect of human capital investments (Becker 1964). Firm age: For growth regressions, we include a variable based on the year the venture was established subtracted from year of collecting data: 2015 (screened for change in legal status of existing firm). Team size: The size of the founding team, following Agarwal and Audretsch (2001). International sales: Percentage of international sales estimated by survey respondents, was included due to previous research finding associations between early internationalization and knowledge intensity in new ventures concerning both growth (Autio et al. 2000), and survival (Sapienza et al. 2006; Mudambi and Zahra 2007). The logit, or log-odds, transformation was employed on this variable following graphical interpretation of the distribution of the data. R&D intensity: Self-estimated R&D Intensity (by percentage of sales invested), in order to control for the effect of R&D on our variables of interest (Hagedoorn and Cloodt 2003) and the effect of absorptive capacity of the firm (Cohen and Levinthal 1990; Tsai 2001). This variable is also transformed via a logit (log-odds) function.
We further control for the firm’s industrial Sector: HTMS (High-tech and medium-to-high-tech manufacturing sectors) ¸ LTMS (Low and low-to-medium tech manufacturing sectors); KIBS (Knowledge intensive business services); OBS (Other business services). See Table 1 for sectors included as well as sectoral composition of the data used. Similar to the country identifiers, the sectors were derived from combining categorically the different sampled sectors. This was done to smooth out the effects of the control variable, and to match the categories assigned to the sectors in the AEGIS project and survey.
To control for the firm’s degree of resilience during the economic crisis of 2008–2009, a principal components analysis of the rating scale question estimating the degree of fluctuation (in percentile categories) of sales, exports, employment, profits and investments. The most influential component was extracted as a variable to control for the effects of the crisis, and is labeled Crisis.Footnote 7 Self-reported past estimated fluctuation in sales (Past sales growth) as well as employees (Past employee growth) from 2007 to 2009 were also included as controls.
Lastly, we control for different types of growth barriers experienced by the firm. To this end two additional principal components analyses were conducted, respectively, on summated rating scales assessing competitive barriers to growth (including technology and market risk/uncertainties, initial investments, funding, partnerships, hiring, and technological know-how), and regulatory barriers to growth (including tax regulations and rates, time consuming bureaucracies, poorly enforced laws and IPR regulations, government favor, bankruptcy and insolvency proceedings, and labor market legislation). For each of these types of barriers, we selected the most influential component and added it as a control variable: Competitive barriers and Regulatory barriers, respectively.Footnote 8
Table 3 shows the summary descriptive statistics of the explanatory and control variables in a manner similar to the response variables, both for the quantile and the Cox proportional hazard model samples. This is to show the comparability of the samples despite the change in number of observations. Table 4 shows the inter-correlations of the principal components used as control variables with their respective summative rating scales, while Tables 5 and 6 show inter-correlations between the variables in both sub-samples.