Microfirms and innovation in the service sector

In the context of microfirms, this paper analyzes whether the link between the three aspects involving innovative activities—R&D, innovative output, and productivity—hold for knowledge-intensive services. With especially high start-up rates and the majority of employees in microfirms, knowledge-intensive services (KIS) have a starkly different profile from manufacturing. Results from our structural models indicate that KIS firms benefit from innovation activities through increased labor productivity with highly skilled employees being similarly important compared to R&D for creating innovation output in microfirms. Moreover, the firm size advantage of large firms found for manufacturing almost disappears in KIS, with start-ups and young firms having a higher probability of initiating innovation activities and of successfully turning knowledge into innovation output than mature firms.


Introduction
A robust literature confirms that firm size is positively associated with the decision to invest in R&D (see, inter alia, Hall et al. 2009, Baumann andKritikos 2016). Yet, the literature primarily studies manufacturing firms, with knowledge-intensive services (KIS) that have a large innovation potential within the service sector, 1 being analyzed to a lower extent. However, firms in KIS industries are playing an increasingly important role, as reflected by their start-up rates (Fritsch et al. 2015;Konon et al. 2018) and job growth when compared to manufacturing. Moreover, while the majority of employees in the manufacturing sector works in large firms, the opposite is true for KIS industries. As such, it is important to better understand how microfirms-in comparison with larger and mature firms-in the KIS https://doi.org/10.1007/s11187-020-  industries engage in and benefit from innovation activities.
Empirical research on whether firms in KIS industries innovate and are able to turn innovative products and services into higher productivity typically concentrates on firms with at least 10, if not 20, employees (see, e.g., Lööf and Heshmati 2006). The nascent research on the innovative activities of microfirms in the KIS industries, covering 90% of all firms in this sector, is yet to yield significant results. Obvious questions arise. Given that a positive relationship between firm size and the probability of engaging in innovation activities is consistently found in previous research, does this conversely mean that microfirms in KIS industries abstain from innovation?
A second research question relates to causality issues between innovation outcomes and firm productivity. The established empirical approaches applying structural models (see Crepon et al. 1998) assume, without being able to establish a causal effect, that the introduction of an innovative product or service to the market is expected to increase firm productivity (Hall 2011). However, recent research points to issues of reverse causality, according to which more productive firms are associated with stronger innovation activities (Aspara et al. 2018). Thus, it remains unclear how innovation activities and firm productivity interact in service firms.
Therefore, in this study, we take issues of causality into account and analyze to what extent do microfirms in KIS industries conduct activities to become innovative and, ultimately, more productive. By comparing microfirms with large firms, we also analyze to what extent do firm size and firm age matter in the KIS industries with respect to the decision to invest in new knowledge. For our analysis, we use the IABestablishment panel, which is a representative annual German firm survey that offers information on all industries in all firm size and firm age classes. Subjecting these data to systematic analysis enables us to contribute to the existing literature in three important ways: First, we provide empirical evidence on the triad relationship between innovation input (like R&D activities), innovation output (like a new product or service), and productivity in the KIS part of the service sector, also in comparison with manufacturing. In contrast to previous research, we separately analyze firms with less than 10 employees and differentiate them by their firm age. Thus, secondly, by comparing microfirms with larger firms, we investigate whether small scale imposes a burden in terms of potential threshold levels for success of R&D investments. Thirdly, as we extend the structural model of Crepon et al. (1998) by incorporating the structural model developed by Ackerberg et al. (2015) at the third stage of the innovation process, our analysis adds in general to the discussion of a causal relationship between innovation and firm productivity.
We find that KIS firms of all age and size classes are able to turn innovative input into innovation output, with R&D investment but also highly skilled employees being responsible for innovation output in microfirms. We further show that this newly produced knowledge causally increases productivity. However, the role of firm age and size is disparate across industry sectors. With respect to firm age, we observe that, in contrast to the manufacturing sector, start-ups and young firms in KIS industries are more likely to engage in innovation activities and, even more importantly, are more likely to successfully turn innovation input into a product or service innovation than mature firms are. And while larger firm size bestows advantages for innovation and productivity in manufacturing, our findings suggest that micro-and small firms in knowledge-intensive services are less burdened by an inherent firm size disadvantage. This outcome is important for economies suffering from low shares of large firms such as in Southern Europe. Innovation activities enhance firm productivity and ultimately economic growth through knowledge-intensive services, even if firms tend to be mostly micro or small.
The rest of the paper is organized as follows. In Section 2, we document the relevance of KIS industries in the economy, consider size-related differences in R&D decisions between manufacturing and KIS industries, and review the empirical findings. Data and summary statistics are reported in Section 3. Section 4 outlines the estimation strategy. The empirical results, robustness checks, and the discussion of limitations are presented in Section 5. A summary and conclusions are provided in Section 6. 10,000 employees are established annually in KIS industries, consisting of firms in the industries of "ICT" and "scientific and technical services." By contrast, around two start-ups per 10,000 employees in the manufacturing sector, of which every fourth (i.e., 0.5 per 10,000 employees) has the potential to innovate, showing that the number of new businesses with an innovation potential is significantly higher in KIS industries (Konon et al. 2018).
Secondly, when putting all firms together, the knowledge-intensive service firms comprise an important part of the German economy. 2 In 2014, this part of the service sector covers 23% of all establishments in Germany, contributing 17% to the German gross value added and accounting for 13% of all employees. For comparison, 26% of all jobs were in the manufacturing sector contributing 34% to the nation's gross value added. The importance of microfirms in terms of their pure number is well documented. While in manufacturing, 64% of all firms have fewer than 10 employees, the dominance of microfirms is striking in the KIS services: in these industries, 90% of all firms are microfirms with fewer than 10 employees (Federal Statistical Office Germany 2018).
How relevant microfirms in this sector are becomes clear when focusing on employment in KIS services, as depicted in Fig. 1. While the majority of employees work for firms with at least 250 employees in manufacturing, the opposite is true for the knowledgeintensive services (KIS). In KIS, the highest share of individuals-over 30%-work for a firm with 10 or fewer employees. Thus, figures on microfirms suggest that a considerable and important part of the economy remained unexplored in previous research with respect to potential innovation activities.
2.2 Firm size and innovation processes in the service sector Firms engage in R&D activities in order to introduce new products and services or to improve the quality of their products or services, increase sales, or reduce production costs, ultimately fueling productivity increases. In this context, in innovation economics, the most prevalent approach to analyzing innovation at the firm level traces back to Griliches (1979), who introduces an augmented Cobb-Douglas production function that explicitly includes knowledge as an input, along with capital and labor, linking it to output and productivity. The framework describes the process from investment in research by using past and present R&D expenditures to approximate the state of technical knowledge and to estimate its effect. A vast empirical literature confirms the validity of the knowledge production function and shows that R&D investment is positively related to firm productivity (see surveys by Griliches 1998;Griffith et al. 2004;Hall et al. 2010), and there is good reason to assume that the general innovation pattern is the same in knowledge-intensive services as in manufacturing (Tether 2005).
Beyond the well-established research that analyzes the triad relationship between R&D, innovation, and productivity, we concentrate in this section, from a conceptual point of view, on the question why firm size and firm age play a key role (Acs and Audretsch 1987;Acs et al. 1994;Cohen and Klepper 1996a) and how innovation processes may differ in this context between the two sectors when comparing firms in KIS industries and manufacturing. There are compelling reasons for a positive relationship between the decision to invest in R&D and firm size. Theory revolves around the two conditions driving the investment decision: opportunity and appropriability.
In terms of opportunity, there are two reasons why access to investment funds for R&D is limited for smaller firms. The first is the lower level of profitability associated with smaller firms and the limited amount of internal capital available for investing in R&D (Mairesse and Mohnen 2002). The second is that smaller firms are more informationally opaque for financial institutions than larger ones, making it more difficult for providers of external finance to assess the quality of the projects proposed for funding (Berger and Udell 2002). Thus, smaller firms are more likely to face financial constraints when seeking external finance to invest in 2 As we use German data for our analysis, we follow the German definition of this sector, which has minor differences to the European definition. According to the German definition, KIS services are made up of all of the ICT services (J), of the financial industry (K), all of the scientific and technical services (M), of Human health activities (Q), plus the 2-digit industries of creative, art, and entertainment activities (R90) and libraries, archives, museums, and other cultural activities (R91). The financial sector is excluded in our analysis, because the general concept of inputs and outputs does not fit for this industry, which makes it very difficult to estimate production functions for this sector. We also exclude public services (Q, R90, and R91). In Germany, prices and quantities in these services are determined by administrative and social insurance entities. In fact, output-especially value added-is calculated by statistics as sum of inputs. Thus, a production function approach with its underlying concept of competition, market prices, efficiency, and technological progress is not applicable. innovation activities (Stiglitz and Weiss 1981;Czarnitzki and Hottenrott 2011).
In terms of the second dimension, smaller firms are limited in their ability to appropriate the returns accruing from R&D investments since the scale of their production and sales is inherently limited (Cohen and Klepper 1996a). This holds for start-ups and firms that engage in R&D for the first time, due to sunk start-up costs (Peters et al. 2017) or missing management experience, explaining why R&D investments might be more limited in these firms relative to their larger counterparts.
From these two conditions, it would seemingly follow that smaller firms are burdened by an inherent innovation disadvantage. However, while theory and most of the empirical findings apply to manufacturing, both the production and innovation processes are, to an extent, different among KIS firms than among manufacturing firms in that they may affect innovation processes in firms. Firms in KIS industries have different capital requirements for starting a business as well as different labor qualification requirements, and differ with respect to the physical production locations and to output tangibility. These may also influence the process of innovation activities.
More specifically, focusing on the production process, per se, KIS firms generate more customized knowledge products and services than manufacturing firms (Gallouj and Weinstein 1997). Therefore, the role of scale economies in producing such intangible goods is less relevant in much of the knowledge-intensive services (except for network-based services). Further, higher real capital requirements (e.g., in terms of machinery) may facilitate a larger scale of output that, in turn, is more conducive to innovative activity and enhances productivity in manufacturing, where scale and efficiency are positively related. This may not necessarily hold for knowledge-intensive services in the same way, as there are lower capital requirements in terms of physical capital. Hence, in most parts of knowledge-intensive services, firm size and firm age play a different role in influencing R&D investment decisions than they do in manufacturing. 3 To some extent, there are also differences between firms in KIS industries and manufacturing that are directly related to innovation processes (see also Forsman 2011). First, the process of creating innovative products and services is different in KIS industries. Because physical capital requirements are generally lower among KIS firms than in manufacturing, producing innovative products and services, or developing new processes that improve the delivery of products and services, also tend to be less capital intensive and, therefore, requires relatively smaller investments. It is also less resource intensive in terms of the R&D work force, and there might be no need for a physical production site to produce a new service product or to implement a new process.
A further issue relates to the question of the extent to which formal R&D investments in the service sector produce new knowledge in the same way it does in manufacturing. First, we observe that, in knowledge- intensive services, the majority of R&D expenditures (75%) are used to finance R&D workers (Stifterverband 2017), clarifying that R&D spending is invested in highly educated "brains" and less in machines or equipment. Moreover, there is a discussion concerning the extent to which innovation output is produced without formal investment into R&D. Empirical research shows that, even in manufacturing, a certain share of firms produce innovative output without a formal R&D budget. This issue of formality might be more important in the context of professional KIS firms. Individuals producing knowledge on a daily basis may observe opportunities for innovation and, thus, are more frequently able to contribute to the generation of new knowledge during their routine work or in exchange with their customers, which needs then to be effectively managed through appropriate knowledge management strategies (Storey and Kahn 2010).
Overall, these considerations make clear that both opportunity and appropriability may influence R&D decisions in KIS firms differently from manufacturing. There might be lower threshold levels and reduced financial constraints, allowing us to posit the hypothesis that differences in firm age and firm size should be less important in the decision to engage in innovative activities.

Previous research and research questions
Crepon et al. (1998) introduce a structural model (the "CDM model") that connects the approach of Griliches (1979) with a knowledge production function similar to Pakes and Griliches (1984). 4 The model, which relates R&D effort to its determinants, includes an innovation equation linking R&D effort to innovation output and a production function linking innovation output to labor productivity. Their framework is now the workhorse model for empirical analyses and is used to examine the elasticity of labor productivity to R&D investment through innovation at the firm level, by harnessing data collected as part of the Community Innovation Surveys (CIS). 5 The majority of research on the relationship between R&D, innovation, and labor productivity is confined to manufacturing firms (see Hall 2011, Mohnen and Hall 2013, and Lööf et al. 2017. Recently, Peters et al. (2017) analyze the relationships between research, innovation, and productivity using a dynamic structural model of a firm's decision to engage in R&D that is contingent on R&D expenditure and prospective payoff. As for the effect of firm size on innovation, Cohen and Klepper (1996b), Hall et al. (2009), andBaumann andKritikos (2016) examine the relationships between R&D, innovation, and labor productivity in SMEs in the manufacturing sector. All studies find that SMEs produce substantial innovation output, but show that firm size is positively associated with a firm's ability to produce innovation output. By contrast, only few studies point to rather inconclusive results with respect to the relationship between firm age and innovation activities, again in the manufacturing sector (Huergo and Jaumandreu 2004).
There are first studies analyzing the service sector separately from manufacturing in developed economies by making use of structural models and by following Griliches (1979). Lööf and Heshmati (2006) use data from Sweden for the 1996 to 1998 period and find homogeneity for the two sectors in the key elasticities between innovation input, innovation output, and labor productivity. Mairesse and Robin (2010) and Musolesi and Huiban (2010), 6 relying on various French data from 1998 to 2000 and 2002 to 2004, show that KIS firms are able to produce innovation outcomes and that product innovation is positively correlated with labor productivity (while process as well as non-technological innovation is not). Also, using CIS data, Segarra-Blasco (2010) estimates a CDM model that considers product, process, and organizational innovation as dichotomous variables. The study points to heterogeneity between manufacturing and service firms. It is the only study that reveals an age effect in the sense that young firms in the KIS sector with more than 10 employees are more often carrying out R&D. Yet, given their data, the study is limited to a cross-sectional setting. Peters et al. (2018) estimate a CDM model using CIS data on the service sector in Germany, Ireland, and the UK, covering the 2006 to 2008 period and including firms with at least 10 employees. Measuring innovation input in terms of innovation investment, they find that innovation in the service sector is associated with higher productivity.
These studies face data limitations and do not identify a causal relationship between innovation output and productivity. As for the first limitation, previous studies lack information on a number of aspects, such as firms with fewer than 10 employees or even fewer than 20 employees, as in the case of Lööf and Heshmati (2006) and Mairesse and Robin (2010), as well as Musolesi and Huiban (2010). The huge number of microfirms in this sector including most start-ups is not analyzed. The studies also lack information on materials, on high-skilled employees (with the exception of Lööf and Heshmati (2006) and Segarra-Blasco (2010)), and in the case of Peters et al. (2018), on capital. As for the second issue, all studies of the service sector (except Lööf and Heshmati 2006), similar to nearly all CDM studies on manufacturing, are due to cross-sectional data availability and make only statements about correlation, but not causation between innovation and productivity. The main issue with CIS data is that R&D expenditures are observed in t 0 while innovation output is observed in t −1 to t −3 ; thus, innovation input (R&D) is observed subsequent to innovation output. Therefore, studies based on CIS data rest on the strong assumption that firms continuously invest in R&D and that the R&D observed in t 0 is not different from the innovation input between t −1 and t −3 .
Our study deviates from previous analyses in three ways: firstly, by including microfirms, where the majority of employees work in knowledge-intensive services and by analyzing how microfirms do in terms of innovation activities in comparison with larger firms; secondly, by forgoing the assumption that the observed innovation input in t 0 explains the innovation output prior to t 0 ; and thirdly, by estimating a causal relationship between innovation output and labor productivity in the service sector. More specifically, we are able to analyze the following research questions: Are firms in KIS industries that engage in innovative input, more likely to create an innovation output such as a new product or service than firms that do not engage in such innovative activities? When conducting this analysis, we have a special focus on microfirms. Moreover, to what extent do these firms generate innovation output without formal R&D? We furthermore investigate the extent to which firm size is a burden in the service sector, namely, (a) when the decision is made to engage in innovation and (b) when firms are aimed at translating innovation input into innovation output. Finally yet importantly, we causally examine whether the link between innovation and productivity works in this part of the service sector, i.e., whether the ability to innovate causally increases firm productivity.

Data
This study uses the IAB Establishment Panel (IAB-EP), an annual survey of approximately 16,000 establishments with at least one employee liable to social security. The survey is conducted by the Federal Employment Agency. The establishments are drawn from the BA establishment file, which comprises roughly two million establishments that have notified the social security agencies of their employees, as stipulated by law. It is representative within size and age classes, as well as at the industry-group and federal state levels. In addition to questions directly concerned with employment, the survey also inquires about business performance, investment, and R&D engagement. 7 Using dichotomous variables, as defined in the Oslo Manual (OECD/Eurostat 2005), the survey takes account of the introduction of new products and services, as well as the implementation of new processes that helped to improve the production process or the provision of services within the firm. However, it does not ask whether innovations are successful in terms of increased sales or reduced costs.
The IAB-EP covers start-ups and microfirms with one employee across all industries, a distinct advantage over other German panel surveys on innovation (Maaß and Führmann 2012). One disadvantage of the IAB-EP is that it considers establishments instead of firms. Yet, the overwhelming majority of micro, small, and medium-sized enterprise (MSME) and virtually all start-ups are single-establishment firms, which is why this disadvantage is a minor issue. Therefore, we also speak of firms throughout the rest of the paper.
In line with the MSME definition of the European Commission (European Commission 2003), we restrict the sample to establishments with no more than 249 employees. We distinguish them by the number of employees, i.e., between micro-(1-9 employees), small (10-49), and medium-sized establishments (50-249), and do not introduce further restrictions based on revenue. 8 We further differentiate them by the firm age and build three categories. We define start-ups as firms in an age class between 0 and 5 years (Robehmed 2013), middle-aged firms in an age class between 6 and 19 years, and mature firms (20 years and older). We retain sole proprietorship, partnerships, and private limited liability companies. 9 Moreover, only observations with complete information for all variables are included and used in the analysis. We concentrate on observations for firms active in the manufacturing industry or knowledgeintensive services (Gehrke et al. 2013). The latter contains firms in the areas of information and communication, as well as in professional, scientific, and technical activities. Establishments in the financial services and insurance sector are excluded.
The dataset covers the waves from 2009 to 2014. The questionnaire of each wave addresses two different points in time. Questions regarding inputs and the economic output, such as sales or innovation, refer to the previous year (t − 1). Labor-related questions such as education of workers, but also the question regarding R&D, refer to the current year (t). Therefore, the data are rearranged so that all variables refer to the same year, avoiding disadvantages of CIS data where innovation input is aligned with innovation output from the previous 3 years. Thus, unlike CIS data, we do not need to assume that firms continuously invest in R&D. The disadvantage, however, is that we always need two consecutive observations, which reduces the total number of available observations. After this initial data preparation, the final sample comprises 12,297 observations, of which 9317 belong to manufacturing industries and 2980 are in the knowledge-intensive services. 10 Table 1 shows the descriptive statistics for those variables used in the analysis for both manufacturing and service firms, also separated by firm size and age classes (see also Table 7). It reveals that KIS firms are considerably different from manufacturing firms. They have more start-ups and are, on average, younger on the one hand. On the other hand, they have lower sales, make smaller investments, use less material and intermediate inputs, and also employ less labor per firm. Table 7 shows that this holds when we compare the two industries within size classes.
While 28% of all manufacturing units in the dataset report R&D engagement, only 19% of KIS firms do. In these services, the share of large SMEs (50-249 employees) engaging in R&D is almost 4 times higher than that of microenterprises (1-4 employees). Moreover, in both the manufacturing and services sectors, another 30% of all firms report an innovation without stating a formal R&D engagement, such that the overall innovator share is a little less than 60% among manufacturing, while among all KIS firms, nearly every second firm innovates. This differs across firm size classes, where 36% of the very microfirms and 73% of medium-sized firms in the KIS sector report a successful innovation (Table 1).
Nearly all service firms innovating report a product or service innovation. Process innovation relating to the implementation of new processes that improved the service provision within the firm is reported less frequently, ranging between 10% for the very microfirms and 40% for the largest ones. In this context, we should also emphasize that, in KIS industries, a product often denotes a customized service process. Therefore, incremental product or service innovation and process innovation tend to be to a certain extent "synonymous" (Gallouj and Weinstein 1997, p. 542), which is why it is difficult to differentiate between these kinds of innovations in the KIS industry.
Firms from the knowledge-intensive services also differ with respect to the share of high-skilled employees: at 22%, this share is significantly higher than in manufacturing. Compared to manufacturing firms, KIS firms face less competitive pressure, which matches with the observations that they are less active in international markets. The reduced competitive pressure also seems to lead to more companies in KIS assessing their earning situation as good (the variable "profitable") than in manufacturing industries. The share of KIS firms investing in the training of their employees is 65%, compared with 58% in manufacturing. In addition, almost 80% of firms in KIS report modern technical equipment, in contrast to less than 60% in manufacturing. This is in accord with the fact that investment cycles and necessary equipment are quite different between, for instance, a mechanical engineering firm and a consulting firm. Overall, Table 1 reveals that firms in manufacturing and in the knowledge-intensive services differ in several ways from each other.

The CDM model
We start by briefly describing the CDM model (Crepon et al. 1998) in a variant proposed by Mairesse et al. (2005). This variant uses occurrence rather than the intensity of R&D engagement. Thus, the strict selectivity issue concerning R&D and innovation intensity does not arise, for which Crepon et al. (1998) had to correct for in their specification. The model breaks down the innovative process between a firm's decision to invest in R&D and its productivity into three recursive steps. The first step uses a probit model to estimate the probability of engaging in R&D. The R&D decision r * jt of firm j at time t is modeled as follows: where r jt is the observed binary variable for R&D activities, r * jt is an unobserved latent variable 11 that defines the probability of engaging in R&D, X 0 jt is a vector of determinants affecting firms' decision to undertake R&D investment, and e jt is the error term. If the unobserved latent variable r * jt is larger than a certain threshold level b c, the observed r jt will equal one and zero otherwise.
The second step models the "knowledge production" (Pakes and Griliches 1984), which is the transformation from innovation input to innovation output, as follows: where i jt is the observed binary variable for innovation (as mentioned in Section 3 regardless whether it is a product, service, or a process innovation), r * jt is the latent R&D decision that will be proxied by the predicted value from the first step, 12 Z 0 jt contains further determinants influencing the knowledge production, and u jt is the error term. The first two steps of the model are each estimated by means of a probit.
Using the predicted R&D decision from the first step takes into account that firms may engage in innovative effort without reporting R&D engagement, like the majority of microfirms. 13 It also helps to overcome the selectivity and endogeneity issue. Such an issue would arise if the innovative effort (r) and produced knowledge (i) were determined by the same unobservable firm characteristics. In such a case, r j and u j are (potentially positively) correlated and parameter β would be (upward) biased. Using the predicted probability as instrument instead of the observed R&D engagement variable avoids the potential endogeneity bias, assuming that X 0 jt and u jt are uncorrelated. In its third step, the CDM model uses a productivity function that includes the predicted probability for innovation as a proxy of knowledge input. Using the predicted value seeks to alleviate the potential endogeneity issue with respect to innovation. The function to be estimated in the third stage is the Grilichestype production function, which is a plain Cobb-Douglas production function augmented by the knowledge stock. In the CDM case, this stock is replaced with the results from the second stage of the CDM approach, hence, the predicted probability for innovating. As we make use of a sales production function, we also have to include intermediates and material as explanatory variables. Thus, the respective estimation equation in logs is where y jt is sales; l jt is the labor input; hence, y jt − l jt is labor productivity; k jt is the capital input variable 14 ; m jt is the intermediate and material input; i * jt is the predicted probability for having innovation output; and ν jt is the observed error term. Additional control variables are also included in Eq. (3), such as time and sector dummies. The model outlined is extensively used in the literature (see Lööf et al. 2017). 15 11 An asterisk denotes latent variables, while all other variables apart from the error terms are observed. 12 This is in line with Griffith et al. (2006), Hall et al. (2009), and Baumann and Kritikos (2016), who similarly include the predicted R&D intensity as an explanatory variable in the knowledge production function. Griffith et al. (2006) argue that firms may report R&D effort only if it exceeds a certain threshold so that innovative effort, such as workers investing a small amount of their working time to improve the process they are performing, would not be reported. This is based on the assumption that the relationship between innovation input and output is the same for firms that report R&D activities and those that do not. 13 This also allows for imputing values for observations with missing values on the R&D decision (see Section 3). 14 Due to the lack of information on capital stocks in the data, investments are used as a capital variable. Non investment is controlled for by a dummy variable. 15 We deviate from previous studies by not imposing the constant returns to scale (CRS) assumption on the combination of labor and capital (α l + α k = 1), as it is a very restrictive assumption. If CRS is assumed, labor and capital are replaced by capital intensity (K/L). Under such assumption, Eq. (3) would change to y jt −l jt ¼ α 0 þ α k c jt þ α m m jt þ α i i * jt þ ν jt , with α k c jt being the log of K L À Á αk Note that a negative labor coefficient must show up when Eq. (3) is estimated, as long as labor productivity is the dependent variable and labor is kept as explanatory variable. This is because of (α l − 1) and because α l is usually smaller than 1-especially in sales production functions.
In previous studies, Eq. (3) is estimated by means of OLS. However, consistently estimating the production function is not trivial, due to the unobserved total factor productivity. Researchers can directly estimate an intercept, which is the average total factor productivity of all firms under the production function. However, even in such estimations, the observed error ν jt still contains not only the true error term that captures the measurement errors (ε jt ) but also the firm-specific total factor productivity (ω jt ).
Because TFP is unobserved and, therefore, part of the error term ν i , estimations are subject to the simultaneity problem first emphasized by Marschak and Andrews (1944). Simply put, while TFP is unobserved by research, firms know, or at least have a vague idea, about their productivity; thus, they will choose all inputs accordingly. Consequently, the inputs are correlated with the error term ν jt as it contains the firm-specific part of TFP and, in turn, estimated coefficients are potentially biased. Hence, even the use of cross-sectional data does not avoid the simultaneity problem.
The productivity literature has been dealing with this issue. 16 One approach is the structural model of Ackerberg et al. (2015), which builds upon the studies of Olley and Pakes (1996) and Levinsohn and Petrin (2003). We extend the CDM model by incorporating the ACF approach into the CDM model. The ACF method is used to properly estimate the production function and to allow for a causal interpretation of the estimated coefficients. For details on how we apply the ACF model, see Appendix.

Estimation strategy
Our estimation strategy reaps the benefits of both models. We analyze the relationship between R&D and innovation using the first two stages of the CDM model. While most studies ignore the simultaneity issue of the production function estimation in the third stage, we make use of the ACF model instead. We solve the endogeneity issue in the ACF approach with respect to R&D by using the predicted probabilities to innovate from the second stage of the CDM model. By using CDM results in the ACF model, we solve the selectivity problem regarding innovating firms that we would face if only ACF is employed. Note that we conduct the full multi-stage procedure separately for each group of firms. Thus, we allow for different production functions per size class and industry group. We consistently use bootstrapped standard errors after the first step of the estimation procedure. 17 Given the indication (see Section 3) according to which it is difficult to differentiate between product, service, and process innovation in KIS industries, our output variable of interest in the main estimation approach at the second stage of the CDM model is the likelihood of being an innovator. Consequently, we also use the predicted probability of being an innovator as an explanatory variable in the final stage of our estimation approach. Thus, we do not separate between the effects for product, service, and process innovation in the main specification. However, these separate effects are then estimated in the robustness checks. As for the control variables at the three stages, we follow the existing literature on the CDM model and control for variables that may also influence the knowledge production (for an extensive discussion, see, inter alia, Hall et al. 2009).

Econometric results
We run separate regressions for the sample of manufacturing and knowledge-intensive service (KIS) firms, subsequently dividing KIS firms into microfirms (less than 10 employees) and larger firms (with 10 to 249 employees), while we control for the three age classes (start-ups, middle aged, and mature firms). This allows us to investigate differences between sectors and between size and age classes within the KIS sector. 18 In each of the three sections, we first present findings with respect to the full samples of manufacturing and KIS firms before analyzing differences within the KIS sector. Table 2 presents the estimation results of Eq. (1). The dependent variable takes on the value of one if the firm engages in R&D and zero otherwise. Columns 1 and 2 present the results for the full sample of manufacturing 16 We refer to Ackerberg et al. (2007) and Aguirregabiria (2009) for a comprehensive overview. 17 We use the non-standard bootstrapping as described in Cameron and Trivedi (2010, 417f). The unique firm identifier (id) is used as cluster or block variable, such that the sample drawn during each replication is a bootstrap sample over these ids as it is required for panel data. 18 Here, we do not discuss the outcomes for manufacturing differentiated for firm size classes any further, as this is beyond the scope of the present paper. A more detailed discussion is found in Baumann and Kritikos (2016).

First stage-innovative effort
(column 1) and KIS industries (column 2), where we employ dummies for firm size and firm age with large-sized firms and mature firms being the reference group. There is a notable difference for the size and age class dummies in the estimations of the two samples. The negative effect of the size class dummies is always significant in the manufacturing sample, indicating that firm size is positively related with the decision to start an R&D engagement (as is consistently found in the literature), while it is almost never significant for KIS firms, with the exception of the very small firms with 1-4 employees. Given that marginal effects are considerably smaller among KIS firms, we are confident that the insignificance does not result from a smaller sample size in this sector. This points to an important effect: there is a significant difference in R&D decisions between the two industries in that firm size is less relevant in services. Moreover, there is an important age class effect with respect to KIS firms as well. While in manufacturing, start-ups and mature firms start R&D activities with similar probabilities (and middle-aged firms with higher probability), in KIS industries, start-ups and middle-aged firms have a higher likelihood of engaging in R&D than firms that have been established for 20 or more years.
We observe several similarities between the two industries. First, the positive and statistically significant coefficient of internationalization suggests that having an international orientation is associated with a higher propensity to engage in R&D. This holds across all firm Clustered s.e. at the firm level in parentheses. Reference groups: medium-sized firms, mature firms in the age class 20 years, *significant at p < 0.05 level, **significant at p < 0.01 level sizes in the sample, with the positive effect of exporting being more pronounced in the SME subsample of KIS firms compared to their smaller counterparts. Second, a positive and statistically significant relationship is also found to exist between firms with a limited liability structure and the likelihood of R&D engagement. This variable may capture aspects such as a firm's creditworthiness. Beyond these, no other observed variables unfold significant effects on the probability of engaging in R&D, be it the variable indicating group affiliation, strong competition, or a good profit situation (the variable "profitable"). The lack of a significant effect of a positive profit situation is worth highlighting: it may be interpreted in the sense that firms engage in R&D for strategic reasons. Overall, while the rest of variables are found to be quite similar between the two industries, the age class effect in favor of start-ups and the near absence of a firm size effect in influencing R&D engagement of firms in knowledge-intensive services are striking. Table 3 provides results for the second stage of the model, which estimates the likelihood of a firm being an innovator (the knowledge production function, Eq.

Second stage-knowledge production
(2)), i.e., having introduced a product, service, or process innovation. Here, the predicted values of R&D engagement from the first step are used to correct for endogeneity. In order to take into account that we use the predicted probabilities of the R&D variables, we compute bootstrapped standard errors with 100 replications. 19 To do so, the two probit models for R&D engagement and innovation output are estimated sequentially on 100 random samples drawn from the data with replacement.
The first two columns include the entire sample of firms separated for the two sectors. As the positive and statistically significant coefficient of R&D engagement shows, the central tenet of the knowledge production model holds. Firms engaging in R&D exhibit a higher likelihood of being innovative in both manufacturing and knowledge-intensive services. As the next two columns show, this holds for larger firms and also for microfirms in the KIS industries, even if the marginal effect for microfirms is half that of SMEs. Importantly, among microfirms, the skill variable seems to "compensate" for the lower marginal effect of R&D on the probability of innovation: highly skilled employees unfold nearly the same marginal effect on the probability of introducing an innovation, as does R&D engagement. For SMEs, highly skilled employees are a much less powerful predictor of innovation.
The second striking result is observed with respect to start-ups: the positive, and for KIS firm statistically significant, coefficient of the age class dummy variables for start-ups implies that firm age and the likelihood of innovating are negatively related. Start-ups in particular among the microfirms are more likely to innovate successfully than mature firms in the KIS sector are. This suggests that, after controlling for R&D engagement and other investments in knowledge, such as skilled employees, training, technical equipment, and investment, young firms enjoy an innovation advantage vis-à-vis their mature counterparts among microfirms in these industries.
A third remarkable result when comparing manufacturing with KIS industries again concerns firm size. The coefficients of all firm-size class dummies are negative and statistically significant for manufacturing, suggesting that, in this sector, after controlling for R&D engagement, firm size tends to be positively related with the likelihood of innovating. When the likelihood of innovating is estimated for knowledge-intensive service firms, the coefficients of the dummy variables for the different firm size classes are almost never statistically significant. For knowledge-intensive service firms, it seems that firm size does not adversely influence the likelihood of innovative activities, again, except for the very small firms with less than five employees.
There are also similarities between the two sectors. The positive and statistically significant coefficients of skilled labor, technical state of equipment, and training in both sectors are consistent with the knowledge production model in that they suggest that a higher amount of inputs generating knowledge results in a greater likelihood of innovative activity. This also holds for the KIS sector, even if these variables are not significant for the SME size class, probably because variation for firms with employees between 10 and 249 employees is low, with nearly all KIS firms in the SME size class appearing to be equipped with the newest apparatuses and offering training (Table 1).

Third stage-productivity
The production function estimation is conducted using the ACF model as described in Section 4.2. Given the data, we employ a sales production function with the log of revenue per employee used as the dependent variable. We do not impose the assumption of constant returns to scale, thereby keeping labor in the estimation. Note that cyclical effects on sales, sector differences, and regional differences, as well as differences due to age, are accounted for in the first stage of the ACF procedure. The predicted probability of being an innovating firm from the second stage of the CDM model serves as the innovation variable. The actual estimation is conducted by means of GMM (general methods of moments). Given the discussion in Section 4.2, the following set of instruments is employed: Age, region, year, and industry are controlled for in the first stage of ACF procedure.
The results of the production function estimations are presented in Table 4. 21 Columns 1 and 2 display the results for manufacturing and knowledge-intensive service, and columns 3 and 4 show the effects for micro-and small firms in the KIS sector, respectively. As one would expect, labor has a much stronger effect in service firms than in manufacturing firms. 22 At the same time, investment in physical capital is insignificant for microfirms in knowledge-intensive services.
With respect to innovation, our main variable of interest, we find that innovating firms (in comparison to non-innovating firms) are causally able to increase their labor productivity. If the probability to innovate increases by 1 %, labor productivity increases by 1.1% for all firms in KIS (column 2). This result is in line with earlier research by, for instance, Hall et al. (2009) for Italian manufacturing firms.

Robustness checks and limitations
Two robustness checks have been conducted. The first concerns the number of replications of the bootstrap. The main estimation is constructed using 100 replications (B), mainly to have a more practicable calculation time. In order to test the sensitivity of the significance of the results to the chosen level of B, the main estimation has also been estimated with a B = 1000. While minor changes in the standard errors can be observed, they are not large enough to change the significance of the results. 23 Using 1000 repetitions, the robustness check with a B = 1000 largely confirms our main results.
Secondly, the main estimation examines the effect of being an innovator instead of differentiating between product, service, and process innovation. As additional robustness checks, the effects between the two is disentangled. In line with Hall et al. (2009), we estimate process and product or service innovation output in a bivariate probit model, taking into account the assumption that both are determined by the same firm characteristics. Table 5 presents the results for the second stage, the knowledge production function. It shows that R&D engagement increases the probability of being innovative for both types of innovation. Similar to manufacturing, the marginal effects for product/service innovation are higher in the KIS industries than for process innovation. It also becomes clear that the age class dummy is only significant for product/service innovation: startups are more likely to create a new product or service than are mature firms. One effect should be emphasized: larger microfirms, with 5 to 9 employees, are able to transform innovation inputs into outputs with a similar likelihood (see column 3 in Table 3), but this does not hold when we 20 Specifications with additional instruments, such as k it − 1 or i * it−1 lead to similar results. However, in that case, the model is overidentified, and the p value of the Hansen-Test is not always larger than 0.1. 21 The results of the standard CDM with OLS in the third stage (available on request) show that the coefficients for innovation are slightly higher with OLS. The same holds for the labor and the capital coefficients. This issue of an upward bias in OLS estimations is well documented in the productivity literature. 22 Note that because labor productivity is the dependent variable and labor is kept as an explanatory variable, due to not imposing the assumption of constant returns to scale, the coefficient of labor is actually (α l − 1). Thus, the "pure" labor coefficient (α l ) is the estimated value, as shown in Table 4, plus 1. Consequently, due to the fact that (α l − 1) is estimated, the closer the coefficient in the table is to zero, the higher the actual labor coefficient is.
Further note that we estimate a sales production function. While the labor coefficient in value-added production functions is usually somewhere between 0.5 and 0.9-depending on the industry-it is usually around 0.5 and smaller in the case of sales production functions. 23 As results are virtually identical, we refrain from providing the tables here. Results are available from the authors on request. estimate for product or service and process innovations separately.
By contrast, small firms (10-49 employees) are not adversely influenced by their size in comparison with medium-sized firms when it comes to the likelihood of turning R&D investments into new products or services. Table 6 presents the results of the third stage of the innovation process, the production function, again estimated (as for Table 4) with the ACF approach. Disentangling the influence of product or service and process innovation on labor productivity confirms a well-established effect found in many empirical papers on manufacturing. While the positive influence of the predicted probability of a product or service innovation on labor productivity is highly significant, the effect of process innovation is insignificant. 24 This also holds for knowledgeintensive services. Overall, this robustness check confirms our main results and provides a clear answer to our main research question: KIS firms benefit from investments into R&D in the sense that their innovation outcomes causally increase their labor productivity.
Our analysis still faces a number of data-driven limitations that we address here. First, we are able to use only a dichotomous variable to measure R&D effort. As shown by Mairesse et al. (2005), 24 See Hall (2011) for a more general discussion on why product more often than process innovation increases the labor productivity among firms. Clustered s.e. at the firm level in parentheses. Reference groups: medium-sized firms, mature firms in the age class ≥ 20 years, *significant at p < 0.05 level, **significant at p < 0.01 level such a dichotomous variable has less explanatory power than a continuous one. However, using qualitative information on R&D has an advantage against censored quantitative information, namely, as Mairesse et al. (2005) clarify, that we do not have to account for the selectivity issue concerning R&D and innovation intensity that Crepon et al. (1998) must correct for in their specification. Secondly, the measurement of innovation output remains rudimentary in two ways. Innovation output is still restricted to product, service, and process innovation, although other types of innovation are receiving attention, including organizational, marketing, and social innovation. Future research, therefore, needs to differentiate with better data between technological and non-technological innovation outputs. Another measurement issue concerns the fact that innovation output is measured as a dichotomous variable. As larger firms tend to have more R&D activities than smaller ones, they are more likely to realize a higher number of innovative outcomes. Therefore, the knowledge production function estimates for the measures of firm size might be biased. A last data limitation of this study concerns the lack of information on firm capital stock. This, however, is a common shortcoming in the literature (see, inter alia, Griffith et al. 2006;Hall et al. 2009). Most research faces this issue and, thus, uses investment to proxy for the capital stock. We follow the literature by also making use of information on investment. 25

Discussion and conclusions
This paper analyzes the triad relationship between innovation input, principally R&D engagement, innovation output, and its impact on productivity in a non-manufacturing context, and focusses on microfirms in knowledge-intensive services (KIS). By using a comprehensive database including both knowledge-intensive service firms and manufacturing firms, the paper also probes whether services are different from manufacturing in terms of what influences firm innovation.
The first finding of this study is that knowledge-intensive services are able to generate innovative output not only by investing in R&D 25 We follow Crass and Peters (2014) by replacing the log values of investment for non-investing firms with a constant (here, zero) and by adding a dummy variables for no-investment observations. By doing so, the estimated output elasticity of investment is unaffected by the value of the constant and the estimation is not only restricted to investing firms.    ≥ 20 years, *significant at p < 0.05 level, **significant at p < 0.01 level but also by having highly skilled employees, the latter being germane for microfirms. We reveal that while a small firm size places a distinct burden on the innovation performance of manufacturing firms, the empirical results suggest that the role of firm size in innovation for knowledgeintensive services is decidedly different from manufacturing firms. Microfirms are willing to engage with similar probabilities in innovation activities (R&D) as larger firms and, importantly, have a similar ability of transforming innovation inputs into innovation output. One reason for these findings could be that firms have become able to reap the advantages of lower industry-specific minimum efficient firm sizes. In this context, it is also important to highlight that we also find an important age effect, which is more relevant for the KIS sector than for manufacturing: micro-start-ups are better able to turn innovation inputs into new knowledge than are mature microfirms. As such, with the inclusion of microfirms, this paper contributes to the understanding of innovative patterns and activities in firms of all size and age classes.
To answer the question related to issues of causality between innovation output and productivity levels of firms, we incorporate the structural model of Ackerberg et al. (2015) into the third stage of the model of Crepon et al. (1998). We provide evidence that firms in KIS industries-as much as manufacturing firmsare causally able to turn their innovation output into higher productivity.
Overall, our findings have policy implications. Expectations that start-ups are associated with introducing innovative products and services receive strong support for knowledge-intensive services (in contrast to the manufacturing sector). This finding is particularly significant as it confirms the view that entrants into KIS industries tend to have the innovative advantage vis-à-vis their more established incumbents. Together with the fact that in KIS industries the probability of successfully turning innovation input into innovation output is nearly not affected by firm size can be seen as an encouragement for entrepreneurs in microfirms seeking to develop innovation activities. Making such investments may increase the probability of innovation success that pays in the sense of higher productivity. At the same time, this finding does not mean that such investment is riskless per se. These observations about microfirms are also good news for economies suffering from a low share of large firms, as is the case in Southern Europe. Earlier research found that KIS industries are among the most dynamic sectors (see Gebauer et al. 2019) and are becoming crucial for economic growth and prosperity. Our results show that by contrast to the manufacturing sector, there is no prerequisite for having a considerable amount of large firms in order to realize such growth. Even if firms tend to remain small, they will be able to successfully introduce innovations that in turn increase firm productivity and ultimately spur economic growth. A second important insight of this finding is that as the innovative output of KIS firms might be used as input for other productive activities, KIS firms may also have the potential to foster innovation through knowledge diffusion in the downstream industries.
Several consequences can be drawn from these observations for the policy side. On the one hand, there exist prejudices at the level of the ministerial bureaucracy assuming that microfirms are marginal in their contribution to the economic performance of a country. These preconceptions can be clearly rejected-microfirms do not only employ the majority of employees in the KIS industries, they are also able to successfully innovate and thus are important driver of technological progress.
On the other hand, policy needs to be cognizant that sweeping generalizations about promoting R&D, innovation, and productivity, across all types of firms and sectors, may be less efficient. Rather, the relationships between R&D, innovation, and productivity are specific to the firm size and industry context. This suggests that policy efforts to stimulate innovation and, thus, increases in firm productivity need to be sensitive to both aspects. Consider, as one example, the fact that the majority of micro-and small firms in KIS industries-although successful innovators-do not formally budget for R&D. At the same time, Germany, for instance, is planning to introduce tax credits for R&D investments, a benefit that already exists in a number of other countries. In the German context, politicians specifically emphasized that such benefits will help MSMEs in their innovation efforts, because these are said to refrain more strongly from R&D investments than larger firms. However, such tax benefits are futile, as the majority of these firms would not gain any tax advantages from benefits for R&D because they do not formally employ R&D workers.
Future research needs to analyze what kind of instruments successfully incentivize MSMEs to become more innovative. For this, it is important to determine to what extent MSMEs currently refrain from formal innovation activities (see also Hottenrott and Peters 2012). Is it that they perceive these activities as too risky or is it that MSMEs would like to innovate but are not able to do so because they face financing constraints and lack external funding? Moreover, innovative start-ups and microfirms add to the level of competitiveness of an economy by bringing in own new product ideas, according to which they push established firms to improve their performance. In this sense, future research needs also to investigate to what extent the entry of new firms into and their exit from markets is impeded through prohibitive over-regulation and red tape.

Appendix B The ACF model
The ACF model is aimed at splitting the observed error term (ν jt ) such that the unobserved firm specific factor productivity (ω jt ) can be "observed," as one can control for it in the estimations; these approaches are referred to as control function approaches. Since Levinsohn and Petrin (2003), control function approaches utilize the assumption that an intermediate input demand function with certain characteristics exists: m jt = h t (•). Inter alia, it is assumed that such a function contains all observed variables relevant for material and TFP (ω jt ), that the function is strictly monotonic in ω it , and that TFP is the only unobserved state variable in that function (Ackerberg et al. 2015). Given these assumptions, the function h t (•) is invertible, which allows for replacing the unobserved TFP with a function of observables in the production function. Using Eq. (3) as starting point, this leads to the following: y jt −l jt ¼ α 0 þ α l −1 ð Þl jt þ α k k jt þ α k m jt þ α i i * jt þ h −1 t l jt ; k jt ; m jt ; i * jt þ ε jt or; y jt ¼ φ t l jt ; k jt ; m jt ; i * jt þ ε jt ð4Þ with φ t l jt ; k jt ; m jt ; i * jt ¼ α 0 þ α l −1 ð Þl jt þ α k k jt þ α m m jt þ α i i * jt þ h −1 t l jt ; k jt ; m jt ; i * jt .
The function h −1 t Á ð Þ is approximated by a polynomial as its functional form is unknown (Levinsohn and Petrin 2003).
Even when controlling for TFP, the coefficients are not identified when estimating Eq. (4), e.g., by means of OLS, because of the functional dependency between the regressors and h −1 t Á ð Þ, which also contains the regressors (see Ackerberg et al. 2015 for the proof). Nevertheless, estimating Eq. (4) dislodges the TFP from the error term ν i (see Eq. (3)) and is needed as the first stage in the twostage ACF procedure. The identification strategy in the second stage relies on the assumption that TFP follows a firstorder Markov process (Olley and Pakes 1996;Levinsohn and Petrin 2003;Bond and Söderbom 2005;Ackerberg et al. 2007;Ackerberg et al. 2015). Hence, the firm's productivity expectation is derived from its past experience, contained in the information set Υ jt − 1 , and a random productivity shock ξ it in t that is independent of all past information. This model, formally ω jt = E(ω jt | Υ jt − 1 ) + ξ jt = g(ω jt − 1 ) + ξ jt , can be approximated by a polynomial in ω jt − 1 of order n. Following Petrin et al. (2004), we set n = 3, hence, where ϵ jt is an error term that contains true measurement error and the unobserved productivity shock ξ jt . Given Eq. (4), we infer ω jt from rearranging that function: φ t l jt ; k jt ; m jt ; i * jt − α l −1 ð Þl jt −α k k jt −α m m jt − α i i * jt ¼ h −1 t l jt ; k jt ; m jt ; i * jt or T h i s rearranged function is substituted into Eq. (5), which is then estimated by means of GMM using b φ jt , as estimated in the first step, and the starting values for the coefficients. 26 Identification further relies on timing assumptions regarding the firms' decisions for the different inputs. Since the seminal study of Olley and Pakes (1996), it is assumed that the decision to invest is taken in t − 1 but carried out in t, which is why investment in t is independent of the productivity shock ξ jt . Ackerberg et al. (2015) show that the labor variable is correlated with ξ jt . This holds even if labor is considered "less flexible" than material and when firms decide about labor before they decide about material, e.g., at t − b with 0 < b < 1. As also shown, the decision for l jt − 1 exploits only the information the firms possess at t − 1, which are in information set Υ jt − 2 . Consequently, l jt − 1 , which was decided upon at t − b − 1, is not correlated with the productivity shock in t. The same holds for the material variable. The use of the predicted innovation probability as an instrument for the observed innovation variable ensures orthogonality with ξ jt .
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.