Replication (in a narrow sense)
Future citation-weighted patent counts are used as a proxy for the dependent variable innovation, which motivates the use of count data models. As a starting point (and following Aghion et al. 2013), a log link is implemented. In the spirit of a quasi-maximum likelihood approach, the Poisson model is employed for the mean equation along with clustered standard errors. The conditional expectation function is
$$\begin{aligned} \mathrm {E}(\mathrm {Cites}_{it} \mid \varvec{x}_{it}, \gamma _{i}, \delta _{t}) = \exp \left( \varvec{x}_{it}^{\mathsf {T}}\varvec{\beta }+ \gamma _{i} + \delta _{t}\right) , \end{aligned}$$
(1)
where \(\mathrm {Cites}_{it}\) is the number of citation-weighted patent counts for company i in year t, the vector \(\varvec{x}_{it}\) contains all explanatory variables for firm i in year t in the model, \(\gamma _{i}\) is firm-specific fixed effects controls, and \(\delta _{t}\) is time dummy variable (Aghion et al. 2013, pp. 280–281). All models include institutional ownership, which is measured as the percentage of outstanding shares held by institutions, the capital to labor ratio as well as sales (in logs), time dummies and four-digit industry dummies as explanatory variables (Aghion et al. 2013). Some models additionally include the stock of R&D expenditures (in logs) and the presample mean-scaling estimatorFootnote 2 developed by Blundell and Powell (2004).
Table 1 Basic models—replication
Table 1 comprises the (successful) replication of Table 1 in Aghion et al. (2013, p. 283) , where the outcomes show that the coefficient of institutional ownership is consistently positive and significant.
In the negative binomial specification, some additional efforts are necessary to resolve differences between Stata (StataCorp. 2011) and R (R Core Team 2013) regarding starting values and standard errors. As for the former, the function glm.nb() (Venables and Ripley 2002), used for negative binomial regression in R, sometimes has problems to find starting values for very high-dimensional regressions. Hence, we estimate the negative binomial parameter estimates using a quasi-Newton optimization based on analytical gradients and a numerical approximation of the Hessian starting from Poisson coefficient estimates. Concerning the clustered sandwich standard errors, Stata and R give different results by default. The root of this difference is that the standard errors in Stata are based on the observed information matrix while R employs the expected information by default. When employing the numerical approximation of the observed information in R, Stata’s results can be replicated except for small numeric differences that are likely due to the great amount of industry dummies as well as the model specification.
Extended analysis
However, the data show overdispersion as well as excess zeros. There are about \(35.2\,\%\) (accounting for 2183 out of 6208) firm–year observations with zero citation-weighted patents in the data. On one hand, the zeros can come from either the decision to keep potentially patentable discoveries in secrecy, or from the lack of any patentable finding (see Crepon and Duguet 1997), both of which resulting in a lack of any patents. On the other hand, the zeros can come from holding patents, but without citations. In the data analyzed, there are about \(32.6\,\%\) firm–year observations with zero patents in the data and \(2.6\,\%\) with patents but without citations. In other words, there are only a few patents in our data which do not get cited. Summarizing the above, the amount of zeros in the dependent variable is higher than expected by the Poisson distribution, which casts doubt on the distributional assumption and suggests potentially different determinants driving the zero and nonzero citations.
Furthermore, overdispersion is a common characteristic of count data (in the field of economics), meaning that the conditional variance is higher than the conditional mean. The fraction \(\frac{\mathrm {Var}(\mathrm {Cites})}{\mathrm {Mean}(\mathrm {Cites})} = 4836.2\) reveals a substantial amount of overdispersion for citation-weighted patents (note that covariates are not taken into account here). A negative binomial model offers some remedy in such a situation (see, e.g., Hausman et al. 1984). As a likelihood model, it does explicitly account for dispersion. Aghion et al. (2013) consider negative binomial models only in their basic models in Table 1 (columns 6, 7, and 8). However, it is worth pointing out that the negative binomial model does also not explain the high proportion of zero citations discussed above.
Table 2 Basic—hurdle models
Finally, the Poisson model assumes independent occurrences over time (see, e.g., Cameron and Trivedi 1998) and it may also be the case that the first innovation (the first citation-weighted patent count) is especially hard to obtain in comparison with succeeding innovations, such that ‘...the innovation process is characterized by nonlinearities’ (Crepon and Duguet 1997, p. 360). For example, in case of the discovery of a seminal innovation, some further discoveries of minor importance can follow more easily (Crepon and Duguet 1997).
Our investigation is not so much motivated by the huge amount of zeros or overdispersion in the data, but rather uses hurdle models to allow for potentially different processes driving innovation. Employing the Poisson model for the mean equation along with clustered standard errors cannot accommodate differing processes.
In summary, these considerations concerning excess zeros, overdispersion, and potentially different determinants in the innovation process can be addressed by two-part hurdle models. Specifically, these consider the case that there are two different processes driving either the ‘first innovation’ (does a company own at least one citation-weighted patent) or the ‘continuing innovation’ decision (if a company has a positive number of citation-weighted patents, how many of them does it possess).Footnote 3
Positive outcomes are observed whether the zero hurdle is crossed and are modeled through a truncated (from the left) negative binomial model, whereas the probability to cross the hurdle is modeled via a censored negative binomial model (see, e.g., Cameron and Trivedi 1998). As we use a negative binomial model for the count as well as for the binary part of the hurdle model, we restrict both dispersion parameters to the same value (as recommended by Winkelmann 2010, p. 183).
Table 2 shows the same models as Table 1, but using negative binomial hurdle models instead of the (single-equation) Poisson models. As in the original analysis, standard errors are clustered at the firm level.
Furthermore, a test for the presence of a zero hurdle is conducted. It is a Wald test to check for pairwise equality between all reported coefficients from the two parts of the hurdle model, where the null hypothesis claims that no hurdle is needed (Zeileis et al. 2008). In the event of two identical processes, the estimated parameters in both model parts should be similar as well. On the contrary, we find evidence that our estimated parameters differ in both model parts, which allows us to draw the conclusion that a single equation is not enough.Footnote 4
The results indicate that the coefficient of institutional ownership is for most of the estimated hurdle models positive and significant. The only exception is the count part of the Hurdle NegBin (3) model, where the coefficient of institutional ownership is no longer significant. Quantitatively, a dampening of the coefficient of institutional ownership in the count part of the model can be observed, and the more variables are included into the model. In the binary part of the model, the coefficient of institutional ownership is quite stable. For most of the other explanatory variables, a dampening of the coefficients in both parts of the model can be observed and the more variables are included into the model.