Skip to main content
Log in

Count data stochastic frontier models, with an application to the patents–R&D relationship

Journal of Productivity Analysis Aims and scope Submit manuscript

Abstract

This article introduces a new count data stochastic frontier model that researchers can use in order to study efficiency in production when the output variable is a count (so that its conditional distribution is discrete). We discuss parametric and nonparametric estimation of the model, and a Monte Carlo study is presented in order to evaluate the merits and applicability of the new model in small samples. Finally, we use the methods discussed in this article to estimate a production function for the number of patents awarded to a firm given expenditure on R&D.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. For instance, let y = 3 and f = 7; then there is not an integer value for d solving the assumed identity.

  2. A procedure illustrated by Denuit and Lambert (2005), albeit in a different context.

  3. Fé (2008) only considered the case of economic bads, commodities with a negative marginal utility. In this case, the ALS model relies on the distribution of ɛ = v + u where v is, say, symmetric and u has non-negative domain, so that the distribution of log(y) should exhibit a long right tail; however simulations in Fé (2008) show that if y is discrete skewness can often be negative.

  4. In that context efficiency scores amount to an measure of relative inequality in the distribution of these deaths, give socio–economic and environmental factors.

  5. Addressing unobserved heterogeneity and dynamics presents non-trivial technical challenges in terms of identification of parameters and tackling these issues are left for future work.

  6. In Aigner et al. (1977) output has a normal distribution with mean log λ, given x and ɛ.

  7. For example, assume that ɛ has gamma distribution with parameters α > 0 and δ > 0 such that δ = α. Then y is conditionally distributed as:

    $$ {\mathbb{P}}(y|x) = \frac{1}{\Upgamma(\alpha)}\int_0^{\infty} \frac{({\hbox{exp}}(-{\hbox{exp}}(x^{\prime}\beta + \frac{\epsilon}{\delta} ))){\hbox{exp}}(y(x^{\prime}\beta + \frac{\epsilon}{\delta}))}{y!} \epsilon^{\alpha-1} e^{-\epsilon} d\epsilon. $$
    (2.2)

    Under the assumed gamma distribution, the density function of the transformation is

    $$ f(\nu)=f(\hbox{exp}(\varepsilon))=\frac{\delta^{\alpha}}{\Upgamma(\alpha)} \frac{(\log(\nu))^{\alpha-1}}{\nu^{\delta+1}}\quad{\rm for }\,\nu\in[1, \infty). $$
    (2.3)

    However, under the assumption δ = α, moments of order δ or above won’t exist, since the integral \( \int\nolimits_1^{\infty} \log (x)/x \text{d}x\) is not convergent. This affects the distribution of y, whose moments depend on those of ν through the expression

    $$ E(y^r)=\sum_{s=1}^{r} S(r,j)E(\nu^j)\quad\hbox { for }j=1,2,\ldots $$

    (see Karlis and Xekalaki 2005) where S(., .) are Stirling’s numbers of the second kind.

  8. Gaussian Quadrature methods (see Press et al. 1992 or Judd 1998) are unlikely to work well with this model. The accuracy of quadrature methods depends on the smoothness of f(x) in the sense of being well-approximated by a polynomial (see Press et al. 1992; DeVuyst and Preckel 2007).

  9. This occurs because the variance of the sum of any two draws is less than the variance of the sum of two independent draws (Gentle 2003). A clear drawback of Quasi–Monte Carlo methods is the deterministic nature of the sequence which results in negatively correlated random draws (even though the strength of the correlation might be small). This contradicts the assumptions of independent random draws on which the above asymptotic results hold. It is possible, however, to generate randomized Halton draws without compromising the properties of the original sequence. In particular Bhat (2003) advocates shifting the original terms of the Halton sequence, say s h , by a quantity μ that has been drawn randomly from a uniform distribution on [0, 1]. In the resulting sequence those terms exceeding 1 are transformed so that s * h  = s h  + μ − 1.

  10. This is not a special feature of our model and is shared, in general, by Mixed Poisson models—see Karlis and Xekalaki (2005).

  11. See, for instance, Gourieroux and Monfort (1995).

  12. In the multivariate case, for \(m=1, \tilde{\theta}=\theta_0+\Uptheta_1(X_i-x)\), where \(\Uptheta\) is a p × d matrix of parameters.

  13. Broyden-Fletcher-Goldfarb-Shanno (see, for instance, Press et al. 1992).

  14. MaxSa is available for download at Bos’ website: http://www.tinbergen.nl/~cbos/.

  15. We used up to 200 Halton draws in our simulations, but did not observe fundamental variation in our results.

  16. Longer sequences could be used, at the cost of increased computer time.

  17. In the literature, this ratio is usually denoted by the Greek letter λ. Here we had to use η to avoid clashes of notation.

  18. We thank the authors for kindly providing us with their data set.

  19. Note that, among other relations,

    $$ \begin{aligned} Erf(x)&=2\Upphi(x\sqrt(2))-1 \\ Erfc(x)&=2\left[1-\Upphi(x\sqrt(2))\right] \\ \end{aligned} $$

    where \(\Upphi(.)\) is the standard normal distribution function.

References

  • Aigner D, Lovell C, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Economet 6:21–37

    Article  Google Scholar 

  • Bhat C (2003) Simulation estimation of mixed discrete choice models using randomized and scrambled Halton sequences. Transport Res Part B (Methodol) 37:837–855

    Article  Google Scholar 

  • Caflisch R, Morokoff W, Owen A (1997) Valuation of mortgage backed securities using brownian bridges to reduce effective dimension. J Comput Finance 1:27–46

    Google Scholar 

  • Cameron C, Johansson P (1997) Count data regression using series expansions: with applications. J Appl Economet 12:203–224

    Article  Google Scholar 

  • Cameron C, Trivedi P (1986) Econometric models based on count data: comparison and applications of some estimators. J Appl Economet 1:29–53

    Article  Google Scholar 

  • Cramer J (1986) Econometric applications of maximum likelihood methods. Cambridge University Press, Cambridge, MA

    Book  Google Scholar 

  • Delaporte P (1962) Sur l’efficacite des criteres de tarification de l’assurance contre les accidents d’automobiles. ASTIN Bull 2:84–95

    Google Scholar 

  • Denuit M, Lambert P (2005) Constraints on concordance measures in bivariate discrete data. J Multivar Anal 93:40–57

    Article  Google Scholar 

  • DeVuyst E, Preckel P (2007) Gaussian cubature: a practitioner’s guide. Math Comput Modell 45:787–794

    Article  Google Scholar 

  • Doornik JA (2007) Ox: an object-oriented matrix programming language. Timberlake Consultants Ltd, London

    Google Scholar 

  • Fan J, Farmen M, Gijbels I (2002) Local maximum likelihood estimation and inference. J R Stat Soc Ser B (Stat Methodol) 60:591–608

    Article  Google Scholar 

  • Fan J, Lin H, Zhou Y (2006) Local partial-likelihood estimation for lifetime data. Ann Stat 34:290–325

    Article  Google Scholar 

  • Fé E (2007) Exploring a stochastic frontier model when the dependent variable is a count. The school of economics discussion paper series, University of Manchester 0725

  • Fé E (2008) On the production of economic bads and the estimation of production efficiency when the dependent variable is a count. The school of economics discussion paper series, University of Manchester 0812

  • Gentle JE (2003) Random Number Generation and Monte Carlo Methods, 2nd edn. Springer, New York

    Google Scholar 

  • Goffe W (1996) Simann: a global optimization algorithm using simulated annealing. Stud Nonlinear Dyn Economet 1:169–176

    Google Scholar 

  • Goffe W, Ferrier G, Rogers J (1994) Global optimization of statistical functions with simulated annealing. J Economet 60:65–99

    Article  Google Scholar 

  • Gourieroux C, Monfort A (1995) Statistics and econometric models. Cambridge University Press, Cambridge, MA

    Book  Google Scholar 

  • Gourieroux C, Monfort A, Trognon A (1984a) Pseudo maximum likelihood methods: applications to Poisson models. Econometrica 52:701–720

    Article  Google Scholar 

  • Gourieroux C, Monfort A, Trognon A (1984b) Pseudo maximum likelihood methods: theory. Econometrica 52:681–700

    Article  Google Scholar 

  • Gozalo P, Linton O (2000) Local nonlinear least squares: using parametric information in nonparametric regression. J Economet 99:63–106

    Article  Google Scholar 

  • Greene W (2003) Simulated likelihood estimation of the normal-gamma stochastic frontier function. J Prod Anal 19:179–190

    Article  Google Scholar 

  • Greene W (2004) Distinguishing between heterogeneity and inefficiency: stochastic frontier analysis of the World Health Organization’s panel data on national health care systems. Health Econ 13:959–980

    Article  Google Scholar 

  • Greene W (2005) Reconsidering heterogeneity in panel data estimators of the stochastic frontier model. J Economet 126:269–303

    Article  Google Scholar 

  • Griliches Z (1990) Patent statistics and economic indicators: a survey. J Economet Lit 28:1661–1707

    Google Scholar 

  • Hall B, Cummins C, Laderman E, Murphy J (1986a) The R&D master file documentation. Technical Working Paper, NBER 72

  • Hall B, Griliches Z, Hausman JA (1986b) Patents and R&D: is there a lag? Int Econ Rev 27:265–283

    Article  Google Scholar 

  • Hausman J, Hall B, Griliches Z (1984) Econometric models for count data with an application to the patents R&D relationship. Econometrica 52:909–938

    Article  Google Scholar 

  • Hofler R, Scrogin D (2008) A count data stochastic frontier. Discussion paper, University of Central Florida

  • Jondrow J, Materov I, Lovell K, Schmidt P (1982) On the estimation of technical inefficiency in the stochastic frontier production function model. J Economet 19:233–238

    Article  Google Scholar 

  • Judd KL (1998) Numerical methods in economics. The MIT Press, Cambridge, MA

    Google Scholar 

  • Karlis D, Xekalaki E (2005) Mixed Poisson distributions. Int Stat Rev 73:35–58

    Article  Google Scholar 

  • Kumbhakar S, Park B, Simar L, Tsionas E (2007) Nonparametric stochastic frontiers: a local maximum likelihood approach. J Economet 137:1–27

    Article  Google Scholar 

  • Kumbhakar SC, Lovell K (2003) Stochastic frontier analysis. Cambridge University Press, Cambridge, MA

    Google Scholar 

  • Lancaster A (2000) The incidental parameter problem since 1948. J Economet 95:391–413

    Article  Google Scholar 

  • Loader C (1996) Local likelihood density estimation. Ann Stat 24:1602–1618

    Article  Google Scholar 

  • Meeusen W, van den Broeck J (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. Int Econ Rev 18:435–444

    Article  Google Scholar 

  • Neyman J, Scott E (1948) Consistent estimation from partially consistent observations. Econometrica 16:1–32

    Article  Google Scholar 

  • Pakes A (1986) Patents as options: some estimates of the value of holding European patent stocks. Econometrica 54:755–784

    Article  Google Scholar 

  • Pakes A, Griliches Z (1980) Patents and R&D at the firm level: a first report. Econ Lett 5:377–381

    Article  Google Scholar 

  • Park B, Simar L, Zelenyuk V (2008) Local likelihood estimation of truncated regression and its partial derivatives: theory and application. J Economet 146:185–198

    Article  Google Scholar 

  • Press W, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical Recipes in C (The art of Scientific Computing), 2nd edn. Cambridge University Press, Cambridge, MA

    Google Scholar 

  • Ruohonen M (1989) On a model for the claim number process. ASTIN Bull 18:57–68

    Article  Google Scholar 

  • Tibshirani R, Hastie T (1987) Local likelihood estimation. J Am Econ Assoc 82:559–567

    Google Scholar 

  • Train K (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge, MA

    Book  Google Scholar 

  • Wang P, Cockburn IM, Puterman ML (1998) Analysis of patent data: a mixed-poisson-regression-model approach. J Bus Econ Stat 16:27–41

    Google Scholar 

  • Wang W, Schmidt P (2009) On the distribution of estimated technical efficiency in stochastic frontier models. J Economet 148:36–45

    Article  Google Scholar 

  • White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 53:1–16

    Article  Google Scholar 

  • Winkelmann R (2008) Econometric analysis of count data, 5th edn. Springer, New York

    Google Scholar 

  • Zheng J (1996) A consistent test of functional form via non-parametric estimation techniques. J Economet 76:263–289

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank the associate editor and two anonymous referees for helpful comments. We also thank E. DeVuyst and Bill Greene who provided helpful advice at several stages of this work and express gratitude for the many helpful comments from participants of the North American Productivity Workshop VI, June 2010, especially Giannis Karagiannis. Finally, we thank Peiming Wang, Iain Cockburn and Martin Puterman for kindly providing the data set used in Sect. 5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Hofler.

Appendix: moments of the density function \(f(\hbox{exp}(\pm\varepsilon))\)

Appendix: moments of the density function \(f(\hbox{exp}(\pm\varepsilon))\)

We now show how to calculate the first and second moments of the transformation \(u=f(\hbox{exp}(\pm\varepsilon))\), where \(\varepsilon\) follows a half-normal distribution. Consider first the case \(f(\hbox{exp}(-\varepsilon))\). Then,

$$ \begin{aligned} {\mathcal{E}}(u)&=\frac{2}{\sigma\sqrt{2\pi}} \int_0^1 e^{-\log^2(u)/2\sigma^2}du =\frac{2}{\sigma\sqrt{2\pi}}\int^0_{-\infty}e^{-(\frac{t^2}{2\sigma^2}-t)}dt=e^{\sigma^2/2}\frac{2}{\sqrt{\pi}}\int^{-\frac{\sigma}{\sqrt{2}}}_{-\infty}e^{-s^2}ds \\ &=e^{\sigma^2/2}{\rm Erfc}\left(\frac{\sigma}{\sqrt{2}}\right) \in \left[0,1\right] \end{aligned} $$
(5)

where Erfc(.) is the complementary error functionFootnote 19 and we used the changes of variable \(\log(x)=t, s=\frac{t}{\sqrt{2}\sigma}-\frac{\sigma}{\sqrt{2}}\) and the fact that \(\frac{t^2}{2\sigma^2}-t=\left(\frac{t}{\sigma\sqrt{2}}-\frac{\sigma\sqrt{2}}{2}\right)^2-\frac{\sigma^2}{2}.\) Similar steps show that

$$ {\mathcal{E}}(u^2)=e^{2\sigma^2}{\rm Erfc}\left(\sigma\sqrt{2}\right) $$
(6)
$$ {\mathcal{V}}(u)= e^{2\sigma^2}{\rm Erfc}\left(\sigma\sqrt{2}\right)-\left\{e^{\sigma^2/2}{\rm Erfc}\left(\frac{\sigma}{\sqrt{2}}\right)\right\}^2 .$$
(7)

For the case \(u=\hbox{exp}(\varepsilon)\) the method is identical, but only the range of integration changes to [1, ∞). Thus,

$$ \begin{aligned} {\mathcal{E}}(u)&=\frac{\sqrt{2}}{\sigma\sqrt{\pi}}\int_1^{\infty} e^{-\log^2(u)/2\sigma^2}du=e^{\sigma^2/2}\frac{2}{\pi} \int_{-\frac{\sigma}{\sqrt{2}}}^{\infty}e^{-s^2}ds\\ &=e^{\sigma^2/2}\left\{1+{\rm Erf}\left(\frac{\sigma}{\sqrt{2}}\right)\right\} \geq 1\\ \end{aligned} $$
(8)
$$ \mathcal{E}(u^{2} ) = e^{{2\sigma ^{2} }} \left\{ {1 + {\text{Erf}}\left( {\sigma \sqrt 2 } \right)} \right\}. $$
(9)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fé, E., Hofler, R. Count data stochastic frontier models, with an application to the patents–R&D relationship. J Prod Anal 39, 271–284 (2013). https://doi.org/10.1007/s11123-012-0286-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11123-012-0286-y

Keywords

JEL Classification

Navigation