Abstract
This article introduces a new count data stochastic frontier model that researchers can use in order to study efficiency in production when the output variable is a count (so that its conditional distribution is discrete). We discuss parametric and nonparametric estimation of the model, and a Monte Carlo study is presented in order to evaluate the merits and applicability of the new model in small samples. Finally, we use the methods discussed in this article to estimate a production function for the number of patents awarded to a firm given expenditure on R&D.
Notes
For instance, let y = 3 and f = 7; then there is not an integer value for d solving the assumed identity.
A procedure illustrated by Denuit and Lambert (2005), albeit in a different context.
Fé (2008) only considered the case of economic bads, commodities with a negative marginal utility. In this case, the ALS model relies on the distribution of ɛ = v + u where v is, say, symmetric and u has non-negative domain, so that the distribution of log(y) should exhibit a long right tail; however simulations in Fé (2008) show that if y is discrete skewness can often be negative.
In that context efficiency scores amount to an measure of relative inequality in the distribution of these deaths, give socio–economic and environmental factors.
Addressing unobserved heterogeneity and dynamics presents non-trivial technical challenges in terms of identification of parameters and tackling these issues are left for future work.
In Aigner et al. (1977) output has a normal distribution with mean log λ, given x and ɛ.
For example, assume that ɛ has gamma distribution with parameters α > 0 and δ > 0 such that δ = α. Then y is conditionally distributed as:
$$ {\mathbb{P}}(y|x) = \frac{1}{\Upgamma(\alpha)}\int_0^{\infty} \frac{({\hbox{exp}}(-{\hbox{exp}}(x^{\prime}\beta + \frac{\epsilon}{\delta} ))){\hbox{exp}}(y(x^{\prime}\beta + \frac{\epsilon}{\delta}))}{y!} \epsilon^{\alpha-1} e^{-\epsilon} d\epsilon. $$(2.2)Under the assumed gamma distribution, the density function of the transformation is
$$ f(\nu)=f(\hbox{exp}(\varepsilon))=\frac{\delta^{\alpha}}{\Upgamma(\alpha)} \frac{(\log(\nu))^{\alpha-1}}{\nu^{\delta+1}}\quad{\rm for }\,\nu\in[1, \infty). $$(2.3)However, under the assumption δ = α, moments of order δ or above won’t exist, since the integral \( \int\nolimits_1^{\infty} \log (x)/x \text{d}x\) is not convergent. This affects the distribution of y, whose moments depend on those of ν through the expression
$$ E(y^r)=\sum_{s=1}^{r} S(r,j)E(\nu^j)\quad\hbox { for }j=1,2,\ldots $$(see Karlis and Xekalaki 2005) where S(., .) are Stirling’s numbers of the second kind.
This occurs because the variance of the sum of any two draws is less than the variance of the sum of two independent draws (Gentle 2003). A clear drawback of Quasi–Monte Carlo methods is the deterministic nature of the sequence which results in negatively correlated random draws (even though the strength of the correlation might be small). This contradicts the assumptions of independent random draws on which the above asymptotic results hold. It is possible, however, to generate randomized Halton draws without compromising the properties of the original sequence. In particular Bhat (2003) advocates shifting the original terms of the Halton sequence, say s h , by a quantity μ that has been drawn randomly from a uniform distribution on [0, 1]. In the resulting sequence those terms exceeding 1 are transformed so that s * h = s h + μ − 1.
This is not a special feature of our model and is shared, in general, by Mixed Poisson models—see Karlis and Xekalaki (2005).
See, for instance, Gourieroux and Monfort (1995).
In the multivariate case, for \(m=1, \tilde{\theta}=\theta_0+\Uptheta_1(X_i-x)\), where \(\Uptheta\) is a p × d matrix of parameters.
Broyden-Fletcher-Goldfarb-Shanno (see, for instance, Press et al. 1992).
MaxSa is available for download at Bos’ website: http://www.tinbergen.nl/~cbos/.
We used up to 200 Halton draws in our simulations, but did not observe fundamental variation in our results.
Longer sequences could be used, at the cost of increased computer time.
In the literature, this ratio is usually denoted by the Greek letter λ. Here we had to use η to avoid clashes of notation.
We thank the authors for kindly providing us with their data set.
Note that, among other relations,
$$ \begin{aligned} Erf(x)&=2\Upphi(x\sqrt(2))-1 \\ Erfc(x)&=2\left[1-\Upphi(x\sqrt(2))\right] \\ \end{aligned} $$where \(\Upphi(.)\) is the standard normal distribution function.
References
Aigner D, Lovell C, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Economet 6:21–37
Bhat C (2003) Simulation estimation of mixed discrete choice models using randomized and scrambled Halton sequences. Transport Res Part B (Methodol) 37:837–855
Caflisch R, Morokoff W, Owen A (1997) Valuation of mortgage backed securities using brownian bridges to reduce effective dimension. J Comput Finance 1:27–46
Cameron C, Johansson P (1997) Count data regression using series expansions: with applications. J Appl Economet 12:203–224
Cameron C, Trivedi P (1986) Econometric models based on count data: comparison and applications of some estimators. J Appl Economet 1:29–53
Cramer J (1986) Econometric applications of maximum likelihood methods. Cambridge University Press, Cambridge, MA
Delaporte P (1962) Sur l’efficacite des criteres de tarification de l’assurance contre les accidents d’automobiles. ASTIN Bull 2:84–95
Denuit M, Lambert P (2005) Constraints on concordance measures in bivariate discrete data. J Multivar Anal 93:40–57
DeVuyst E, Preckel P (2007) Gaussian cubature: a practitioner’s guide. Math Comput Modell 45:787–794
Doornik JA (2007) Ox: an object-oriented matrix programming language. Timberlake Consultants Ltd, London
Fan J, Farmen M, Gijbels I (2002) Local maximum likelihood estimation and inference. J R Stat Soc Ser B (Stat Methodol) 60:591–608
Fan J, Lin H, Zhou Y (2006) Local partial-likelihood estimation for lifetime data. Ann Stat 34:290–325
Fé E (2007) Exploring a stochastic frontier model when the dependent variable is a count. The school of economics discussion paper series, University of Manchester 0725
Fé E (2008) On the production of economic bads and the estimation of production efficiency when the dependent variable is a count. The school of economics discussion paper series, University of Manchester 0812
Gentle JE (2003) Random Number Generation and Monte Carlo Methods, 2nd edn. Springer, New York
Goffe W (1996) Simann: a global optimization algorithm using simulated annealing. Stud Nonlinear Dyn Economet 1:169–176
Goffe W, Ferrier G, Rogers J (1994) Global optimization of statistical functions with simulated annealing. J Economet 60:65–99
Gourieroux C, Monfort A (1995) Statistics and econometric models. Cambridge University Press, Cambridge, MA
Gourieroux C, Monfort A, Trognon A (1984a) Pseudo maximum likelihood methods: applications to Poisson models. Econometrica 52:701–720
Gourieroux C, Monfort A, Trognon A (1984b) Pseudo maximum likelihood methods: theory. Econometrica 52:681–700
Gozalo P, Linton O (2000) Local nonlinear least squares: using parametric information in nonparametric regression. J Economet 99:63–106
Greene W (2003) Simulated likelihood estimation of the normal-gamma stochastic frontier function. J Prod Anal 19:179–190
Greene W (2004) Distinguishing between heterogeneity and inefficiency: stochastic frontier analysis of the World Health Organization’s panel data on national health care systems. Health Econ 13:959–980
Greene W (2005) Reconsidering heterogeneity in panel data estimators of the stochastic frontier model. J Economet 126:269–303
Griliches Z (1990) Patent statistics and economic indicators: a survey. J Economet Lit 28:1661–1707
Hall B, Cummins C, Laderman E, Murphy J (1986a) The R&D master file documentation. Technical Working Paper, NBER 72
Hall B, Griliches Z, Hausman JA (1986b) Patents and R&D: is there a lag? Int Econ Rev 27:265–283
Hausman J, Hall B, Griliches Z (1984) Econometric models for count data with an application to the patents R&D relationship. Econometrica 52:909–938
Hofler R, Scrogin D (2008) A count data stochastic frontier. Discussion paper, University of Central Florida
Jondrow J, Materov I, Lovell K, Schmidt P (1982) On the estimation of technical inefficiency in the stochastic frontier production function model. J Economet 19:233–238
Judd KL (1998) Numerical methods in economics. The MIT Press, Cambridge, MA
Karlis D, Xekalaki E (2005) Mixed Poisson distributions. Int Stat Rev 73:35–58
Kumbhakar S, Park B, Simar L, Tsionas E (2007) Nonparametric stochastic frontiers: a local maximum likelihood approach. J Economet 137:1–27
Kumbhakar SC, Lovell K (2003) Stochastic frontier analysis. Cambridge University Press, Cambridge, MA
Lancaster A (2000) The incidental parameter problem since 1948. J Economet 95:391–413
Loader C (1996) Local likelihood density estimation. Ann Stat 24:1602–1618
Meeusen W, van den Broeck J (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. Int Econ Rev 18:435–444
Neyman J, Scott E (1948) Consistent estimation from partially consistent observations. Econometrica 16:1–32
Pakes A (1986) Patents as options: some estimates of the value of holding European patent stocks. Econometrica 54:755–784
Pakes A, Griliches Z (1980) Patents and R&D at the firm level: a first report. Econ Lett 5:377–381
Park B, Simar L, Zelenyuk V (2008) Local likelihood estimation of truncated regression and its partial derivatives: theory and application. J Economet 146:185–198
Press W, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical Recipes in C (The art of Scientific Computing), 2nd edn. Cambridge University Press, Cambridge, MA
Ruohonen M (1989) On a model for the claim number process. ASTIN Bull 18:57–68
Tibshirani R, Hastie T (1987) Local likelihood estimation. J Am Econ Assoc 82:559–567
Train K (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge, MA
Wang P, Cockburn IM, Puterman ML (1998) Analysis of patent data: a mixed-poisson-regression-model approach. J Bus Econ Stat 16:27–41
Wang W, Schmidt P (2009) On the distribution of estimated technical efficiency in stochastic frontier models. J Economet 148:36–45
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 53:1–16
Winkelmann R (2008) Econometric analysis of count data, 5th edn. Springer, New York
Zheng J (1996) A consistent test of functional form via non-parametric estimation techniques. J Economet 76:263–289
Acknowledgments
The authors thank the associate editor and two anonymous referees for helpful comments. We also thank E. DeVuyst and Bill Greene who provided helpful advice at several stages of this work and express gratitude for the many helpful comments from participants of the North American Productivity Workshop VI, June 2010, especially Giannis Karagiannis. Finally, we thank Peiming Wang, Iain Cockburn and Martin Puterman for kindly providing the data set used in Sect. 5.
Author information
Authors and Affiliations
Corresponding author
Appendix: moments of the density function \(f(\hbox{exp}(\pm\varepsilon))\)
Appendix: moments of the density function \(f(\hbox{exp}(\pm\varepsilon))\)
We now show how to calculate the first and second moments of the transformation \(u=f(\hbox{exp}(\pm\varepsilon))\), where \(\varepsilon\) follows a half-normal distribution. Consider first the case \(f(\hbox{exp}(-\varepsilon))\). Then,
where Erfc(.) is the complementary error functionFootnote 19 and we used the changes of variable \(\log(x)=t, s=\frac{t}{\sqrt{2}\sigma}-\frac{\sigma}{\sqrt{2}}\) and the fact that \(\frac{t^2}{2\sigma^2}-t=\left(\frac{t}{\sigma\sqrt{2}}-\frac{\sigma\sqrt{2}}{2}\right)^2-\frac{\sigma^2}{2}.\) Similar steps show that
For the case \(u=\hbox{exp}(\varepsilon)\) the method is identical, but only the range of integration changes to [1, ∞). Thus,
Rights and permissions
About this article
Cite this article
Fé, E., Hofler, R. Count data stochastic frontier models, with an application to the patents–R&D relationship. J Prod Anal 39, 271–284 (2013). https://doi.org/10.1007/s11123-012-0286-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11123-012-0286-y
Keywords
- Discrete data
- Stochastic frontier analysis
- Local maximum likelihood
- Maximum simulated likelihood
- Halton sequence