Abstract
The effect of aggregation on estimates of stochastic frontier functions is considered. Inefficiency is assumed associated with the individual units being aggregated. In this case, the aggregated data have a closed skew normal distribution. Estimating the parameters of a closed skew normal distribution is difficult and so we focus mostly on the biases created by ignoring the fact that the data are aggregated. The conclusions are based on both analytical and Monte Carlo results. When data for firms are aggregates over smaller units and the inefficiency is associated with the units and not the firm, empirical work that does not consider the effect of aggregation will attribute the inefficiency of large firms to diseconomies of scale.
Similar content being viewed by others
References
Adkins LC, Moomaw RL (2003) The impact of local funding on the technical efficiency of Oklahoma schools. Econ Lett 81:31–37
Aigner D, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier production models. J Econom 6:21–37
Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew normal distributions. Scand J Stat 33:561–574
Aziz MAS (2011) Study of unified multivariate skew normal distribution with applications in finance and actuarial science. PhD dissertation, Bowling Green State University, Available at http://etd.ohiolink.edu/view.cgi?acc_num=bgsu1306504618
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
Azzalini A (2005) The skew-normal distribution and related multivariate families. Scand J Stat 32:159–188
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Stat Soc B 61:579–602
Azzalini A, Dalla-Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726
Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113
Caudill SB, Ford JM (1993) Biases in frontier estimation due to heteroscedasticity. Econ Lett 41:17–20
Caudill SB, Ford JM, Gropper DM (1995) Frontier estimation and firm-specific inefficiency measures in the presence of heteroscedasticity. J Bus Econ Stat 13(1):105–111
Colombi R (2010) A skew normal stochastic frontier model for panel data. Scientific meetings of SIS, 45th scientific meeting of the Italian statistical society. Available at http://homes.stat.unipd.it/mgri/SIS2010/Program/contributedpaper/486-1310-1-DR.pdf
Dickens WT (1990) Error components in grouped data: is it ever worth weighting? Rev Econ Stat 72(2):328–333
Dominguez-Molina JA, Gonzalez-Farias G, Ramos-Quiroga R (2003) Skew-normality in stochastic frontier analysis. Comunicacion Tecnica I-03-18:1–13
Flecher C, Naveau P, Allard D (2009) Estimating the closed skew-normal distribution using weighted moments. Stat Probab Lett 79:1977–1984
Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. Chapman & Hall/CRC, Florida
Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities. Springer, New York
Gonzalez-Farias G, Dominguez-Molina A, Gupta AK (2004) Additive properties of skew normal random vectors. J Stat Plan Inference 126:512–534
Greene W (2005) Reconsidering heterogeneity in panel data estimators of the stochastic frontier model. J Econom 126:269–303
Hadri K (1999) Estimation of a doubly heteroscedastic stochastic frontier cost function. J Bus Econ Stat 17(3):359–363
Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica 20:303–322
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265
MacDonald JM, Michael EO (2000) Scale economies and consolidation in hog slaughter. Am J Agric Econ 82(2):334–346
Richter FGC, Brorsen BW (2006) Aggregate versus disaggregate data in measuring school quality. J Prod Anal 25(3):279–289
Weinstein MA (1964) The sum of variances from a normal and a truncated normal distribution. Technometrics 6:104–105 and 469–470
Acknowledgments
Partial funding was provided by the Oklahoma Agricultural Experiment Station.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This appendix shows how to derive the distribution of aggregated data when a firm-level random effect and a firm-level inefficiency term are added to Eq. (7). With the two extra terms, the cost function becomes
where \( \gamma_{j} \sim iid N\left( {\nu ,\sigma_{\gamma }^{2} }\right), \lambda_j \sim iid \left| {N\left( {\nu ,\sigma_{\lambda}^{2} } \right)} \right|, \quad \text{cov} (\gamma_{j},\lambda_{j} ) = 0 \) and \( \gamma_{j} \, {\text{and}}\, \lambda_{j} \) are independent of \( w_{ij} \, {\text{and}}\, \nu_{ij} . \) The average cost function in Eq. (8) then becomes
The term \( (w_{ \cdot j} + v_{ \cdot j} ) \) follows a closed skew normal distribution as defined in (15). The term \( (\gamma_{i} + \lambda_{j} ) \) follows a closed skew normal distribution as in (12):
The terms \( (w_{ \cdot j} + v_{ \cdot j} ) \) and \( (\gamma_{j} + \lambda_{j} ) \) are defined to be independent. Since they are independent their sum must also follow a closed skew normal distribution by Theorem 4 in Gonzales-Farias et al. (2004). The distribution of their sum is
where \( {\mathbf{0}}_{{\user2{n} + {\mathbf{1}}}} \) is a n × 1 vector of zeroes, D M is \( \left[ {\sigma_{v}^{2} {\mathbf{1}}_{\user2{n}}^{'} , n\sigma_{\lambda }^{2} } \right] \cdot \left( {\sigma_{w}^{2} + \sigma_{v}^{2} + n\sigma_{\gamma }^{2} + n\sigma_{\lambda }^{2} } \right)^{ - 1} \) and ΔM can be derived following Theorem 4 in Gonzalez-Farias et al. and their n = 2 example. While it is possible to derive the distribution, estimating the parameters of such a distribution for a realistic problem with several explanatory variables is not an easy task. Note that in some instances, disaggregate data may be available. Estimating the parameters with disaggregate data still has estimation difficulties since the disaggregate data also follow a closed skew normal distribution. Colombi (2010) has suggested a two-step procedure for estimating the disaggregate model. Greene (2005) uses simulated maximum likelihood to estimate fixed effect and random effects stochastic frontier models with only one of the two possible inefficiency terms. Researchers have also suggested maximum likelihood procedures using the EM algorithm with general skew normal distributions and mixed models, but with only the firm-level random effect (Lin 2009; Lachos et al. 2010).
Rights and permissions
About this article
Cite this article
Brorsen, B.W., Kim, T. Data aggregation in stochastic frontier models: the closed skew normal distribution. J Prod Anal 39, 27–34 (2013). https://doi.org/10.1007/s11123-012-0274-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11123-012-0274-2