Skip to main content
Log in

Data aggregation in stochastic frontier models: the closed skew normal distribution

  • Published:
Journal of Productivity Analysis Aims and scope Submit manuscript

Abstract

The effect of aggregation on estimates of stochastic frontier functions is considered. Inefficiency is assumed associated with the individual units being aggregated. In this case, the aggregated data have a closed skew normal distribution. Estimating the parameters of a closed skew normal distribution is difficult and so we focus mostly on the biases created by ignoring the fact that the data are aggregated. The conclusions are based on both analytical and Monte Carlo results. When data for firms are aggregates over smaller units and the inefficiency is associated with the units and not the firm, empirical work that does not consider the effect of aggregation will attribute the inefficiency of large firms to diseconomies of scale.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adkins LC, Moomaw RL (2003) The impact of local funding on the technical efficiency of Oklahoma schools. Econ Lett 81:31–37

    Article  Google Scholar 

  • Aigner D, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier production models. J Econom 6:21–37

    Article  Google Scholar 

  • Arellano-Valle RB, Azzalini A (2006) On the unification of families of skew normal distributions. Scand J Stat 33:561–574

    Article  Google Scholar 

  • Aziz MAS (2011) Study of unified multivariate skew normal distribution with applications in finance and actuarial science. PhD dissertation, Bowling Green State University, Available at http://etd.ohiolink.edu/view.cgi?acc_num=bgsu1306504618

  • Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    Google Scholar 

  • Azzalini A (2005) The skew-normal distribution and related multivariate families. Scand J Stat 32:159–188

    Article  Google Scholar 

  • Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Stat Soc B 61:579–602

    Article  Google Scholar 

  • Azzalini A, Dalla-Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726

    Article  Google Scholar 

  • Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113

    Article  Google Scholar 

  • Caudill SB, Ford JM (1993) Biases in frontier estimation due to heteroscedasticity. Econ Lett 41:17–20

    Article  Google Scholar 

  • Caudill SB, Ford JM, Gropper DM (1995) Frontier estimation and firm-specific inefficiency measures in the presence of heteroscedasticity. J Bus Econ Stat 13(1):105–111

    Google Scholar 

  • Colombi R (2010) A skew normal stochastic frontier model for panel data. Scientific meetings of SIS, 45th scientific meeting of the Italian statistical society. Available at http://homes.stat.unipd.it/mgri/SIS2010/Program/contributedpaper/486-1310-1-DR.pdf

  • Dickens WT (1990) Error components in grouped data: is it ever worth weighting? Rev Econ Stat 72(2):328–333

    Article  Google Scholar 

  • Dominguez-Molina JA, Gonzalez-Farias G, Ramos-Quiroga R (2003) Skew-normality in stochastic frontier analysis. Comunicacion Tecnica I-03-18:1–13

    Google Scholar 

  • Flecher C, Naveau P, Allard D (2009) Estimating the closed skew-normal distribution using weighted moments. Stat Probab Lett 79:1977–1984

    Article  Google Scholar 

  • Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. Chapman & Hall/CRC, Florida

    Book  Google Scholar 

  • Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities. Springer, New York

    Book  Google Scholar 

  • Gonzalez-Farias G, Dominguez-Molina A, Gupta AK (2004) Additive properties of skew normal random vectors. J Stat Plan Inference 126:512–534

    Article  Google Scholar 

  • Greene W (2005) Reconsidering heterogeneity in panel data estimators of the stochastic frontier model. J Econom 126:269–303

    Article  Google Scholar 

  • Hadri K (1999) Estimation of a doubly heteroscedastic stochastic frontier cost function. J Bus Econ Stat 17(3):359–363

    Google Scholar 

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica 20:303–322

    Google Scholar 

  • Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265

    Article  Google Scholar 

  • MacDonald JM, Michael EO (2000) Scale economies and consolidation in hog slaughter. Am J Agric Econ 82(2):334–346

    Article  Google Scholar 

  • Richter FGC, Brorsen BW (2006) Aggregate versus disaggregate data in measuring school quality. J Prod Anal 25(3):279–289

    Article  Google Scholar 

  • Weinstein MA (1964) The sum of variances from a normal and a truncated normal distribution. Technometrics 6:104–105 and 469–470

    Google Scholar 

Download references

Acknowledgments

Partial funding was provided by the Oklahoma Agricultural Experiment Station.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Wade Brorsen.

Appendix

Appendix

This appendix shows how to derive the distribution of aggregated data when a firm-level random effect and a firm-level inefficiency term are added to Eq. (7). With the two extra terms, the cost function becomes

$$ C_{ij} = {\mathbf{x}}_{ij}{\varvec{\upbeta}} + w_{ij} + v_{ij} +\gamma_{j} + \lambda_{j}$$
(21)

where \( \gamma_{j} \sim iid N\left( {\nu ,\sigma_{\gamma }^{2} }\right), \lambda_j \sim iid \left| {N\left( {\nu ,\sigma_{\lambda}^{2} } \right)} \right|, \quad \text{cov} (\gamma_{j},\lambda_{j} ) = 0 \) and \( \gamma_{j} \, {\text{and}}\, \lambda_{j} \) are independent of \( w_{ij} \, {\text{and}}\, \nu_{ij} . \) The average cost function in Eq. (8) then becomes

$$ {\text{AC}}_{i} = {\mathbf{x}}_{ \cdot }^{\prime } {\varvec{\upbeta}} + (w_{ \cdot j} + \nu_{ \cdot j} ) + \gamma_{j} + \lambda_{j} . $$
(22)

The term \( (w_{ \cdot j} + v_{ \cdot j} ) \) follows a closed skew normal distribution as defined in (15). The term \( (\gamma_{i} + \lambda_{j} ) \) follows a closed skew normal distribution as in (12):

$$ (\gamma_{j} + \lambda_{j} ) \sim {\text{CSN}}_{1,1} \left( {0, \sigma_{\gamma }^{2} + \sigma_{\lambda }^{2} ,\frac{{\sigma_{\lambda }^{2} }}{{\sigma_{\gamma }^{2} + \sigma_{\lambda }^{2} }},0, \frac{{\sigma_{\gamma }^{2} \sigma_{\lambda }^{2} }}{{\sigma_{\gamma }^{2} + \sigma_{\lambda }^{2} }}} \right) $$
(23)

The terms \( (w_{ \cdot j} + v_{ \cdot j} ) \) and \( (\gamma_{j} + \lambda_{j} ) \) are defined to be independent. Since they are independent their sum must also follow a closed skew normal distribution by Theorem 4 in Gonzales-Farias et al. (2004). The distribution of their sum is

$$ \left( {w_{ \cdot j} + v_{ \cdot j} } \right) + \left( {\gamma_{j} + \lambda_{j} } \right)\sim CSN_{1, n + 1} \left( {\left( {0, \frac{1}{n}(\sigma_{w}^{2} + \sigma_{v}^{2} } \right) + \sigma_{\gamma }^{2} + \sigma_{\lambda }^{2} , D^{M} , {\mathbf{0}}_{{\user2{n} + {\mathbf{1}}}} , {{\Updelta}}^{M} } \right) $$
(24)

where \( {\mathbf{0}}_{{\user2{n} + {\mathbf{1}}}} \) is a n × 1 vector of zeroes, D M is \( \left[ {\sigma_{v}^{2} {\mathbf{1}}_{\user2{n}}^{'} , n\sigma_{\lambda }^{2} } \right] \cdot \left( {\sigma_{w}^{2} + \sigma_{v}^{2} + n\sigma_{\gamma }^{2} + n\sigma_{\lambda }^{2} } \right)^{ - 1} \) and ΔM can be derived following Theorem 4 in Gonzalez-Farias et al. and their n = 2 example. While it is possible to derive the distribution, estimating the parameters of such a distribution for a realistic problem with several explanatory variables is not an easy task. Note that in some instances, disaggregate data may be available. Estimating the parameters with disaggregate data still has estimation difficulties since the disaggregate data also follow a closed skew normal distribution. Colombi (2010) has suggested a two-step procedure for estimating the disaggregate model. Greene (2005) uses simulated maximum likelihood to estimate fixed effect and random effects stochastic frontier models with only one of the two possible inefficiency terms. Researchers have also suggested maximum likelihood procedures using the EM algorithm with general skew normal distributions and mixed models, but with only the firm-level random effect (Lin 2009; Lachos et al. 2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brorsen, B.W., Kim, T. Data aggregation in stochastic frontier models: the closed skew normal distribution. J Prod Anal 39, 27–34 (2013). https://doi.org/10.1007/s11123-012-0274-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11123-012-0274-2

Keywords

JEL Classification

Navigation